Case Study: Health Check ¶
Scenario:
- Create Deployment and Service
- Simulate an error (delete index.html)
- Pod is in unhealth status and is removed from endpoint list
- Fix the error (revert the index.html)
- Pod is back to normal and in endpoint list
Create Deployment and Service ¶
Create Deployment nginx-healthcheck and Service nginx-healthcheck.
kubectl apply -f - <<EOF
apiVersion: apps/v1
kind: Deployment
metadata:
name: nginx-healthcheck
spec:
replicas: 2
selector:
matchLabels:
name: nginx-healthcheck
template:
metadata:
labels:
name: nginx-healthcheck
spec:
containers:
- name: nginx-healthcheck
image: nginx:latest
imagePullPolicy: IfNotPresent
ports:
- containerPort: 80
livenessProbe:
initialDelaySeconds: 5
periodSeconds: 5
tcpSocket:
port: 80
timeoutSeconds: 5
readinessProbe:
httpGet:
path: /
port: 80
scheme: HTTP
initialDelaySeconds: 5
periodSeconds: 5
timeoutSeconds: 5
---
apiVersion: v1
kind: Service
metadata:
name: nginx-healthcheck
spec:
ports:
- port: 80
targetPort: 80
protocol: TCP
type: NodePort
selector:
name: nginx-healthcheck
EOF
Check Pod nginx-healthcheck.
kubectl get pod -owide
Result
NAME READY STATUS RESTARTS AGE IP NODE NOMINATED NODE READINESS GATES
nginx-healthcheck-79fc55d944-jw887 1/1 Running 0 9s 10.244.102.14 cka003 <none> <none>
nginx-healthcheck-79fc55d944-nwwjc 1/1 Running 0 9s 10.244.112.13 cka002 <none> <none>
Access Pod IP via curl command, e.g., above example.
curl 10.244.102.14
curl 10.244.112.13
We see a successful index.html content of Nginx below with above example.
Check details of Service craeted in above example.
kubectl describe svc nginx-healthcheck
We will see below output. There are two Pods information listed in Endpoints.
Name: nginx-healthcheck
Namespace: dev
Labels: <none>
Annotations: <none>
Selector: name=nginx-healthcheck
Type: NodePort
IP Family Policy: SingleStack
IP Families: IPv4
IP: 11.244.238.20
IPs: 11.244.238.20
Port: <unset> 80/TCP
TargetPort: 80/TCP
NodePort: <unset> 31795/TCP
Endpoints: 10.244.102.14:80,10.244.112.13:80
Session Affinity: None
External Traffic Policy: Cluster
Events: <none>
We can also get information of Endpoints.
kubectl get endpoints nginx-healthcheck
Result
NAME ENDPOINTS AGE
nginx-healthcheck 10.244.102.14:80,10.244.112.13:80 72s
Till now, two nginx-healthcheck Pods are working and providing service as expected.
Simulate readinessProbe Failure ¶
Let's simulate an error by deleting and index.html file in on of nginx-healthcheck Pod and see what's readinessProbe will do.
First, execute kubectl exec -it <your_pod_name> -- bash to log into nginx-healthcheck Pod, and delete the index.html file.
kubectl exec -it nginx-healthcheck-79fc55d944-jw887 -- bash
cd /usr/share/nginx/html/
rm -rf index.html
exit
After that, let's check the status of above Pod that index.html file was deleted.
kubectl describe pod nginx-healthcheck-79fc55d944-jw887
We can now see Readiness probe failed error event message.
......
Events:
Type Reason Age From Message
---- ------ ---- ---- -------
Normal Scheduled 2m8s default-scheduler Successfully assigned dev/nginx-healthcheck-79fc55d944-jw887 to cka003
Normal Pulled 2m7s kubelet Container image "nginx:latest" already present on machine
Normal Created 2m7s kubelet Created container nginx-healthcheck
Normal Started 2m7s kubelet Started container nginx-healthcheck
Warning Unhealthy 2s (x2 over 7s) kubelet Readiness probe failed: HTTP probe failed with statuscode: 403
Let's check another Pod.
kubectl describe pod nginx-healthcheck-79fc55d944-nwwjc
There is no error info.
......
Events:
Type Reason Age From Message
---- ------ ---- ---- -------
Normal Scheduled 3m46s default-scheduler Successfully assigned dev/nginx-healthcheck-79fc55d944-nwwjc to cka002
Normal Pulled 3m45s kubelet Container image "nginx:latest" already present on machine
Normal Created 3m45s kubelet Created container nginx-healthcheck
Normal Started 3m45s kubelet Started container nginx-healthcheck
Now, access Pod IP via curl command and see what the result of each Pod.
curl 10.244.102.14
curl 10.244.112.13
Result:
curl 10.244.102.14failed with403 Forbiddenerror below.curl 10.244.112.13works well.
Let's check current status of Nginx Service after one of Pods runs into failure.
kubectl describe svc nginx-healthcheck
In below output, there is only one Pod information listed in Endpoint.
Name: nginx-healthcheck
Namespace: dev
Labels: <none>
Annotations: <none>
Selector: name=nginx-healthcheck
Type: NodePort
IP Family Policy: SingleStack
IP Families: IPv4
IP: 11.244.238.20
IPs: 11.244.238.20
Port: <unset> 80/TCP
TargetPort: 80/TCP
NodePort: <unset> 31795/TCP
Endpoints: 10.244.112.13:80
Session Affinity: None
External Traffic Policy: Cluster
Events: <none>
Same result we can get by checking information of Endpoints, which is only Pod is running.
kubectl get endpoints nginx-healthcheck
Output:
NAME ENDPOINTS AGE
nginx-healthcheck 10.244.112.13:80 6m5s
Fix readinessProbe Failure ¶
Let's re-create the index.html file again in the Pod.
kubectl exec -it nginx-healthcheck-79fc55d944-jw887 -- bash
cd /usr/share/nginx/html/
cat > index.html << EOF
<!DOCTYPE html>
<html>
<head>
<title>Welcome to nginx!</title>
<style>
body {
width: 35em;
margin: 0 auto;
font-family: Tahoma, Verdana, Arial, sans-serif;
}
</style>
</head>
<body>
<h1>Welcome to nginx!</h1>
<p>If you see this page, the nginx web server is successfully installed and
working. Further configuration is required.</p>
<p>For online documentation and support please refer to
<a href="http://nginx.org/">nginx.org</a>.<br/>
Commercial support is available at
<a href="http://nginx.com/">nginx.com</a>.</p>
<p><em>Thank you for using nginx.</em></p>
</body>
</html>
EOF
exit
We now can see that two Pods are back to Endpoints to provide service now.
kubectl describe svc nginx-healthcheck
kubectl get endpoints nginx-healthcheck
Re-access Pod IP via curl command and we can see both are back to normal status.
curl 10.244.102.14
curl 10.244.112.13
Verify the Pod status again.
kubectl describe pod nginx-healthcheck-79fc55d944-jw887
Conclusion:
- By delete the
index.htmlfile, the Pod is in unhealth status and is removed from endpoint list. - One one health Pod can provide normal service.
Clean up
kubectl delete service nginx-healthcheck
kubectl delete deployment nginx-healthcheck
Simulate livenessProbe Failure ¶
Re-create Deployment nginx-healthcheck and Service nginx-healthcheck.
Deployment:
NAME READY UP-TO-DATE AVAILABLE AGE
nginx-healthcheck 0/2 2 0 7s
Pods:
NAME READY STATUS RESTARTS AGE
nginx-healthcheck-79fc55d944-lknp9 1/1 Running 0 96s
nginx-healthcheck-79fc55d944-wntmg 1/1 Running 0 96s
Change nginx default listening port from 80 to 90 to simulate livenessProbe Failure. livenessProbe check the live status via port 80.
kubectl exec -it nginx-healthcheck-79fc55d944-lknp9 -- bash
root@nginx-healthcheck-79fc55d944-lknp9:/# cd /etc/nginx/conf.d
root@nginx-healthcheck-79fc55d944-lknp9:/etc/nginx/conf.d# sed -i 's/80/90/g' default.conf
root@nginx-healthcheck-79fc55d944-lknp9:/etc/nginx/conf.d# nginx -s reload
2022/07/24 12:59:45 [notice] 79#79: signal process started
The Pod runs into failure.
kubectl describe pod nginx-healthcheck-79fc55d944-lknp9
We can see livenessProbe failed error event message.
Events:
Type Reason Age From Message
---- ------ ---- ---- -------
Normal Scheduled 17m default-scheduler Successfully assigned dev/nginx-healthcheck-79fc55d944-lknp9 to cka003
Normal Pulled 2m47s (x2 over 17m) kubelet Container image "nginx:latest" already present on machine
Normal Created 2m47s (x2 over 17m) kubelet Created container nginx-healthcheck
Normal Started 2m47s (x2 over 17m) kubelet Started container nginx-healthcheck
Warning Unhealthy 2m47s (x4 over 2m57s) kubelet Readiness probe failed: Get "http://10.244.102.46:80/": dial tcp 10.244.102.46:80: connect: connection refused
Warning Unhealthy 2m47s (x3 over 2m57s) kubelet Liveness probe failed: dial tcp 10.244.102.46:80: connect: connection refused
Normal Killing 2m47s kubelet Container nginx-healthcheck failed liveness probe, will be restarted
Once failure detected by livenessProbe, the container will restarted again automatically. The default.conf we modified will be replaced by default file and the container status is up and normal.