启动(Startup Probe)、存活(Liveness Probe)和就绪探针(Readiness Probe)有其不同的用途和优先级。
优先级和用途
启动探针(Startup Probe)用于Pod内程序告诉kubernetes,其准备工作已经做好。这些准备工作主要是指业务运行前的前置条件,比如资源文件下载完毕,内置数据库文件下载完毕等。这步完成后存活和就绪探针才会开始工作。
存活和就绪探针之间没有关系,所以它们没有优先级区别,即在启动探针确定Success后,它们两个同时开始检测。有任何一个失败就会执行其对应的失败处理动作。
存活探针用于表示程序是否活着。如果被认定不存活,会依据设置要么重启容器或让Pod调度失败。
就绪探针表示程序是否可以提供服务。一般Pod内程序是通过Service对外提供服务,如果就绪探针失败,Service会将该Pod摘除,这样流量就不会打到这个不能工作的Pod上;如果就绪探针成功了,该Pod又会被加进Service。
似乎有存活和就绪探针就够了,为什么还要启动探针呢?因为一些准备工作我们并不知道其需要花多长时间,比如可能网络带宽问题导致资源文件下载很慢。这个时候设置存活或者就绪探针就可能不准确,或者导致其不灵敏。所以设置启动探针可以提升其他探针的灵敏度。
启动和存活探针
# startup_liveness.yaml
apiVersion: apps/v1
kind: Deployment
metadata:
name: startup-liveness-deployment
spec:
selector:
matchLabels:
app: startup-liveness
template:
metadata:
labels:
app: startup-liveness
spec:
containers:
- name: startup-liveness-container
image: busybox
command: ["/bin/sh", "-c", "sleep 6; touch /tempdir/ready; sleep 3;touch /tempdir/keepalive; while true; do sleep 5; done"]
volumeMounts:
- name: probe-volume
mountPath: /tempdir
startupProbe:
exec:
command:
- cat
- /tempdir/ready
initialDelaySeconds: 3
failureThreshold: 6
periodSeconds: 1
successThreshold: 1
livenessProbe:
exec:
command:
- cat
- /tempdir/keepalive
failureThreshold: 6
periodSeconds: 1
successThreshold: 1
volumes:
- name: probe-volume
emptyDir:
medium: Memory
sizeLimit: 1Gi
这段清单中的逻辑如下图
我们使用下面指令查看中间发生的事件
kubectl describe pod
Events:
Type Reason Age From Message
---- ------ ---- ---- -------
Normal Scheduled 15s default-scheduler Successfully assigned default/startup-liveness-deployment-66f76576ff-9pnmj to ubuntub
Normal Pulling 15s kubelet Pulling image "busybox"
Normal Pulled 13s kubelet Successfully pulled image "busybox" in 2.603715682s (2.603722383s including waiting)
Normal Created 13s kubelet Created container startup-liveness-container
Normal Started 13s kubelet Started container startup-liveness-container
Warning Unhealthy 7s (x4 over 10s) kubelet Startup probe failed: cat: can't open '/tempdir/ready': No such file or directory
Warning Unhealthy 4s (x2 over 5s) kubelet Liveness probe failed: cat: can't open '/tempdir/keepalive': No such file or directory
可以看到Startup Probe在第4次检测时,/tempdir/ready文件还没创建。但是第5次时,就检测到了它,于是进入Liveness Probe检测状态。这从最后两个行为经历的时间差10s-5s=5s可以印证Startup Probe执行的次数大概在4~5次之间(因为检测周期periodSeconds是1秒)。
Liveness Probe在第2次检测时,/tempdir/keepalive还没创建。到第三次检测时,这个标志文件就创建了。于是整个Pod进入了Running状态。
启动和就绪探针
# startup_readiness.yaml
apiVersion: apps/v1
kind: Deployment
metadata:
name: startup-readiness-deployment
spec:
selector:
matchLabels:
app: startup-readiness
template:
metadata:
labels:
app: startup-readiness
spec:
containers:
- name: startup-readiness-container
image: busybox
command: ["/bin/sh", "-c", "sleep 6; touch /tempdir/ready; sleep 3;touch /tempdir/readiness; while true; do sleep 5; done"]
volumeMounts:
- name: probe-volume
mountPath: /tempdir
startupProbe:
exec:
command:
- cat
- /tempdir/ready
initialDelaySeconds: 3
failureThreshold: 6
periodSeconds: 1
successThreshold: 1
readinessProbe:
exec:
command:
- cat
- /tempdir/readiness
failureThreshold: 6
periodSeconds: 1
successThreshold: 1
volumes:
- name: probe-volume
emptyDir:
medium: Memory
sizeLimit: 1Gi
和上一节流程类似
其执行事件如下:
Events:
Type Reason Age From Message
---- ------ ---- ---- -------
Normal Scheduled 13s default-scheduler Successfully assigned default/startup-readiness-deployment-64cbcc9659-k7m5v to ubuntuc
Normal Pulling 13s kubelet Pulling image "busybox"
Normal Pulled 11s kubelet Successfully pulled image "busybox" in 2.10831058s (2.10831728s including waiting)
Normal Created 11s kubelet Created container startup-readiness-container
Normal Started 11s kubelet Started container startup-readiness-container
Warning Unhealthy 5s (x4 over 8s) kubelet Startup probe failed: cat: can't open '/tempdir/ready': No such file or directory
Warning Unhealthy 2s (x3 over 4s) kubelet Readiness probe failed: cat: can't open '/tempdir/readiness': No such file or directory
这次readiness检测到第4次时才认定状态为success。
上述两个实验可以证明:启动探针(Startup Probe)检测状态是success后,存活(Liveness Probe)和就绪探针(Readiness Probe)才开始检测。
存活和就绪探针
# liveness_readiness.yaml
apiVersion: apps/v1
kind: Deployment
metadata:
name: liveness-readiness-deployment
spec:
selector:
matchLabels:
app: liveness-readiness
template:
metadata:
labels:
app: liveness-readiness
spec:
containers:
- name: liveness-readiness-container
image: busybox
command: ["/bin/sh", "-c", "sleep 3; touch /tempdir/keepalive; sleep 3;touch /tempdir/readiness; while true; do sleep 5; done"]
volumeMounts:
- name: probe-volume
mountPath: /tempdir
livenessProbe:
exec:
command:
- cat
- /tempdir/keepalive
initialDelaySeconds: 3
failureThreshold: 6
periodSeconds: 1
successThreshold: 1
readinessProbe:
exec:
command:
- cat
- /tempdir/readiness
failureThreshold: 6
periodSeconds: 1
successThreshold: 1
volumes:
- name: probe-volume
emptyDir:
medium: Memory
sizeLimit: 1Gi
通过Pod的Event可以看到,Liveness和Readiness Probe的生命长度一致(如下图都是6秒)。
Events:
Type Reason Age From Message
---- ------ ---- ---- -------
Normal Scheduled 10s default-scheduler Successfully assigned default/liveness-readiness-deployment-f6db88747-znxsm to ubuntub
Normal Pulling 10s kubelet Pulling image "busybox"
Normal Pulled 8s kubelet Successfully pulled image "busybox" in 2.092699902s (2.092706902s including waiting)
Normal Created 8s kubelet Created container liveness-readiness-container
Normal Started 8s kubelet Started container liveness-readiness-container
Warning Unhealthy 5s (x2 over 6s) kubelet Liveness probe failed: cat: can't open '/tempdir/keepalive': No such file or directory
Warning Unhealthy 4s (x4 over 6s) kubelet Readiness probe failed: cat: can't open '/tempdir/readiness': No such file or directory