探针 · kubernetes

[TOC] ## **探针类型** 探针主要有三类： * 存活探针（LivenessProbe） * 就绪探针（ReadinessProbe） * 启动探针（StartupProbe）它们在Pod中的位置如下： ``` apiVersion: v1 kind: Pod metadata: ... spec: containers: - name: container1 ... livenessProbe: ... readinessProbe: ... startupProbe: ... ``` 这三类探针的区别在于，存活探针如果失败则会根据Pod的restartPolicy来处理（比如重启）容器；就绪探针如果失败，则不会被加入到Endpoints当中，也就是不会进入到负载均衡中。启动探针一般用于保护启动较慢的容器，参考Reference ## **检测方式** 上述三类探针的检测方式是完全一样的，有如下三种 #### **TCP检测** 如下是存活探针TCP检测方式的示例。kubelet会探测Pod的8080端口，如果探测失败，则会根据Pod的restartPolicy（Always）来重启（Pod还是容器？） ``` apiVersion: v1 kind: Pod metadata: name: livenessExample spec: restartPolicy: Always containers: - name: tomcat image: tomcat:8 livenessProbe: tcpSockets: port: 8080 ``` #### **HTTP检测** 如下是一个HTTP检测方式的示例，kubelet会检测URL`http://podIP:8080/healthz`，如果返回的状态码为`[200, 400)`，则说明检测成功 ``` apiVersion: v1 kind: Pod metadata: name: livenessExample spec: restartPolicy: Always containers: - name: tomcat image: tomcat:8 livenessProbe: httpGet: port: 8080 path: /healthz ``` #### **命令检测** 如下是一个命令检测方式的示例，kubelet会给Pod的Container发送一个命令，如果命令的返回值为0，则说明检测成功 ``` apiVersion: v1 kind: Pod metadata: name: livenessExample spec: restartPolicy: Always containers: - name: busybox image: k8s.gcr.io/busybox args: ["/bin/sh", "-c", "touch /tmp/healthy; sleep 30; rm -rf /tmp/healthy; sleep 600"] livenessProbe: exec: command: ["cat", "/tmp/healthy"] ``` 这个容器生命的前 30 秒，`/tmp/healthy`文件是存在的。所以在这最开始的 30 秒内，执行命令`cat /tmp/healthy`会返回成功码。30 秒之后，执行命令`cat /tmp/healthy`就会返回失败码。 ## **探针配置** 探针还有如下的配置： * `initialDelaySeconds`：当Pod的状态变成started后，kubelet再等多少秒才初始化探针；默认值为0 * `periodSeconds`：kubelet每隔多少秒执行一次检测；默认值为10 * `timeoutSeconds`：kubelet执行每次检测的超时时间，超过这个时间Pod没有返回，则表示检测失败；默认值为1 * `failureThreshold`：kubelet连续多少次检测失败，才把探针定义为失败；默认值为3 * `successThreshold`：当探针被检测为失败后，需要连续多少次检测为成功，才把探针定义为成功；存活探针必须设置为1，否则会一直重启；默认值为1 如下是一个存活探针的详情示例： ``` apiVersion: v1 kind: Pod metadata: name: livenessExample spec: restartPolicy: Always containers: - name: tomcat image: tomcat:8 livenessProbe: initialDelaySeconds: 10 periodSeconds: 10 timeoutSeconds: 1 failureThreshold: 3 successThreshold: 1 tcpSockets: port: 8080 ``` ## **探针类型详细介绍** #### **存活探针** to be continued #### **就绪探针** 下面是一个就绪探针的例子 ``` apiVersion: v1 kind: Pod metadata: name: peng spec: restartPolicy: Always containers: - name: linzhe image: tomcat:8 command: - /bin/sh - -c - sleep 6000 readinessProbe: initialDelaySeconds: 10 tcpSocket: port: 8091 - name: shizhu image: tomcat:8 ``` 该Pod的就绪探针中会检测TCP 8091端口，而该端口永远不会起来，所以该就绪探针一直为失败。我们查看Pod，会发现两个Container只有一个是ready（第2列） ``` $ kubectl get pod peng NAME READY STATUS RESTARTS AGE peng 1/2 Running 0 9s ``` 如果查看Pod的yaml文件可以看到，Pod的`Ready`这个condition为`False`，原因`Reason`为`ContainersNotReady`；同时我们也能看到`linzhe`这个Container的`ready`为`false`，而`shizhu`这个Container的`ready`为`true` ``` $ kubectl get pod peng -o yaml ... status: conditions: - type: Ready status: "False" reason: ContainersNotReady message: 'containers with unready status: [linzhe]' ... containerStatuses: - name: linzhe ready: false restartCount: 0 started: true state: running: startedAt: "2020-07-07T07:47:15Z" ... - name: shizhu ready: true restartCount: 0 started: true state: running: startedAt: "2020-07-07T07:47:16Z" ... ``` 对于`Ready`这个condition为`Fasle`的Pod，k8s是不会把它加入到Endpoints当中。 #### **启动探针** 参考Reference ## **最佳实践** 1、配置启动探针或者将`intialDelaySeconds`的值设置大一些，以保护启动慢的容器 2、其他值都可以保持默认 ## **Reference** * https://kubernetes.io/zh/docs/tasks/configure-pod-container/configure-liveness-readiness-startup-probes/#configure-probes