🔥码云GVP开源项目 12k star Uniapp+ElementUI 功能强大 支持多语言、二开方便! 广告
[TOC] ### **Pod的几个阶段** 在这个Measurement中,Pod被分为了如下四个阶段: * create:Pod被创建的时间 * schedule:Pod被成功调度的时间 * run:Pod成功运行的时间(每个container都要是ready状态) * watch:Pod被watch的时间 在SLO的定义中,Pod的启动时间被定义为从创建到运行,以及被watch,也就是 `watchTime - createTime`。在clusterloader的源码中,也的确是这么计算的,如下(pod_startup): ``` var podStartupTransitions = map[string]measurementutil.Transition{ "create_to_schedule": { From: createPhase, To: schedulePhase, }, "schedule_to_run": { From: schedulePhase, To: runPhase, }, "run_to_watch": { From: runPhase, To: watchPhase, }, "schedule_to_watch": { From: schedulePhase, To: watchPhase, }, "pod_startup": { From: createPhase, To: watchPhase, }, } ``` 前三个阶段还好理解,第四个阶段watch是意思?我们先用curl来watch一下Pod创建到运行的全过程: 先在master主机上执行如下命令,通过kubectl的proxy功能对外代理kube-apiserver: ``` $ kubectl proxy Starting to serve on 127.0.0.1:8001 ``` 然后在master主机上的另一个shell终端,执行curl命令,监听default命名空间中名字为test的Pod的事件: ``` $ curl http://127.0.0.1:8001/api/v1/watch/namespaces/default/pods/test ``` 然后,创建一个名字为test的Pod ``` $ kubectl run test --image harbor.ccse.io:8021/kubernetes/pause:3.6 ``` 然后,我们的curl程序就会监听到如下的事件: ``` {"type":"ADDED","object":{"kind":"Pod","apiVersion":"v1","metadata":{"name":"test","namespace":"default","uid":"f3038772-4b56-4a1f-87da-f7d0d786532a","resourceVersion":"2988095","creationTimestamp":"2022-09-01T02:14:48Z","labels":{"run":"test"},"managedFields":[{"manager":"kubectl-run","operation":"Update","apiVersion":"v1","time":"2022-09-01T02:14:48Z","fieldsType":"FieldsV1","fieldsV1":{"f:metadata":{"f:labels":{".":{},"f:run":{}}},"f:spec":{"f:containers":{"k:{\"name\":\"test\"}":{".":{},"f:image":{},"f:imagePullPolicy":{},"f:name":{},"f:resources":{},"f:terminationMessagePath":{},"f:terminationMessagePolicy":{}}},"f:dnsPolicy":{},"f:enableServiceLinks":{},"f:restartPolicy":{},"f:schedulerName":{},"f:securityContext":{},"f:terminationGracePeriodSeconds":{}}}}]},"spec":{"volumes":[{"name":"kube-api-access-bsbjw","projected":{"sources":[{"serviceAccountToken":{"expirationSeconds":3607,"path":"token"}},{"configMap":{"name":"kube-root-ca.crt","items":[{"key":"ca.crt","path":"ca.crt"}]}},{"downwardAPI":{"items":[{"path":"namespace","fieldRef":{"apiVersion":"v1","fieldPath":"metadata.namespace"}}]}}],"defaultMode":420}}],"containers":[{"name":"test","image":"harbor.ccse.io:8021/kubernetes/pause:3.6","resources":{},"volumeMounts":[{"name":"kube-api-access-bsbjw","readOnly":true,"mountPath":"/var/run/secrets/kubernetes.io/serviceaccount"}],"terminationMessagePath":"/dev/termination-log","terminationMessagePolicy":"File","imagePullPolicy":"IfNotPresent"}],"restartPolicy":"Always","terminationGracePeriodSeconds":30,"dnsPolicy":"ClusterFirst","serviceAccountName":"default","serviceAccount":"default","securityContext":{},"schedulerName":"default-scheduler","tolerations":[{"key":"node.kubernetes.io/not-ready","operator":"Exists","effect":"NoExecute","tolerationSeconds":300},{"key":"node.kubernetes.io/unreachable","operator":"Exists","effect":"NoExecute","tolerationSeconds":300}],"priority":0,"enableServiceLinks":true,"preemptionPolicy":"PreemptLowerPriority"},"status":{"phase":"Pending","qosClass":"BestEffort"}}} {"type":"MODIFIED","object":{"kind":"Pod","apiVersion":"v1","metadata":{"name":"test","namespace":"default","uid":"f3038772-4b56-4a1f-87da-f7d0d786532a","resourceVersion":"2988096","creationTimestamp":"2022-09-01T02:14:48Z","labels":{"run":"test"},"managedFields":[{"manager":"kubectl-run","operation":"Update","apiVersion":"v1","time":"2022-09-01T02:14:48Z","fieldsType":"FieldsV1","fieldsV1":{"f:metadata":{"f:labels":{".":{},"f:run":{}}},"f:spec":{"f:containers":{"k:{\"name\":\"test\"}":{".":{},"f:image":{},"f:imagePullPolicy":{},"f:name":{},"f:resources":{},"f:terminationMessagePath":{},"f:terminationMessagePolicy":{}}},"f:dnsPolicy":{},"f:enableServiceLinks":{},"f:restartPolicy":{},"f:schedulerName":{},"f:securityContext":{},"f:terminationGracePeriodSeconds":{}}}}]},"spec":{"volumes":[{"name":"kube-api-access-bsbjw","projected":{"sources":[{"serviceAccountToken":{"expirationSeconds":3607,"path":"token"}},{"configMap":{"name":"kube-root-ca.crt","items":[{"key":"ca.crt","path":"ca.crt"}]}},{"downwardAPI":{"items":[{"path":"namespace","fieldRef":{"apiVersion":"v1","fieldPath":"metadata.namespace"}}]}}],"defaultMode":420}}],"containers":[{"name":"test","image":"harbor.ccse.io:8021/kubernetes/pause:3.6","resources":{},"volumeMounts":[{"name":"kube-api-access-bsbjw","readOnly":true,"mountPath":"/var/run/secrets/kubernetes.io/serviceaccount"}],"terminationMessagePath":"/dev/termination-log","terminationMessagePolicy":"File","imagePullPolicy":"IfNotPresent"}],"restartPolicy":"Always","terminationGracePeriodSeconds":30,"dnsPolicy":"ClusterFirst","serviceAccountName":"default","serviceAccount":"default","nodeName":"10.35.20.5","securityContext":{},"schedulerName":"default-scheduler","tolerations":[{"key":"node.kubernetes.io/not-ready","operator":"Exists","effect":"NoExecute","tolerationSeconds":300},{"key":"node.kubernetes.io/unreachable","operator":"Exists","effect":"NoExecute","tolerationSeconds":300}],"priority":0,"enableServiceLinks":true,"preemptionPolicy":"PreemptLowerPriority"},"status":{"phase":"Pending","conditions":[{"type":"PodScheduled","status":"True","lastProbeTime":null,"lastTransitionTime":"2022-09-01T02:14:48Z"}],"qosClass":"BestEffort"}}} {"type":"MODIFIED","object":{"kind":"Pod","apiVersion":"v1","metadata":{"name":"test","namespace":"default","uid":"f3038772-4b56-4a1f-87da-f7d0d786532a","resourceVersion":"2988098","creationTimestamp":"2022-09-01T02:14:48Z","labels":{"run":"test"},"managedFields":[{"manager":"Go-http-client","operation":"Update","apiVersion":"v1","time":"2022-09-01T02:14:48Z","fieldsType":"FieldsV1","fieldsV1":{"f:status":{"f:conditions":{"k:{\"type\":\"ContainersReady\"}":{".":{},"f:lastProbeTime":{},"f:lastTransitionTime":{},"f:message":{},"f:reason":{},"f:status":{},"f:type":{}},"k:{\"type\":\"Initialized\"}":{".":{},"f:lastProbeTime":{},"f:lastTransitionTime":{},"f:status":{},"f:type":{}},"k:{\"type\":\"Ready\"}":{".":{},"f:lastProbeTime":{},"f:lastTransitionTime":{},"f:message":{},"f:reason":{},"f:status":{},"f:type":{}}},"f:containerStatuses":{},"f:hostIP":{},"f:startTime":{}}},"subresource":"status"},{"manager":"kubectl-run","operation":"Update","apiVersion":"v1","time":"2022-09-01T02:14:48Z","fieldsType":"FieldsV1","fieldsV1":{"f:metadata":{"f:labels":{".":{},"f:run":{}}},"f:spec":{"f:containers":{"k:{\"name\":\"test\"}":{".":{},"f:image":{},"f:imagePullPolicy":{},"f:name":{},"f:resources":{},"f:terminationMessagePath":{},"f:terminationMessagePolicy":{}}},"f:dnsPolicy":{},"f:enableServiceLinks":{},"f:restartPolicy":{},"f:schedulerName":{},"f:securityContext":{},"f:terminationGracePeriodSeconds":{}}}}]},"spec":{"volumes":[{"name":"kube-api-access-bsbjw","projected":{"sources":[{"serviceAccountToken":{"expirationSeconds":3607,"path":"token"}},{"configMap":{"name":"kube-root-ca.crt","items":[{"key":"ca.crt","path":"ca.crt"}]}},{"downwardAPI":{"items":[{"path":"namespace","fieldRef":{"apiVersion":"v1","fieldPath":"metadata.namespace"}}]}}],"defaultMode":420}}],"containers":[{"name":"test","image":"harbor.ccse.io:8021/kubernetes/pause:3.6","resources":{},"volumeMounts":[{"name":"kube-api-access-bsbjw","readOnly":true,"mountPath":"/var/run/secrets/kubernetes.io/serviceaccount"}],"terminationMessagePath":"/dev/termination-log","terminationMessagePolicy":"File","imagePullPolicy":"IfNotPresent"}],"restartPolicy":"Always","terminationGracePeriodSeconds":30,"dnsPolicy":"ClusterFirst","serviceAccountName":"default","serviceAccount":"default","nodeName":"10.35.20.5","securityContext":{},"schedulerName":"default-scheduler","tolerations":[{"key":"node.kubernetes.io/not-ready","operator":"Exists","effect":"NoExecute","tolerationSeconds":300},{"key":"node.kubernetes.io/unreachable","operator":"Exists","effect":"NoExecute","tolerationSeconds":300}],"priority":0,"enableServiceLinks":true,"preemptionPolicy":"PreemptLowerPriority"},"status":{"phase":"Pending","conditions":[{"type":"Initialized","status":"True","lastProbeTime":null,"lastTransitionTime":"2022-09-01T02:14:48Z"},{"type":"Ready","status":"False","lastProbeTime":null,"lastTransitionTime":"2022-09-01T02:14:48Z","reason":"ContainersNotReady","message":"containers with unready status: [test]"},{"type":"ContainersReady","status":"False","lastProbeTime":null,"lastTransitionTime":"2022-09-01T02:14:48Z","reason":"ContainersNotReady","message":"containers with unready status: [test]"},{"type":"PodScheduled","status":"True","lastProbeTime":null,"lastTransitionTime":"2022-09-01T02:14:48Z"}],"hostIP":"10.35.20.5","startTime":"2022-09-01T02:14:48Z","containerStatuses":[{"name":"test","state":{"waiting":{"reason":"ContainerCreating"}},"lastState":{},"ready":false,"restartCount":0,"image":"harbor.ccse.io:8021/kubernetes/pause:3.6","imageID":"","started":false}],"qosClass":"BestEffort"}}} {"type":"MODIFIED","object":{"kind":"Pod","apiVersion":"v1","metadata":{"name":"test","namespace":"default","uid":"f3038772-4b56-4a1f-87da-f7d0d786532a","resourceVersion":"2988101","creationTimestamp":"2022-09-01T02:14:48Z","labels":{"run":"test"},"annotations":{"cni.projectcalico.org/containerID":"5d74239bea601a89ec85ce46c1af510bf8a51d548c5e8239fc89b59e13cab32c","cni.projectcalico.org/podIP":"10.10.146.67/32","cni.projectcalico.org/podIPs":"10.10.146.67/32"},"managedFields":[{"manager":"Go-http-client","operation":"Update","apiVersion":"v1","time":"2022-09-01T02:14:48Z","fieldsType":"FieldsV1","fieldsV1":{"f:status":{"f:conditions":{"k:{\"type\":\"ContainersReady\"}":{".":{},"f:lastProbeTime":{},"f:lastTransitionTime":{},"f:message":{},"f:reason":{},"f:status":{},"f:type":{}},"k:{\"type\":\"Initialized\"}":{".":{},"f:lastProbeTime":{},"f:lastTransitionTime":{},"f:status":{},"f:type":{}},"k:{\"type\":\"Ready\"}":{".":{},"f:lastProbeTime":{},"f:lastTransitionTime":{},"f:message":{},"f:reason":{},"f:status":{},"f:type":{}}},"f:containerStatuses":{},"f:hostIP":{},"f:startTime":{}}},"subresource":"status"},{"manager":"kubectl-run","operation":"Update","apiVersion":"v1","time":"2022-09-01T02:14:48Z","fieldsType":"FieldsV1","fieldsV1":{"f:metadata":{"f:labels":{".":{},"f:run":{}}},"f:spec":{"f:containers":{"k:{\"name\":\"test\"}":{".":{},"f:image":{},"f:imagePullPolicy":{},"f:name":{},"f:resources":{},"f:terminationMessagePath":{},"f:terminationMessagePolicy":{}}},"f:dnsPolicy":{},"f:enableServiceLinks":{},"f:restartPolicy":{},"f:schedulerName":{},"f:securityContext":{},"f:terminationGracePeriodSeconds":{}}}},{"manager":"calico","operation":"Update","apiVersion":"v1","time":"2022-09-01T02:14:50Z","fieldsType":"FieldsV1","fieldsV1":{"f:metadata":{"f:annotations":{".":{},"f:cni.projectcalico.org/containerID":{},"f:cni.projectcalico.org/podIP":{},"f:cni.projectcalico.org/podIPs":{}}}},"subresource":"status"}]},"spec":{"volumes":[{"name":"kube-api-access-bsbjw","projected":{"sources":[{"serviceAccountToken":{"expirationSeconds":3607,"path":"token"}},{"configMap":{"name":"kube-root-ca.crt","items":[{"key":"ca.crt","path":"ca.crt"}]}},{"downwardAPI":{"items":[{"path":"namespace","fieldRef":{"apiVersion":"v1","fieldPath":"metadata.namespace"}}]}}],"defaultMode":420}}],"containers":[{"name":"test","image":"harbor.ccse.io:8021/kubernetes/pause:3.6","resources":{},"volumeMounts":[{"name":"kube-api-access-bsbjw","readOnly":true,"mountPath":"/var/run/secrets/kubernetes.io/serviceaccount"}],"terminationMessagePath":"/dev/termination-log","terminationMessagePolicy":"File","imagePullPolicy":"IfNotPresent"}],"restartPolicy":"Always","terminationGracePeriodSeconds":30,"dnsPolicy":"ClusterFirst","serviceAccountName":"default","serviceAccount":"default","nodeName":"10.35.20.5","securityContext":{},"schedulerName":"default-scheduler","tolerations":[{"key":"node.kubernetes.io/not-ready","operator":"Exists","effect":"NoExecute","tolerationSeconds":300},{"key":"node.kubernetes.io/unreachable","operator":"Exists","effect":"NoExecute","tolerationSeconds":300}],"priority":0,"enableServiceLinks":true,"preemptionPolicy":"PreemptLowerPriority"},"status":{"phase":"Pending","conditions":[{"type":"Initialized","status":"True","lastProbeTime":null,"lastTransitionTime":"2022-09-01T02:14:48Z"},{"type":"Ready","status":"False","lastProbeTime":null,"lastTransitionTime":"2022-09-01T02:14:48Z","reason":"ContainersNotReady","message":"containers with unready status: [test]"},{"type":"ContainersReady","status":"False","lastProbeTime":null,"lastTransitionTime":"2022-09-01T02:14:48Z","reason":"ContainersNotReady","message":"containers with unready status: [test]"},{"type":"PodScheduled","status":"True","lastProbeTime":null,"lastTransitionTime":"2022-09-01T02:14:48Z"}],"hostIP":"10.35.20.5","startTime":"2022-09-01T02:14:48Z","containerStatuses":[{"name":"test","state":{"waiting":{"reason":"ContainerCreating"}},"lastState":{},"ready":false,"restartCount":0,"image":"harbor.ccse.io:8021/kubernetes/pause:3.6","imageID":"","started":false}],"qosClass":"BestEffort"}}} {"type":"MODIFIED","object":{"kind":"Pod","apiVersion":"v1","metadata":{"name":"test","namespace":"default","uid":"f3038772-4b56-4a1f-87da-f7d0d786532a","resourceVersion":"2988108","creationTimestamp":"2022-09-01T02:14:48Z","labels":{"run":"test"},"annotations":{"cni.projectcalico.org/containerID":"5d74239bea601a89ec85ce46c1af510bf8a51d548c5e8239fc89b59e13cab32c","cni.projectcalico.org/podIP":"10.10.146.67/32","cni.projectcalico.org/podIPs":"10.10.146.67/32"},"managedFields":[{"manager":"kubectl-run","operation":"Update","apiVersion":"v1","time":"2022-09-01T02:14:48Z","fieldsType":"FieldsV1","fieldsV1":{"f:metadata":{"f:labels":{".":{},"f:run":{}}},"f:spec":{"f:containers":{"k:{\"name\":\"test\"}":{".":{},"f:image":{},"f:imagePullPolicy":{},"f:name":{},"f:resources":{},"f:terminationMessagePath":{},"f:terminationMessagePolicy":{}}},"f:dnsPolicy":{},"f:enableServiceLinks":{},"f:restartPolicy":{},"f:schedulerName":{},"f:securityContext":{},"f:terminationGracePeriodSeconds":{}}}},{"manager":"Go-http-client","operation":"Update","apiVersion":"v1","time":"2022-09-01T02:14:50Z","fieldsType":"FieldsV1","fieldsV1":{"f:status":{"f:conditions":{"k:{\"type\":\"ContainersReady\"}":{".":{},"f:lastProbeTime":{},"f:lastTransitionTime":{},"f:status":{},"f:type":{}},"k:{\"type\":\"Initialized\"}":{".":{},"f:lastProbeTime":{},"f:lastTransitionTime":{},"f:status":{},"f:type":{}},"k:{\"type\":\"Ready\"}":{".":{},"f:lastProbeTime":{},"f:lastTransitionTime":{},"f:status":{},"f:type":{}}},"f:containerStatuses":{},"f:hostIP":{},"f:phase":{},"f:podIP":{},"f:podIPs":{".":{},"k:{\"ip\":\"10.10.146.67\"}":{".":{},"f:ip":{}}},"f:startTime":{}}},"subresource":"status"},{"manager":"calico","operation":"Update","apiVersion":"v1","time":"2022-09-01T02:14:50Z","fieldsType":"FieldsV1","fieldsV1":{"f:metadata":{"f:annotations":{".":{},"f:cni.projectcalico.org/containerID":{},"f:cni.projectcalico.org/podIP":{},"f:cni.projectcalico.org/podIPs":{}}}},"subresource":"status"}]},"spec":{"volumes":[{"name":"kube-api-access-bsbjw","projected":{"sources":[{"serviceAccountToken":{"expirationSeconds":3607,"path":"token"}},{"configMap":{"name":"kube-root-ca.crt","items":[{"key":"ca.crt","path":"ca.crt"}]}},{"downwardAPI":{"items":[{"path":"namespace","fieldRef":{"apiVersion":"v1","fieldPath":"metadata.namespace"}}]}}],"defaultMode":420}}],"containers":[{"name":"test","image":"harbor.ccse.io:8021/kubernetes/pause:3.6","resources":{},"volumeMounts":[{"name":"kube-api-access-bsbjw","readOnly":true,"mountPath":"/var/run/secrets/kubernetes.io/serviceaccount"}],"terminationMessagePath":"/dev/termination-log","terminationMessagePolicy":"File","imagePullPolicy":"IfNotPresent"}],"restartPolicy":"Always","terminationGracePeriodSeconds":30,"dnsPolicy":"ClusterFirst","serviceAccountName":"default","serviceAccount":"default","nodeName":"10.35.20.5","securityContext":{},"schedulerName":"default-scheduler","tolerations":[{"key":"node.kubernetes.io/not-ready","operator":"Exists","effect":"NoExecute","tolerationSeconds":300},{"key":"node.kubernetes.io/unreachable","operator":"Exists","effect":"NoExecute","tolerationSeconds":300}],"priority":0,"enableServiceLinks":true,"preemptionPolicy":"PreemptLowerPriority"},"status":{"phase":"Running","conditions":[{"type":"Initialized","status":"True","lastProbeTime":null,"lastTransitionTime":"2022-09-01T02:14:48Z"},{"type":"Ready","status":"True","lastProbeTime":null,"lastTransitionTime":"2022-09-01T02:14:50Z"},{"type":"ContainersReady","status":"True","lastProbeTime":null,"lastTransitionTime":"2022-09-01T02:14:50Z"},{"type":"PodScheduled","status":"True","lastProbeTime":null,"lastTransitionTime":"2022-09-01T02:14:48Z"}],"hostIP":"10.35.20.5","podIP":"10.10.146.67","podIPs":[{"ip":"10.10.146.67"}],"startTime":"2022-09-01T02:14:48Z","containerStatuses":[{"name":"test","state":{"running":{"startedAt":"2022-09-01T02:14:50Z"}},"lastState":{},"ready":true,"restartCount":0,"image":"harbor.ccse.io:8021/kubernetes/pause:3.6","imageID":"docker-pullable://harbor.ccse.io:8021/kubernetes/pause@sha256:74bf6fc6be13c4ec53a86a5acf9fdbc6787b176db0693659ad6ac89f115e182c","containerID":"docker://43c5fe83200b0ee5852d81c5cd9d61defdf1a7083a6f413fa5d83cc597c08542","started":true}],"qosClass":"BestEffort"}}} ``` **注意:这里的事件和Kubernetes中的Event资源(kubectl get event)不是同一个概念**。 可以看到,这里的事件有两个字段:type和object。type的取值有ADDED(资源对象被创建)、MODIFIED(资源对象被更改)、DELETE(资源对象被删除)。object就是被监听的test这个Pod的完整json(不过,我们通过kubectl get pod test -o json看到的内容,是没有metadata.managedFields的),managedFields表示该事件对哪些字段做了更改。 第一个是Pod被创建的事件,在managedFields这个数组中有一个元素,它的manager是kubectl run。 第二个是Pod被更改的事件,在这个事件中,managedFields依然只有一个元素且内容并没有发生改变(这里有点奇怪),但是我们发现Pod的nodeName字段以及scheduler字段已经有了值,说明该事件是Pod被调度成功的事件。 第三个事件中,managedFields多了一个元素,它的manager是Go-http-client,它改变的字段主要是status。然后我们看一下Pod的这些字段,发现多了status.conditions和status.containerStatus等字段。从这些字段的值可以发现,Pod此时还没有处于Running状态,也就是说,kubelet在监听到调度到它的Pod后,首先会更改这个Pod的状态。 第四个事件中,managedFields又多了一个元素,它的manager是calico,通过观察我们知道该事件表示是calico给Pod设置了IP,以及在annotation中添加了一些内容。 第五个事件中,managedFields的元素还是三个,不过manager为Go-http-client的内容已经变了(time已经变了),通过观察我们发现,此时Pod已经是Running状态。 通过上面的分析,我们可以简单总结一下Pod的创建过程为: ``` 创建 -> 调度 -> kubelet初始化 -> 分配IP -> 运行 ``` 在上面的分析中,依然不知道clusterloader2中的watch阶段是什么意思。不急,我们再来看clusterloader2的源码。 clusterloader2以一个二层map来存储每个Pod的每个阶段的超始时间,比如`map["default/test"]["create"]`存储的就是default命名空间下test这个Pod的创建时间。 接下来,我们来看一下,clusterloader2是如何获取每个Pod的每个阶段的超始时间的。下面这个函数为关键函数,clusterloader2会监听测试过程中指定label的Pod,当监听到指定的Pod的事件后,便会调用下面的这个函数,该函数本人已经添加了中文注释: ``` func (p *podStartupLatencyMeasurement) processEvent(event *eventData) { // obj就是上面事件中的object对象,recvTime表示clusterloader2接收到这个事件的时间 obj, recvTime := event.obj, event.recvTime if obj == nil { return } pod, ok := obj.(*corev1.Pod) if !ok { return } // 根据namespace与pod的名字生成生成key,类似 namespace/pod,可定位到某一个Pod key := createMetaNamespaceKey(pod.Namespace, pod.Name) p.podMetadata.SetStateless(key, isPodStateless(pod)) // 只有当这个事件中,Pod的Phase为Running时,才处理这个事件,可以减少事件的处理 // 创建、调度、初始化等事件都不需要处理,这是因为在运行这个事件中,可以拿到Pod的创建、运行时间 if pod.Status.Phase == corev1.PodRunning { // 如果没有找到该Pod的记录,则开始统计;如果该Pod已经统计过,则忽略该事件无需再统计 if _, found := p.podStartupEntries.Get(key, createPhase); !found { // watch的时间就是clusterloader2收到该事件的时间 p.podStartupEntries.Set(key, watchPhase, recvTime) // 从Pod的metadata.creationTimestamp获取Pod的创建时间 p.podStartupEntries.Set(key, createPhase, pod.CreationTimestamp.Time) var startTime metav1.Time // 从Pod的.status.containerStatuses字段中遍历所有container的启动时间,最后一个Container的启动时间作为该Pod的启动时间 for _, cs := range pod.Status.ContainerStatuses { if cs.State.Running != nil { if startTime.Before(&cs.State.Running.StartedAt) { startTime = cs.State.Running.StartedAt } } } if startTime != metav1.NewTime(time.Time{}) { p.podStartupEntries.Set(key, runPhase, startTime.Time) } else { klog.Errorf("%s: pod %v (%v) is reported to be running, but none of its containers is", p, pod.Name, pod.Namespace) } } } } ``` 通过上面的函数分析,我们基本清楚了clusterloader2是如何获取Pod的各个阶段的起始时间的。我们也终于弄清楚了:**Pod的watch阶段的超始时间就是clusterloader2第一次接收到Pod为Running状态的事件的时间**。为什么是第一次为Running状态的事件的时间呢?这是因为如果Pod的container重启,kubelet会修改Pod的相关字段,此时kube-apiserver又会给clusterloader2发送一个事件,这个事件中Pod还会是Running状态。 上面的函数已经统计了一个Pod的create、run、watch三个阶段的起始时间,但是没有看到schedule阶段的超始时间。别急,Pod的调度时间是在下面的函数中进行统计: ``` func (p *podStartupLatencyMeasurement) gatherScheduleTimes(c clientset.Interface) error { // 通过这两个字段,过滤出调度的event,注意这里的event就是kubectl get event中的event,而不是clusterloader2监听到的事件 selector := fields.Set{ "involvedObject.kind": "Pod", "source": corev1.DefaultSchedulerName, }.AsSelector().String() options := metav1.ListOptions{FieldSelector: selector} schedEvents, err := c.CoreV1().Events(p.selector.Namespace).List(context.TODO(), options) if err != nil { return err } // 从event对象中获取Pod的调度成功时间 for _, event := range schedEvents.Items { key := createMetaNamespaceKey(event.InvolvedObject.Namespace, event.InvolvedObject.Name) if _, exists := p.podStartupEntries.Get(key, createPhase); exists { if !event.EventTime.IsZero() { // 如果.eventTime非空,则用它作为调度成功的时间 p.podStartupEntries.Set(key, schedulePhase, event.EventTime.Time) } else { // 如果.eventTime为空,则使用.firstTimestamp作为调度成功的时间 p.podStartupEntries.Set(key, schedulePhase, event.FirstTimestamp.Time) } } } return nil } ``` 可以看到,该函数首先通过fieldSelector过滤出所有Pod的调度事件,然后再从每个event对象的特定字段中获取Pod的调度成功时间。我们可以通过下面的命令过滤出上述函数中的event: ``` $ kubectl get events --all-namespaces -o wide --field-selector involvedObject.kind=Pod,source=default-scheduler NAMESPACE LAST SEEN TYPE REASON OBJECT SUBOBJECT SOURCE MESSAGE FIRST SEEN COUNT NAME default 17m Normal Scheduled pod/test default-scheduler Successfully assigned default/test to 10.35.20.5 17m 1 test.1710a798130957bd ``` 我们来看一下default/test这个Pod的调度事件的完整yaml内容(注意这个事件的时间和上面create、run的时间可能相差很远,这是因为本文写到这里时,default/test已经被重复创建和删除了好几次了): ``` $ kubectl get event test.1710a798130957bd -o yaml apiVersion: v1 count: 1 eventTime: null firstTimestamp: "2022-09-01T06:08:54Z" involvedObject: apiVersion: v1 kind: Pod name: test namespace: default resourceVersion: "3016347" uid: 258f859a-497d-4cb3-aa0b-1b4029fd51a8 kind: Event lastTimestamp: "2022-09-01T06:08:54Z" message: Successfully assigned default/test to 10.35.20.5 metadata: creationTimestamp: "2022-09-01T06:08:54Z" name: test.1710a798130957bd namespace: default resourceVersion: "3016349" uid: fd2cc914-6a71-4579-9552-2fd7ac4c00c7 reason: Scheduled reportingComponent: "" reportingInstance: "" source: component: default-scheduler type: Normal ``` 可以看到,involvedObject字段中的值表示这个event就是default/test这个Pod的调度成功后,default-scheduler生成的。这里有几个问题: 第一,在前面的函数中,为什么先以eventTime为准,如果它为空则取firstTimestamp? 这个问题没有去深入了解,可以阅读kube-scheduler的源码或者做实验去研究一下(猜测可能eventTime是调度成功的时间,firstTimestamp是第一次调度的时间,但第一次调度不一定成功)。 第二,统计schedule时间是在gather阶段,也就是说clusterloader2要等所有N*30个Pod启动成功后才会一次性统计它的schedule时间;但是event在etcd只默认只保存一个小时,如果还没有等所有的Pod都启动失败,一部分Pod的event被自动删除了怎么办? 对于这个问题,好像clusterloader2也没有管,如果event被自动删除了,就忽略这个Pod的schedule的时间。 ### **Clusterloader2中的统计方法**