企业🤖AI Agent构建引擎,智能编排和调试,一键部署,支持私有化部署方案 广告
[TOC] ### **指标** K8S的apiserver有如下指标,记录了apiserver的所有HTTP请求的响应时间数据。下面只截图了Node的GET与LIST请求的数据,其他资源类型(如Pod)的其他请求类型(如POST)的数据没有截出来: ``` apiserver_request_duration_seconds_bucket{component="apiserver",dry_run="",group="",resource="nodes",scope="cluster",subresource="",verb="GET",version="v1",le="0.05"} 30980 apiserver_request_duration_seconds_bucket{component="apiserver",dry_run="",group="",resource="nodes",scope="cluster",subresource="",verb="GET",version="v1",le="0.1"} 30980 apiserver_request_duration_seconds_bucket{component="apiserver",dry_run="",group="",resource="nodes",scope="cluster",subresource="",verb="GET",version="v1",le="0.15"} 30980 apiserver_request_duration_seconds_bucket{component="apiserver",dry_run="",group="",resource="nodes",scope="cluster",subresource="",verb="GET",version="v1",le="0.2"} 30980 apiserver_request_duration_seconds_bucket{component="apiserver",dry_run="",group="",resource="nodes",scope="cluster",subresource="",verb="GET",version="v1",le="0.25"} 30980 apiserver_request_duration_seconds_bucket{component="apiserver",dry_run="",group="",resource="nodes",scope="cluster",subresource="",verb="GET",version="v1",le="0.3"} 30980 apiserver_request_duration_seconds_bucket{component="apiserver",dry_run="",group="",resource="nodes",scope="cluster",subresource="",verb="GET",version="v1",le="0.35"} 30980 apiserver_request_duration_seconds_bucket{component="apiserver",dry_run="",group="",resource="nodes",scope="cluster",subresource="",verb="GET",version="v1",le="0.4"} 30980 apiserver_request_duration_seconds_bucket{component="apiserver",dry_run="",group="",resource="nodes",scope="cluster",subresource="",verb="GET",version="v1",le="0.45"} 30980 apiserver_request_duration_seconds_bucket{component="apiserver",dry_run="",group="",resource="nodes",scope="cluster",subresource="",verb="GET",version="v1",le="0.5"} 30980 apiserver_request_duration_seconds_bucket{component="apiserver",dry_run="",group="",resource="nodes",scope="cluster",subresource="",verb="GET",version="v1",le="0.6"} 30980 apiserver_request_duration_seconds_bucket{component="apiserver",dry_run="",group="",resource="nodes",scope="cluster",subresource="",verb="GET",version="v1",le="0.7"} 30980 apiserver_request_duration_seconds_bucket{component="apiserver",dry_run="",group="",resource="nodes",scope="cluster",subresource="",verb="GET",version="v1",le="0.8"} 30980 apiserver_request_duration_seconds_bucket{component="apiserver",dry_run="",group="",resource="nodes",scope="cluster",subresource="",verb="GET",version="v1",le="0.9"} 30980 apiserver_request_duration_seconds_bucket{component="apiserver",dry_run="",group="",resource="nodes",scope="cluster",subresource="",verb="GET",version="v1",le="1"} 30980 apiserver_request_duration_seconds_bucket{component="apiserver",dry_run="",group="",resource="nodes",scope="cluster",subresource="",verb="GET",version="v1",le="1.25"} 30980 apiserver_request_duration_seconds_bucket{component="apiserver",dry_run="",group="",resource="nodes",scope="cluster",subresource="",verb="GET",version="v1",le="1.5"} 30980 apiserver_request_duration_seconds_bucket{component="apiserver",dry_run="",group="",resource="nodes",scope="cluster",subresource="",verb="GET",version="v1",le="1.75"} 30980 apiserver_request_duration_seconds_bucket{component="apiserver",dry_run="",group="",resource="nodes",scope="cluster",subresource="",verb="GET",version="v1",le="2"} 30980 apiserver_request_duration_seconds_bucket{component="apiserver",dry_run="",group="",resource="nodes",scope="cluster",subresource="",verb="GET",version="v1",le="2.5"} 30980 apiserver_request_duration_seconds_bucket{component="apiserver",dry_run="",group="",resource="nodes",scope="cluster",subresource="",verb="GET",version="v1",le="3"} 30980 apiserver_request_duration_seconds_bucket{component="apiserver",dry_run="",group="",resource="nodes",scope="cluster",subresource="",verb="GET",version="v1",le="3.5"} 30980 apiserver_request_duration_seconds_bucket{component="apiserver",dry_run="",group="",resource="nodes",scope="cluster",subresource="",verb="GET",version="v1",le="4"} 30980 apiserver_request_duration_seconds_bucket{component="apiserver",dry_run="",group="",resource="nodes",scope="cluster",subresource="",verb="GET",version="v1",le="4.5"} 30980 apiserver_request_duration_seconds_bucket{component="apiserver",dry_run="",group="",resource="nodes",scope="cluster",subresource="",verb="GET",version="v1",le="5"} 30980 apiserver_request_duration_seconds_bucket{component="apiserver",dry_run="",group="",resource="nodes",scope="cluster",subresource="",verb="GET",version="v1",le="6"} 30980 apiserver_request_duration_seconds_bucket{component="apiserver",dry_run="",group="",resource="nodes",scope="cluster",subresource="",verb="GET",version="v1",le="7"} 30980 apiserver_request_duration_seconds_bucket{component="apiserver",dry_run="",group="",resource="nodes",scope="cluster",subresource="",verb="GET",version="v1",le="8"} 30980 apiserver_request_duration_seconds_bucket{component="apiserver",dry_run="",group="",resource="nodes",scope="cluster",subresource="",verb="GET",version="v1",le="9"} 30980 apiserver_request_duration_seconds_bucket{component="apiserver",dry_run="",group="",resource="nodes",scope="cluster",subresource="",verb="GET",version="v1",le="10"} 30980 apiserver_request_duration_seconds_bucket{component="apiserver",dry_run="",group="",resource="nodes",scope="cluster",subresource="",verb="GET",version="v1",le="15"} 30980 apiserver_request_duration_seconds_bucket{component="apiserver",dry_run="",group="",resource="nodes",scope="cluster",subresource="",verb="GET",version="v1",le="20"} 30980 apiserver_request_duration_seconds_bucket{component="apiserver",dry_run="",group="",resource="nodes",scope="cluster",subresource="",verb="GET",version="v1",le="25"} 30980 apiserver_request_duration_seconds_bucket{component="apiserver",dry_run="",group="",resource="nodes",scope="cluster",subresource="",verb="GET",version="v1",le="30"} 30980 apiserver_request_duration_seconds_bucket{component="apiserver",dry_run="",group="",resource="nodes",scope="cluster",subresource="",verb="GET",version="v1",le="40"} 30980 apiserver_request_duration_seconds_bucket{component="apiserver",dry_run="",group="",resource="nodes",scope="cluster",subresource="",verb="GET",version="v1",le="50"} 30980 apiserver_request_duration_seconds_bucket{component="apiserver",dry_run="",group="",resource="nodes",scope="cluster",subresource="",verb="GET",version="v1",le="60"} 30980 apiserver_request_duration_seconds_bucket{component="apiserver",dry_run="",group="",resource="nodes",scope="cluster",subresource="",verb="GET",version="v1",le="+Inf"} 30980 apiserver_request_duration_seconds_sum{component="apiserver",dry_run="",group="",resource="nodes",scope="cluster",subresource="",verb="GET",version="v1"} 7.231600085000039 apiserver_request_duration_seconds_count{component="apiserver",dry_run="",group="",resource="nodes",scope="cluster",subresource="",verb="GET",version="v1"} 30980 apiserver_request_duration_seconds_bucket{component="apiserver",dry_run="",group="",resource="nodes",scope="cluster",subresource="",verb="LIST",version="v1",le="0.05"} 48 apiserver_request_duration_seconds_bucket{component="apiserver",dry_run="",group="",resource="nodes",scope="cluster",subresource="",verb="LIST",version="v1",le="0.1"} 48 apiserver_request_duration_seconds_bucket{component="apiserver",dry_run="",group="",resource="nodes",scope="cluster",subresource="",verb="LIST",version="v1",le="0.15"} 48 apiserver_request_duration_seconds_bucket{component="apiserver",dry_run="",group="",resource="nodes",scope="cluster",subresource="",verb="LIST",version="v1",le="0.2"} 48 apiserver_request_duration_seconds_bucket{component="apiserver",dry_run="",group="",resource="nodes",scope="cluster",subresource="",verb="LIST",version="v1",le="0.25"} 48 apiserver_request_duration_seconds_bucket{component="apiserver",dry_run="",group="",resource="nodes",scope="cluster",subresource="",verb="LIST",version="v1",le="0.3"} 48 apiserver_request_duration_seconds_bucket{component="apiserver",dry_run="",group="",resource="nodes",scope="cluster",subresource="",verb="LIST",version="v1",le="0.35"} 48 apiserver_request_duration_seconds_bucket{component="apiserver",dry_run="",group="",resource="nodes",scope="cluster",subresource="",verb="LIST",version="v1",le="0.4"} 48 apiserver_request_duration_seconds_bucket{component="apiserver",dry_run="",group="",resource="nodes",scope="cluster",subresource="",verb="LIST",version="v1",le="0.45"} 48 apiserver_request_duration_seconds_bucket{component="apiserver",dry_run="",group="",resource="nodes",scope="cluster",subresource="",verb="LIST",version="v1",le="0.5"} 48 apiserver_request_duration_seconds_bucket{component="apiserver",dry_run="",group="",resource="nodes",scope="cluster",subresource="",verb="LIST",version="v1",le="0.6"} 48 apiserver_request_duration_seconds_bucket{component="apiserver",dry_run="",group="",resource="nodes",scope="cluster",subresource="",verb="LIST",version="v1",le="0.7"} 48 apiserver_request_duration_seconds_bucket{component="apiserver",dry_run="",group="",resource="nodes",scope="cluster",subresource="",verb="LIST",version="v1",le="0.8"} 48 apiserver_request_duration_seconds_bucket{component="apiserver",dry_run="",group="",resource="nodes",scope="cluster",subresource="",verb="LIST",version="v1",le="0.9"} 48 apiserver_request_duration_seconds_bucket{component="apiserver",dry_run="",group="",resource="nodes",scope="cluster",subresource="",verb="LIST",version="v1",le="1"} 48 apiserver_request_duration_seconds_bucket{component="apiserver",dry_run="",group="",resource="nodes",scope="cluster",subresource="",verb="LIST",version="v1",le="1.25"} 48 apiserver_request_duration_seconds_bucket{component="apiserver",dry_run="",group="",resource="nodes",scope="cluster",subresource="",verb="LIST",version="v1",le="1.5"} 48 apiserver_request_duration_seconds_bucket{component="apiserver",dry_run="",group="",resource="nodes",scope="cluster",subresource="",verb="LIST",version="v1",le="1.75"} 48 apiserver_request_duration_seconds_bucket{component="apiserver",dry_run="",group="",resource="nodes",scope="cluster",subresource="",verb="LIST",version="v1",le="2"} 48 apiserver_request_duration_seconds_bucket{component="apiserver",dry_run="",group="",resource="nodes",scope="cluster",subresource="",verb="LIST",version="v1",le="2.5"} 48 apiserver_request_duration_seconds_bucket{component="apiserver",dry_run="",group="",resource="nodes",scope="cluster",subresource="",verb="LIST",version="v1",le="3"} 48 apiserver_request_duration_seconds_bucket{component="apiserver",dry_run="",group="",resource="nodes",scope="cluster",subresource="",verb="LIST",version="v1",le="3.5"} 48 apiserver_request_duration_seconds_bucket{component="apiserver",dry_run="",group="",resource="nodes",scope="cluster",subresource="",verb="LIST",version="v1",le="4"} 48 apiserver_request_duration_seconds_bucket{component="apiserver",dry_run="",group="",resource="nodes",scope="cluster",subresource="",verb="LIST",version="v1",le="4.5"} 48 apiserver_request_duration_seconds_bucket{component="apiserver",dry_run="",group="",resource="nodes",scope="cluster",subresource="",verb="LIST",version="v1",le="5"} 48 apiserver_request_duration_seconds_bucket{component="apiserver",dry_run="",group="",resource="nodes",scope="cluster",subresource="",verb="LIST",version="v1",le="6"} 48 apiserver_request_duration_seconds_bucket{component="apiserver",dry_run="",group="",resource="nodes",scope="cluster",subresource="",verb="LIST",version="v1",le="7"} 48 apiserver_request_duration_seconds_bucket{component="apiserver",dry_run="",group="",resource="nodes",scope="cluster",subresource="",verb="LIST",version="v1",le="8"} 48 apiserver_request_duration_seconds_bucket{component="apiserver",dry_run="",group="",resource="nodes",scope="cluster",subresource="",verb="LIST",version="v1",le="9"} 48 apiserver_request_duration_seconds_bucket{component="apiserver",dry_run="",group="",resource="nodes",scope="cluster",subresource="",verb="LIST",version="v1",le="10"} 48 apiserver_request_duration_seconds_bucket{component="apiserver",dry_run="",group="",resource="nodes",scope="cluster",subresource="",verb="LIST",version="v1",le="15"} 48 apiserver_request_duration_seconds_bucket{component="apiserver",dry_run="",group="",resource="nodes",scope="cluster",subresource="",verb="LIST",version="v1",le="20"} 48 apiserver_request_duration_seconds_bucket{component="apiserver",dry_run="",group="",resource="nodes",scope="cluster",subresource="",verb="LIST",version="v1",le="25"} 48 apiserver_request_duration_seconds_bucket{component="apiserver",dry_run="",group="",resource="nodes",scope="cluster",subresource="",verb="LIST",version="v1",le="30"} 48 apiserver_request_duration_seconds_bucket{component="apiserver",dry_run="",group="",resource="nodes",scope="cluster",subresource="",verb="LIST",version="v1",le="40"} 48 apiserver_request_duration_seconds_bucket{component="apiserver",dry_run="",group="",resource="nodes",scope="cluster",subresource="",verb="LIST",version="v1",le="50"} 48 apiserver_request_duration_seconds_bucket{component="apiserver",dry_run="",group="",resource="nodes",scope="cluster",subresource="",verb="LIST",version="v1",le="60"} 48 apiserver_request_duration_seconds_bucket{component="apiserver",dry_run="",group="",resource="nodes",scope="cluster",subresource="",verb="LIST",version="v1",le="+Inf"} 48 apiserver_request_duration_seconds_sum{component="apiserver",dry_run="",group="",resource="nodes",scope="cluster",subresource="",verb="LIST",version="v1"} 0.14604355200000002 apiserver_request_duration_seconds_count{component="apiserver",dry_run="",group="",resource="nodes",scope="cluster",subresource="",verb="LIST",version="v1"} 48 ... ``` 接下来,我们要来求如下几个指标: * Apiserver中的对于`resource=nodes`的GET请求中,99%的请求小于等于多少秒 * Apiserver的所有请求中,99%的请求小于等于多少秒 * Apiserver中的对于`resource=nodes`的GET请求中,小于0.1秒的请求占百分之多少 * Apiserver的所有请求中,小于0.1秒的请求占百分之多少 ### **Apiserver中的对于`resource=nodes`的GET请求中,99%的请求小于等于多少秒** 使用内置函数可以直接计算: ``` histogram_quantile(0.99, apiserver_request_duration_seconds_bucket{resource="nodes",verb="GET"}) ``` 可以看到,对于资源类型为节点(resource="nodes"),请求类型为GET(verb="GET")的请求,99%请求的响应时间小于等于0.0495秒 ![](https://img.kancloud.cn/67/19/67197ac24475cee48e5bc43943f6e28c_1366x489.png) 需要注意的是,上面的PromQL,是统计了从apiserver起来开始,一直到执行该语句时的所有该类型的请求。如果我们只想统计最近10分钟的,那么应该使用下面的语句(参考[histogram_quantile函数的官方介绍](https://prometheus.io/docs/prometheus/latest/querying/functions/#histogram_quantile)): ``` histogram_quantile(0.99, rate(apiserver_request_duration_seconds_bucket{resource="nodes",verb="GET"}[10m])) ``` ### **Apiserver的所有请求中,99%的请求小于等于多少秒** 上面我们只统计了nodes的GET请求,如果要统计所有的请求,需要做聚合操作。如下(参考[histogram_quantile函数的官方介绍](https://prometheus.io/docs/prometheus/latest/querying/functions/#histogram_quantile)): ``` histogram_quantile(0.99, sum(rate(apiserver_request_duration_seconds_bucket{}[10m])) by (le)) ``` 如果有多个apiserver,上面的PromQL会把所有apiserver实例的请求都做了聚合,如果想每个apiserver实例做聚合,则使用下面的语句: ``` histogram_quantile(0.99, sum(rate(apiserver_request_duration_seconds_bucket{resource="nodes",verb="GET"}[10m])) by (le, instance)) ``` ### **Apiserver中的对于`resource=nodes`的GET请求中,小于0.1秒的请求占百分之多少** 参考:https://prometheus.io/docs/practices/histograms/ ``` # 注意,le="0.1" 这个bucket必须存在 sum(rate(apiserver_request_duration_seconds_bucket{le="0.1",resource="nodes",verb="GET"}[10m])) / sum(rate(apiserver_request_duration_seconds_count[10m])) ``` ### **Apiserver的所有请求中,小于0.1秒的请求占百分之多少** ``` sum(rate(apiserver_request_duration_seconds_bucket{le="0.1"}[10m])) / sum(rate(apiserver_request_duration_seconds_count[10m])) ``` ### **总结** ``` # 99%的请求小于多少秒(GET,nodes,所有apiserver,10分钟内) histogram_quantile(0.99, sum(rate(apiserver_request_duration_seconds_bucket{resource="nodes",verb="GET"}[10m])) by (le)) # 99%的请求小于多少秒(GET,nodes,单个apiserver,10分钟内) histogram_quantile(0.99, rate(apiserver_request_duration_seconds_bucket{resource="nodes",verb="GET"}[10m])) # 小于0.1秒的请求占比多少(GET,nodes,所有apiserver,10分钟) sum(rate(apiserver_request_duration_seconds_bucket{le="0.1",resource="nodes",verb="GET"}[10m])) / sum(rate(apiserver_request_duration_seconds_count[10m])) ?? or sum(rate(apiserver_request_duration_seconds_bucket{le="0.1",resource="nodes",verb="GET"}[10m])) / sum(rate(apiserver_request_duration_seconds_count{resource="nodes",verb="GET"}[10m])) 上面哪个正常待验证 # 小于0.1秒的请求占比多少(GET,nodes,每个apiserver,10分钟) sum(rate(apiserver_request_duration_seconds_bucket{le="0.1",resource="nodes",verb="GET"}[10m])) by (instance) / sum(rate(apiserver_request_duration_seconds_count{resource="nodes",verb="GET"}[10m])) by (instance) 上面的语句待验证 ``` ### **参考** * https://prometheus.io/docs/prometheus/latest/querying/functions/#histogram_quantile * https://prometheus.io/docs/practices/histograms/ * https://cloud.tencent.com/developer/news/319419 * https://zhuanlan.zhihu.com/p/76904793