[TOC]
### **指标**
K8S的apiserver有如下指标,记录了apiserver的所有HTTP请求的响应时间数据。下面只截图了Node的GET与LIST请求的数据,其他资源类型(如Pod)的其他请求类型(如POST)的数据没有截出来:
```
apiserver_request_duration_seconds_bucket{component="apiserver",dry_run="",group="",resource="nodes",scope="cluster",subresource="",verb="GET",version="v1",le="0.05"} 30980
apiserver_request_duration_seconds_bucket{component="apiserver",dry_run="",group="",resource="nodes",scope="cluster",subresource="",verb="GET",version="v1",le="0.1"} 30980
apiserver_request_duration_seconds_bucket{component="apiserver",dry_run="",group="",resource="nodes",scope="cluster",subresource="",verb="GET",version="v1",le="0.15"} 30980
apiserver_request_duration_seconds_bucket{component="apiserver",dry_run="",group="",resource="nodes",scope="cluster",subresource="",verb="GET",version="v1",le="0.2"} 30980
apiserver_request_duration_seconds_bucket{component="apiserver",dry_run="",group="",resource="nodes",scope="cluster",subresource="",verb="GET",version="v1",le="0.25"} 30980
apiserver_request_duration_seconds_bucket{component="apiserver",dry_run="",group="",resource="nodes",scope="cluster",subresource="",verb="GET",version="v1",le="0.3"} 30980
apiserver_request_duration_seconds_bucket{component="apiserver",dry_run="",group="",resource="nodes",scope="cluster",subresource="",verb="GET",version="v1",le="0.35"} 30980
apiserver_request_duration_seconds_bucket{component="apiserver",dry_run="",group="",resource="nodes",scope="cluster",subresource="",verb="GET",version="v1",le="0.4"} 30980
apiserver_request_duration_seconds_bucket{component="apiserver",dry_run="",group="",resource="nodes",scope="cluster",subresource="",verb="GET",version="v1",le="0.45"} 30980
apiserver_request_duration_seconds_bucket{component="apiserver",dry_run="",group="",resource="nodes",scope="cluster",subresource="",verb="GET",version="v1",le="0.5"} 30980
apiserver_request_duration_seconds_bucket{component="apiserver",dry_run="",group="",resource="nodes",scope="cluster",subresource="",verb="GET",version="v1",le="0.6"} 30980
apiserver_request_duration_seconds_bucket{component="apiserver",dry_run="",group="",resource="nodes",scope="cluster",subresource="",verb="GET",version="v1",le="0.7"} 30980
apiserver_request_duration_seconds_bucket{component="apiserver",dry_run="",group="",resource="nodes",scope="cluster",subresource="",verb="GET",version="v1",le="0.8"} 30980
apiserver_request_duration_seconds_bucket{component="apiserver",dry_run="",group="",resource="nodes",scope="cluster",subresource="",verb="GET",version="v1",le="0.9"} 30980
apiserver_request_duration_seconds_bucket{component="apiserver",dry_run="",group="",resource="nodes",scope="cluster",subresource="",verb="GET",version="v1",le="1"} 30980
apiserver_request_duration_seconds_bucket{component="apiserver",dry_run="",group="",resource="nodes",scope="cluster",subresource="",verb="GET",version="v1",le="1.25"} 30980
apiserver_request_duration_seconds_bucket{component="apiserver",dry_run="",group="",resource="nodes",scope="cluster",subresource="",verb="GET",version="v1",le="1.5"} 30980
apiserver_request_duration_seconds_bucket{component="apiserver",dry_run="",group="",resource="nodes",scope="cluster",subresource="",verb="GET",version="v1",le="1.75"} 30980
apiserver_request_duration_seconds_bucket{component="apiserver",dry_run="",group="",resource="nodes",scope="cluster",subresource="",verb="GET",version="v1",le="2"} 30980
apiserver_request_duration_seconds_bucket{component="apiserver",dry_run="",group="",resource="nodes",scope="cluster",subresource="",verb="GET",version="v1",le="2.5"} 30980
apiserver_request_duration_seconds_bucket{component="apiserver",dry_run="",group="",resource="nodes",scope="cluster",subresource="",verb="GET",version="v1",le="3"} 30980
apiserver_request_duration_seconds_bucket{component="apiserver",dry_run="",group="",resource="nodes",scope="cluster",subresource="",verb="GET",version="v1",le="3.5"} 30980
apiserver_request_duration_seconds_bucket{component="apiserver",dry_run="",group="",resource="nodes",scope="cluster",subresource="",verb="GET",version="v1",le="4"} 30980
apiserver_request_duration_seconds_bucket{component="apiserver",dry_run="",group="",resource="nodes",scope="cluster",subresource="",verb="GET",version="v1",le="4.5"} 30980
apiserver_request_duration_seconds_bucket{component="apiserver",dry_run="",group="",resource="nodes",scope="cluster",subresource="",verb="GET",version="v1",le="5"} 30980
apiserver_request_duration_seconds_bucket{component="apiserver",dry_run="",group="",resource="nodes",scope="cluster",subresource="",verb="GET",version="v1",le="6"} 30980
apiserver_request_duration_seconds_bucket{component="apiserver",dry_run="",group="",resource="nodes",scope="cluster",subresource="",verb="GET",version="v1",le="7"} 30980
apiserver_request_duration_seconds_bucket{component="apiserver",dry_run="",group="",resource="nodes",scope="cluster",subresource="",verb="GET",version="v1",le="8"} 30980
apiserver_request_duration_seconds_bucket{component="apiserver",dry_run="",group="",resource="nodes",scope="cluster",subresource="",verb="GET",version="v1",le="9"} 30980
apiserver_request_duration_seconds_bucket{component="apiserver",dry_run="",group="",resource="nodes",scope="cluster",subresource="",verb="GET",version="v1",le="10"} 30980
apiserver_request_duration_seconds_bucket{component="apiserver",dry_run="",group="",resource="nodes",scope="cluster",subresource="",verb="GET",version="v1",le="15"} 30980
apiserver_request_duration_seconds_bucket{component="apiserver",dry_run="",group="",resource="nodes",scope="cluster",subresource="",verb="GET",version="v1",le="20"} 30980
apiserver_request_duration_seconds_bucket{component="apiserver",dry_run="",group="",resource="nodes",scope="cluster",subresource="",verb="GET",version="v1",le="25"} 30980
apiserver_request_duration_seconds_bucket{component="apiserver",dry_run="",group="",resource="nodes",scope="cluster",subresource="",verb="GET",version="v1",le="30"} 30980
apiserver_request_duration_seconds_bucket{component="apiserver",dry_run="",group="",resource="nodes",scope="cluster",subresource="",verb="GET",version="v1",le="40"} 30980
apiserver_request_duration_seconds_bucket{component="apiserver",dry_run="",group="",resource="nodes",scope="cluster",subresource="",verb="GET",version="v1",le="50"} 30980
apiserver_request_duration_seconds_bucket{component="apiserver",dry_run="",group="",resource="nodes",scope="cluster",subresource="",verb="GET",version="v1",le="60"} 30980
apiserver_request_duration_seconds_bucket{component="apiserver",dry_run="",group="",resource="nodes",scope="cluster",subresource="",verb="GET",version="v1",le="+Inf"} 30980
apiserver_request_duration_seconds_sum{component="apiserver",dry_run="",group="",resource="nodes",scope="cluster",subresource="",verb="GET",version="v1"} 7.231600085000039
apiserver_request_duration_seconds_count{component="apiserver",dry_run="",group="",resource="nodes",scope="cluster",subresource="",verb="GET",version="v1"} 30980
apiserver_request_duration_seconds_bucket{component="apiserver",dry_run="",group="",resource="nodes",scope="cluster",subresource="",verb="LIST",version="v1",le="0.05"} 48
apiserver_request_duration_seconds_bucket{component="apiserver",dry_run="",group="",resource="nodes",scope="cluster",subresource="",verb="LIST",version="v1",le="0.1"} 48
apiserver_request_duration_seconds_bucket{component="apiserver",dry_run="",group="",resource="nodes",scope="cluster",subresource="",verb="LIST",version="v1",le="0.15"} 48
apiserver_request_duration_seconds_bucket{component="apiserver",dry_run="",group="",resource="nodes",scope="cluster",subresource="",verb="LIST",version="v1",le="0.2"} 48
apiserver_request_duration_seconds_bucket{component="apiserver",dry_run="",group="",resource="nodes",scope="cluster",subresource="",verb="LIST",version="v1",le="0.25"} 48
apiserver_request_duration_seconds_bucket{component="apiserver",dry_run="",group="",resource="nodes",scope="cluster",subresource="",verb="LIST",version="v1",le="0.3"} 48
apiserver_request_duration_seconds_bucket{component="apiserver",dry_run="",group="",resource="nodes",scope="cluster",subresource="",verb="LIST",version="v1",le="0.35"} 48
apiserver_request_duration_seconds_bucket{component="apiserver",dry_run="",group="",resource="nodes",scope="cluster",subresource="",verb="LIST",version="v1",le="0.4"} 48
apiserver_request_duration_seconds_bucket{component="apiserver",dry_run="",group="",resource="nodes",scope="cluster",subresource="",verb="LIST",version="v1",le="0.45"} 48
apiserver_request_duration_seconds_bucket{component="apiserver",dry_run="",group="",resource="nodes",scope="cluster",subresource="",verb="LIST",version="v1",le="0.5"} 48
apiserver_request_duration_seconds_bucket{component="apiserver",dry_run="",group="",resource="nodes",scope="cluster",subresource="",verb="LIST",version="v1",le="0.6"} 48
apiserver_request_duration_seconds_bucket{component="apiserver",dry_run="",group="",resource="nodes",scope="cluster",subresource="",verb="LIST",version="v1",le="0.7"} 48
apiserver_request_duration_seconds_bucket{component="apiserver",dry_run="",group="",resource="nodes",scope="cluster",subresource="",verb="LIST",version="v1",le="0.8"} 48
apiserver_request_duration_seconds_bucket{component="apiserver",dry_run="",group="",resource="nodes",scope="cluster",subresource="",verb="LIST",version="v1",le="0.9"} 48
apiserver_request_duration_seconds_bucket{component="apiserver",dry_run="",group="",resource="nodes",scope="cluster",subresource="",verb="LIST",version="v1",le="1"} 48
apiserver_request_duration_seconds_bucket{component="apiserver",dry_run="",group="",resource="nodes",scope="cluster",subresource="",verb="LIST",version="v1",le="1.25"} 48
apiserver_request_duration_seconds_bucket{component="apiserver",dry_run="",group="",resource="nodes",scope="cluster",subresource="",verb="LIST",version="v1",le="1.5"} 48
apiserver_request_duration_seconds_bucket{component="apiserver",dry_run="",group="",resource="nodes",scope="cluster",subresource="",verb="LIST",version="v1",le="1.75"} 48
apiserver_request_duration_seconds_bucket{component="apiserver",dry_run="",group="",resource="nodes",scope="cluster",subresource="",verb="LIST",version="v1",le="2"} 48
apiserver_request_duration_seconds_bucket{component="apiserver",dry_run="",group="",resource="nodes",scope="cluster",subresource="",verb="LIST",version="v1",le="2.5"} 48
apiserver_request_duration_seconds_bucket{component="apiserver",dry_run="",group="",resource="nodes",scope="cluster",subresource="",verb="LIST",version="v1",le="3"} 48
apiserver_request_duration_seconds_bucket{component="apiserver",dry_run="",group="",resource="nodes",scope="cluster",subresource="",verb="LIST",version="v1",le="3.5"} 48
apiserver_request_duration_seconds_bucket{component="apiserver",dry_run="",group="",resource="nodes",scope="cluster",subresource="",verb="LIST",version="v1",le="4"} 48
apiserver_request_duration_seconds_bucket{component="apiserver",dry_run="",group="",resource="nodes",scope="cluster",subresource="",verb="LIST",version="v1",le="4.5"} 48
apiserver_request_duration_seconds_bucket{component="apiserver",dry_run="",group="",resource="nodes",scope="cluster",subresource="",verb="LIST",version="v1",le="5"} 48
apiserver_request_duration_seconds_bucket{component="apiserver",dry_run="",group="",resource="nodes",scope="cluster",subresource="",verb="LIST",version="v1",le="6"} 48
apiserver_request_duration_seconds_bucket{component="apiserver",dry_run="",group="",resource="nodes",scope="cluster",subresource="",verb="LIST",version="v1",le="7"} 48
apiserver_request_duration_seconds_bucket{component="apiserver",dry_run="",group="",resource="nodes",scope="cluster",subresource="",verb="LIST",version="v1",le="8"} 48
apiserver_request_duration_seconds_bucket{component="apiserver",dry_run="",group="",resource="nodes",scope="cluster",subresource="",verb="LIST",version="v1",le="9"} 48
apiserver_request_duration_seconds_bucket{component="apiserver",dry_run="",group="",resource="nodes",scope="cluster",subresource="",verb="LIST",version="v1",le="10"} 48
apiserver_request_duration_seconds_bucket{component="apiserver",dry_run="",group="",resource="nodes",scope="cluster",subresource="",verb="LIST",version="v1",le="15"} 48
apiserver_request_duration_seconds_bucket{component="apiserver",dry_run="",group="",resource="nodes",scope="cluster",subresource="",verb="LIST",version="v1",le="20"} 48
apiserver_request_duration_seconds_bucket{component="apiserver",dry_run="",group="",resource="nodes",scope="cluster",subresource="",verb="LIST",version="v1",le="25"} 48
apiserver_request_duration_seconds_bucket{component="apiserver",dry_run="",group="",resource="nodes",scope="cluster",subresource="",verb="LIST",version="v1",le="30"} 48
apiserver_request_duration_seconds_bucket{component="apiserver",dry_run="",group="",resource="nodes",scope="cluster",subresource="",verb="LIST",version="v1",le="40"} 48
apiserver_request_duration_seconds_bucket{component="apiserver",dry_run="",group="",resource="nodes",scope="cluster",subresource="",verb="LIST",version="v1",le="50"} 48
apiserver_request_duration_seconds_bucket{component="apiserver",dry_run="",group="",resource="nodes",scope="cluster",subresource="",verb="LIST",version="v1",le="60"} 48
apiserver_request_duration_seconds_bucket{component="apiserver",dry_run="",group="",resource="nodes",scope="cluster",subresource="",verb="LIST",version="v1",le="+Inf"} 48
apiserver_request_duration_seconds_sum{component="apiserver",dry_run="",group="",resource="nodes",scope="cluster",subresource="",verb="LIST",version="v1"} 0.14604355200000002
apiserver_request_duration_seconds_count{component="apiserver",dry_run="",group="",resource="nodes",scope="cluster",subresource="",verb="LIST",version="v1"} 48
...
```
接下来,我们要来求如下几个指标:
* Apiserver中的对于`resource=nodes`的GET请求中,99%的请求小于等于多少秒
* Apiserver的所有请求中,99%的请求小于等于多少秒
* Apiserver中的对于`resource=nodes`的GET请求中,小于0.1秒的请求占百分之多少
* Apiserver的所有请求中,小于0.1秒的请求占百分之多少
### **Apiserver中的对于`resource=nodes`的GET请求中,99%的请求小于等于多少秒**
使用内置函数可以直接计算:
```
histogram_quantile(0.99, apiserver_request_duration_seconds_bucket{resource="nodes",verb="GET"})
```
可以看到,对于资源类型为节点(resource="nodes"),请求类型为GET(verb="GET")的请求,99%请求的响应时间小于等于0.0495秒
![](https://img.kancloud.cn/67/19/67197ac24475cee48e5bc43943f6e28c_1366x489.png)
需要注意的是,上面的PromQL,是统计了从apiserver起来开始,一直到执行该语句时的所有该类型的请求。如果我们只想统计最近10分钟的,那么应该使用下面的语句(参考[histogram_quantile函数的官方介绍](https://prometheus.io/docs/prometheus/latest/querying/functions/#histogram_quantile)):
```
histogram_quantile(0.99, rate(apiserver_request_duration_seconds_bucket{resource="nodes",verb="GET"}[10m]))
```
### **Apiserver的所有请求中,99%的请求小于等于多少秒**
上面我们只统计了nodes的GET请求,如果要统计所有的请求,需要做聚合操作。如下(参考[histogram_quantile函数的官方介绍](https://prometheus.io/docs/prometheus/latest/querying/functions/#histogram_quantile)):
```
histogram_quantile(0.99, sum(rate(apiserver_request_duration_seconds_bucket{}[10m])) by (le))
```
如果有多个apiserver,上面的PromQL会把所有apiserver实例的请求都做了聚合,如果想每个apiserver实例做聚合,则使用下面的语句:
```
histogram_quantile(0.99, sum(rate(apiserver_request_duration_seconds_bucket{resource="nodes",verb="GET"}[10m])) by (le, instance))
```
### **Apiserver中的对于`resource=nodes`的GET请求中,小于0.1秒的请求占百分之多少**
参考:https://prometheus.io/docs/practices/histograms/
```
# 注意,le="0.1" 这个bucket必须存在
sum(rate(apiserver_request_duration_seconds_bucket{le="0.1",resource="nodes",verb="GET"}[10m])) / sum(rate(apiserver_request_duration_seconds_count[10m]))
```
### **Apiserver的所有请求中,小于0.1秒的请求占百分之多少**
```
sum(rate(apiserver_request_duration_seconds_bucket{le="0.1"}[10m])) / sum(rate(apiserver_request_duration_seconds_count[10m]))
```
### **总结**
```
# 99%的请求小于多少秒(GET,nodes,所有apiserver,10分钟内)
histogram_quantile(0.99, sum(rate(apiserver_request_duration_seconds_bucket{resource="nodes",verb="GET"}[10m])) by (le))
# 99%的请求小于多少秒(GET,nodes,单个apiserver,10分钟内)
histogram_quantile(0.99, rate(apiserver_request_duration_seconds_bucket{resource="nodes",verb="GET"}[10m]))
# 小于0.1秒的请求占比多少(GET,nodes,所有apiserver,10分钟)
sum(rate(apiserver_request_duration_seconds_bucket{le="0.1",resource="nodes",verb="GET"}[10m])) / sum(rate(apiserver_request_duration_seconds_count[10m])) ??
or
sum(rate(apiserver_request_duration_seconds_bucket{le="0.1",resource="nodes",verb="GET"}[10m])) / sum(rate(apiserver_request_duration_seconds_count{resource="nodes",verb="GET"}[10m]))
上面哪个正常待验证
# 小于0.1秒的请求占比多少(GET,nodes,每个apiserver,10分钟)
sum(rate(apiserver_request_duration_seconds_bucket{le="0.1",resource="nodes",verb="GET"}[10m])) by (instance) / sum(rate(apiserver_request_duration_seconds_count{resource="nodes",verb="GET"}[10m])) by (instance)
上面的语句待验证
```
### **参考**
* https://prometheus.io/docs/prometheus/latest/querying/functions/#histogram_quantile
* https://prometheus.io/docs/practices/histograms/
* https://cloud.tencent.com/developer/news/319419
* https://zhuanlan.zhihu.com/p/76904793
- (一)快速开始
- 安装Prometheus
- 使用NodeExporter采集数据
- AlertManager进行告警
- Grafana数据可视化
- (二)探索PromQL
- 理解时间序列
- Metrics类型
- 初识PromQL
- PromQL操作符
- PromQL内置函数
- rate和irate
- 常见指标的PromQL
- 主机CPU使用率
- 主机内存使用率
- 主机磁盘使用率
- 主机磁盘IO
- 主机网络IO
- API的响应时间
- (三)Promtheus告警处理
- 自定义告警规则
- 示例-对主机进行监控告警
- 部署AlertManager
- 告警的路由与分组
- 使用Receiver接收告警信息
- 集成邮件系统
- 屏蔽告警通知
- 扩展阅读
- AlertManager的API
- Prometheus发送告警机制
- 实践:接收Prometheus的告警
- 实践:AlertManager
- (四)监控Kubernetes集群
- 部署Prometheus
- Kubernetes下的服务发现
- 监控Kubernetes集群
- 监控Kubelet的运行状态
- 监控Pod的资源(cadvisor)
- 监控K8s主机的资源
- KubeStateMetrics
- K8S及ETCD常见监控指标
- ETCD监控指标
- Kube-apiserver监控指标
- (五)其他
- Prometheus的relabel-config
- Target的Endpoint
- Prometheus的其他配置
- (六)BlackboxExporter
- 安装
- BlackboxExporter的应用场景
- 在Promtheus中使用BlackboxExporter
- 参考