企业🤖AI Agent构建引擎,智能编排和调试,一键部署,支持私有化部署方案 广告
[TOC] 前面章节我们介绍了AlertManager的API。本节,我们将使用PostMan模拟Prometheus向AlertManager发送告警,然后用我们自已写的程序接收AlertManager发出来的通知。 本文中,我们的实验主要来验证AlertManager中Group的机制,以及的三个配置参数的效果:`group_wait`、`group_interval`、`repeat_interval`。 ## **启动自定义程序** 启动我们自已的程序,监听10000端口,提供POST /webhook API,用来接收AlertManager的通知。程序代码见文章附录 ## **Group机制** 设置AlertManager的配置如下,然后启动AlertManager ``` route: group_by: ["alertname"] group_wait: 30s group_interval: 5m repeat_interval: 4h receiver: 'webhook' receivers: - name: webhook webhook_configs: - url: http://192.168.2.101:10000/webhook send_resolved: true ``` 接着,用PostMan调用AlertManager的API `POST /api/v2/alerts`发送一个告警,告警内容(Body参数)如下,注意下面的时间要设置好(StartsAt可以是一个过去的时间,EndsAt设置为你通过PostMan发送这个请求时的后一个小时或更久): ``` [ { "Labels": { "alertname": "NodeCpuPressure", "IP": "192.168.2.101" }, "Annotations": { "summary": "NodeCpuPressure, IP: 192.168.2.101, Value: 90%, Threshold: 85%" }, "StartsAt": "2020-02-17T23:00:00.000+08:00", "EndsAt": "2020-02-18T23:00:00.000+08:00" } ] ``` 然后,我们调AlertManager的API来查询Alerts(`GET /api/v2/alerts`)与Groups(GET `/api/v2/alerts/groups`),可以通过浏览器直接调或者通过命令行curl来调。 查询到的Alerts结果如下: ``` [ { "annotations": { "summary": "NodeCpuPressure, IP: 192.168.2.101, Value: 90%, Threshold: 85%" }, "endsAt": "2020-02-18T23:00:00.000+08:00", "fingerprint": "27e1a08813b1ec3b", "receivers": [ { "name": "webhook" } ], "startsAt": "2020-02-17T23:00:00.000+08:00", "status": { "inhibitedBy": [], "silencedBy": [], "state": "active" }, "updatedAt": "2020-02-17T23:38:38.610+08:00", "labels": { "IP": "192.168.2.101", "alertname": "NodeCpuPressure" } } ] ``` 查询到的Groups结果如下: ``` [ { "alerts": [ { "annotations": { "summary": "NodeCpuPressure, IP: 192.168.2.101, Value: 90%, Threshold: 85%" }, "endsAt": "2020-02-18T23:00:00.000+08:00", "fingerprint": "27e1a08813b1ec3b", "receivers": [ { "name": "webhook" } ], "startsAt": "2020-02-17T23:00:00.000+08:00", "status": { "inhibitedBy": [], "silencedBy": [], "state": "active" }, "updatedAt": "2020-02-17T23:38:38.610+08:00", "labels": { "IP": "192.168.2.101", "alertname": "NodeCpuPressure" } } ], "labels": { "alertname": "NodeCpuPressure" }, "receiver": { "name": "webhook" } } ] ``` 我们发现,AlertManger自动创建了一个Group,其Labels为`{alertname=NodeCpuPressure}`,里面包含了刚才的告警。 接着,我们再发一个Alert,其内容如下: ``` [ { "Labels": { "alertname": "NodeMemoryPressure", "IP": "192.168.2.101" }, "Annotations": { "summary": "NodeMemoryPressure, IP: 192.168.2.101, Value: 90%, Threshold: 85%" }, "StartsAt": "2020-02-17T23:00:00.000+08:00", "EndsAt": "2020-02-18T23:00:00.000+08:00" } ] ``` 然后再查询Group,结果如下,说明又创建了一个Group,其Labels为`{alertname=NodeCpuPressure}` ``` [ { "alerts": [ { "annotations": { "summary": "NodeCpuPressure, IP: 192.168.2.101, Value: 90%, Threshold: 85%" }, "endsAt": "2020-02-18T23:00:00.000+08:00", "fingerprint": "27e1a08813b1ec3b", "receivers": [ { "name": "webhook" } ], "startsAt": "2020-02-17T23:00:00.000+08:00", "status": { "inhibitedBy": [], "silencedBy": [], "state": "active" }, "updatedAt": "2020-02-17T23:38:38.610+08:00", "labels": { "IP": "192.168.2.101", "alertname": "NodeCpuPressure" } } ], "labels": { "alertname": "NodeCpuPressure" }, "receiver": { "name": "webhook" } }, { "alerts": [ { "annotations": { "summary": "NodeMemoryPressure, IP: 192.168.2.101, Value: 90%, Threshold: 85%" }, "endsAt": "2020-02-18T23:00:00.000+08:00", "fingerprint": "1a354c7333c5b062", "receivers": [ { "name": "webhook" } ], "startsAt": "2020-02-17T23:00:00.000+08:00", "status": { "inhibitedBy": [], "silencedBy": [], "state": "active" }, "updatedAt": "2020-02-17T23:41:27.790+08:00", "labels": { "IP": "192.168.2.101", "alertname": "NodeMemoryPressure" } } ], "labels": { "alertname": "NodeMemoryPressure" }, "receiver": { "name": "webhook" } } ] ``` 此时,我们我们再发送以下的“解除告警”(即把EndsAt设置为一个过去的时间) ``` [ { "Labels": { "alertname": "NodeCpuPressure", "IP": "192.168.2.101" }, "Annotations": { "summary": "NodeCpuPressure, IP: 192.168.2.101, Value: 90%, Threshold: 85%" }, "StartsAt": "2020-02-17T23:00:00.000+08:00", "EndsAt": "2020-02-17T23:01:00.000+08:00" } ] ``` 再查看Alert与Group,发现都只剩下一个了 ``` [ { "alerts": [ { "annotations": { "summary": "NodeMemoryPressure, IP: 192.168.2.101, Value: 90%, Threshold: 85%" }, "endsAt": "2020-02-18T23:00:00.000+08:00", "fingerprint": "1a354c7333c5b062", "receivers": [ { "name": "webhook" } ], "startsAt": "2020-02-17T23:00:00.000+08:00", "status": { "inhibitedBy": [], "silencedBy": [], "state": "active" }, "updatedAt": "2020-02-17T23:41:27.790+08:00", "labels": { "IP": "192.168.2.101", "alertname": "NodeMemoryPressure" } } ], "labels": { "alertname": "NodeMemoryPressure" }, "receiver": { "name": "webhook" } } ] ``` ## **group_wait** 停止alertmanager,清空alertmanager的数据目录,然后还是使用上面的配置,启动alertmanager。此时alertmanager中没有任何Alert与Group 接着,向AlertManager发送一个告警,内容如下: ``` [ { "Labels": { "alertname": "NodeCpuPressure", "IP": "192.168.2.101" }, "Annotations": { "summary": "NodeCpuPressure, IP: 192.168.2.101, Value: 90%, Threshold: 85%" }, "StartsAt": "2020-02-17T23:00:00.000+08:00", "EndsAt": "2020-02-18T23:00:00.000+08:00" } ] ``` 然后在30秒内,再调用API发送一个如下的Alert ``` [ { "Labels": { "alertname": "NodeCpuPressure", "IP": "192.168.2.102" }, "Annotations": { "summary": "NodeCpuPressure, IP: 192.168.2.102, Value: 95%, Threshold: 85%" }, "StartsAt": "2020-02-17T23:00:00.000+08:00", "EndsAt": "2020-02-18T23:00:00.000+08:00" } ] ``` 然后,等到第一个告警发送后的30秒后,我们便会在我们自已程序那里看到接收到的通知,内容如下: ``` ``` ## **附录** webhook-receiver.go ``` package main import ( "time" "io/ioutil" "net/http" "fmt" ) type MyHandler struct{} func (am *MyHandler) ServeHTTP(w http.ResponseWriter, r *http.Request) { body, err := ioutil.ReadAll(r.Body) if err != nil { fmt.Printf("read body err, %v\n", err) return } fmt.Println(time.Now()) fmt.Printf("%s\n\n", string(body)) } func main() { http.Handle("/webhook", &MyHandler{}) http.ListenAndServe(":10000", nil) } ```