企业🤖AI Agent构建引擎,智能编排和调试,一键部署,支持私有化部署方案 广告
[TOC] > 注意: > - Prometheus使用普通用户启动,注意创建文件的用户 > - 多个节点Prometheus文件中的target配置保持一致 ## 静态监控 ```yaml - job_name: "Prometheus" static_configs: - "localhost:9090" ``` ## 基于文件服务发现 1. 创建target目标 ```yaml - job_name: "node-exporter" file_sd_configs: - files: - "targets/node-exporter.yml" # 刷新间隔以重新读取文件 refresh_interval: 1m ``` 2. 创建监控文件 ```shell mkdir /data/prometheus/targets cat <<-EOF | sudo tee /data/prometheus/targets/node-exporter.yml > /dev/null - targets: - 192.168.31.103:9100 - 192.168.31.79:9100 - 192.168.31.95:9100 - 192.168.31.78:9100 - 192.168.31.253:9100 EOF chown -R ops. /data/prometheus ``` 3. 热加载配置文件 ```shell sudo systemctl reload prometheus ``` 4. 将文件同步给其他节点 ```shell # 主配置文件 及 文件发现目录 cd /data/prometheus && scp -r prometheus.yml targets ops@k8s-master02:/data/prometheus # 修改其他节点特有的labal ssh ops@k8s-master02 "sed -ri 's@(replica).*@\1: B@g' /data/prometheus/prometheus.yml" # 检测配置文件 ssh ops@k8s-master02 "promtool check config /data/prometheus/prometheus.yml" # 热加载配置文件 ssh ops@k8s-master02 "sudo systemctl reload prometheus" ``` ## 基于kubernetes服务发现 > 由于 thanos 是二进制部署的,需要在 kubernetes 集群上创建 sa 的相关监控权限 1. 创建Prometheus监控kubernetes集群的权限(k8s master节点执行) ```yaml cat <<-EOF | kubectl apply -f - apiVersion: rbac.authorization.k8s.io/v1 kind: ClusterRole metadata: name: prometheus rules: - apiGroups: - "" resources: - nodes - services - endpoints - pods - nodes/proxy verbs: - get - list - watch - apiGroups: - "extensions" resources: - ingresses verbs: - get - list - watch - apiGroups: - "" resources: - configmaps - nodes/metrics verbs: - get - nonResourceURLs: - /metrics verbs: - get --- apiVersion: v1 kind: ServiceAccount metadata: name: prometheus namespace: kube-system --- apiVersion: rbac.authorization.k8s.io/v1beta1 kind: ClusterRoleBinding metadata: name: prometheus roleRef: apiGroup: rbac.authorization.k8s.io kind: ClusterRole name: prometheus subjects: - kind: ServiceAccount name: prometheus namespace: kube-system EOF ``` 2. 获取监控kubernetes的token(k8s master节点执行) ```shell kubectl -n kube-system get secret `kubectl -n kube-system get sa prometheus -o jsonpath={.secrets[0].name}` -ojsonpath={.data.token} | base64 --decode > /data/prometheus/token ``` 3. 示例(thanos节点) ```yaml - job_name: "Service/kube-apiserver" scheme: https tls_config: insecure_skip_verify: true # 上面获取的token bearer_token_file: /data/prometheus/token kubernetes_sd_configs: - role: endpoints # 访问集群的入口 api_server: https://192.168.31.100:6443 tls_config: insecure_skip_verify: true bearer_token_file: /data/prometheus/token relabel_configs: - source_labels: [__meta_kubernetes_namespace, __meta_kubernetes_endpoints_name, __meta_kubernetes_endpoint_port_name] action: keep regex: default;kubernetes;https ``` 4. 热加载配置文件 ```shell sudo systemctl reload prometheus ``` 5. 将文件同步给其他节点 ```shell # 主配置文件 及 文件发现目录 cd /data/prometheus && scp -r prometheus.yml targets ops@k8s-master02:/data/prometheus # 修改其他节点特有的labal ssh ops@k8s-master02 "sed -ri 's@(replica): .*@\1: B@g' /data/prometheus/prometheus.yml" # 检测配置文件 ssh ops@k8s-master02 "promtool check config /data/prometheus/prometheus.yml" # 热加载配置文件 ssh ops@k8s-master02 "sudo systemctl reload prometheus" ``` ## 监控kubernetes(完整版) > 下面有证书,token,文件发现目录等等,需要自行手工创建或者拷贝,这里只是主配文件示例 ```yaml scrape_configs: # 基于文件服务发现 - job_name: "node-exporter" file_sd_configs: - files: - "targets/node-exporter.yml" # 刷新间隔以重新读取文件 refresh_interval: 1m relabel_configs: metric_relabel_configs: - source_labels: [__address__] action: replace regex: (.*):10250 target_label: instance replacement: $1 # 基于kubernetes服务发现 - job_name: "Service/kube-apiserver" scheme: https tls_config: insecure_skip_verify: true # 请参考上面方式创建token bearer_token_file: /data/prometheus/token kubernetes_sd_configs: - role: endpoints api_server: https://192.168.31.100:6443 tls_config: insecure_skip_verify: true bearer_token_file: /data/prometheus/token relabel_configs: - source_labels: [__meta_kubernetes_namespace, __meta_kubernetes_endpoints_name, __meta_kubernetes_endpoint_port_name] action: keep regex: default;kubernetes;https - job_name: "Service/kube-controller-manager" scheme: https tls_config: insecure_skip_verify: true bearer_token_file: /data/prometheus/token kubernetes_sd_configs: - role: node api_server: https://192.168.31.100:6443 tls_config: insecure_skip_verify: true bearer_token_file: /data/prometheus/token relabel_configs: - source_labels: [__meta_kubernetes_node_labelpresent_node_role_kubernetes_io_master] action: keep regex: true - source_labels: [__address__] action: replace regex: (.*):10250 target_label: __address__ replacement: $1:10257 - job_name: "Service/kube-scheduler" scheme: https tls_config: insecure_skip_verify: true bearer_token_file: /data/prometheus/token kubernetes_sd_configs: - role: node api_server: https://192.168.31.100:6443 tls_config: insecure_skip_verify: true bearer_token_file: /data/prometheus/token relabel_configs: - source_labels: [__meta_kubernetes_node_labelpresent_node_role_kubernetes_io_master] action: keep regex: true - source_labels: [__address__] action: replace regex: (.*):10250 target_label: __address__ replacement: $1:10259 - job_name: "Service/kubelet" scheme: https tls_config: insecure_skip_verify: true bearer_token_file: /data/prometheus/token kubernetes_sd_configs: - role: node api_server: https://192.168.31.100:6443 tls_config: insecure_skip_verify: true bearer_token_file: /data/prometheus/token - job_name: "Service/kube-proxy" kubernetes_sd_configs: - role: node api_server: https://192.168.31.100:6443 tls_config: insecure_skip_verify: true bearer_token_file: /data/prometheus/token relabel_configs: - source_labels: [__address__] action: replace regex: (.*):10250 target_label: __address__ replacement: $1:10249 - job_name: "Service/etcd" scheme: https tls_config: ca_file: targets/certs/ca.pem cert_file: targets/certs/etcd.pem key_file: targets/certs/etcd-key.pem insecure_skip_verify: true file_sd_configs: - files: - targets/etcd.yml - job_name: "Service/calico" kubernetes_sd_configs: - role: node api_server: https://192.168.31.100:6443 tls_config: insecure_skip_verify: true bearer_token_file: /data/prometheus/token relabel_configs: - source_labels: [__address__] action: replace regex: (.*):10250 target_label: __address__ replacement: $1:9091 - job_name: "Service/coredns" kubernetes_sd_configs: - role: endpoints api_server: https://192.168.31.100:6443 tls_config: insecure_skip_verify: true bearer_token_file: /data/prometheus/token relabel_configs: - source_labels: [__meta_kubernetes_namespace, __meta_kubernetes_endpoints_name, __meta_kubernetes_endpoint_port_name] action: keep regex: kube-system;kube-dns;metrics - job_name: "Service/ingress-nginx" kubernetes_sd_configs: - role: endpoints api_server: https://192.168.31.100:6443 tls_config: insecure_skip_verify: true bearer_token_file: /data/prometheus/token relabel_configs: - source_labels: [__meta_kubernetes_namespace, __meta_kubernetes_endpoints_name, __meta_kubernetes_endpoint_port_name] action: keep regex: ingress-nginx;ingress-nginx-metrics;metrics - job_name: "kube-state-metrics" kubernetes_sd_configs: - role: endpoints api_server: https://192.168.31.100:6443 tls_config: insecure_skip_verify: true bearer_token_file: /data/prometheus/token relabel_configs: - source_labels: [__meta_kubernetes_namespace, __meta_kubernetes_endpoints_name, __meta_kubernetes_endpoint_port_name] action: keep regex: kube-system;kube-state-metrics;http-metrics - job_name: "service-http-probe" scrape_interval: 1m metrics_path: /probe # 使用blackbox exporter配置文件的http_2xx的探针 params: module: [ http_2xx ] kubernetes_sd_configs: - role: service api_server: https://192.168.31.100:6443 tls_config: insecure_skip_verify: true bearer_token_file: /data/prometheus/token relabel_configs: # 保留service注释有prometheus.io/scrape: true和prometheus.io/http-probe: true - source_labels: [__meta_kubernetes_service_annotation_prometheus_io_scrape, __meta_kubernetes_service_annotation_prometheus_io_http_probe] action: keep regex: true;true # 将原标签名__meta_kubernetes_service_name改成service_name - source_labels: [__meta_kubernetes_service_name] action: replace regex: (.*) target_label: service_name # 将原标签名__meta_kubernetes_namespace改成namespace - source_labels: [__meta_kubernetes_namespace] action: replace regex: (.*) target_label: namespace # 将instance改成 `clusterIP:port` 地址 - source_labels: [__meta_kubernetes_service_cluster_ip, __meta_kubernetes_service_annotation_prometheus_io_http_probe_port, __meta_kubernetes_service_annotation_pretheus_io_http_probe_path] action: replace regex: (.*);(.*);(.*) target_label: __param_target replacement: $1:$2$3 - source_labels: [__param_target] target_label: instance # 将__address__的值改成 `blackbox-exporter:9115` - target_label: __address__ replacement: blackbox-exporter:9115 - job_name: "service-tcp-probe" scrape_interval: 1m metrics_path: /probe # 使用blackbox exporter配置文件的tcp_connect的探针 params: module: [ tcp_connect ] kubernetes_sd_configs: - role: service api_server: https://192.168.31.100:6443 tls_config: insecure_skip_verify: true bearer_token_file: /data/prometheus/token relabel_configs: # 保留prometheus.io/scrape: "true"和prometheus.io/tcp-probe: "true"的service - source_labels: [__meta_kubernetes_service_annotation_prometheus_io_scrape, __meta_kubernetes_service_annotation_prometheus_io_tcp_probe] action: keep regex: true;true # 将原标签名__meta_kubernetes_service_name改成service_name - source_labels: [__meta_kubernetes_service_name] action: replace regex: (.*) target_label: service_name # 将原标签名__meta_kubernetes_service_name改成service_name - source_labels: [__meta_kubernetes_namespace] action: replace regex: (.*) target_label: namespace # 将instance改成 `clusterIP:port` 地址 - source_labels: [__meta_kubernetes_service_cluster_ip, __meta_kubernetes_service_annotation_prometheus_io_http_probe_port] action: replace regex: (.*);(.*) target_label: __param_target replacement: $1:$2 - source_labels: [__param_target] target_label: instance # 将__address__的值改成 `blackbox-exporter:9115` - target_label: __address__ replacement: blackbox-exporter:9115 ```