First steps · Prometheus 官方文档中文翻译

# **First steps with Prometheus** <br /> 欢迎使用 Prometheus！Prometheus 是一个监控平台，它对监控目标，通过其HTTP endpoints 来收集、采样指标（metrics）。该指南将会向您展示如何使用Prometheus 安装、配置和监控我们的第一个资源。您将会下载、安装和运行Prometheus。您还将下载和安装 exporter，它能够暴露主机和服务的时序数据。我们第一个 exporter 将是 Prometheus 本身，它能够提供关于内存使用、垃圾回收等多种宿主机级别的指标。 ## **下载 Prometheus** [下载最新版本](https://prometheus.io/download)的 Prometheus，然后进行解压缩： ``` tar xvfz prometheus-*.tar.gz cd prometheus-* ``` Prometheus server 是一个名字为 prometheus 的二进制文件（或 Microsoft Windows 上的 prometheus.exe）。我们可以运行该二进制文件，并通过使用 `--help` 标志来查看其有关其选项的帮助。 ~~~bash ./prometheus --help usage: prometheus [<flags>] The Prometheus monitoring server ~~~ 在开始 Prometheus 之前，我们先来配置它。 ## **配置** Prometheus 配置文件结构为 YAML。下载时随附一个名为 prometheus.yml 的示例配置文件。我们删除了示例文件中的大多数注释以使其更加简洁。 ~~~yaml global: scrape_interval: 15s evaluation_interval: 15s rule_files: # - "first.rules" # - "second.rules" scrape_configs: - job_name: prometheus static_configs: - targets: ['localhost:9090'] ~~~ 在示例配置文件中有三个有关配置的 block：`global`, `rule_files`, `scrape_configs`. `global` 块控制着 Prometheus server 的全局配置。在此我们展示了两项配置。 * `scrape_interval`：采样间隔，控制着 Prometheus 抓取目标的频率。您可以为单独的采样目标覆盖该配置。在此示例中，全局配置是每15秒做一次采样。 * `evaluation_interval`：评估间隔选项控制着 Prometheus 评估规则的频率。Prometheus 运行评估规则来创建新的时间序列，并生成警报。 `rule_files` 块指定了我们要 Prometheus server 加载的任意规则。此处我们还没有规则。最后一个块，`scrape_configs` 控制了 Prometheus 监控着的采样目标资源。因为Prometheus 将自身有关的数据通过 HTTP endpoint 暴露，因此它可以抓取并监控自身的运行状况。在默认配置中，有一个称为 Prometheus 的 Job，它会抓取Prometheus server 暴露的时序数据。该 Job 包含了一个独立的，静态配置的目标：localhost:9090。Prometheus 希望能够通过 `/metrics` 路由获取指标。因此，该默认 Job 通过以下 URL 进行采样：`http://localhost:9090/metrics`。请求获得的时序数据 response 将会详细表达 Prometheus server 的状态和性能。有关配置选项的完整说明，可以参考[配置文档](https://prometheus.io/docs/operating/configuration)。 ## **开始 Prometheus** 进入到 Prometheus bin 所在的目录，使用我们新的配置文件来启动 Prometheus： ~~~bash ./prometheus --config.file=prometheus.yml ~~~ Prometheus 将会启动。您将同样能够浏览一个关于 Prometheus 自身的状态网页：`http://localhost:9090`。给他 30s 时间来收集通过它自身 HTTP metric endpoint 暴露的数据。您同样能验证 Prometheus 是否在提供有关其自身的metrics，通过 endpoint：`http://localhost:9090/metrics`。 ## **使用表达式浏览（expression browser）** 让我们看一下 Prometheus 收集的有关它自身的数据。打开`http://localhost:9090/graph`，选择 `Console` 中的 `Graph` tab 来使用 Prometheus 自建的表达式浏览。正如您可以从 `http://localhost:9090/metrics` 收集的那样，Prometheus export 的有关它自身的一个指标叫称为 `promhttp_metric_handler_requests_total`（Prometheus server 已处理的 /metrics 请求总数）。将其输入到表达式 console 中： ~~~ promhttp_metric_handler_requests_total ~~~ 这将返回返回多个不同的时间序列（以及每个时间序列的最新值），所有时间序列的指标名称均为 promhttp_metric_handler_requests_total，但标签（label）不同。这些标签指定了不同的请求状态。如果我们只对 HTTP code 是 200 的请求感兴趣，我们可以通过这样的 query 来检索数据： ~~~ promhttp_metric_handler_requests_total{code="200"} ~~~ 对返回的时间序列进行计数，可以： ~~~ count(promhttp_metric_handler_requests_total) ~~~ 通过查看[表达式语言文档](https://prometheus.io/docs/querying/basics/)来获取更多的信息。 ## **使用图形接口** 要想图形化表达式，打开`http://localhost:9090/graph`，选择`Graph` tab。举个例子，输入如下表达式，图形化Prometheus本身的status code为200的每秒HTTP请求率： ~~~ rate(promhttp_metric_handler_requests_total{code="200"}[1m]) ~~~ ## **监控其他目标** 仅从 Prometheus 收集指标并不能很好地说明 Prometheus 的功能。为了更好的了解 Prometheus 能做什么，我们建议您浏览其他 exporters 的文档。使用 `node exporter` 来监控 Linux 或 macOS 的宿主机指标是一个不错的起点。 ## **总结** 在该指南中，您安装了 Prometheus，配置了一个 Prometheus 实例来监控资源，并了解了在 Prometheus 表达式浏览中使用时间序列数据的一些基础知识。要继续学习 Prometheus，请查看[概述](https://prometheus.io/docs/introduction/overview)以获取有关接下来要探索的内容的一些想法。