文章目录
- 一、概述
- 二、docker-compose 部署 Prometheus
- 1)部署 docker
- 2)部署 docker-compose
- 3)配置 prometheus.yml
- 4)配置 rules.yml
- 5)配置 alertmanager.yml
- 6)编排 docker-compose yaml 文件
- 7)开始部署 Prometheus
- 三、独立部署 Consul (如果安装ConsulManager可忽略)
- 四、安装 ConsulManager
- 1)下载
- 2)基于docker-compose安装
- 3)安装 Node_Export
- 3)通过 API 注册到 Consul
- 4)配置 Prometheus 实现自动服务发现
- 5)Prometheus 监控域名证书时间 domain_exporter
- 1、安装 domain_exporter
- 2、配置 Prometheus 实现自动服务发现
- 3、通过 API 注册到 Consul
- 6)Prometheus blackbox_exporter
- 1、安装 blackbox_exporter
- 2、配置 Prometheus 实现自动服务发现
- 3、通过 API 注册到 Consul
- 4、配置告警规则
- 7)consul 删除服务
一、概述
Prometheus 与 Consul 集成可以实现服务的自动发现和注册。Consul 是一个服务发现和配置的工具,它可以管理服务的注册、发现和健康检查。Prometheus 可以通过 Consul 的服务发现功能来动态地发现监控目标(Targets)。
整体架构图:
二、docker-compose 部署 Prometheus
1)部署 docker
# 安装yum-config-manager配置工具
yum -y install yum-utils
# 建议使用阿里云yum源:(推荐)
#yum-config-manager --add-repo https://download.docker.com/linux/centos/docker-ce.repo
yum-config-manager --add-repo http://mirrors.aliyun.com/docker-ce/linux/centos/docker-ce.repo
# 安装docker-ce版本
yum install -y docker-ce
# 启动并开机启动
systemctl enable --now docker
docker --version
2)部署 docker-compose
curl -SL https://github.com/docker/compose/releases/download/v2.16.0/docker-compose-linux-x86_64 -o /usr/local/bin/docker-compose
chmod +x /usr/local/bin/docker-compose
docker-compose --version
3)配置 prometheus.yml
/etc/prometheus/prometheus.yml
# 全局配置
global:
scrape_interval: 15s
evaluation_interval: 15s
# scrape_timeout is set to the global default (10s).
# 告警配置
alerting:
alertmanagers:
- static_configs:
- targets: ['192.168.182.110:9093']
# 加载一次规则,并根据全局“评估间隔”定期评估它们。
rule_files:
- "/etc/prometheus/rules.yml"
# 控制Prometheus监视哪些资源
# 默认配置中,有一个名为prometheus的作业,它会收集Prometheus服务器公开的时间序列数据。
scrape_configs:
# 作业名称将作为标签“job=<job_name>`添加到此配置中获取的任何数据。
- job_name: 'prometheus'
static_configs:
- targets: ['localhost:9090']
- job_name: 'node'
static_configs:
- targets: ['localhost:9100']
labels:
env: dev
role: docker
4)配置 rules.yml
/etc/prometheus/rules.yml
groups:
- name: example
rules:
# Alert for any instance that is unreachable for >5 minutes.
- alert: InstanceDown
expr: up == 0
for: 1m
labels:
serverity: page
annotations:
summary: "Instance {{ $labels.instance }} down"
description: "{{ $labels.instance }} of job {{ $labels.job }} has been down for more than 5 minutes."
5)配置 alertmanager.yml
/etc/alertmanager/alertmanager.yml
global:
resolve_timeout: 5m
smtp_smarthost: 'xxx@xxx:587'
smtp_from: 'xxx@xxx'
smtp_auth_username: 'xxx@xxx'
smtp_auth_password: 'xxxx'
smtp_require_tls: true
route:
group_by: ['alertname']
group_wait: 10s
group_interval: 10s
repeat_interval: 1h
receiver: 'test-mails'
receivers:
- name: 'test-mails'
email_configs:
- to: 'scottcho@qq.com'
6)编排 docker-compose yaml 文件
services:
prometheus:
image: prom/prometheus
volumes:
- /etc/prometheus/:/etc/prometheus/
- prometheus_data:/prometheus
command:
- '--config.file=/etc/prometheus/prometheus.yml'
- '--storage.tsdb.path=/prometheus'
- '--web.console.libraries=/usr/share/prometheus/console_libraries'
- '--web.console.templates=/usr/share/prometheus/consoles'
- '--web.external-url=http://192.168.182.110:9090/'
- '--web.enable-lifecycle'
- '--storage.tsdb.retention=15d'
ports:
- 9090:9090
links:
- alertmanager:alertmanager
restart: always
alertmanager:
image: prom/alertmanager
ports:
- 9093:9093
volumes:
- /etc/alertmanager/:/etc/alertmanager/
- alertmanager_data:/alertmanager
command:
- '--config.file=/etc/alertmanager/alertmanager.yml'
- '--storage.path=/alertmanager'
restart: always
grafana:
image: grafana/grafana
ports:
- 3000:3000
volumes:
- /etc/grafana/:/etc/grafana/provisioning/
- grafana_data:/var/lib/grafana
environment:
- GF_INSTALL_PLUGINS=camptocamp-prometheus-alertmanager-datasource
links:
- prometheus:prometheus
- alertmanager:alertmanager
restart: always
volumes:
prometheus_data: {}
grafana_data: {}
alertmanager_data: {}
7)开始部署 Prometheus
chmod -R 777 data etc
docker-compose up -d
访问地址(注意更换自己的IP):
- Prometheus server:http://192.168.192.110:9090
- Prometheus server自身指标:http://192.168.192.110:9090/metrics
- Grafana:http://192.168.192.110:3000 ,初始登录账号/密码:
admin/admin
- AlertManager:http://192.168.192.110:9093
三、独立部署 Consul (如果安装ConsulManager可忽略)
Consul 是基于 GO 语言开发的开源工具,主要面向分布式,服务化的系统提供服务注册、服务发现和配置管理的功能。Consul 提供服务注册/发现、健康检查、Key/Value存储、多数据中心和分布式一致性保证等功能。
之前我们通过 Prometheus 实现监控,当新增一个 Target 时,需要变更服务器上的配置文件,即使使用 file_sd_configs 配置,也需要登录服务器修改对应 Json 文件,会非常麻烦。不过 Prometheus 官方支持多种自动服务发现的类型,其中就支持 Consul。
通过docker部署
docker run -d --name consul -p 8500:8500 consul:1.14.5
访问web :http://192.168.182.110:8500
四、安装 ConsulManager
GitHub地址:https://github.com/starsliao/TenSunS.git
1)下载
git clone https://github.com/starsliao/TenSunS.git
2)基于docker-compose安装
前提服务器需要先安装好docker和docker-compose
一键安装:
curl -s https://starsl.cn/static/img/all_install.sh|sudo bash
#正在启动后羿运维平台...
#[+] Running 3/3
# ⠿ Container consul Running #0.0s
# ⠿ Container flask-consul Started #10.5s
# ⠿ Container nginx-consul Started #0.7s
#后羿运维平台默认的admin密码是:eb98033c
#修改密码请编辑 /opt/tensuns/docker-compose.yaml 查找并修改变量 admin_passwd 的值
#请使用浏览器访问 http://{你的IP}:1026 并登录使用
查看服务启动状态
docker-compose -f /opt/tensuns/docker-compose.yaml ps
- 运行该脚本后会使用docker-compose启动TenSunS和Consul,安装路径是:/opt/tensuns
- 脚本运行完成后会有使用提示及自动生成登录密码,打开浏览器立刻登录TenSunS,开始体验吧!
访问(账号/密码:admin/eb98033c
):http://192.168.182.110:1026/
3)安装 Node_Export
node_export用于采集主机信息,本质是一个采用http的协议的api
RedHat家族的操作系统可以采用yum进行安装
yum 安装方法: https://copr.fedorainfracloud.org/coprs/ibotty/prometheus-exporters/
curl -Lo /etc/yum.repos.d/_copr_ibotty-prometheus-exporters.repo https://copr.fedorainfracloud.org/coprs/ibotty/prometheus-exporters/repo/epel-7/ibotty-prometheus-exporters-epel-7.repo
yum -y install node_exporter
systemctl start node_exporter
systemctl enable node_exporter.service
# 检查,获取指标数据
curl localhost:9100/metrics
3)通过 API 注册到 Consul
curl -X PUT -d '{"id": "node1","name": "node_exporter","address": "192.168.182.110","port": 9100,"tags": ["exporter"],"meta": {"job": "node_exporter","instance": "Prometheus服务器"},"checks": [{"http": "http://192.168.182.110:9100/metrics", "interval": "5s"}]}' http://192.168.182.110:8500/v1/agent/service/register
### 参数说明
# id : 注册ID 在consul中为唯一标识
# name :Service名称
# address:自动注册绑定ip
# port:自动注册绑定端口
# tags:注册标签,可多个
# checks : 健康检查
# http: 检查数据来源
# interval: 检查时间间隔
# http://192.168.182.110:8500/v1/agent/service/register consul注册接口
把json数据放在文件中,使用这个json文件注册
cat > node_exporter.json<<"EOF"
{
"id": "node2",
"name": "node_exporter",
"address": "192.168.182.110",
"port": 9100,
"tags": ["exporter"],
"meta": {
"job": "node_exporter",
"instance": "test服务器"
},
"checks": [{
"http": "http://192.168.182.110:9100/metrics",
"interval": "10s"
}]
}
EOF
使用json文件注册
curl --request PUT --data @node_exporter.json http://192.168.182.110:8500/v1/agent/service/register
可能会出现权限问题:Permission denied: anonymous token lacks permission 'service:write' on "node_exporter". The anonymous token is used implicitly when a request does not specify a token.
【解决】修改配置 /opt/tensuns/consul/config/consul.hcl
重启Consul
docker-compose -f /opt/tensuns/docker-compose.yaml restart
重新注册
curl -X PUT -d '{"id": "node1","name": "node_exporter","address": "192.168.182.110","port": 9100,"tags": ["exporter"],"meta": {"job": "node_exporter","instance": "Prometheus服务器"},"checks": [{"http": "http://192.168.182.110:9100/metrics", "interval": "5s"}]}' http://192.168.182.110:8500/v1/agent/service/register
4)配置 Prometheus 实现自动服务发现
现在 Consul 服务已经启动完毕,并成功注册了一个服务,接下来,我们需要配置 Prometheus 来使用 Consul 自动服务发现,目的就是能够将上边添加的服务自动发现到 Prometheus 的 Targets 中,增加 prometheus.yml 配置如下:
- job_name: 'consul_exporter'
consul_sd_configs:
- server: '192.168.182.110:8500'
services: []
#重新加载prometheus服务
curl -X POST http://192.168.182.110:9090/-/reload
说明一下:这里需要使用 consul_sd_configs 来配置使用 Consul 服务发现类型,server 为 Consul 的服务地址,这里跟上边要对应上。 配置完毕后,重启 Prometheus 服务,此时可以通过 Prometheus UI 页面的 Targets 下查看是否配置成功。
5)Prometheus 监控域名证书时间 domain_exporter
1、安装 domain_exporter
docker run -d --restart=always --name domain_exporter -p 9222:9222 caarlos0/domain_exporter
# 检查
curl 192.168.182.111:9222/metrics
2、配置 Prometheus 实现自动服务发现
#域名检测
- job_name: consul_domain_exporter
scrape_interval: 10s
metrics_path: /probe
consul_sd_configs:
- server: '192.168.182.110:8500'
services: []
relabel_configs:
- source_labels: [__meta_consul_tags]
regex: .*domain.*
action: keep
- regex: __meta_consul_service_metadata_(.+)
action: labelmap
- source_labels: [__meta_consul_service_address]
target_label: __param_target
- target_label: __address__
replacement: 192.168.182.111:9222
#重新加载prometheus服务
curl -X POST http://192.168.182.110:9090/-/reload
3、通过 API 注册到 Consul
curl -X PUT -d '{"id": "domain2","name": "domain_exporter","address": "baidu.com","tags": ["domain"],"meta": {"job": "domain_exporter","instance": "test服务器"},"checks": [{"http": "192.168.182.111:9222", "interval": "5s"}]}' http://192.168.182.110:8500/v1/agent/service/register
6)Prometheus blackbox_exporter
blackbox_exporter 是Prometheus 官方提供的 exporter 之一,主要提供http、dns、tcp、icmp 的监控数据采集。
1、安装 blackbox_exporter
docker run -d -p 9115:9115 --name blackbox_exporter quay.io/prometheus/blackbox-exporter:latest
# 检查
curl localhost:9115/metrics
2、配置 Prometheus 实现自动服务发现
#http配置
- job_name: "consul-blackbox_http"
metrics_path: /probe
params:
module: [http_2xx]
consul_sd_configs:
- server: '192.168.182.110:8500'
services: []
relabel_configs:
- source_labels: [__meta_consul_tags]
regex: .*blackbox_http.*
action: keep
- regex: __meta_consul_service_metadata_(.+)
action: labelmap
- source_labels: [__meta_consul_service_address]
target_label: __param_target
- source_labels: [__param_target]
target_label: instance
- target_label: __address__
replacement: 192.168.182.111:9115
#重新加载prometheus服务
curl -X POST http://192.168.182.110:9090/-/reload
3、通过 API 注册到 Consul
curl -X PUT -d '{"id": "http1","name": "blackbox_http","address": "https://www.jd.com","tags": ["blackbox_http"],"checks": [{"http": "http://192.168.182.111:9115", "interval": "5s"}]}' http://localhost:8500/v1/agent/service/register
4、配置告警规则
/etc/prometheus/rules/blackbox exporter.yml
groups:
- name: 黑盒子探测
rules:
- alert: 黑盒子探测失败告警
expr: probe_success == 0
for: 1m
labels:
severity: critical
annotations:
description: 黑盒子检测失败,当前值:{{ $value }}
summary: 黑盒子探测失败{{ $labels.instance }}
- alert: http状态码检测失败
expr: probe_http_status_code <= 199 or probe_http_status_code >= 400
for: 1m
labels:
severity: warning
annotations:
description: HTTP状态码非 200-399,当前状态码为:{{ $value }}
summary: http状态码检测失败{{ $labels.instance }}
在Prometheus配置文件中添加上面的告警规则配置
/etc/prometheus/prometheus.yml
rule_files:
- "/etc/prometheus/rules.yml"
- "/etc/prometheus/rules/blackbox_exporter.yml"
重新加载或重启
#重新加载prometheus服务
curl -X POST http://192.168.182.110:9090/-/reload
# 重启
docker-compose -f docker-compose.yaml restart
7)consul 删除服务
# 语法
# curl --request PUT http://127.0.0.1:8500/v1/agent/service/deregister/${ID}
# 示例:
curl --request PUT http://127.0.0.1:8500/v1/agent/service/deregister/http1
有任何疑问也可关注我公众号:大数据与云原生技术分享,进行技术交流,如本篇文章对您有所帮助,麻烦帮忙一键三连(点赞、转发、收藏)~