服务器监控软件夜莺使用(二)

文章目录

  • 一、采集器安装
    • 1. Categraf简介
    • 2. Categraf部署
    • 3. 测试服务器部署
    • 4. 系统监控插件
    • 5. 显卡监控插件
    • 6. 服务监控插件
  • 二、监控仪表盘
    • 1. 机器列表
    • 2. 系统监控
    • 3. 服务监控
  • 三、告警配置
    • 1. 邮件通知
    • 2. 告警规则
    • 3. 告警自愈


一、采集器安装

1. Categraf简介

Categraf 需要部署到所有需要监控的机器上,因为采集 CPU、内存、进程等指标需要读取操作系统里的信息。
Categraf 推送监控数据到服务端,基于 Prometheus 的 RemoteWrite 协议。

Grafana 仪表盘市场
categraf插件说明
categraf部署文档
categraf下载地址
下载文件例如: categraf-v0.3.45-linux-amd64.tar.gz

2. Categraf部署

有些监控插件,docker部署方式很难配置,所以采用二进制部署Categraf。

  1. 删除不使用的插件
    categraf-v0.3.45-linux-amd64/conf/input.*
  2. 修改插件配置*.toml
  3. 修改Categraf配置config.toml
[global]
hostname = "机器标签"
[[writers]]
url = "http://192.168.6.226:17000/prometheus/v1/write"
[ibex]
enable = true
servers = ["192.168.6.226:20090"]
[heartbeat]
url = "http://192.168.6.226:17000/v1/n9e/heartbeat"
  1. 拷贝categraf
    拷贝categraf-v0.3.45-linux-amd64内的所有文件/文件夹到要部署的环境 /home/monitor/categraf
  2. 安装启动categraf
cd /home/monitor/categraf && chmod +x categraf && ./categraf --install && ./categraf --start
  • 其他命令
# 以service方式安装, 相当于添加service文件+systemctl daemon-reload
sudo ./categraf  --install
# 以service方式卸载, 相当于systemctl stop categraf + 删除service文件
# 如果安装过categraf,先卸载
sudo ./categraf  --remove
# 以service方式启动categraf ,相当于systemctl start categraf
sudo ./categraf  --start
# 以service方式停止categraf,相当于systemctl stop categraf
sudo ./categraf  --stop
# 以service方式查看categraf,相当于systemctl status categraf
sudo ./categraf  --status
# 采集了哪些 mysql 指标
sudo ./categraf --test --inputs mysql

3. 测试服务器部署

在这里插入图片描述

4. 系统监控插件

  • cpu 插件:采集本机 CPU 的使用率、空闲率等
    input.cpu/cpu.toml,可使用默认配置
# 采集频率
interval = 15
# 是否采集每个单核的指标
collect_per_cpu = false
  • 磁盘 插件:采集磁盘利用率、inode利用率等
    input.disk/disk.toml,可使用默认配置
# 采集频率
interval = 15

# 统计指定挂载点
# mount_points = ["/"]

# 按文件系统类型忽略挂载点
ignore_fs = ["tmpfs", "devtmpfs", "devfs", "iso9660", "overlay", "aufs", "squashfs", "nsfs", "CDFS"]

# 忽略挂载点
ignore_mount_points = ["/boot", "/var/lib/kubelet/pods"]
  • 磁盘IO 插件:采集磁盘读写IO指标
    input.diskio/diskio.toml,可使用默认配置
# 采集频率
interval = 15

# 统计指定设备
# devices = ["sda", "sdb", "vd*"]
  • 内核 插件:采集 OS 启动时间,上下文切换的次数等
    input.kernel/kernel.toml,可使用默认配置
# 采集频率
interval = 15
  • 内存 插件:采集内存利用率等
    input.mem/mem.toml,可使用默认配置
# 采集频率
interval = 15

# 是否采集各个平台特有的指标
collect_platform_fields = true
  • 网络流量 插件:采集网卡的流量、包量等
    input.net/net.toml,可使用默认配置
# 采集频率
interval = 15

# 是否在Linux上收集协议统计信息 
# collect_protocol_stats = false

# 统计指定网卡信息
# interfaces = ["eth0"]
  • 网络连接 插件:采集有多少 time_wait 连接,多少 established 连接等
    input.netstat/netstat.toml,可使用默认配置
# 采集频率
interval = 15

disable_summary_stats = false

# 如果有很多网络连接, 该插件占用系统资源
disable_connection_stats = true

tcp_ext = false
ip_ext = false
  • ntp时间 插件:监控机器时间偏移量
    input.ntp/ntp.toml
# 采集频率
interval = 15

# ntp 服务器
ntp_servers = ["ntp.aliyun.com"]

# 响应超时时间
timeout = 5
  • 进程 插件:采集进程 running 的有多少,sleeping 的有多少,total 有多少
    input.processes/processes.toml,可使用默认配置
# 采集频率
interval = 15

#  强制使用ps命令收集 
# force_ps = false

#  强制使用/proc收集
# force_proc = false
  • system 插件:采集系统负载信息
    input.system/system.toml,可使用默认配置
# 采集频率
interval = 15

# 是否收集system_n_users信息
# collect_user_number = false

5. 显卡监控插件

  • nvidia显卡 插件:监控nvidia显卡信息
    input.nvidia_smi/nvidia_smi.toml
# 采集频率
interval = 15

# 执行本地命令
nvidia_smi_command = "nvidia-smi"

# 可以通过运行`nvidia-smi --help-query-gpus`来查找可能的字段
# `AUTO` 自动检测要查询的字段
query_field_names = "AUTO"

6. 服务监控插件

  • docker 插件:docker容器监控
    input.docker/docker.toml
# 采集频率
interval = 15

[[instances]]
# interval = global.interval * interval_times
interval_times = 1

## Docker Endpoint
endpoint = "unix:///var/run/docker.sock"

# 包括/排除的容器
container_name_include = []
container_name_exclude = []

gather_services = false
gather_extend_memstats = false

container_id_label_enable = true
container_id_label_short_style = false

timeout = "5s"

perdevice_include = []

total_include = ["cpu", "blkio", "network"]

docker_label_include = []
docker_label_exclude = ["annotation*", "io.kubernetes*", "*description*", "*maintainer*", "*hash", "*author*", "*org_*", "*date*", "*url*", "*docker_compose*"]
  • 日志 插件:提取日志内容,转换为监控metrics
    input.mtail/mtail.toml
# 采集频率
interval = 15

[[instances]]
progs = "/home/monitor/categraf/conf/input.mtail/prog1" # 日志解析规则配置文件的路径
logs = ["/home/logs/example/all.log"] # 日志文件
labels = { log="6.221-example-log" } # 日志标签
override_timezone = "Asia/Shanghai" # 时区
emit_metric_timestamp = "true" # 时间戳

input.mtail/prog1/rule_error.mtail

gauge error_num
/ERROR.*/ {
      error_num++
}

input.mtail/prog1/rule_info.mtail

gauge info_num
/INFO.*/ {
      info_num++
}

input.mtail/prog1/rule_login.mtail

gauge login_num
/登录账户.*/ {
      login_num++
}
  • mysql 插件:连到 mysql 实例,执行一些 sql,解析输出内容,整理为监控数据上报
    input.mysql/mysql.toml
# 采集频率
interval = 15

# 定义instance, 一个instance对应一个mysql实例
[[instances]]
address = "192.168.6.200:3306"
username = "root"
password = "123456"

# 是否使用tls 等定制参数
parameters = "tls=false"
  • nginx 插件:监控nginx状态,该插件依赖nginx的 **http_stub_status_module
    nginx服务需要启用http_stub_status_module模块
    input.nginx/nginx.toml
# 采集频率
interval = 15

[[instances]]
# 设置访问 Nginx stub_status 链接
urls = ["http://192.168.6.223:8080/nginx_status"]

response_timeout = "5s"
  • redis 插件:就是连上 redis,执行 info 命令,解析结果,整理成监控数据上报
    input.redis/redis.toml
# 采集频率
interval = 15

# 定义instance, 一个instance对应一个redis实例
[[instances]]
address = "192.168.6.223:6379"
username = ""
password = ""
pool_size = 2

# 是否开启slowlog收集
gather_slowlog = true

# 最多收集少条slowlog
slowlog_max_len = 100

二、监控仪表盘

1. 机器列表

  • 仪表盘 JSON
{
    "name": "机器列表",
    "tags": "",
    "ident": "",
    "configs": {
        "panels": [
            {
                "type": "table",
                "id": "77bf513a-8504-4d33-9efe-75aaf9abc9e4",
                "layout": {
                    "h": 11,
                    "i": "77bf513a-8504-4d33-9efe-75aaf9abc9e4",
                    "isResizable": true,
                    "w": 24,
                    "x": 0,
                    "y": 5
                },
                "version": "3.0.0",
                "datasourceCate": "prometheus",
                "datasourceValue": "${prom}",
                "targets": [
                    {
                        "expr": "avg(system_uptime{ident=~\"$ident\"}) by (ident)",
                        "refId": "A",
                        "legend": "启动时长"
                    },
                    {
                        "expr": "avg(cpu_usage_active{cpu=\"cpu-total\", ident=~\"$ident\"}) by (ident)",
                        "legend": "CPU使用率",
                        "refId": "B"
                    },
                    {
                        "expr": "avg(mem_used_percent{ident=~\"$ident\"}) by (ident)",
                        "legend": "内存使用率",
                        "refId": "C"
                    },
                    {
                        "expr": "avg(mem_total{ident=~\"$ident\"}) by (ident)",
                        "legend": "总内存",
                        "refId": "D"
                    },
                    {
                        "expr": "avg(disk_used_percent{ident=~\"$ident\",path=\"/\"}) by (ident)",
                        "legend": "硬盘使用率",
                        "refId": "E"
                    },
                    {
                        "expr": "avg(disk_total{ident=~\"$ident\"}) by (ident)",
                        "refId": "F",
                        "legend": "总硬盘"
                    },
                    {
                        "expr": "avg(rate(net_bytes_recv{ident=~\"$ident\"}[1m])) by(ident)",
                        "refId": "G",
                        "legend": "网络入流量"
                    },
                    {
                        "expr": "avg(rate(net_bytes_sent{ident=~\"$ident\"}[1m])) by(ident)",
                        "refId": "H",
                        "legend": "网络出流量"
                    },
                    {
                        "expr": "avg(nvidia_smi_utilization_gpu_ratio{ident=~\"$ident\"}) by (ident)",
                        "refId": "I",
                        "legend": "GPU使用率"
                    },
                    {
                        "expr": "avg(nvidia_smi_memory_used_bytes/nvidia_smi_memory_total_bytes{ident=~\"$ident\"}) by (ident)",
                        "refId": "J",
                        "legend": "显存使用率"
                    },
                    {
                        "expr": "avg(nvidia_smi_memory_total_bytes{ident=~\"$ident\"}) by (ident)",
                        "refId": "K",
                        "legend": "总显存"
                    },
                    {
                        "expr": "ntp_offset_ms",
                        "refId": "L",
                        "legend": "NTP偏移 ms"
                    }
                ],
                "transformations": [
                    {
                        "id": "organize",
                        "options": {
                            "renameByName": {
                                "ident": "机器"
                            }
                        }
                    }
                ],
                "name": "机器列表",
                "maxPerRow": 4,
                "custom": {
                    "showHeader": true,
                    "colorMode": "background",
                    "calc": "lastNotNull",
                    "displayMode": "labelValuesToRows",
                    "aggrDimension": "ident",
                    "sortColumn": "ident",
                    "sortOrder": "ascend",
                    "linkMode": "cellLink"
                },
                "options": {
                    "standardOptions": {}
                },
                "overrides": [
                    {
                        "type": "special",
                        "matcher": {
                            "id": "byFrameRefID",
                            "value": "A"
                        },
                        "properties": {
                            "standardOptions": {
                                "util": "humantimeSeconds"
                            }
                        }
                    },
                    {
                        "matcher": {
                            "id": "byFrameRefID",
                            "value": "B"
                        },
                        "properties": {
                            "standardOptions": {
                                "util": "percent",
                                "decimals": 1
                            },
                            "valueMappings": []
                        }
                    },
                    {
                        "matcher": {
                            "id": "byFrameRefID",
                            "value": "C"
                        },
                        "properties": {
                            "standardOptions": {
                                "util": "percent",
                                "decimals": 1
                            },
                            "valueMappings": []
                        },
                        "type": "special"
                    },
                    {
                        "matcher": {
                            "id": "byFrameRefID",
                            "value": "D"
                        },
                        "properties": {
                            "standardOptions": {
                                "decimals": 1,
                                "util": "bytesIEC"
                            },
                            "valueMappings": []
                        },
                        "type": "special"
                    },
                    {
                        "matcher": {
                            "id": "byFrameRefID",
                            "value": "E"
                        },
                        "properties": {
                            "standardOptions": {
                                "decimals": 1,
                                "util": "percent"
                            },
                            "valueMappings": []
                        },
                        "type": "special"
                    },
                    {
                        "type": "special",
                        "matcher": {
                            "id": "byFrameRefID",
                            "value": "F"
                        },
                        "properties": {
                            "standardOptions": {
                                "util": "bytesIEC",
                                "decimals": 0
                            }
                        }
                    },
                    {
                        "type": "special",
                        "matcher": {
                            "id": "byFrameRefID",
                            "value": "G"
                        },
                        "properties": {
                            "standardOptions": {
                                "util": "bytesSecIEC",
                                "decimals": 1
                            }
                        }
                    },
                    {
                        "type": "special",
                        "matcher": {
                            "id": "byFrameRefID",
                            "value": "H"
                        },
                        "properties": {
                            "standardOptions": {
                                "util": "bytesSecIEC",
                                "decimals": 1
                            }
                        }
                    },
                    {
                        "type": "special",
                        "matcher": {
                            "id": "byFrameRefID",
                            "value": "I"
                        },
                        "properties": {
                            "standardOptions": {
                                "util": "percentUnit",
                                "decimals": 1
                            }
                        }
                    },
                    {
                        "type": "special",
                        "matcher": {
                            "id": "byFrameRefID",
                            "value": "J"
                        },
                        "properties": {
                            "standardOptions": {
                                "util": "percentUnit",
                                "decimals": 1
                            }
                        }
                    },
                    {
                        "type": "special",
                        "matcher": {
                            "id": "byFrameRefID",
                            "value": "K"
                        },
                        "properties": {
                            "standardOptions": {
                                "util": "bytesIEC",
                                "decimals": 1
                            }
                        }
                    }
                ]
            }
        ],
        "var": [
            {
                "definition": "prometheus",
                "name": "prom",
                "type": "datasource"
            },
            {
                "allOption": true,
                "datasource": {
                    "cate": "prometheus",
                    "value": "${prom}"
                },
                "definition": "label_values(system_load1,ident)",
                "multi": true,
                "name": "ident",
                "type": "query"
            }
        ],
        "version": "3.0.0"
    }
}
  • 仪表盘 效果
    在这里插入图片描述

2. 系统监控

  • 仪表盘 JSON
{
    "name": "系统监控",
    "tags": "",
    "ident": "",
    "configs": {
        "panels": [
            {
                "type": "timeseries",
                "id": "043c26de-d19f-4fe8-a615-2b7c10ceb828",
                "layout": {
                    "h": 7,
                    "w": 8,
                    "x": 0,
                    "y": 0,
                    "i": "043c26de-d19f-4fe8-a615-2b7c10ceb828",
                    "isResizable": true
                },
                "version": "3.0.0",
                "datasourceCate": "prometheus",
                "datasourceValue": "${prom}",
                "targets": [
                    {
                        "expr": "cpu_usage_active{ident=~\"$ident\"}",
                        "refId": "A",
                        "legend": "{{ident}}-使用率"
                    }
                ],
                "transformations": [
                    {
                        "id": "organize",
                        "options": {}
                    }
                ],
                "name": "CPU使用率",
                "maxPerRow": 4,
                "options": {
                    "tooltip": {
                        "mode": "all",
                        "sort": "desc"
                    },
                    "legend": {
                        "displayMode": "hidden",
                        "behaviour": "showItem"
                    },
                    "standardOptions": {
                        "util": "percent",
                        "min": 0,
                        "max": 101,
                        "decimals": 0
                    },
                    "thresholds": {
                        "steps": [
                            {
                                "color": "#634CD9",
                                "value": null,
                                "type": "base"
                            }
                        ]
                    }
                },
                "custom": {
                    "drawStyle": "lines",
                    "lineInterpolation": "smooth",
                    "spanNulls": false,
                    "lineWidth": 2,
                    "fillOpacity": 0,
                    "gradientMode": "none",
                    "stack": "off",
                    "scaleDistribution": {
                        "type": "linear"
                    }
                },
                "overrides": [
                    {
                        "matcher": {
                            "id": "byFrameRefID"
                        },
                        "properties": {
                            "rightYAxisDisplay": "off",
                            "standardOptions": {
                                "min": null,
                                "max": null,
                                "decimals": null
                            }
                        }
                    }
                ]
            },
            {
                "type": "timeseries",
                "id": "239aacdf-1982-428b-b240-57f4ce7f946d",
                "layout": {
                    "h": 7,
                    "w": 8,
                    "x": 8,
                    "y": 0,
                    "i": "239aacdf-1982-428b-b240-57f4ce7f946d",
                    "isResizable": true
                },
                "version": "3.0.0",
                "datasourceCate": "prometheus",
                "datasourceValue": "${prom}",
                "targets": [
                    {
                        "expr": "mem_used_percent{ident=~\"$ident\"}",
                        "refId": "A",
                        "legend": "{{ident}}-使用率"
                    }
                ],
                "transformations": [
                    {
                        "id": "organize",
                        "options": {}
                    }
                ],
                "name": "内存使用率",
                "maxPerRow": 4,
                "options": {
                    "tooltip": {
                        "mode": "all",
                        "sort": "desc"
                    },
                    "legend": {
                        "displayMode": "hidden",
                        "behaviour": "showItem"
                    },
                    "standardOptions": {
                        "util": "percent",
                        "min": 0,
                        "max": 101,
                        "decimals": 0
                    },
                    "thresholds": {
                        "steps": [
                            {
                                "color": "#634CD9",
                                "value": null,
                                "type": "base"
                            }
                        ]
                    }
                },
                "custom": {
                    "drawStyle": "lines",
                    "lineInterpolation": "smooth",
                    "spanNulls": false,
                    "lineWidth": 2,
                    "fillOpacity": 0,
                    "gradientMode": "none",
                    "stack": "off",
                    "scaleDistribution": {
                        "type": "linear"
                    }
                },
                "overrides": [
                    {
                        "matcher": {
                            "id": "byFrameRefID"
                        },
                        "properties": {
                            "rightYAxisDisplay": "off",
                            "standardOptions": {
                                "decimals": null,
                                "min": null,
                                "max": null
                            }
                        }
                    }
                ]
            },
            {
                "type": "timeseries",
                "id": "bbd1ebda-99f6-419c-90a5-5f84973976dd",
                "layout": {
                    "h": 7,
                    "w": 8,
                    "x": 16,
                    "y": 0,
                    "i": "bbd1ebda-99f6-419c-90a5-5f84973976dd",
                    "isResizable": true
                },
                "version": "3.0.0",
                "datasourceCate": "prometheus",
                "datasourceValue": "${prom}",
                "targets": [
                    {
                        "expr": "rate(diskio_read_bytes{ident=~\"$ident\"}[1m])",
                        "legend": "{{ident}}-{{name}}-读IO",
                        "refId": "A"
                    },
                    {
                        "expr": "rate(diskio_write_bytes{ident=~\"$ident\"}[1m])",
                        "legend": "{{ident}}-{{name}}-写IO",
                        "refId": "B"
                    }
                ],
                "transformations": [
                    {
                        "id": "organize",
                        "options": {}
                    }
                ],
                "name": "磁盘IO",
                "maxPerRow": 4,
                "options": {
                    "tooltip": {
                        "mode": "all",
                        "sort": "desc"
                    },
                    "legend": {
                        "displayMode": "hidden",
                        "behaviour": "showItem"
                    },
                    "standardOptions": {
                        "util": "bytesIEC",
                        "decimals": 0
                    },
                    "thresholds": {
                        "steps": [
                            {
                                "color": "#634CD9",
                                "value": null,
                                "type": "base"
                            }
                        ]
                    }
                },
                "custom": {
                    "drawStyle": "lines",
                    "lineInterpolation": "smooth",
                    "spanNulls": false,
                    "lineWidth": 2,
                    "fillOpacity": 0,
                    "gradientMode": "none",
                    "stack": "off",
                    "scaleDistribution": {
                        "type": "linear"
                    }
                },
                "overrides": [
                    {
                        "matcher": {
                            "id": "byFrameRefID"
                        },
                        "properties": {
                            "rightYAxisDisplay": "off"
                        }
                    }
                ]
            },
            {
                "type": "timeseries",
                "id": "f2ee5d32-737c-4095-b6b7-b15b778ffdb9",
                "layout": {
                    "h": 7,
                    "w": 8,
                    "x": 0,
                    "y": 7,
                    "i": "f2ee5d32-737c-4095-b6b7-b15b778ffdb9",
                    "isResizable": true
                },
                "version": "3.0.0",
                "datasourceCate": "prometheus",
                "datasourceValue": "${prom}",
                "targets": [
                    {
                        "expr": "rate(net_bytes_recv{ident=~\"$ident\"}[1m])",
                        "legend": "{{ident}}-入流量",
                        "refId": "A"
                    },
                    {
                        "expr": "rate(net_bytes_sent{ident=~\"$ident\"}[1m])",
                        "legend": "{{ident}}-出流量",
                        "refId": "B"
                    }
                ],
                "transformations": [
                    {
                        "id": "organize",
                        "options": {}
                    }
                ],
                "name": "网络流量",
                "maxPerRow": 4,
                "options": {
                    "tooltip": {
                        "mode": "all",
                        "sort": "desc"
                    },
                    "legend": {
                        "displayMode": "hidden",
                        "behaviour": "showItem"
                    },
                    "standardOptions": {
                        "util": "bytesIEC",
                        "decimals": 0
                    },
                    "thresholds": {
                        "steps": [
                            {
                                "color": "#634CD9",
                                "value": null,
                                "type": "base"
                            }
                        ]
                    }
                },
                "custom": {
                    "drawStyle": "lines",
                    "lineInterpolation": "smooth",
                    "spanNulls": false,
                    "lineWidth": 2,
                    "fillOpacity": 0,
                    "gradientMode": "none",
                    "stack": "off",
                    "scaleDistribution": {
                        "type": "linear"
                    }
                },
                "overrides": [
                    {
                        "matcher": {
                            "id": "byFrameRefID"
                        },
                        "properties": {
                            "rightYAxisDisplay": "off"
                        }
                    }
                ]
            },
            {
                "type": "timeseries",
                "id": "6be9a2be-1d4c-488d-b695-aa1d82df3a3c",
                "layout": {
                    "h": 7,
                    "w": 8,
                    "x": 8,
                    "y": 7,
                    "i": "e164a7cb-394c-4670-b83c-e9321a08cbe6",
                    "isResizable": true
                },
                "version": "3.0.0",
                "datasourceCate": "prometheus",
                "datasourceValue": "${prom}",
                "targets": [
                    {
                        "expr": "nvidia_smi_utilization_gpu_ratio{ident=~\"$ident\"}",
                        "legend": "{{ident}}-使用率",
                        "refId": "A"
                    }
                ],
                "transformations": [
                    {
                        "id": "organize",
                        "options": {}
                    }
                ],
                "name": "显卡使用率",
                "maxPerRow": 4,
                "options": {
                    "tooltip": {
                        "mode": "all",
                        "sort": "desc"
                    },
                    "legend": {
                        "displayMode": "hidden",
                        "behaviour": "showItem"
                    },
                    "standardOptions": {
                        "util": "percentUnit",
                        "min": 0,
                        "max": 1.01,
                        "decimals": 0
                    },
                    "thresholds": {
                        "steps": [
                            {
                                "color": "#634CD9",
                                "value": null,
                                "type": "base"
                            }
                        ]
                    }
                },
                "custom": {
                    "drawStyle": "lines",
                    "lineInterpolation": "smooth",
                    "spanNulls": false,
                    "lineWidth": 2,
                    "fillOpacity": 0,
                    "gradientMode": "none",
                    "stack": "off",
                    "scaleDistribution": {
                        "type": "linear"
                    }
                },
                "overrides": [
                    {
                        "matcher": {
                            "id": "byFrameRefID"
                        },
                        "properties": {
                            "rightYAxisDisplay": "off"
                        }
                    }
                ]
            },
            {
                "type": "timeseries",
                "id": "7873f825-1e41-45e9-a1ee-792a87fd4351",
                "layout": {
                    "h": 7,
                    "w": 8,
                    "x": 16,
                    "y": 7,
                    "i": "37ced102-b020-4e3f-8247-6b2c9240a762",
                    "isResizable": true
                },
                "version": "3.0.0",
                "datasourceCate": "prometheus",
                "datasourceValue": "${prom}",
                "targets": [
                    {
                        "expr": "nvidia_smi_memory_used_bytes/nvidia_smi_memory_total_bytes{ident=~\"$ident\"}",
                        "legend": "{{ident}}-使用率",
                        "refId": "A"
                    }
                ],
                "transformations": [
                    {
                        "id": "organize",
                        "options": {}
                    }
                ],
                "name": "显存使用率",
                "maxPerRow": 4,
                "options": {
                    "tooltip": {
                        "mode": "all",
                        "sort": "desc"
                    },
                    "legend": {
                        "displayMode": "hidden",
                        "behaviour": "showItem"
                    },
                    "standardOptions": {
                        "util": "percentUnit",
                        "min": 0,
                        "max": 1.01,
                        "decimals": 0
                    },
                    "thresholds": {
                        "steps": [
                            {
                                "color": "#634CD9",
                                "value": null,
                                "type": "base"
                            }
                        ]
                    }
                },
                "custom": {
                    "drawStyle": "lines",
                    "lineInterpolation": "smooth",
                    "spanNulls": false,
                    "lineWidth": 2,
                    "fillOpacity": 0,
                    "gradientMode": "none",
                    "stack": "off",
                    "scaleDistribution": {
                        "type": "linear"
                    }
                },
                "overrides": [
                    {
                        "matcher": {
                            "id": "byFrameRefID"
                        },
                        "properties": {
                            "rightYAxisDisplay": "off"
                        }
                    }
                ]
            }
        ],
        "var": [
            {
                "definition": "prometheus",
                "name": "prom",
                "type": "datasource"
            },
            {
                "allOption": true,
                "datasource": {
                    "cate": "prometheus",
                    "value": "${prom}"
                },
                "definition": "label_values(system_load1,ident)",
                "multi": true,
                "name": "ident",
                "type": "query"
            }
        ],
        "version": "3.0.0"
    }
}
  • 仪表盘 效果
    在这里插入图片描述

3. 服务监控

  • 仪表盘 JSON
{
    "name": "服务监控",
    "tags": "",
    "ident": "",
    "configs": {
        "panels": [
            {
                "type": "timeseries",
                "id": "043c26de-d19f-4fe8-a615-2b7c10ceb828",
                "layout": {
                    "h": 6,
                    "w": 8,
                    "x": 0,
                    "y": 0,
                    "i": "043c26de-d19f-4fe8-a615-2b7c10ceb828",
                    "isResizable": true
                },
                "version": "3.0.0",
                "datasourceCate": "prometheus",
                "datasourceValue": "${prom}",
                "targets": [
                    {
                        "expr": "mysql_global_status_threads_connected{ident=~\"$ident\"}",
                        "refId": "A",
                        "legend": "{{ident}}-当前连接数"
                    }
                ],
                "transformations": [
                    {
                        "id": "organize",
                        "options": {}
                    }
                ],
                "name": "MySQL 连接数",
                "maxPerRow": 4,
                "options": {
                    "tooltip": {
                        "mode": "all",
                        "sort": "desc"
                    },
                    "legend": {
                        "displayMode": "hidden",
                        "behaviour": "showItem"
                    },
                    "standardOptions": {
                        "min": null,
                        "max": null,
                        "decimals": null
                    },
                    "thresholds": {
                        "steps": [
                            {
                                "color": "#634CD9",
                                "value": null,
                                "type": "base"
                            }
                        ]
                    }
                },
                "custom": {
                    "drawStyle": "lines",
                    "lineInterpolation": "smooth",
                    "spanNulls": false,
                    "lineWidth": 2,
                    "fillOpacity": 0,
                    "gradientMode": "none",
                    "stack": "off",
                    "scaleDistribution": {
                        "type": "linear"
                    }
                },
                "overrides": [
                    {
                        "matcher": {
                            "id": "byFrameRefID"
                        },
                        "properties": {
                            "rightYAxisDisplay": "off",
                            "standardOptions": {
                                "min": null,
                                "max": null,
                                "decimals": null
                            }
                        }
                    }
                ]
            },
            {
                "type": "timeseries",
                "id": "bbd1ebda-99f6-419c-90a5-5f84973976dd",
                "layout": {
                    "h": 6,
                    "w": 8,
                    "x": 8,
                    "y": 0,
                    "i": "bbd1ebda-99f6-419c-90a5-5f84973976dd",
                    "isResizable": true
                },
                "version": "3.0.0",
                "datasourceCate": "prometheus",
                "datasourceValue": "${prom}",
                "targets": [
                    {
                        "expr": "mysql_global_status_slow_queries{ident=~\"$ident\"}",
                        "legend": "{{ident}}-慢查询",
                        "refId": "A"
                    }
                ],
                "transformations": [
                    {
                        "id": "organize",
                        "options": {}
                    }
                ],
                "name": "MySQL 慢查询数",
                "maxPerRow": 4,
                "options": {
                    "tooltip": {
                        "mode": "all",
                        "sort": "desc"
                    },
                    "legend": {
                        "displayMode": "hidden",
                        "behaviour": "showItem"
                    },
                    "standardOptions": {
                        "decimals": null
                    },
                    "thresholds": {
                        "steps": [
                            {
                                "color": "#634CD9",
                                "value": null,
                                "type": "base"
                            }
                        ]
                    }
                },
                "custom": {
                    "drawStyle": "lines",
                    "lineInterpolation": "smooth",
                    "spanNulls": false,
                    "lineWidth": 2,
                    "fillOpacity": 0,
                    "gradientMode": "none",
                    "stack": "off",
                    "scaleDistribution": {
                        "type": "linear"
                    }
                },
                "overrides": [
                    {
                        "matcher": {
                            "id": "byFrameRefID"
                        },
                        "properties": {
                            "rightYAxisDisplay": "off"
                        }
                    }
                ]
            },
            {
                "type": "timeseries",
                "id": "3ca8db64-b25e-4e72-8dac-187cec4886ae",
                "layout": {
                    "h": 6,
                    "w": 8,
                    "x": 16,
                    "y": 0,
                    "i": "7174939f-2742-47bd-a023-5d1d3698bf76",
                    "isResizable": true
                },
                "version": "3.0.0",
                "datasourceCate": "prometheus",
                "datasourceValue": "${prom}",
                "targets": [
                    {
                        "expr": "mtail_login_num{ident=~\"$ident\"}",
                        "legend": "{{ident}}-登录",
                        "refId": "A",
                        "time": {
                            "start": "now-24h",
                            "end": "now"
                        }
                    }
                ],
                "transformations": [
                    {
                        "id": "organize",
                        "options": {}
                    }
                ],
                "name": "登录 日志数",
                "maxPerRow": 4,
                "options": {
                    "tooltip": {
                        "mode": "all",
                        "sort": "desc"
                    },
                    "legend": {
                        "displayMode": "hidden",
                        "behaviour": "showItem"
                    },
                    "standardOptions": {
                        "decimals": 0
                    },
                    "thresholds": {
                        "steps": [
                            {
                                "color": "#634CD9",
                                "value": null,
                                "type": "base"
                            }
                        ]
                    }
                },
                "custom": {
                    "drawStyle": "lines",
                    "lineInterpolation": "smooth",
                    "spanNulls": false,
                    "lineWidth": 2,
                    "fillOpacity": 0,
                    "gradientMode": "none",
                    "stack": "off",
                    "scaleDistribution": {
                        "type": "linear"
                    }
                },
                "overrides": [
                    {
                        "matcher": {
                            "id": "byFrameRefID"
                        },
                        "properties": {
                            "rightYAxisDisplay": "off"
                        }
                    }
                ]
            },
            {
                "type": "timeseries",
                "id": "093b192e-e991-4590-ab4b-aa768159e00f",
                "layout": {
                    "h": 6,
                    "w": 8,
                    "x": 0,
                    "y": 6,
                    "i": "a18a3bd3-8c2b-4fa2-81f3-7b0d00b49cc9",
                    "isResizable": true
                },
                "version": "3.0.0",
                "datasourceCate": "prometheus",
                "datasourceValue": "${prom}",
                "targets": [
                    {
                        "expr": "redis_connected_clients{ident=~\"$ident\"}",
                        "refId": "A",
                        "legend": "{{ident}}-当前连接数"
                    }
                ],
                "transformations": [
                    {
                        "id": "organize",
                        "options": {}
                    }
                ],
                "name": "Redis 连接数",
                "maxPerRow": 4,
                "options": {
                    "tooltip": {
                        "mode": "all",
                        "sort": "desc"
                    },
                    "legend": {
                        "displayMode": "hidden",
                        "behaviour": "showItem"
                    },
                    "standardOptions": {
                        "min": null,
                        "max": null,
                        "decimals": null
                    },
                    "thresholds": {
                        "steps": [
                            {
                                "color": "#634CD9",
                                "value": null,
                                "type": "base"
                            }
                        ]
                    }
                },
                "custom": {
                    "drawStyle": "lines",
                    "lineInterpolation": "smooth",
                    "spanNulls": false,
                    "lineWidth": 2,
                    "fillOpacity": 0.01,
                    "gradientMode": "none",
                    "stack": "off",
                    "scaleDistribution": {
                        "type": "linear"
                    }
                },
                "overrides": [
                    {
                        "matcher": {
                            "id": "byFrameRefID"
                        },
                        "properties": {
                            "rightYAxisDisplay": "off",
                            "standardOptions": {
                                "min": null,
                                "max": null,
                                "decimals": null
                            }
                        }
                    }
                ]
            },
            {
                "type": "timeseries",
                "id": "2674442f-937f-4027-806b-10b2286b14f6",
                "layout": {
                    "h": 6,
                    "w": 8,
                    "x": 8,
                    "y": 6,
                    "i": "c8c061df-894d-458e-a89d-86a8428c52c9",
                    "isResizable": true
                },
                "version": "3.0.0",
                "datasourceCate": "prometheus",
                "datasourceValue": "${prom}",
                "targets": [
                    {
                        "expr": "redis_used_memory{ident=~\"$ident\"}",
                        "legend": "{{ident}}-内存",
                        "refId": "A"
                    }
                ],
                "transformations": [
                    {
                        "id": "organize",
                        "options": {}
                    }
                ],
                "name": "Redis 使用内存",
                "maxPerRow": 4,
                "options": {
                    "tooltip": {
                        "mode": "all",
                        "sort": "desc"
                    },
                    "legend": {
                        "displayMode": "hidden",
                        "behaviour": "showItem"
                    },
                    "standardOptions": {
                        "decimals": null
                    },
                    "thresholds": {
                        "steps": [
                            {
                                "color": "#634CD9",
                                "value": null,
                                "type": "base"
                            }
                        ]
                    }
                },
                "custom": {
                    "drawStyle": "lines",
                    "lineInterpolation": "smooth",
                    "spanNulls": false,
                    "lineWidth": 2,
                    "fillOpacity": 0,
                    "gradientMode": "none",
                    "stack": "off",
                    "scaleDistribution": {
                        "type": "linear"
                    }
                },
                "overrides": [
                    {
                        "matcher": {
                            "id": "byFrameRefID"
                        },
                        "properties": {
                            "rightYAxisDisplay": "off"
                        }
                    }
                ]
            },
            {
                "type": "timeseries",
                "id": "d26e8bc3-16a0-4a60-9aa9-36d71b85abc5",
                "layout": {
                    "h": 6,
                    "w": 8,
                    "x": 16,
                    "y": 6,
                    "i": "0a3310ea-74ca-48fa-8c18-52c1b0f71235",
                    "isResizable": true
                },
                "version": "3.0.0",
                "datasourceCate": "prometheus",
                "datasourceValue": "${prom}",
                "targets": [
                    {
                        "expr": "mtail_error_num{ident=~\"$ident\"}",
                        "legend": "{{ident}}-错误",
                        "refId": "A",
                        "time": {
                            "start": "now-24h",
                            "end": "now"
                        }
                    }
                ],
                "transformations": [
                    {
                        "id": "organize",
                        "options": {}
                    }
                ],
                "name": "Error 日志数",
                "maxPerRow": 4,
                "options": {
                    "tooltip": {
                        "mode": "all",
                        "sort": "desc"
                    },
                    "legend": {
                        "displayMode": "hidden",
                        "behaviour": "showItem"
                    },
                    "standardOptions": {
                        "decimals": 0
                    },
                    "thresholds": {
                        "steps": [
                            {
                                "color": "#634CD9",
                                "value": null,
                                "type": "base"
                            }
                        ]
                    }
                },
                "custom": {
                    "drawStyle": "lines",
                    "lineInterpolation": "smooth",
                    "spanNulls": false,
                    "lineWidth": 2,
                    "fillOpacity": 0,
                    "gradientMode": "none",
                    "stack": "off",
                    "scaleDistribution": {
                        "type": "linear"
                    }
                },
                "overrides": [
                    {
                        "matcher": {
                            "id": "byFrameRefID"
                        },
                        "properties": {
                            "rightYAxisDisplay": "off"
                        }
                    }
                ]
            },
            {
                "type": "timeseries",
                "id": "7fa2cdbe-b782-4b71-bd7e-2cdba7455e77",
                "layout": {
                    "h": 6,
                    "w": 8,
                    "x": 0,
                    "y": 12,
                    "i": "9a2e4d49-7a4f-4627-b2f6-cbe0e4ab04b1",
                    "isResizable": true
                },
                "version": "3.0.0",
                "datasourceCate": "prometheus",
                "datasourceValue": "${prom}",
                "targets": [
                    {
                        "expr": "nginx_active{ident=~\"$ident\"}",
                        "refId": "A",
                        "legend": "{{ident}}-活跃连接"
                    }
                ],
                "transformations": [
                    {
                        "id": "organize",
                        "options": {}
                    }
                ],
                "name": "Nginx 活跃连接数",
                "maxPerRow": 4,
                "options": {
                    "tooltip": {
                        "mode": "all",
                        "sort": "desc"
                    },
                    "legend": {
                        "displayMode": "hidden",
                        "behaviour": "showItem"
                    },
                    "standardOptions": {
                        "min": null,
                        "max": null,
                        "decimals": null
                    },
                    "thresholds": {
                        "steps": [
                            {
                                "color": "#634CD9",
                                "value": null,
                                "type": "base"
                            }
                        ]
                    }
                },
                "custom": {
                    "drawStyle": "lines",
                    "lineInterpolation": "smooth",
                    "spanNulls": false,
                    "lineWidth": 2,
                    "fillOpacity": 0,
                    "gradientMode": "none",
                    "stack": "off",
                    "scaleDistribution": {
                        "type": "linear"
                    }
                },
                "overrides": [
                    {
                        "matcher": {
                            "id": "byFrameRefID"
                        },
                        "properties": {
                            "rightYAxisDisplay": "off",
                            "standardOptions": {
                                "min": null,
                                "max": null,
                                "decimals": null
                            }
                        }
                    }
                ]
            },
            {
                "type": "timeseries",
                "id": "0cb01432-ea29-41f4-8e6f-e6b9b71e90ab",
                "layout": {
                    "h": 6,
                    "w": 8,
                    "x": 8,
                    "y": 12,
                    "i": "8bf97e38-e840-4804-a686-28bb65fec78d",
                    "isResizable": true
                },
                "version": "3.0.0",
                "datasourceCate": "prometheus",
                "datasourceValue": "${prom}",
                "targets": [
                    {
                        "expr": "docker_n_containers_running{ident=~\"$ident\"}",
                        "refId": "A",
                        "legend": "{{ident}}-启动容器"
                    }
                ],
                "transformations": [
                    {
                        "id": "organize",
                        "options": {}
                    }
                ],
                "name": "Docker 启动容器数",
                "maxPerRow": 4,
                "options": {
                    "tooltip": {
                        "mode": "all",
                        "sort": "desc"
                    },
                    "legend": {
                        "displayMode": "hidden",
                        "behaviour": "showItem"
                    },
                    "standardOptions": {
                        "min": null,
                        "max": null,
                        "decimals": null
                    },
                    "thresholds": {
                        "steps": [
                            {
                                "color": "#634CD9",
                                "value": null,
                                "type": "base"
                            }
                        ]
                    }
                },
                "custom": {
                    "drawStyle": "lines",
                    "lineInterpolation": "smooth",
                    "spanNulls": false,
                    "lineWidth": 2,
                    "fillOpacity": 0,
                    "gradientMode": "none",
                    "stack": "off",
                    "scaleDistribution": {
                        "type": "linear"
                    }
                },
                "overrides": [
                    {
                        "matcher": {
                            "id": "byFrameRefID"
                        },
                        "properties": {
                            "rightYAxisDisplay": "off",
                            "standardOptions": {
                                "min": null,
                                "max": null,
                                "decimals": null
                            }
                        }
                    }
                ]
            },
            {
                "type": "timeseries",
                "id": "936b934b-6340-4743-8c12-821c63210fd6",
                "layout": {
                    "h": 6,
                    "w": 8,
                    "x": 16,
                    "y": 12,
                    "i": "c6da1998-c1e3-4486-a24c-58e26d349206",
                    "isResizable": true
                },
                "version": "3.0.0",
                "datasourceCate": "prometheus",
                "datasourceValue": "${prom}",
                "targets": [
                    {
                        "expr": "docker_container_mem_usage{ident=~\"$ident\"}",
                        "legend": "{{ident}}-{{container_name}}-内存",
                        "refId": "A"
                    }
                ],
                "transformations": [
                    {
                        "id": "organize",
                        "options": {}
                    }
                ],
                "name": "Docker 内存使用率",
                "maxPerRow": 4,
                "options": {
                    "tooltip": {
                        "mode": "all",
                        "sort": "desc"
                    },
                    "legend": {
                        "displayMode": "hidden",
                        "behaviour": "showItem"
                    },
                    "standardOptions": {
                        "decimals": 0
                    },
                    "thresholds": {
                        "steps": [
                            {
                                "color": "#634CD9",
                                "value": null,
                                "type": "base"
                            }
                        ]
                    }
                },
                "custom": {
                    "drawStyle": "lines",
                    "lineInterpolation": "smooth",
                    "spanNulls": false,
                    "lineWidth": 2,
                    "fillOpacity": 0,
                    "gradientMode": "none",
                    "stack": "off",
                    "scaleDistribution": {
                        "type": "linear"
                    }
                },
                "overrides": [
                    {
                        "matcher": {
                            "id": "byFrameRefID"
                        },
                        "properties": {
                            "rightYAxisDisplay": "off"
                        }
                    }
                ]
            }
        ],
        "var": [
            {
                "definition": "prometheus",
                "name": "prom",
                "type": "datasource"
            },
            {
                "allOption": true,
                "datasource": {
                    "cate": "prometheus",
                    "value": "${prom}"
                },
                "definition": "label_values(system_load1,ident)",
                "multi": true,
                "name": "ident",
                "type": "query"
            }
        ],
        "version": "3.0.0"
    }
}
  • 仪表盘 效果
    在这里插入图片描述

三、告警配置

1. 邮件通知

  • 配置 SMTP
    在这里插入图片描述
  • 配置 用户邮箱在这里插入图片描述
  • 配置 邮件通知模板在这里插入图片描述
<!DOCTYPE html>
	<html lang="en">
	<head>
		<meta charset="UTF-8">
		<meta http-equiv="X-UA-Compatible" content="ie=edge">
		<title>夜莺告警通知</title>
		<style type="text/css">
			.wrapper {
				background-color: #f8f8f8;
				padding: 15px;
				height: 100%;
			}
			.main {
				width: 600px;
				padding: 30px;
				margin: 0 auto;
				background-color: #fff;
				font-size: 12px;
				font-family: verdana,'Microsoft YaHei',Consolas,'Deja Vu Sans Mono','Bitstream Vera Sans Mono';
			}
			header {
				border-radius: 2px 2px 0 0;
			}
			header .title {
				font-size: 14px;
				color: #333333;
				margin: 0;
			}
			header .sub-desc {
				color: #333;
				font-size: 14px;
				margin-top: 6px;
				margin-bottom: 0;
			}
			hr {
				margin: 20px 0;
				height: 0;
				border: none;
				border-top: 1px solid #e5e5e5;
			}
			em {
				font-weight: 600;
			}
			table {
				margin: 20px 0;
				width: 100%;
			}
	
			table tbody tr{
				font-weight: 200;
				font-size: 12px;
				color: #666;
				height: 32px;
			}
			.succ {
				background-color: green;
				color: #fff;
			}
			.fail {
				background-color: red;
				color: #fff;
			}
			.succ th, .succ td, .fail th, .fail td {
				color: #fff;
			}
			table tbody tr th {
				width: 80px;
				text-align: right;
			}
			.text-right {
				text-align: right;
			}
			.body {
				margin-top: 24px;
			}
			.body-text {
				color: #666666;
				-webkit-font-smoothing: antialiased;
			}
			.body-extra {
				-webkit-font-smoothing: antialiased;
			}
			.body-extra.text-right a {
				text-decoration: none;
				color: #333;
			}
			.body-extra.text-right a:hover {
				color: #666;
			}
			.button {
				width: 200px;
				height: 50px;
				margin-top: 20px;
				text-align: center;
				border-radius: 2px;
				background: #2D77EE;
				line-height: 50px;
				font-size: 20px;
				color: #FFFFFF;
				cursor: pointer;
			}
			.button:hover {
				background: rgb(25, 115, 255);
				border-color: rgb(25, 115, 255);
				color: #fff;
			}
			footer {
				margin-top: 10px;
				text-align: right;
			}
			.footer-logo {
				text-align: right;
			}
			.footer-logo-image {
				width: 108px;
				height: 27px;
				margin-right: 10px;
			}
			.copyright {
				margin-top: 10px;
				font-size: 12px;
				text-align: right;
				color: #999;
				-webkit-font-smoothing: antialiased;
			}
		</style>
	</head>
	<body>
	<div class="wrapper">
		<div class="main">
			<header>
				<h3 class="title">{{.RuleName}}</h3>
				<p class="sub-desc"></p>
			</header>
			<hr>
			<div class="body">
				<table cellspacing="0" cellpadding="0" border="0">
					<tbody>
					{{if .IsRecovered}}
					<tr class="succ">
						<th>级别状态:</th>
						<td>S{{.Severity}} Recovered</td>
					</tr>
					{{else}}
					<tr class="fail">
						<th>级别状态:</th>
						<td>S{{.Severity}} Triggered</td>
					</tr>
					{{end}}
	
					{{if not .IsRecovered}}
					<tr>
						<th>触发时值:</th>
						<td>{{.TriggerValue}}</td>
					</tr>
					{{end}}
	
					{{if .TargetIdent}}
					<tr>
						<th>监控对象:</th>
						<td>{{.TargetIdent}}</td>
					</tr>
					{{end}}
					<tr>
						<th>监控指标:</th>
						<td>{{.TagsJSON}}</td>
					</tr>

                    {{$time_duration := sub now.Unix .FirstTriggerTime }}
					{{if .IsRecovered}}
					<tr>
						<th>持续时间:</th>
						<td>{{humanizeDurationInterface $time_duration}}</td>
					</tr>
					<tr>
						<th>恢复时间:</th>
						<td>{{timeformat .LastEvalTime}}</td>
					</tr>
					{{else}}
					<tr>
						<th>触发时间:</th>
						<td>
							{{timeformat .TriggerTime}}
						</td>
					</tr>
					{{end}}
					</tbody>
				</table>
			</div>
		</div>
	</div>
	</body>
	</html>

2. 告警规则

  • CPU 使用率超过90%
[
  {
    "cate": "prometheus",
    "datasource_ids": [
      0
    ],
    "name": "CPU 使用率超过90%",
    "note": "",
    "prod": "metric",
    "algorithm": "",
    "algo_params": null,
    "delay": 0,
    "severity": 0,
    "severities": [
      1
    ],
    "disabled": 0,
    "prom_for_duration": 60,
    "prom_ql": "",
    "rule_config": {
      "inhibit": true,
      "queries": [
        {
          "keys": {
            "labelKey": "",
            "valueKey": ""
          },
          "prom_ql": "cpu_usage_active > 90",
          "severity": 1
        }
      ]
    },
    "prom_eval_interval": 15,
    "enable_stime": "00:00",
    "enable_stimes": [
      "00:00"
    ],
    "enable_etime": "23:59",
    "enable_etimes": [
      "23:59"
    ],
    "enable_days_of_week": [
      "1",
      "2",
      "3",
      "4",
      "5",
      "6",
      "0"
    ],
    "enable_days_of_weeks": [
      [
        "1",
        "2",
        "3",
        "4",
        "5",
        "6",
        "0"
      ]
    ],
    "enable_in_bg": 0,
    "notify_recovered": 1,
    "notify_channels": [
      "email"
    ],
    "notify_repeat_step": 60,
    "notify_max_number": 3,
    "recover_duration": 60,
    "callbacks": [],
    "runbook_url": "",
    "append_tags": [],
    "annotations": {},
    "extra_config": null
  }
]
  • MySQL 1分钟内慢查询数超过10个
[
  {
    "cate": "prometheus",
    "datasource_ids": [
      0
    ],
    "name": "MySQL 1分钟内慢查询数超过10个",
    "note": "",
    "prod": "metric",
    "algorithm": "",
    "algo_params": null,
    "delay": 0,
    "severity": 0,
    "severities": [
      1
    ],
    "disabled": 0,
    "prom_for_duration": 120,
    "prom_ql": "",
    "rule_config": {
      "inhibit": false,
      "queries": [
        {
          "keys": {
            "labelKey": "",
            "valueKey": ""
          },
          "prom_ql": "increase(mysql_global_status_slow_queries[1m]) > 10",
          "severity": 1
        }
      ]
    },
    "prom_eval_interval": 15,
    "enable_stime": "00:00",
    "enable_stimes": [
      "00:00"
    ],
    "enable_etime": "23:59",
    "enable_etimes": [
      "23:59"
    ],
    "enable_days_of_week": [
      "1",
      "2",
      "3",
      "4",
      "5",
      "6",
      "0"
    ],
    "enable_days_of_weeks": [
      [
        "1",
        "2",
        "3",
        "4",
        "5",
        "6",
        "0"
      ]
    ],
    "enable_in_bg": 0,
    "notify_recovered": 1,
    "notify_channels": [
      "email"
    ],
    "notify_repeat_step": 60,
    "notify_max_number": 3,
    "recover_duration": 60,
    "callbacks": [],
    "runbook_url": "",
    "append_tags": [],
    "annotations": {},
    "extra_config": null
  }
]
  • MySQL 连接数超过80%
[
  {
    "cate": "prometheus",
    "datasource_ids": [
      0
    ],
    "name": "MySQL 连接数超过80%",
    "note": "",
    "prod": "metric",
    "algorithm": "",
    "algo_params": null,
    "delay": 0,
    "severity": 0,
    "severities": [
      1
    ],
    "disabled": 0,
    "prom_for_duration": 120,
    "prom_ql": "",
    "rule_config": {
      "inhibit": false,
      "queries": [
        {
          "keys": {
            "labelKey": "",
            "valueKey": ""
          },
          "prom_ql": "avg by (instance) (mysql_global_status_threads_connected) / avg by (instance) (mysql_global_variables_max_connections) * 100 > 80",
          "severity": 1
        }
      ]
    },
    "prom_eval_interval": 15,
    "enable_stime": "00:00",
    "enable_stimes": [
      "00:00"
    ],
    "enable_etime": "23:59",
    "enable_etimes": [
      "23:59"
    ],
    "enable_days_of_week": [
      "1",
      "2",
      "3",
      "4",
      "5",
      "6",
      "0"
    ],
    "enable_days_of_weeks": [
      [
        "1",
        "2",
        "3",
        "4",
        "5",
        "6",
        "0"
      ]
    ],
    "enable_in_bg": 0,
    "notify_recovered": 1,
    "notify_channels": [
      "email"
    ],
    "notify_repeat_step": 60,
    "notify_max_number": 3,
    "recover_duration": 60,
    "callbacks": [],
    "runbook_url": "",
    "append_tags": [],
    "annotations": {},
    "extra_config": null
  }
]
  • 内存 使用率超过85%
[
  {
    "cate": "prometheus",
    "datasource_ids": [
      0
    ],
    "name": "内存 使用率超过85%",
    "note": "",
    "prod": "metric",
    "algorithm": "",
    "algo_params": null,
    "delay": 0,
    "severity": 0,
    "severities": [
      1
    ],
    "disabled": 0,
    "prom_for_duration": 60,
    "prom_ql": "",
    "rule_config": {
      "inhibit": true,
      "queries": [
        {
          "keys": {
            "labelKey": "",
            "valueKey": ""
          },
          "prom_ql": "mem_used_percent > 85",
          "severity": 1
        }
      ]
    },
    "prom_eval_interval": 15,
    "enable_stime": "00:00",
    "enable_stimes": [
      "00:00"
    ],
    "enable_etime": "23:59",
    "enable_etimes": [
      "23:59"
    ],
    "enable_days_of_week": [
      "1",
      "2",
      "3",
      "4",
      "5",
      "6",
      "0"
    ],
    "enable_days_of_weeks": [
      [
        "1",
        "2",
        "3",
        "4",
        "5",
        "6",
        "0"
      ]
    ],
    "enable_in_bg": 0,
    "notify_recovered": 1,
    "notify_channels": [
      "email"
    ],
    "notify_repeat_step": 60,
    "notify_max_number": 3,
    "recover_duration": 60,
    "callbacks": [],
    "runbook_url": "",
    "append_tags": [],
    "annotations": {},
    "extra_config": null
  }
]
  • 硬盘 使用率超过80%
[
  {
    "cate": "prometheus",
    "datasource_ids": [
      0
    ],
    "name": "硬盘 使用率超过80%",
    "note": "",
    "prod": "metric",
    "algorithm": "",
    "algo_params": null,
    "delay": 0,
    "severity": 0,
    "severities": [
      1
    ],
    "disabled": 0,
    "prom_for_duration": 60,
    "prom_ql": "",
    "rule_config": {
      "inhibit": true,
      "queries": [
        {
          "keys": {
            "labelKey": "",
            "valueKey": ""
          },
          "prom_ql": "disk_used_percent > 80",
          "severity": 1
        }
      ]
    },
    "prom_eval_interval": 30,
    "enable_stime": "00:00",
    "enable_stimes": [
      "00:00"
    ],
    "enable_etime": "23:59",
    "enable_etimes": [
      "23:59"
    ],
    "enable_days_of_week": [
      "0",
      "1",
      "2",
      "3",
      "4",
      "5",
      "6"
    ],
    "enable_days_of_weeks": [
      [
        "0",
        "1",
        "2",
        "3",
        "4",
        "5",
        "6"
      ]
    ],
    "enable_in_bg": 0,
    "notify_recovered": 1,
    "notify_channels": [],
    "notify_repeat_step": 60,
    "notify_max_number": 3,
    "recover_duration": 60,
    "callbacks": [],
    "runbook_url": "",
    "append_tags": [],
    "annotations": {},
    "extra_config": null
  }
]
  • 网络 入流量超过6M/s
[
  {
    "cate": "prometheus",
    "datasource_ids": [
      0
    ],
    "name": "网络 入流量超过6M/s",
    "note": "",
    "prod": "metric",
    "algorithm": "",
    "algo_params": null,
    "delay": 0,
    "severity": 0,
    "severities": [
      1
    ],
    "disabled": 0,
    "prom_for_duration": 60,
    "prom_ql": "",
    "rule_config": {
      "inhibit": false,
      "queries": [
        {
          "keys": {
            "labelKey": "",
            "valueKey": ""
          },
          "prom_ql": "rate(net_bytes_recv[1m]) / 1024 / 1024 > 6",
          "severity": 1
        }
      ]
    },
    "prom_eval_interval": 15,
    "enable_stime": "00:00",
    "enable_stimes": [
      "00:00"
    ],
    "enable_etime": "23:59",
    "enable_etimes": [
      "23:59"
    ],
    "enable_days_of_week": [
      "1",
      "2",
      "3",
      "4",
      "5",
      "6",
      "0"
    ],
    "enable_days_of_weeks": [
      [
        "1",
        "2",
        "3",
        "4",
        "5",
        "6",
        "0"
      ]
    ],
    "enable_in_bg": 0,
    "notify_recovered": 1,
    "notify_channels": [
      "email"
    ],
    "notify_repeat_step": 60,
    "notify_max_number": 3,
    "recover_duration": 60,
    "callbacks": [],
    "runbook_url": "",
    "append_tags": [],
    "annotations": {},
    "extra_config": null
  }
]
  • 网络 出流量超过6M/s
[
  {
    "cate": "prometheus",
    "datasource_ids": [
      0
    ],
    "name": "网络 出流量超过6M/s",
    "note": "",
    "prod": "metric",
    "algorithm": "",
    "algo_params": null,
    "delay": 0,
    "severity": 0,
    "severities": [
      1
    ],
    "disabled": 0,
    "prom_for_duration": 60,
    "prom_ql": "",
    "rule_config": {
      "inhibit": false,
      "queries": [
        {
          "keys": {
            "labelKey": "",
            "valueKey": ""
          },
          "prom_ql": "rate(net_bytes_sent[1m]) / 1024 / 1024 > 6",
          "severity": 1
        }
      ]
    },
    "prom_eval_interval": 15,
    "enable_stime": "00:00",
    "enable_stimes": [
      "00:00"
    ],
    "enable_etime": "23:59",
    "enable_etimes": [
      "23:59"
    ],
    "enable_days_of_week": [
      "1",
      "2",
      "3",
      "4",
      "5",
      "6",
      "0"
    ],
    "enable_days_of_weeks": [
      [
        "1",
        "2",
        "3",
        "4",
        "5",
        "6",
        "0"
      ]
    ],
    "enable_in_bg": 0,
    "notify_recovered": 1,
    "notify_channels": [
      "email"
    ],
    "notify_repeat_step": 60,
    "notify_max_number": 3,
    "recover_duration": 60,
    "callbacks": [],
    "runbook_url": "",
    "append_tags": [],
    "annotations": {},
    "extra_config": null
  }
]

3. 告警自愈

  • 自愈配置
    在这里插入图片描述
  • 测试告警自愈
    告警自愈 > 自愈脚本 > 创建
    在这里插入图片描述
    告警自愈 > 自愈脚本 > test 创建任务 > 保存立刻执行 > 执行历史 > 点击标题下的任务
    在这里插入图片描述

本文来自互联网用户投稿,该文观点仅代表作者本人,不代表本站立场。本站仅提供信息存储空间服务,不拥有所有权,不承担相关法律责任。如若转载,请注明出处:/a/303503.html

如若内容造成侵权/违法违规/事实不符,请联系我们进行投诉反馈qq邮箱809451989@qq.com,一经查实,立即删除!

相关文章

一种DevOpts的实现方式:基于gitlab的CICD(二)

写在之前 前文已经搭建了基于gitlab的cicd环境&#xff0c;现在我们来更近一步&#xff0c;结合官网给出的案例来详细介绍如何一步一步实现CI的过程。 基于gitlab搭建一个前端静态页面 环境依赖&#xff1a; gitlabgitlab runner&#xff08;docker版本&#xff09; 环境达吉…

【华为】IPsec VPN 实验配置(动态地址接入)

【华为】IPsec VPN 实验配置&#xff08;动态地址接入&#xff09; 注意实验需求配置思路配置命令拓扑R1基础配置配置第一阶段 IKE SA配置第二阶段 IPsec SA ISP_R2基础配置 R3基础配置配置第一阶段 IKE SA配置第二阶段 IPsec SA PCPC1PC2 检查建立成功查看命令清除IKE / IPsec…

VBA中类的解读及应用第八讲:实现定时器功能的自定义类事件

《VBA中类的解读及应用》教程【10165646】是我推出的第五套教程&#xff0c;目前已经是第一版修订了。这套教程定位于最高级&#xff0c;是学完初级&#xff0c;中级后的教程。 类&#xff0c;是非常抽象的&#xff0c;更具研究的价值。随着我们学习、应用VBA的深入&#xff0…

Elasticsearch:Search tutorial - 使用 Python 进行搜索 (三)

这个是继上一篇文章 “Elasticsearch&#xff1a;Serarch tutorial - 使用 Python 进行搜索 &#xff08;二&#xff09;” 的续篇。在今天的文章中&#xff0c;本节将向你介绍一种不同的搜索方式&#xff0c;利用机器学习 (ML) 技术来解释含义和上下文。 向量搜索 嵌入 (embed…

图像融合论文阅读:CrossFuse: 一种基于交叉注意机制的红外与可见光图像融合方法

article{li2024crossfuse, title{CrossFuse: A novel cross attention mechanism based infrared and visible image fusion approach}, author{Li, Hui and Wu, Xiao-Jun}, journal{Information Fusion}, volume{103}, pages{102147}, year{2024}, publisher{Elsevier} } 论文…

Windows安装Rust环境(完整教程)

一、 安装mingw64(C语言环境) Rust默认使用的C语言依赖Visual Studio&#xff0c;但该工具占用空间大安装也较为麻烦&#xff0c;可以选用轻便的mingw64包。 1.1 安装地址 (1) 下载地址1-GitHub&#xff1a;Releases niXman/mingw-builds-binaries GitHub (2) 下载地址2-W…

函数战争(栈帧)之创建与销毁(c语言)(vs2022)

首先&#xff0c;什么是函数栈帧&#xff1f; C语言中&#xff0c;每个栈帧对应着一个未运行完的函数。栈帧中保存了该函数的返回地址和局部变量。栈帧也叫过程活动记录&#xff0c;是编译器用来实现过程函数调用的一种数据结构。 以问答的方式解释编译器与解释器-CSDN博客htt…

C++ OpenGL 3D Game Tutorial 2: Making OpenGL 3D Engine学习笔记

视频地址https://www.youtube.com/watch?vPH5kH8h82L8&listPLv8DnRaQOs5-MR-zbP1QUdq5FL0FWqVzg&index3 一、main类 接上一篇内容&#xff0c;main.cpp的内容增加了一些代码&#xff0c;显得严谨一些&#xff1a; #include<OGL3D/Game/OGame.h> #include<i…

寒假前端第一次作业

1、用户注册&#xff1a; <!DOCTYPE html> <html lang"en"> <head><meta charset"UTF-8"><meta name"viewport" content"widthdevice-width, initial-scale1.0"><title>用户注册</title> …

C++学习笔记——string类和new函数

目录 string类 1.功能增强 1.1 子字符串提取 1.2 字符串拼接 1.3 大小写转换 1.4 字符串比较 2.性能优化 3.使用示例 下面是一个简单的使用示例&#xff0c;展示了如何使用改进后的String类&#xff1a; NEW函数 2.1NEW函数的基本用法 2.2NEW函数的注意事项 2.3避…

使用lwip的perf进行测速TCP不稳定的一些相关配置项

在使用lwIP的perf工具进行TCP性能测试时&#xff0c;TCP不稳定可能涉及以下配置问题&#xff1a; 缓冲区大小&#xff08;Buffer Size&#xff09;&#xff1a;lwIP中的TCP性能受到发送和接收缓冲区大小的影响。如果缓冲区过小&#xff0c;可能导致数据包丢失或延迟增加&#x…

《BackTrader量化交易图解》第8章:plot 绘制金融图

文章目录 8. plot 绘制金融图8.1 金融分析曲线8.2 多曲线金融指标8.3 Observers 观测子模块8.4 plot 绘图函数的常用参数8.5 买卖点符号和色彩风格8.6 vol 成交参数8.7 多图拼接模式8.8 绘制 HA 平均 K 线图 8. plot 绘制金融图 8.1 金融分析曲线 BackTrader内置的plot绘图函…

Hibernate实战之操作MySQL数据库(2024-1-8)

Hibernate实战之操作MySQL数据库 2024.1.8 前提环境&#xff08;JavaMySQLNavicatVS Code&#xff09;1、Hibernate简介1.1 了解HQL 2、MySQL数据库建表2.1 编写SQL脚本2.2 MySQL执行脚本 3、Java操作MySQL实例&#xff08;Hibernate&#xff09;3.1 准备依赖的第三方jar包3.2 …

密码学:一文读懂非对称加密算法 DH、RSA

文章目录 前言非对称加密算法的由来非对称加密算法的家谱1.基于因子分解难题2.基于离散对数难题 密钥交换算法-DH密钥交换算法-DH的通信模型初始化DH算法密钥对甲方构建DH算法本地密钥乙方构建DH算法本地密钥DH算法加密消息传递 典型非对称加密算法-RSARSA的通信模型RSA特有的的…

大数据StarRocks(六) :Catalog

StarRocks 自 2.3 版本起支持 Catalog&#xff08;数据目录&#xff09;功能&#xff0c;实现在一套系统内同时维护内、外部数据&#xff0c;方便您轻松访问并查询存储在各类外部源的数据。 1. 基本概念 内部数据&#xff1a;指保存在 StarRocks 中的数据。 外部数据&#xf…

用css给宽高不固定的矩形画对角线

.kong{width: 200rpx;height: 76rpx;background: linear-gradient(to bottom right, #E5E5E5 0%, rgba(0, 0, 0, 0.1) calc(50% - 1px),#175CFF 50%, rgba(0, 0, 0, 0.1) calc(50% 1px),rgba(0, 0, 0, 0.1) 100%);}参考&#xff1a; https://blog.csdn.net/weixin_38779534/a…

1.1map

unordered_map和map的使用几乎是一致的&#xff0c;只是头文件和定义不同 #include<iostream> #include<map>//使用map需要的头文件 #include<unordered_map>//使用unordered_map需要的头文件 #include<set>//使用set需要的头文件 #include<uno…

web前端(html)练习

第一题 1. 用户名为文本框&#xff0c;名称为 UserName&#xff0c;长度为 15&#xff0c;最大字符数为 20。 2. 密码为密码框&#xff0c;名称为 UserPass&#xff0c;长度为 15&#xff0c;最大字符数为 20。 3. 性别为两个单选按钮&#xff0c;名称为 sex&#xff0c;值分…

【linux】tcpdump 使用

tcpdump 是一个强大的网络分析工具&#xff0c;可以在 UNIX 和类 UNIX 系统上使用&#xff0c;用于捕获和分析网络流量。它允许用户截取和显示发送或接收过网络的 TCP/IP 和其他数据包。 一、安装 tcpdump 通常是默认安装在大多数 Linux 发行版中的。如果未安装&#xff0c;可…

使用lwip的perf进行测速TCP会有较多的duplicate ack的原因分析

在使用lwIP的perf工具进行TCP测速时&#xff0c;出现较多的重复确认&#xff08;duplicate ACK&#xff09;可能是由于以下原因导致的&#xff1a; 丢包或乱序&#xff1a;重复确认通常是由于网络中的数据包丢失或乱序到达引起的。当接收方收到一个乱序的数据包时&#xff0c;它…