MySQL 是一个关系型数据库管理系统,由瑞典MySQL AB公司开发,目前属于Oracle旗下的产品。MySQL是最流行的关系型数据库管理系统之一。数据库的稳定运行是保证业务可用性的关键因素之一。这一小节当中将介绍如何使用 k8s 集群中部署的 kube-prometheus 实现对 MySQL 数据库性能以及资源利用率的监控和度量。
二进制部署 MySQL_Exporter
从 GitHub 下载 MySQL Exporter 二进制文件
1
wget https://github.com/prometheus/mysqld_exporter/releases/download/v0.13.0/mysqld_exporter-0.13.0.linux-amd64.tar.gz
解压压缩包
1
2tar xf mysqld_exporter-0.13.0.linux-amd64.tar.gz
mv mysqld_exporter-0.13.0.linux-amd64 /usr/local/mysqld_exporter创建 MySQL 监控专用账号
1
2
3GRANT REPLICATION CLIENT, PROCESS, SELECT ON *.* TO 'mysqlexporter'@'127.0.0.1' identified by '12345678';
GRANT SELECT ON performance_schema.* TO 'mysqlexporter'@'127.0.0.1';
flush privileges;注意,如果开启了数据库审计,还需要授权用户对审计表有权限,如下
1
2
3
4
5
6
7
8
9
10
11# 查看 init_connect 参数配置
SQL> show variables like 'init_connect';
+---------------+-------------------------------------------------------------------------------------------------------------------------------------+
| Variable_name | Value |
+---------------+-------------------------------------------------------------------------------------------------------------------------------------+
| init_connect | insert into auditlog.t_audit(id,thread_id,login_time,localname,matchname) values(null,connection_id(),now(),user(),current_user()); |
+---------------+-------------------------------------------------------------------------------------------------------------------------------------+
1 row in set (0.01 sec)
# 授权用户
SQL> GRANT SELECT,INSERT ON `mysql`.`audit_login` TO 'mysqlexporter'@'127.0.0.1';注意,如果查看 exporter_mysqld 服务日志出现大量
Error 1184: Aborted connection 94 to db: 'unconnected' user: 'mysqlexporter' host: '127.0.0.1' (init_connect command failed)
错误日志,这说明 mysql 的init_connect
配置有问题,普通用户无法使用数据库,需要重新授权1
2
3
4
5
6
7
8// 设置 init_connect 变量
SQL> set global init_connect=`insert into mysql.audit_login(USER,HOST,DB,PROCESSLIST_ID) values(current_user(),substring_index(user(),'@',-1),database(),connection_id());`;
// 给普通用户授权
SQL> GRANT SELECT,INSERT ON `mysql`.`audit_login` TO 'mysqlexporter'@'127.0.0.1';
// 刷新权限表
SQL> flush privileges;创建 mysqld_exporter 配置文件
1
2
3
4
5
6cat >> /usr/local/mysqld_exporter/my.cnf <<EOF
[client]
host = 127.0.0.1
user = mysqlexporter
password = 12345678
EOF创建 systemd 服务管理 mysqld_exporter,为了区分主机上的 mysqld 服务,这里将服务名定义为 exporter_mysqld
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15cat > /usr/lib/systemd/system/exporter_mysqld.service << EOF
[Unit]
Description=mysqld_exporter
After=network.target
[Service]
Type=simple
ExecStart=/usr/local/mysqld_exporter/mysqld_exporter \
--config.my-cnf=/usr/local/mysqld_exporter/my.cnf \
--web.listen-address=0.0.0.0:9104
Restart=on-failure
[Install]
WantedBy=multi-user.target
EOF启动 exporter_mysqld 服务
1
systemctl enable --noe exporter_mysqld
服务是否启动正常
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24systemctl status exporter-mysqld.service
● exporter-mysqld.service - mysqld_exporter
Loaded: loaded (/usr/lib/systemd/system/exporter-mysqld.service; enabled; vendor preset: disabled)
Active: active (running) since Thu 2023-03-30 14:21:29 CST; 13min ago
Main PID: 29725 (mysqld_exporter)
CGroup: /system.slice/exporter-mysqld.service
└─29725 /usr/local/mysqld_exporter/mysqld_exporter --config.my-cnf=/usr/local/mysqld_exporter/my.cnf --collect.global_status --web.listen-address=0.0.0.0:9104
Mar 30 14:21:29 mysql01 mysqld_exporter[29725]: ts=2023-03-30T06:21:29.947Z caller=mysqld_exporter.go:277 level=info msg="Starting mysqld_exporter" version="(version=0.14.0, branch=HEAD, revision=ca1b9af82a471c849c529eb8aadb1aac73e7b68c)"
Mar 30 14:21:29 mysql01 mysqld_exporter[29725]: ts=2023-03-30T06:21:29.947Z caller=mysqld_exporter.go:278 level=info msg="Build context" (gogo1.17.8,userroot@401d370ca42e,date20220304-16:25:15)=(MISSING)
Mar 30 14:21:29 mysql01 mysqld_exporter[29725]: ts=2023-03-30T06:21:29.947Z caller=mysqld_exporter.go:293 level=info msg="Scraper enabled" scraper=global_status
Mar 30 14:21:29 mysql01 mysqld_exporter[29725]: ts=2023-03-30T06:21:29.947Z caller=mysqld_exporter.go:293 level=info msg="Scraper enabled" scraper=global_variables
Mar 30 14:21:29 mysql01 mysqld_exporter[29725]: ts=2023-03-30T06:21:29.947Z caller=mysqld_exporter.go:293 level=info msg="Scraper enabled" scraper=slave_status
Mar 30 14:21:29 mysql01 mysqld_exporter[29725]: ts=2023-03-30T06:21:29.947Z caller=mysqld_exporter.go:293 level=info msg="Scraper enabled" scraper=info_schema.innodb_cmp
Mar 30 14:21:29 mysql01 mysqld_exporter[29725]: ts=2023-03-30T06:21:29.947Z caller=mysqld_exporter.go:293 level=info msg="Scraper enabled" scraper=info_schema.innodb_cmpmem
Mar 30 14:21:29 mysql01 mysqld_exporter[29725]: ts=2023-03-30T06:21:29.947Z caller=mysqld_exporter.go:293 level=info msg="Scraper enabled" scraper=info_schema.query_response_time
Mar 30 14:21:29 mysql01 mysqld_exporter[29725]: ts=2023-03-30T06:21:29.948Z caller=mysqld_exporter.go:303 level=info msg="Listening on address" address=0.0.0.0:9104
Mar 30 14:21:29 mysql01 mysqld_exporter[29725]: ts=2023-03-30T06:21:29.948Z caller=tls_config.go:195 level=info msg="TLS is disabled." http2=false
# 查看端口
netstat -anptl |grep 9104
tcp6 0 0 :::9104 :::* LISTEN 29725/mysqld_export
tcp6 0 0 192.168.10.51:9104 192.168.10.47:46410 ESTABLISHED 29725/mysqld_export
tcp6 0 0 192.168.10.51:9104 192.168.10.44:64103 ESTABLISHED 29725/mysqld_export测试指标是否暴露,以下命令只显示部分内容
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17# curl -s http://localhost:9104/metrics | grep mysql |head -n 15
# HELP mysql_exporter_collector_duration_seconds Collector time duration.
# TYPE mysql_exporter_collector_duration_seconds gauge
mysql_exporter_collector_duration_seconds{collector="collect.global_status"} 0.005722647
mysql_exporter_collector_duration_seconds{collector="collect.global_variables"} 0.010595977
mysql_exporter_collector_duration_seconds{collector="collect.info_schema.innodb_cmp"} 0.000366369
mysql_exporter_collector_duration_seconds{collector="collect.info_schema.innodb_cmpmem"} 0.00609734
mysql_exporter_collector_duration_seconds{collector="collect.info_schema.query_response_time"} 0.000111068
mysql_exporter_collector_duration_seconds{collector="collect.slave_status"} 0.011771497
mysql_exporter_collector_duration_seconds{collector="connection"} 0.015082806
# HELP mysql_exporter_last_scrape_error Whether the last scrape of metrics from MySQL resulted in an error (1 for error, 0 for success).
# TYPE mysql_exporter_last_scrape_error gauge
mysql_exporter_last_scrape_error 0
# HELP mysql_exporter_scrapes_total Total number of times MySQL was scraped for metrics.
# TYPE mysql_exporter_scrapes_total counter
mysql_exporter_scrapes_total 56
...
配置 Prometheus 采集指标
编辑 prometheus-additional.yaml 文件,新增 job
mysql-exporter
相关内容,如下所示1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26- job_name: 'node-exporter-outer'
file_sd_configs:
- files: ['/etc/prometheus/hosts/hosts.yaml']
refresh_interval: 5s
metrics_path: /metrics
relabel_configs:
- source_labels: [__address__]
regex: (.*)
target_label: instance
replacement: $1
- source_labels: [__address__]
regex: (.*)
target_label: __address__
replacement: $1:9100
# 以下内容为新增
- job_name: "mysql-exporter"
file_sd_configs:
- files: ['/etc/prometheus/mysql/hosts.yaml']
refresh_interval: 5s
metrics_path: /metrics
metric_relabel_configs:
relabel_configs:
- source_labels: [__address__]
regex: (.*):(\d+)
target_label: instance
replacement: $1执行以下命令,动态更新 Secret
1
2
3
4kubectl create secret generic additional-configs \
--from-file=prometheus-additional.yaml \
--dry-run=client \
-oyaml | kubectl replace -f - -n monitoring新增 MySQL 监控主机文件
prometheus-external-mysqld-exporter.yaml
,内容如下1
2
3
4
5
6
7
8
9
10
11
12
13
14- labels:
env: "prod"
hostname: "mysql01"
app: "mysql"
role: "master"
targets:
- '192.168.10.51:9104'
- labels:
env: "prod"
hostname: "mysql01"
app: "mysql"
role: "slave"
targets:
- '192.168.10.52:9104'创建 MySQL 监控主机的配置 configMap
1
2
3kubectl create cm prometheus-external-mysqld-exporter \
--from-file=mysql.yaml=prometheus-external-mysqld-exporter.yaml \
-n monitoring执行以下命令,修改 Prometheus 配置,添加挂载配置,将 MySQL 监控主机文件挂载都 Prometheus 容器中
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60apiVersion: monitoring.coreos.com/v1
kind: Prometheus
metadata:
creationTimestamp: "2023-03-29T06:46:51Z"
generation: 5
labels:
prometheus: k8s
name: k8s
namespace: monitoring
resourceVersion: "30943845"
uid: c21d1932-13f6-45a5-8be5-7aae3b8ec652
spec:
additionalScrapeConfigs:
key: prometheus-additional.yaml
name: additional-configs
optional: true
alerting:
alertmanagers:
- name: alertmanager-main
namespace: monitoring
port: web
image: quay.io/prometheus/prometheus:v2.22.1
nodeSelector:
kubernetes.io/os: linux
podMonitorNamespaceSelector: {}
podMonitorSelector: {}
probeNamespaceSelector: {}
probeSelector: {}
replicas: 2
resources:
requests:
memory: 400Mi
ruleSelector:
matchLabels:
prometheus: k8s
role: alert-rules
securityContext:
fsGroup: 2000
runAsNonRoot: true
runAsUser: 1000
serviceAccountName: prometheus-k8s
serviceMonitorNamespaceSelector: {}
serviceMonitorSelector: {}
version: v2.22.1
volumeMounts:
- mountPath: /etc/prometheus/hosts
name: prometheus-external-node-exporter-hosts-cm
readOnly: true
# 增加配置,挂载 MySQL 主机文件
- mountPath: /etc/prometheus/mysql
name: prometheus-external-mysqld-exporter
readOnly: true
volumes:
- configMap:
name: prometheus-external-node-exporter-hosts-cm
name: prometheus-external-node-exporter-hosts-cm
# 增加以下配置项
- configMap:
name: prometheus-external-mysqld-exporter
name: prometheus-external-mysqld-exporter打开浏览器,访问
http://prometheus-server:9090/targets/
,检查是否有 MySQL 的监控对象
下载 Grafana Dashboard 配置文件,导入 Grafana,这里使用 MySQL Exporter 导出器自带的 dashboard 文件,下载地址:
https://github.com/prometheus/mysqld_exporter/tree/master/mysqld-mixin/dashboards
配置告警
编辑 prometheus-rules.yaml 文件,在文件最下面新增以下配置
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50- name: MySQLStatsAlerts
rules:
- alert: MySQL is down
expr: mysql_up == 0
for: 1m
labels:
severity: critical
annotations:
description: 'MySQL {{$labels.job}} on {{$labels.instance}} is not up.'
summary: "MySQL database is down. This requires immediate action!"
- alert: IO thread stopped
expr: mysql_slave_status_slave_io_running != 1
for: 1m
labels:
severity: critical
annotations:
summary: "Instance {{ $labels.instance }} IO thread stopped"
description: "IO thread has stopped. This is usually because it cannot connect to the Master any more."
- alert: SQL thread stopped
expr: mysql_slave_status_slave_sql_running == 0
for: 1m
labels:
severity: critical
annotations:
summary: "Instance {{ $labels.instance }} SQL thread stopped"
description: "SQL thread has stopped. This is usually because it cannot apply a SQL statement received from the master."
- alert: Slave lagging behind Master
expr: rate(mysql_slave_status_seconds_behind_master[1m]) >30
for: 1m
labels:
severity: warning
annotations:
summary: "Instance {{ $labels.instance }} Slave lagging behind Master"
description: "Slave is lagging behind Master. Please check if Slave threads are running and if there are some performance issues!"
- alert: Slave is NOT read only(Please ignore this warning indicator.)
expr: mysql_global_variables_read_only != 0
for: 1m
labels:
severity: page
annotations:
summary: "Instance {{ $labels.instance }} Slave is NOT read only"
description: "Slave is NOT set to read only. You can accidentally manipulate data on the slave and get inconsistencies..."
- alert: Used more than 80% of max connections limited
expr: mysql_global_status_max_used_connections > mysql_global_variables_max_connections * 0.8
for: 1m
labels:
severity: warning
annotations:
summary: "Instance {{ $labels.instance }} Used more than 80% of max connections limited"
description: "Used more than 80% of max connections limited"更多告警规则可以参考 prometheus 监控之 mysql 篇(含mysql报警规则)
更新规则
1
kubectl replace -f prometheus-rules.yaml