一、MHA 介绍
MHA(Master HA)是一款开源的 MySQL 的高可用程序,它为 MySQL 主从复制架构提供了 automating master failover 功能。MHA 在监控到 master 节点故障时,会提升其中拥有最新数据的 slave 节点成为新的master 节点,在此期间,MHA 会通过于其它从节点获取额外信息来避免一致性方面的问题。MHA 还提供了 master 节点的在线切换功能,即按需切换 master/slave 节点。
MHA 是由日本人 yoshinorim 开发的比较成熟的 MySQL 高可用方案。MHA 能够在30秒内实现故障切换,并能在故障切换中,最大可能的保证数据一致性。
二、MHA 服务
2.1 服务角色
MHA 服务有两种角色, MHA Manager(管理节点)和 MHA Node(数据节点):
MHA Manager:通常单独部署在一台独立机器上管理多个 master/slave 集群(组),每个 master/slave 集群称作一个 application,用来管理统筹整个集群。
MHA node:运行在每台 MySQL 服务器上(master/slave/manager),它通过监控具备解析和清理 logs 功能的脚本来加快故障转移。
主要是接收管理节点所发出指令的代理,代理需要运行在每一个 mysql 节点上。简单讲 node 就是用来收集从节点服务器上所生成的 bin-log 。对比打算提升为新的主节点之上的从节点的是否拥有并完成操作,如果没有发给新主节点在本地应用后提升为主节点。
由上图我们可以看出,每个复制组内部和 Manager 之间都需要ssh实现无密码互连,只有这样,在 Master 出故障时, Manager才能顺利的连接进去,实现主从切换功能。
2.2 提供的工具
MHA会提供诸多工具程序, 其常见的如下所示:
Manager节点:
masterha_check_ssh
:MHA 依赖的 ssh 环境监测工具;
masterha_check_repl
:MYSQL 复制环境检测工具;
masterga_manager
:MHA 服务主程序;
masterha_check_status
:MHA 运行状态探测工具;
masterha_master_monitor
:MYSQL master 节点可用性监测工具;
masterha_master_swith:master
:节点切换工具;
masterha_conf_host
:添加或删除配置的节点;
masterha_stop
:关闭 MHA 服务的工具。
Node节点:(这些工具通常由MHA Manager的脚本触发,无需人为操作) save_binary_logs
:保存和复制 master 的二进制日志;
apply_diff_relay_logs
:识别差异的中继日志事件并应用于其他 slave;
purge_relay_logs
:清除中继日志(不会阻塞 SQL 线程);
自定义扩展:
secondary_check_script
:通过多条网络路由检测master的可用性;master_ip_failover_script
:更新application使用的masterip;
report_script
:发送报告;
init_conf_load_script
:加载初始配置参数;
master_ip_online_change_script
;更新master节点ip地址。
2.3 工作原理
MHA工作原理总结为以下几条:
(1) 从宕机崩溃的 master 保存二进制日志事件(binlog events);
(2) 识别含有最新更新的 slave ;
(3) 应用差异的中继日志(relay log) 到其他 slave ;
(4) 应用从 master 保存的二进制日志事件(binlog events);
(5) 提升一个 slave 为新 master ;
(6) 使用其他的 slave 连接新的 master 进行复制。
三、实现过程
3.1 环境准备
本实验环境共有四个节点, 其角色分配如下(实验机器均为centos 7.x):
主机名 | IP配置 | 服务角色 | 备注 |
---|---|---|---|
manager | 192.168.121.111 | manager控制器 | 用于监控管理 |
master | 192.168.121.112 | 数据库主服务器 | 开启bin-log relay-log 关闭relay_log |
slave1 | 192.168.121.113 | 数据库从服务器 | 开启bin-log relay-log 关闭relay_log |
slave2 | 192.168.121.114 | 数据库从服务器 | 开启bin-log relay-log 关闭relay_log |
(1)准备四台CentOS 7.x
为了方便后期的操作,我们在各节点的/etc/hosts文件配置内容中添加如下内容:
192.168.121.111 manager
192.168.121.112 master
192.168.121.113 slave1
192.168.121.114 slave2
(2)更换yum源
由于CentOS 7 停止了服务,自带的yum源使用不了,需要手动配置yum源。
# 备份原有的yum源:
cp /etc/yum.repos.d/CentOS-Base.repo /etc/yum.repos.d/CentOS-Base.repo.bak
# 直接编辑 CentOS-Base.repo 文件,添加以下内容:
[base]
name=CentOS-$releasever - Base
baseurl=http://mirrors.aliyun.com/centos/$releasever/os/$basearch/
gpgcheck=1
gpgkey=file:///etc/pki/rpm-gpg/RPM-GPG-KEY-CentOS-7
enabled=1
[updates]
name=CentOS-$releasever - Updates
baseurl=http://mirrors.aliyun.com/centos/$releasever/updates/$basearch/
gpgcheck=1
gpgkey=file:///etc/pki/rpm-gpg/RPM-GPG-KEY-CentOS-7
enabled=1
[extras]
name=CentOS-$releasever - Extras
baseurl=http://mirrors.aliyun.com/centos/$releasever/extras/$basearch/
gpgcheck=1
gpgkey=file:///etc/pki/rpm-gpg/RPM-GPG-KEY-CentOS-7
enabled=1
# 清理yum缓存
yum clean all
# 更新yum源
yum makecache
(3)关闭防火墙,禁用SElinux
systemctl disable --now firewalld
setenforce 0
(4)安装mysql-community
yum install -y mysql84-community-release-el7-1.noarch.rpm
yum install -y mysql-community-server
# 以下是更改密码的操作
systemctl enable --now mysqld
tmp_pwd=`awk '/temporary password/ {print $NF}' /var/log/mysqld.log `
mysqladmin -uroot -p"$tmp_pwd" password 'MySQL@123'
3.2 配置一主两从
MHA 对 MYSQL 复制环境有特殊要求,例如各节点都要开启二进制日志及中继日志,各从节点必须显示启用其read-only
属性,并关闭relay_log_purge
功能等。
(1)初始主节点 master 的配置
在/etc/my.cnf末尾添加下面几行:
server_id = 112
log-bin = mysql-bin
skip_name_resolve
gtid-mode = on
enforce-gtid-consistency = true
log-slave-updates = 1
relay-log = relay-log
(2)所有从节点 slave 的配置
在/etc/my.cnf末尾添加下面几行:
server_id = 113
log-bin = mysql-bin
skip_name_resolve
gtid-mode = on
enforce-gtid-consistency = true
log-slave-updates = 1
relay-log = relay-log
relay_log_purge = 0
设置完后所有节点重启mysql服务
systemctl restart mysqld
(3)配置主从同步
master节点上:
create user 'slave'@'192.168.121.%' identified with mysql_native_password by 'MySQL@123';
grant replication slave,replication client on *.* to 'slave'@'192.168.121.%';
slave节点上:
change master to
master_host='192.168.121.112',
master_user='slave',
master_password='MySQL@123',
master_auto_position=1;
# 开启主从同步
start slave;
3.3 安装配置MHA
(1)在master上授权
在所有 Mysql 节点授权拥有管理权限的用户可在本地网络中有其他节点上远程访问,在master节点上配置:
create user 'mhaadmin'@'192.168.121.%' identified with mysql_native_password by 'Mha@123456';
grant all on *.* to 'mhaadmin'@'192.168.121.%';
(2)准备ssh互通环境
MHA集群中的各节点彼此之间均需要基于ssh互信通信,以实现远程控制及数据管理功能。
所有节点:
ssh-keygen -f ~/.ssh/id_rsa -N '' -q
ssh-copy-id manager
manager上:
scp ~/.ssh/authorized_keys master:~/.ssh/
scp ~/.ssh/authorized_keys slave1:~/.ssh/
scp ~/.ssh/authorized_keys slave2:~/.ssh/
测试连通性(可以在每台主机上都执行一遍):
[root@manager ~]# for i in manager master slave1 slave2;do ssh $i hostname;done
manager
master
slave1
slave2
(3)安装MHA包
github下载地址:
https://github.com/yoshinorim/mha4mysql-manager/releases/tag/v0.58
https://github.com/yoshinorim/mha4mysql-node/releases/tag/v0.58
在本步骤中, Manager节点需要另外多安装一个包。具体需要安装的内容如下:
MHA Manager服务器需要安装manager和node,MHA的Node依赖于perl-DBD-MySQL,所以配置epel源。
wget -O /etc/yum.repos.d/epel.repo http://mirrors.aliyun.com/repo/epel-7.repo
yum install mha4mysql-*.rpm
其它三个节点上安装mha4mysql-node
yum install mha4mysql-node-0.58-0.el7.centos.noarch.rpm -y
(4)定义MHA管理配置文件
Manager 节点需要为每个监控的 master/slave 集群提供一个专用的配置文件,而所有的 master/slave 集群也可共享全局配置。全局配置文件默认为/etc/masterha_default.cnf
,其为可选配置。如果仅监控一组 master/slave 集群,也可直接通过 application 的配置来提供各服务器的默认配置信息。而每个 application 的配置文件路径为自定义。
为MHA专门创建一个管理用户, 方便以后使用, 在mysql的主节点上, 三个节点自动同步,在manager上:
# 创建配置文件目录
mkdir /etc/mha
# 创建日志目录
mkdir -p /var/log/mha/app1
vim /etc/mha/app1.cnf
配置文件内容如下:
[server default]
user=mhaadmin
password=Mha@123456
manager_workdir=/var/log/mha/app1
manager_log=/var/log/mha/app1/manager.log
ssh_user=root
repl_user=slave
repl_password=MySQL@123
ping_interval=1
[server1]
hostname=192.168.121.112
ssh_port=22
candidate_master=1
[server2]
hostname=192.168.121.113
ssh_port=22
candidate_master=1
[server3]
hostname=192.168.121.114
ssh_port=22
candidate_master=1
(5)对四个节点进行检测
检测各节点间 ssh 互信通信配置是否 ok:
[root@manager ~]# masterha_check_ssh --conf=/etc/mha/app1.cnf
Mon Feb 10 16:34:07 2025 - [warning] Global configuration file /etc/masterha_default.cnf not found. Skipping.
Mon Feb 10 16:34:07 2025 - [info] Reading application default configuration from /etc/mha/app1.cnf..
Mon Feb 10 16:34:07 2025 - [info] Reading server configuration from /etc/mha/app1.cnf..
Mon Feb 10 16:34:07 2025 - [info] Starting SSH connection tests..
Mon Feb 10 16:34:10 2025 - [debug]
Mon Feb 10 16:34:07 2025 - [debug] Connecting via SSH from root@192.168.121.113(192.168.121.113:22) to root@192.168.121.112(192.168.121.112:22)..
Mon Feb 10 16:34:09 2025 - [debug] ok.
Mon Feb 10 16:34:09 2025 - [debug] Connecting via SSH from root@192.168.121.113(192.168.121.113:22) to root@192.168.121.114(192.168.121.114:22)..
Mon Feb 10 16:34:10 2025 - [debug] ok.
Mon Feb 10 16:34:10 2025 - [debug]
Mon Feb 10 16:34:07 2025 - [debug] Connecting via SSH from root@192.168.121.112(192.168.121.112:22) to root@192.168.121.113(192.168.121.113:22)..
Mon Feb 10 16:34:08 2025 - [debug] ok.
Mon Feb 10 16:34:08 2025 - [debug] Connecting via SSH from root@192.168.121.112(192.168.121.112:22) to root@192.168.121.114(192.168.121.114:22)..
Mon Feb 10 16:34:09 2025 - [debug] ok.
Mon Feb 10 16:34:11 2025 - [debug]
Mon Feb 10 16:34:08 2025 - [debug] Connecting via SSH from root@192.168.121.114(192.168.121.114:22) to root@192.168.121.112(192.168.121.112:22)..
Mon Feb 10 16:34:09 2025 - [debug] ok.
Mon Feb 10 16:34:09 2025 - [debug] Connecting via SSH from root@192.168.121.114(192.168.121.114:22) to root@192.168.121.113(192.168.121.113:22)..
Mon Feb 10 16:34:10 2025 - [debug] ok.
Mon Feb 10 16:34:11 2025 - [info] All SSH connection tests passed successfully.
检查管理的MySQL复制集群的连接配置参数是否OK:
[root@manager ~]# masterha_check_repl --conf=/etc/mha/app1.cnf
Mon Feb 10 16:35:32 2025 - [warning] Global configuration file /etc/masterha_default.cnf not found. Skipping.
Mon Feb 10 16:35:32 2025 - [info] Reading application default configuration from /etc/mha/app1.cnf..
Mon Feb 10 16:35:32 2025 - [info] Reading server configuration from /etc/mha/app1.cnf..
Mon Feb 10 16:35:32 2025 - [info] MHA::MasterMonitor version 0.58.
Mon Feb 10 16:35:33 2025 - [info] GTID failover mode = 1
Mon Feb 10 16:35:33 2025 - [info] Dead Servers:
Mon Feb 10 16:35:33 2025 - [info] Alive Servers:
Mon Feb 10 16:35:33 2025 - [info] 192.168.121.112(192.168.121.112:3306)
Mon Feb 10 16:35:33 2025 - [info] 192.168.121.113(192.168.121.113:3306)
Mon Feb 10 16:35:33 2025 - [info] 192.168.121.114(192.168.121.114:3306)
Mon Feb 10 16:35:33 2025 - [info] Alive Slaves:
Mon Feb 10 16:35:33 2025 - [info] 192.168.121.113(192.168.121.113:3306) Version=8.0.18 (oldest major version between slaves) log-bin:enabled
Mon Feb 10 16:35:33 2025 - [info] GTID ON
Mon Feb 10 16:35:33 2025 - [info] Replicating from 192.168.121.112(192.168.121.112:3306)
Mon Feb 10 16:35:33 2025 - [info] Primary candidate for the new Master (candidate_master is set)
Mon Feb 10 16:35:33 2025 - [info] 192.168.121.114(192.168.121.114:3306) Version=8.0.18 (oldest major version between slaves) log-bin:enabled
Mon Feb 10 16:35:33 2025 - [info] GTID ON
Mon Feb 10 16:35:33 2025 - [info] Replicating from 192.168.121.112(192.168.121.112:3306)
Mon Feb 10 16:35:33 2025 - [info] Primary candidate for the new Master (candidate_master is set)
Mon Feb 10 16:35:33 2025 - [info] Current Alive Master: 192.168.121.112(192.168.121.112:3306)
Mon Feb 10 16:35:33 2025 - [info] Checking slave configurations..
Mon Feb 10 16:35:33 2025 - [info] read_only=1 is not set on slave 192.168.121.113(192.168.121.113:3306).
Mon Feb 10 16:35:33 2025 - [info] read_only=1 is not set on slave 192.168.121.114(192.168.121.114:3306).
Mon Feb 10 16:35:33 2025 - [info] Checking replication filtering settings..
Mon Feb 10 16:35:33 2025 - [info] binlog_do_db= , binlog_ignore_db=
Mon Feb 10 16:35:33 2025 - [info] Replication filtering check ok.
Mon Feb 10 16:35:33 2025 - [info] GTID (with auto-pos) is supported. Skipping all SSH and Node package checking.
Mon Feb 10 16:35:33 2025 - [info] Checking SSH publickey authentication settings on the current master..
Mon Feb 10 16:35:34 2025 - [info] HealthCheck: SSH to 192.168.121.112 is reachable.
Mon Feb 10 16:35:34 2025 - [info]
192.168.121.112(192.168.121.112:3306) (current master)
+--192.168.121.113(192.168.121.113:3306)
+--192.168.121.114(192.168.121.114:3306)
Mon Feb 10 16:35:34 2025 - [info] Checking replication health on 192.168.121.113..
Mon Feb 10 16:35:34 2025 - [info] ok.
Mon Feb 10 16:35:34 2025 - [info] Checking replication health on 192.168.121.114..
Mon Feb 10 16:35:34 2025 - [info] ok.
Mon Feb 10 16:35:34 2025 - [warning] master_ip_failover_script is not defined.
Mon Feb 10 16:35:34 2025 - [warning] shutdown_script is not defined.
Mon Feb 10 16:35:34 2025 - [info] Got exit code 0 (Not master dead).
MySQL Replication Health is OK.
3.4 启动MHA
(1)命令启动
# 在 manager 节点上执行以下命令来启动 MHA:
[root@manager ~]# nohup masterha_manager --conf=/etc/mha/app1.cnf --remove_dead_master_conf --ignore_last_failover < /dev/null > /var/log/mha/app1/manager.log 2>&1 &
[1] 12005
# 查看一下 master 节点的状态:
[root@manager ~]# masterha_check_status --conf=/etc/mha/app1.cnf
app1 (pid:12005) is running(0:PING_OK), master:192.168.121.112
# 查看监控日志命令如下:
[root@manager ~]# tail -f /var/log/mha/app1/manager.log
(2)开发启动服务脚本
[root@manager ~]# vim /etc/init.d/masterha_managerd
#!/bin/bash
# chkconfig: 35 80 20
# description: MHA management script.
STARTEXEC="/usr/bin/masterha_manager --conf"
STOPEXEC="/usr/bin/masterha_stop --conf"
CONF="/etc/mha/app1.cnf"
process_count=`ps -ef |grep -w masterha_manager|grep -v grep|wc -l`
PARAMS="--ignore_last_failover"
case "$1" in
start)
if [ $process_count -gt 1 ]
then
echo "masterha_manager exists, process is already running"
else
echo "Starting Masterha Manager"
$STARTEXEC $CONF $PARAMS < /dev/null > /var/log/mha/app1/manager.log 2>&1 &
fi
;;
stop)
if [ $process_count -eq 0 ]
then
echo "Masterha Manager does not exist, process is not running"
else
echo "Stopping ..."
$STOPEXEC $CONF
while(true)
do
process_count=`ps -ef |grep -w masterha_manager|grep -v grep|wc -l`
if [ $process_count -gt 0 ]
then
sleep 1
else
break
fi
done
echo "Master Manager stopped"
fi
;;
*)
echo "Please use start or stop as first argument"
;;
esac
[root@manager ~]# chmod +x /etc/init.d/masterha_managerd
# 添加到系统服务
[root@manager ~]# chkconfig --add masterha_managerd
# 设置开机自启
[root@manager ~]# chkconfig masterha_managerd on
(3)测试服务脚本
# 测试前,停止刚刚使用命令启动的服务:
[root@manager ~]# ps -ef | grep masterha
root 12005 1768 0 16:39 pts/1 00:00:00 perl /usr/bin/masterha_manager --conf=/etc/mha/app1.cnf --remove_dead_master_conf --ignore_last_failover
root 12473 1768 0 16:46 pts/1 00:00:00 grep --color=auto masterha
[root@manager ~]# kill 12005
# 测试脚本
[root@manager ~]# systemctl start masterha_managerd
[root@manager ~]# systemctl status masterha_managerd
● masterha_managerd.service - SYSV: MHA management script.
Loaded: loaded (/etc/rc.d/init.d/masterha_managerd; bad; vendor preset: disabled)
Active: active (running) since Mon 2025-02-10 16:48:13 HKT; 8s ago
Docs: man:systemd-sysv-generator(8)
Process: 12493 ExecStart=/etc/rc.d/init.d/masterha_managerd start (code=exited, status=0/SUCCESS)
CGroup: /system.slice/masterha_managerd.service
└─12499 perl /usr/bin/masterha_manager --conf /etc/mha/app1.cnf --ignore_last_failover
Feb 10 16:48:13 manager systemd[1]: Starting SYSV: MHA management script....
Feb 10 16:48:13 manager masterha_managerd[12493]: Starting Masterha Manager
Feb 10 16:48:13 manager systemd[1]: Started SYSV: MHA management script..
3.5 配置VIP
vip配置可以采用两种方式,一种通过keepalived的方式管理虚拟ip的浮动;另外一种通过脚本方式启动虚拟ip的方式 (即不需要keepalived或者heartbeat类似的软件)。
为了防止脑裂发生,推荐生产环境采用脚本的方式来管理虚拟ip,而不是使用keepalived来完成。
(1)编写脚本
[root@manager ~]# vim /usr/local/bin/master_ip_failover
[root@manager ~]# cat /usr/local/bin/master_ip_failover
#!/usr/bin/env perl
use strict;
use warnings FATAL => 'all';
use Getopt::Long;
my (
$command, $ssh_user, $orig_master_host, $orig_master_ip,
$orig_master_port, $new_master_host, $new_master_ip, $new_master_port
);
my $vip = '192.168.121.110/24';
my $key = '1';
my $ssh_start_vip = "/sbin/ifconfig ens33:$key $vip";
my $ssh_stop_vip = "/sbin/ifconfig ens33:$key down";
GetOptions(
'command=s' => \$command,
'ssh_user=s' => \$ssh_user,
'orig_master_host=s' => \$orig_master_host,
'orig_master_ip=s' => \$orig_master_ip,
'orig_master_port=i' => \$orig_master_port,
'new_master_host=s' => \$new_master_host,
'new_master_ip=s' => \$new_master_ip,
'new_master_port=i' => \$new_master_port,
);
exit &main();
sub main {
print "\n\nIN SCRIPT TEST====$ssh_stop_vip==$ssh_start_vip===\n\n";
if ( $command eq "stop" || $command eq "stopssh" ) {
my $exit_code = 1;
eval {
print "Disabling the VIP on old master: $orig_master_host \n";
&stop_vip();
$exit_code = 0;
};
if ($@) {
warn "Got Error: $@\n";
exit $exit_code;
}
exit $exit_code;
}
elsif ( $command eq "start" ) {
my $exit_code = 10;
eval {
print "Enabling the VIP - $vip on the new master - $new_master_host \n";
&start_vip();
$exit_code = 0;
};
if ($@) {
warn $@;
exit $exit_code;
}
exit $exit_code;
}
elsif ( $command eq "status" ) {
print "Checking the Status of the script.. OK \n";
exit 0;
}
else {
&usage();
exit 1;
}
}
sub start_vip() {
`ssh $ssh_user\@$new_master_host \" $ssh_start_vip \"`;
}
sub stop_vip() {
return 0 unless ($ssh_user);
`ssh $ssh_user\@$orig_master_host \" $ssh_stop_vip \"`;
}
sub usage {
print
"Usage: master_ip_failover --command=start|stop|stopssh|status --orig_master_host=host --orig_master_ip=ip --orig_master_port=port --new_master_host=host --new_master_ip=ip --new_master_port=port\n";
}
[root@manager ~]# chmod +x /usr/local/bin/master_ip_failover
(2)更改manager配置文件
[root@manager ~]# vim /etc/mha/app1.cnf
# 在[server default]下添加:
master_ip_failover_script=/usr/local/bin/master_ip_failover
(3)主库上,手动生成第一个vip地址
[root@master ~]# ifconfig ens33:1 192.168.121.110/24
(4)重启服务
[root@manager ~]# systemctl restart masterha_managerd
3.6 邮件提醒
(1)安装mailx
[root@manager ~]# yum install -y mailx
[root@manager ~]# vim /etc/mail.rc
[root@manager ~]# tail -n5 /etc/mail.rc
set from=obboda@163.com
set smtp=smtp.163.com
set smtp-auth-user=obboda@163.com
set smtp-auth-password=******* # 填授权码
set smtp-auth=login
(2)开发send_report脚本
[root@manager ~]# vim /usr/local/bin/send_report
#!/usr/bin/perl
# Copyright (C) 2011 DeNA Co.,Ltd.
#
# This program is free software; you can redistribute it and/or modify
# it under the terms of the GNU General Public License as published by
# the Free Software Foundation; either version 2 of the License, or
# (at your option) any later version.
#
# This program is distributed in the hope that it will be useful,
# but WITHOUT ANY WARRANTY; without even the implied warranty of
# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
# GNU General Public License for more details.
#
# You should have received a copy of the GNU General Public License
# along with this program; if not, write to the Free Software
# Foundation, Inc.,
# 51 Franklin Street, Fifth Floor, Boston, MA 02110-1301 USA
## Note: This is a sample script and is not complete. Modify the script based on your environment.
use strict;
use warnings FATAL => 'all';
use Mail::Sender;
use Getopt::Long;
#new_master_host and new_slave_hosts are set only when recovering master succeeded
my ( $dead_master_host, $new_master_host, $new_slave_hosts, $subject, $body );
my $smtp='smtp.163.com';
my $mail_from='obboda@163.com';
my $mail_user='obboda@163.com';
my $mail_pass='********'; # 这里填邮箱的授权码
#my $mail_to=['to1@qq.com','to2@qq.com'];
my $mail_to='obboda@163.com';
GetOptions(
'orig_master_host=s' => \$dead_master_host,
'new_master_host=s' => \$new_master_host,
'new_slave_hosts=s' => \$new_slave_hosts,
'subject=s' => \$subject,
'body=s' => \$body,
);
# Do whatever you want here
mailToContacts($smtp,$mail_from,$mail_user,$mail_pass,$mail_to,$subject,$body);
sub mailToContacts {
my ($smtp, $mail_from, $mail_user, $mail_pass, $mail_to, $subject, $msg ) = @_;
open my $DEBUG, ">/var/log/mha/app1/mail.log"
or die "Can't open the debug file:$!\n";
my $sender = new Mail::Sender {
ctype => 'text/plain;charset=utf-8',
encoding => 'utf-8',
smtp => $smtp,
from => $mail_from,
auth => 'LOGIN',
TLS_allowed => '0',
authid => $mail_user,
authpwd => $mail_pass,
to => $mail_to,
subject => $subject,
debug => $DEBUG
};
$sender->MailMsg(
{
msg => $msg,
debug => $DEBUG
}
) or print $Mail::Sender::Error;
return 1;
}
exit 0;
[root@manager ~]# chmod +x /usr/local/bin/send_report
[root@manager ~]# touch /var/log/mha/app1/mail.log
(3)更改manager配置文件:
[root@manager ~]# vim /etc/mha/app1.cnf
[server default]
report_script=/usr/local/bin/send_report
[root@manager ~]# systemctl restart masterha_managerd
3.7 测试MHA故障转移
(1)在 master 节点关闭 mysql 服务,模拟主节点数据崩溃
[root@master ~]# systemctl stop mysqld.service
(2)在manager 节点查看日志
[root@manager ~]# tail /var/log/mha/app1/manager.log
Started automated(non-interactive) failover.
Invalidated master IP address on 192.168.121.112(192.168.121.112:3306)
Selected 192.168.121.113(192.168.121.113:3306) as a new master.
192.168.121.113(192.168.121.113:3306): OK: Applying all logs succeeded.
192.168.121.113(192.168.121.113:3306): OK: Activated master IP address.
192.168.121.114(192.168.121.114:3306): OK: Slave started, replicating from 192.168.121.113(192.168.121.113:3306)
192.168.121.113(192.168.121.113:3306): Resetting slave info succeeded.
Master failover to 192.168.121.113(192.168.121.113:3306) completed successfully.
Mon Feb 10 15:22:31 2025 - [info] Sending mail..
(3)检查VIP
[root@slave1 ~]# ifconfig -a |grep -A 2 "ens33:1"
ens33:1: flags=4163<UP,BROADCAST,RUNNING,MULTICAST> mtu 1500
inet 192.168.121.110 netmask 255.255.255.0 broadcast 192.168.121.255
ether 00:0c:29:43:af:07 txqueuelen 1000 (Ethernet)
(4)检查邮箱
[root@manager ~]# tail /var/log/mha/app1/mail.log
<< 192.168.121.113(192.168.121.113:3306): OK: Applying all logs succeeded.
<< 192.168.121.113(192.168.121.113:3306): OK: Activated master IP address.
<< 192.168.121.114(192.168.121.114:3306): OK: Slave started, replicating from 192.168.121.113(192.168.121.113:3306)
<< 192.168.121.113(192.168.121.113:3306): Resetting slave info succeeded.
<< Master failover to 192.168.121.113(192.168.121.113:3306) completed successfully.
<<
<< .
>> 250 Mail OK queued as gzsmtp3,PigvCgBnn_8r6qln82FPAQ--.6190S2 1739188780
<< QUIT
>> 221 Bye
表示 manager 检测到192.168.121.112节点故障, 而后自动执行故障转移, 将192.168.121.113提升为主节点。
注意,故障转移完成后, manager将会自动停止, 此时使用 masterha_check_status 命令检测将会遇到错误提示, 如下所示:
[root@manager ~]# masterha_check_status -conf=/etc/mha/app1.cnf
app1 is stopped(2:NOT_RUNNING).
3.8 提供新的从节点以修复复制集群
原有 master 节点故障后,需要重新准备好一个新的 MySQL 节点。基于来自于master 节点的备份恢复数据后,将其配置为新的 master 的从节点即可。注意,新加入的节点如果为新增节点,其 IP 地址要配置为原来 master 节点的 IP,否则,还需要修改 mha.cnf 中相应的 ip 地址。随后再次启动 manager ,并再次检测其状态。
[root@master ~]# systemctl start mysqld.service
mysql> change master to
-> master_host='192.168.121.113',
-> master_user='slave',
-> master_password='MySQL@123',
-> master_auto_position=1;
mysql> start slave;
mysql> show slave status \G;
*************************** 1. row ***************************
Slave_IO_State: Waiting for master to send event
Master_Host: 192.168.121.113
Master_User: slave
Master_Port: 3306
Connect_Retry: 60
Master_Log_File: mysql-bin.000001
Read_Master_Log_Pos: 1266
Relay_Log_File: relay-log.000002
Relay_Log_Pos: 416
Relay_Master_Log_File: mysql-bin.000001
Slave_IO_Running: Yes
Slave_SQL_Running: Yes
...
再次执行检查操作:
[root@manager ~]# systemctl restart masterha_managerd
[root@manager ~]# masterha_check_status -conf=/etc/mha/app1.cnf
app1 (pid:12880) is running(0:PING_OK), master:192.168.121.113