mysql中MHA配置及切换方式有哪些

2024-04-02 19:55

短信预约 -IT技能 免费直播动态提醒

这篇文章主要介绍mysql中MHA配置及切换方式有哪些，文中介绍的非常详细，具有一定的参考价值，感兴趣的小伙伴们一定要看完！

master节点/MHA管理节点：172.31.217.183
slave节点/MHA成员节点：172.31.217.182
已开启半同步。

数据库版本为5.7

配置免密码登录
master节点：
root@bd-dev-mingshuo-183:/opt/soft#ssh-keygen -t rsa
Generating public/private rsa key pair.
Enter file in which to save the key (/root/.ssh/id_rsa):
Enter passphrase (empty for no passphrase):
Enter same passphrase again:
Your identification has been saved in /root/.ssh/id_rsa.
Your public key has been saved in /root/.ssh/id_rsa.pub.
The key fingerprint is:
36:39:6b:1e:40:f2:85:31:db:d0:3e:ab:05:0e:fd:37 root@bd-dev-mingshuo-183
The key's randomart image is:
+--[ RSA 2048]----+
|      +.         |
|       B.        |
|    ..+.o        |
|    .+o.o.       |
|     oooSo       |
|      .o++E      |
|       o+. .     |
|      .o .       |
|        .        |
+-----------------+
root@bd-dev-mingshuo-183:/opt/soft#ssh-copy-id -i /root/.ssh/id_rsa.pub root@172.31.217.182
root@172.31.217.182's password:
Now try logging into the machine, with "ssh 'root@172.31.217.182'", and check in:

.ssh/authorized_keys

to make sure we haven't added extra keys that you weren't expecting.

root@bd-dev-mingshuo-183:/u01#ssh-copy-id -i /root/.ssh/id_rsa.pub root@172.31.217.183
root@172.31.217.183's password:
Now try logging into the machine, with "ssh 'root@172.31.217.183'", and check in:

.ssh/authorized_keys

to make sure we haven't added extra keys that you weren't expecting.

slave节点：
ssh-keygen -t rsa
ssh-copy-id -i /root/.ssh/id_rsa.pub root@172.31.217.183
ssh-copy-id -i /root/.ssh/id_rsa.pub root@172.31.217.182

slave节点：
mysql> set global read_only=1;
Query OK, 0 rows affected (0.00 sec)

mysql> show variables like 'read_only'\G
*************************** 1. row ***************************
Variable_name: read_only
        Value: ON
1 row in set (0.00 sec)

read_only为1代表是只读，0代表读写。从库只读不会影响slave的日志应用。但是不要把参数写入参数文件，因为可能当这个slave切换为master就会造成普通用户不能写入。当然这个参数在配置mha过程中是可选的。

部署安装包
manager节点安装manager包
所有节点安装node包
先安装node包
rpm -ivh mha4mysql-node-0.58-0.el7.centos.noarch.rpm
yum install mha4mysql-manager-0.58-0.el7.centos.noarch.rpm

在master上创建mha管理账号
grant all privileges on *.* to mha@'172.31.217.%' identified by 'oracle';
flush privileges;

创建目录，用于存放mha配置文件和mha日志
mkdir -p /u01/mha/log
chown mysql.mysql -R mha

编辑配置文件
vi /u01/mha/mha.cnf

[server default]
manager_log=/u01/mha/log/manager.log
manager_workdir=/u01/mha/log

master_binlog_dir=/u01/mysql/3306/data
user=mha
password=oracle
ping_interval=2
repl_user=repl_user
repl_password=oracle
ssh_user=root

[server1]
hostname=172.31.217.183
port=3306

[server2]
hostname=172.31.217.182
port=3306

配置文件可选参数：
[server default]模块：
ping_interval=1         //设置监控主库，发送ping包的时间间隔，默认是3秒，尝试三次没有回应的时候自动进行railover
remote_workdir=/tmp     //设置远端mysql在发生切换时binlog的保存位置
report_script=/usr/local/send_report    //设置发生切换后发送的报警的脚本
shutdown_script=""      //设置故障发生后关闭故障主机脚本（该脚本的主要作用是关闭主机放在发生脑裂,这里没有使用）从库模块：
candidate_master=1   //设置为候选master，如果设置该参数以后，发生主从切换以后将会将此从库提升为主库，即使这个主库不是集群中事件最新的slave
check_repl_delay=0   //默认情况下如果一个slave落后master 100M的relay logs的话，MHA将不会选择该slave作为一个新的master，因为对于这个slave的恢复需要花费很长时间，通过设置check_repl_delay=0,MHA触发切换在选择一个新的master的时候将会忽略复制延时，这个参数对于设置了candidate_master=1的主机非常有用，因为这个候选主在切换的过程中一定是新的master

检测同步及ssh登录
masterha_check_ssh --conf=/u01/mha/mha.cnf
masterha_check_repl --conf=/u01/mha/mha.cnf

中间报了很多次错，部分解决方案：
ln -s /opt/mysql-5.7.23/bin/mysql /usr/bin/mysql
ln -s /opt/mysql-5.7.23/bin/mysqlbinlog /usr/bin/mysqlbinlog
卸载mha4mysql-manager-0.58-0.el7.centos.noarch.rpm，安装mha4mysql-manager-0.56-0.el6.noarch.rpm

启动mha
nohup masterha_manager --conf=/u01/mha/mha.cnf > /u01/mha/log/manager.log 2>&1 &

检查mha状态
root@bd-dev-mingshuo-183:/opt/soft#masterha_check_status --conf=/u01/mha/mha.cnf
mha (pid:24910) is running(0:PING_OK), master:172.31.217.183 配置VIP
在server default模块下面添加
master_ip_failover_script=/usr/local/bin/master_ip_failover

从源码包中将master_ip_failover拷贝到/usr/local/bin/下面
cd /opt/soft/MHAsoft/mha4mysql-manager-0.56/samples/scripts
cp -ra master_ip_failover /usr/local/bin/master_ip_failover

修改/usr/local/bin/master_ip_failover
my $vip = '172.31.217.203/24'; #此处为你要设置的虚拟ip
my $key = '1';
my $ssh_start_vip = "/sbin/ifconfig eth3:$key $vip"; #此处改为你的网卡名称
my $ssh_stop_vip = "/sbin/ifconfig eth3:$key down"; 注：
my (
    $command,          $ssh_user,        $orig_master_host, $orig_master_ip,
    $orig_master_port, $new_master_host, $new_master_ip,    $new_master_port
);

将上面内容添加到这里

GetOptions(
    'command=s'          => \$command,
    'ssh_user=s'         => \$ssh_user,
    'orig_master_host=s' => \$orig_master_host,
    'orig_master_ip=s'   => \$orig_master_ip,
    'orig_master_port=i' => \$orig_master_port,
    'new_master_host=s' => \$new_master_host,
    'new_master_ip=s'    => \$new_master_ip,
    'new_master_port=i' => \$new_master_port,
); 配置网卡VIP
ifconfig eth3:1 172.31.217.203/24

ifconfig
eth3      Link encap:Ethernet HWaddr 54:0F:5D:2C:4D:77
          inet addr:172.31.217.202 Bcast:172.31.217.255 Mask:255.255.255.0
          inet6 addr: fe80::560f:5dff:fe2c:4d77/64 Scope:Link
          UP BROADCAST RUNNING MULTICAST MTU:1500 Metric:1
          RX packets:74742667 errors:0 dropped:0 overruns:0 frame:0
          TX packets:12 errors:0 dropped:0 overruns:0 carrier:0
          collisions:0 txqueuelen:1000
          RX bytes:52680755472 (49.0 GiB) TX bytes:740 (740.0 b)

eth3:1    Link encap:Ethernet HWaddr 54:0F:5D:2C:4D:77
          inet addr:172.31.217.203 Bcast:172.31.217.255 Mask:255.255.255.0
          UP BROADCAST RUNNING MULTICAST MTU:1500 Metric:1

停止mha
masterha_stop --conf=/u01/mha/mha.cnf

再次开启mha
nohup masterha_manager --conf=/u01/mha/mha.cnf > /u01/mha/log/manager.log 2>&1 &

报错：
Bareword "FIXME_xxx" not allowed while "strict subs" in use at /usr/local/bin/master_ip_failover line 98.
Execution of /usr/local/bin/master_ip_failover aborted due to compilation errors.
Mon Sep 17 10:56:04 2018 - [error][/usr/share/perl5/vendor_perl/MHA/MasterMonitor.pm, ln226] Failed to get master_ip_failover_script status with return code 255:0.
Mon Sep 17 10:56:04 2018 - [error][/usr/share/perl5/vendor_perl/MHA/MasterMonitor.pm, ln424] Error happened on checking configurations. at /usr/bin/masterha_manager line 50
Mon Sep 17 10:56:04 2018 - [error][/usr/share/perl5/vendor_perl/MHA/MasterMonitor.pm, ln523] Error happened on monitoring servers.
Mon Sep 17 10:56:04 2018 - [info] Got exit code 1 (Not master dead).

直接把FIXME_xxx相关行注释掉算了。

再次开启mha
nohup masterha_manager --conf=/u01/mha/mha.cnf > /u01/mha/log/manager.log 2>&1 &
ok！

关闭主库
mysqladmin -uroot -poracle shutdown

检查备库
mysql> show slave status;
Empty set (0.00 sec)

mysql> show master status\G
*************************** 1. row ***************************
             File: slave-relay-bin.000002
         Position: 154
     Binlog_Do_DB:
Binlog_Ignore_DB:
Executed_Gtid_Set:
1 row in set (0.00 sec)
备库已经自动切成了主库。停掉的主库上面的mha软件也自动停止了。恢复之前的主从关系：
现在拉起停掉的主库，会发现主库没有主动加入到集群中去。
主库查询日志位置：
mysql> show master status\G
*************************** 1. row ***************************
             File: master-bin.000005
         Position: 154
     Binlog_Do_DB:
Binlog_Ignore_DB:
Executed_Gtid_Set:
1 row in set (0.00 sec)
备库：
change master to
master_host='bd-dev-mingshuo-183',
master_port=3306,
master_user='repl_user',
master_password='oracle',
master_log_file='master-bin.000005',
master_log_pos=154;

start slave;
主库启用mha软件，注意这里要加-ignore_last_failover参数，否则会报错：
Mon Sep 17 14:45:56 2018 - [warning] Global configuration file /etc/masterha_default.cnf not found. Skipping.
Mon Sep 17 14:45:56 2018 - [info] Reading application default configuration from /u01/mha/mha.cnf..
Mon Sep 17 14:45:56 2018 - [info] Reading server configuration from /u01/mha/mha.cnf..
Mon Sep 17 14:45:56 2018 - [info] MHA::MasterMonitor version 0.56.
Mon Sep 17 14:45:56 2018 - [error][/usr/share/perl5/vendor_perl/MHA/ServerManager.pm, ln193] There is no alive slave. We can't do failover
Mon Sep 17 14:45:56 2018 - [error][/usr/share/perl5/vendor_perl/MHA/MasterMonitor.pm, ln424] Error happened on checking configurations. at /usr/share/perl5/vendor_perl/MHA/MasterMonitor.pm line 326
Mon Sep 17 14:45:56 2018 - [error][/usr/share/perl5/vendor_perl/MHA/MasterMonitor.pm, ln523] Error happened on monitoring servers.
Mon Sep 17 14:45:56 2018 - [info] Got exit code 1 (Not master dead).

开启mha软件：
nohup masterha_manager -ignore_last_failover --conf=/u01/mha/mha.cnf > /u01/mha/log/manager.log 2>&1 &

上面是自动failover的过程，后面再来测试一下手动failover
停止mha manager：
masterha_stop --conf=/u01/mha/mha.cnf

停止master数据库
mysqladmin -uroot -poracle shutdown

手动切换
masterha_master_switch --master_state=dead --conf=/u01/mha/mha.cnf --dead_master_host=172.31.217.183 --dead_master_port=3306 --new_master_host=172.31.217.182 --new_master_port=3306 --ignore_last_failover
上面是自动failover的过程，后面再来测试一下在线切换：
manager节点：
停止mha manager：
masterha_stop --conf=/u01/mha/mha.cnf
masterha_master_switch --conf=/u01/mha/mha.cnf --master_state=alive --new_master_host=172.31.217.182 --new_master_port=3306 --orig_master_is_new_slave --running_updates_limit=100
Mon Sep 17 15:47:29 2018 - [info] MHA::MasterRotate version 0.56.
Mon Sep 17 15:47:29 2018 - [info] Starting online master switch..
Mon Sep 17 15:47:29 2018 - [info]
Mon Sep 17 15:47:29 2018 - [info] * Phase 1: Configuration Check Phase..
Mon Sep 17 15:47:29 2018 - [info]
Mon Sep 17 15:47:29 2018 - [warning] Global configuration file /etc/masterha_default.cnf not found. Skipping.
Mon Sep 17 15:47:29 2018 - [info] Reading application default configuration from /u01/mha/mha.cnf..
Mon Sep 17 15:47:29 2018 - [info] Reading server configuration from /u01/mha/mha.cnf..
Mon Sep 17 15:47:29 2018 - [info] GTID failover mode = 0
Mon Sep 17 15:47:29 2018 - [info] Current Alive Master: 172.31.217.183(172.31.217.183:3306)
Mon Sep 17 15:47:29 2018 - [info] Alive Slaves:
Mon Sep 17 15:47:29 2018 - [info]   172.31.217.182(172.31.217.182:3306) Version=5.7.23-log (oldest major version between slaves) log-bin:enabled
Mon Sep 17 15:47:29 2018 - [info]     Replicating from bd-dev-mingshuo-183(172.31.217.183:3306)

It is better to execute FLUSH NO_WRITE_TO_BINLOG TABLES on the master before switching. Is it ok to execute on 172.31.217.183(172.31.217.183:3306)? (YES/no): YES
Mon Sep 17 15:47:33 2018 - [info] Executing FLUSH NO_WRITE_TO_BINLOG TABLES. This may take long time..
Mon Sep 17 15:47:33 2018 - [info] ok.
Mon Sep 17 15:47:33 2018 - [info] Checking MHA is not monitoring or doing failover..
Mon Sep 17 15:47:33 2018 - [info] Checking replication health on 172.31.217.182..
Mon Sep 17 15:47:33 2018 - [info] ok.
Mon Sep 17 15:47:33 2018 - [info] 172.31.217.182 can be new master.
Mon Sep 17 15:47:33 2018 - [info]
From:
172.31.217.183(172.31.217.183:3306) (current master)
+--172.31.217.182(172.31.217.182:3306)

To:
172.31.217.182(172.31.217.182:3306) (new master)
+--172.31.217.183(172.31.217.183:3306)

Starting master switch from 172.31.217.183(172.31.217.183:3306) to 172.31.217.182(172.31.217.182:3306)? (yes/NO): yes
Mon Sep 17 15:47:55 2018 - [info] Checking whether 172.31.217.182(172.31.217.182:3306) is ok for the new master..
Mon Sep 17 15:47:55 2018 - [info] ok.
Mon Sep 17 15:47:55 2018 - [info] 172.31.217.183(172.31.217.183:3306): SHOW SLAVE STATUS returned empty result. To check replication filtering rules, temporarily executing CHANGE MASTER to a dummy host.
Mon Sep 17 15:47:55 2018 - [info] 172.31.217.183(172.31.217.183:3306): Resetting slave pointing to the dummy host.
Mon Sep 17 15:47:55 2018 - [info] ** Phase 1: Configuration Check Phase completed.
Mon Sep 17 15:47:55 2018 - [info]
Mon Sep 17 15:47:55 2018 - [info] * Phase 2: Rejecting updates Phase..
Mon Sep 17 15:47:55 2018 - [info]
master_ip_online_change_script is not defined. If you do not disable writes on the current master manually, applications keep writing on the current master. Is it ok to proceed? (yes/NO): yes
Mon Sep 17 15:48:32 2018 - [info] Locking all tables on the orig master to reject updates from everybody (including root):
Mon Sep 17 15:48:32 2018 - [info] Executing FLUSH TABLES WITH READ LOCK..
Mon Sep 17 15:48:32 2018 - [info] ok.
Mon Sep 17 15:48:32 2018 - [info] Orig master binlog:pos is master-bin.000007:154.
Mon Sep 17 15:48:32 2018 - [info] Waiting to execute all relay logs on 172.31.217.182(172.31.217.182:3306)..
Mon Sep 17 15:48:32 2018 - [info] master_pos_wait(master-bin.000007:154) completed on 172.31.217.182(172.31.217.182:3306). Executed 0 events.
Mon Sep 17 15:48:32 2018 - [info]   done.
Mon Sep 17 15:48:32 2018 - [info] Getting new master's binlog name and position..
Mon Sep 17 15:48:32 2018 - [info] slave-relay-bin.000002:154
Mon Sep 17 15:48:32 2018 - [info] All other slaves should start replication from here. Statement should be: CHANGE MASTER TO MASTER_HOST='172.31.217.182', MASTER_PORT=3306, MASTER_LOG_FILE='slave-relay-bin.000002', MASTER_LOG_POS=154, MASTER_USER='repl_user', MASTER_PASSWORD='xxx';
Mon Sep 17 15:48:32 2018 - [info] Setting read_only=0 on 172.31.217.182(172.31.217.182:3306)..
Mon Sep 17 15:48:32 2018 - [info] ok.
Mon Sep 17 15:48:32 2018 - [info]
Mon Sep 17 15:48:32 2018 - [info] * Switching slaves in parallel..
Mon Sep 17 15:48:32 2018 - [info]
Mon Sep 17 15:48:32 2018 - [info] Unlocking all tables on the orig master:
Mon Sep 17 15:48:32 2018 - [info] Executing UNLOCK TABLES..
Mon Sep 17 15:48:32 2018 - [info] ok.
Mon Sep 17 15:48:32 2018 - [info] Starting orig master as a new slave..
Mon Sep 17 15:48:32 2018 - [info] Resetting slave 172.31.217.183(172.31.217.183:3306) and starting replication from the new master 172.31.217.182(172.31.217.182:3306)..
Mon Sep 17 15:48:32 2018 - [info] Executed CHANGE MASTER.
Mon Sep 17 15:48:32 2018 - [info] Slave started.
Mon Sep 17 15:48:32 2018 - [info] All new slave servers switched successfully.
Mon Sep 17 15:48:32 2018 - [info]
Mon Sep 17 15:48:32 2018 - [info] * Phase 5: New master cleanup phase..
Mon Sep 17 15:48:32 2018 - [info]
Mon Sep 17 15:48:32 2018 - [info] 172.31.217.182: Resetting slave info succeeded.
Mon Sep 17 15:48:32 2018 - [info] Switching master to 172.31.217.182(172.31.217.182:3306) completed successfully.

注意切换过程中会有一个地方询问你
master_ip_online_change_script is not defined. If you do not disable writes on the current master manually, applications keep writing on the current master. Is it ok to proceed? (yes/NO): yes
没有disable主库的写入，切换之后连接这的应用程序会继续往里面写入，这样ok吗？
这里我只是测试这个在线切换的过程的可用性，所以输入了yes。
切换完成之后mha软件暂停了。