MMM即Multi-Master Replication Manager for MySQL:mysql多主复制管理器,基于perl实现,关于mysql主主复制配置的监控、故障转移和管理的一套可伸缩的脚本套件(在任何时候只有一个节点可以被写入),MMM也能对从服务器进行读负载均衡,所以可以用它来在一组用于复制的服务器启动虚拟ip,除此之外,它还有实现数据备份、节点之间重新同步功能的脚本。MySQL本身没有提供replication failover的解决方案,通过MMM方案能实现服务器的故障转移,从而实现mysql的高可用。MMM不仅能提供浮动IP的功能,如果当前的主服务器挂掉后,会将你后端的从服务器自动转向新的主服务器进行同步复制,不用手工更改同步配置。这个方案是目前比较成熟的解决方案。详情请看官网:http://mysql-mmm.org
图片来自网络
优点:高可用性,扩展性好,出现故障自动切换,对于主主同步,在同一时间只提供一台数据库写操作,保证的数据的一致性。当主服务器挂掉以后,另一个主立即接管,其他的从服务器能自动切换,不用人工干预。
角色 IP hostname Server-id Write vip Read vip monitor 192.168.10.100 monitor1 无 Master1 192.168.10.101 master1 1 192.168.10.2 Master2(backup) 192.168.10.102 master2 2 192.168.10.3 Slave1 192.168.10.103 slave1 3 192.168.10.4 (2)、准备工作 Slave2 192.168.10.104 slave2 4 192.168.10.5 1、关闭所有主机的SElinux
2、配置NTP,保证同步时间
3、在所有主机上配置/etc/hosts文件,添加如下内容:
# vim /etc/hosts
192.168.10.100 monitor1
192.168.10.101 master1
192.168.10.102 master2
192.168.10.103 slave1
192.168.10.104 slave2
4、在所有主机上安装perl perl-devel perl-CPAN libart_lgpl.x86_64 rrdtool.x86_64 rrdtool-perl.x86_64包
# yum -y install perl-* libart_lgpl.x86_64 rrdtool.x86_64 rrdtool-perl.x86_64
<-注->:使用centos7在线yum源安装
5、在所有主机上安装perl的相关库
# cpan -i Algorithm::Diff Class::Singleton DBI DBD::mysql Log::Dispatch Log::Log4perl Mail::Send Net::Ping Proc::Daemon Time::HiRes Params::Validate Net::ARP Proc::Daemon Log::Log4perl
(3)、配置MySQL基础环境1、编辑每台MySQL主机的配置文件/etc/my.cnf,在其中分别加入以下内容, 注意server-id不能重复。
master1主机:
log-bin = mysql-bin
binlog_format = mixed
server-id = 1
relay-log = relay-bin
relay-log-index = slave-relay-bin.index
log-slave-updates = 1
auto-increment-increment = 2
auto-increment-offset = 1
master2主机:
log-bin = mysql-bin
binlog_format = mixed
server-id = 2
relay-log = relay-bin
relay-log-index = slave-relay-bin.index
log-slave-updates = 1
auto-increment-increment = 2
auto-increment-offset = 2
slave1主机:
server-id = 3
relay-log = relay-bin
relay-log-index = slave-relay-bin.index
read_only = 1
slave2主机:
server-id = 4
relay-log = relay-bin
relay-log-index = slave-relay-bin.index
read_only = 1
2、在完成了对my.cnf的修改后,重新启动mysql服务
# systemctl restart mysqld
<-注->:所有MySQL主机的uuid不能一样,修改/usr/local/mysql/data/auto.cnf中的值
3、为四台数据库主机建立防火墙规则
# firewall-cmd --permanent --add-port=3306/tcp
# firewall-cmd --reload
(4)、配置主从复制环境(master1和master2配置成主主,slave1和slave2配置成master1的从)
1、在master1和master2上授权:
mysql> grant replication slave on *.* to rep@'192.168.10.%' identified by '123456';
2、将master2、slave1和slave2配置成master1的从库:
在master1上执行show master status; 获取binlog文件和Position点
mysql> show master status;
在master2、slave1和slave2执行
mysql> change master to master_host='192.168.10.101', master_port=3306, master_user='rep', master_password='123456', master_log_file='mysql-bin.000001', master_log_pos=452;
mysql> start slave;
3、验证主从复制
master2主机:
mysql> show slave status\G;
***** 1. row *****
Slave_IO_State: Waiting for master to send event
Master_Host: 192.168.10.101
Master_User: rep
Master_Port: 3306
Connect_Retry: 60
Master_Log_File: mysql-bin.000001
Read_Master_Log_Pos: 452
Relay_Log_File: relay-bin.000002
Relay_Log_Pos: 320
Relay_Master_Log_File: mysql-bin.000001
Slave_IO_Running: Yes
Slave_SQL_Running: Yes
slave1主机:
mysql> show slave status\G;
***** 1. row *****
Slave_IO_State: Waiting for master to send event
Master_Host: 192.168.10.101
Master_User: rep
Master_Port: 3306
Connect_Retry: 60
Master_Log_File: mysql-bin.000001
Read_Master_Log_Pos: 452
Relay_Log_File: relay-bin.000002
Relay_Log_Pos: 320
Relay_Master_Log_File: mysql-bin.000001
Slave_IO_Running: Yes
Slave_SQL_Running: Yes
slave2主机:
mysql> show slave status\G;
***** 1. row *****
Slave_IO_State: Waiting for master to send event
Master_Host: 192.168.10.101
Master_User: rep
Master_Port: 3306
Connect_Retry: 60
Master_Log_File: mysql-bin.000001
Read_Master_Log_Pos: 452
Relay_Log_File: relay-bin.000002
Relay_Log_Pos: 320
Relay_Master_Log_File: mysql-bin.000001
Slave_IO_Running: Yes
Slave_SQL_Running: Yes
<-注->:如果Slave_IO_Running和Slave_SQL_Running都为YES,那么主从就已经配置OK了
4、把master1配置成master2的从库:
在master2上执行show master status ;获取binlog文件和Position点
mysql> show master status;
在master1上执行:
mysql> change master to master_host='192.168.10.102', master_port=3306, master_user='rep', master_password='123456',master_log_file='mysql-bin.000001',master_log_pos=452;
mysql> start slave;
验证主从复制:master1主机:
mysql> show slave status\G;
***** 1. row *****
Slave_IO_State: Waiting for master to send event
Master_Host: 192.168.10.102
Master_User: rep
Master_Port: 3306
Connect_Retry: 60
Master_Log_File: mysql-bin.000001
Read_Master_Log_Pos: 452
Relay_Log_File: relay-bin.000002
Relay_Log_Pos: 320
Relay_Master_Log_File: mysql-bin.000001
Slave_IO_Running: Yes
Slave_SQL_Running: Yes
<-注->:如果Slave_IO_Running和Slave_SQL_Running都为YES,那么主从就已经配置OK了
(5)、mysql-mmm配置:1、在mysql1主MySQL服务器上创建相关用户:
<-注->:因为现在为主从复制关系,所有只需要在master1上创建就可以,另外3台从服务器都会复制master1上的操作
创建代理账号:
mysql> grant super,replicationclient,process on *.* to 'mmm_agent'@'192.168.10.%' identified by '123456';
创建监控账号:
mysql> grant replication client on *.* to 'mmm_monitor'@'192.168.10.%' identified by '123456';
2、为了确保都创建了,请检查master2和slave1、slave2三台db上是否都存在监控和代理账号:
mysql> select user,host from mysql.user where user in ('mmm_monitor','mmm_agent');
相关用户的作用:
mmm_monitor用户:mmm监控用于对mysql服务器进程健康检查
mmm_agent用户:mmm代理用来更改只读模式,复制的主服务器等
(6)、mysql-mmm安装1、在monitor主机(192.168.31.106) 上安装监控程序
# wget http://pkgs.fedoraproject.org/repo/pkgs/mysql-mmm/mysql-mmm-2.2.1.tar.gz/f5f8b48bdf89251d3183328f0249461e/mysql-mmm-2.2.1.tar.gz
# chmod x mysql-mmm-2.2.1.tar.gz
#tar -zxf mysql-mmm-2.2.1.tar.gz
#cd mysql-mmm-2.2.1
#make install
2、在数据库服务器(master1、master2、slave1、slave2)上安装代理
# wget http://pkgs.fedoraproject.org/repo/pkgs/mysql-mmm/mysql-mmm-2.2.1.tar.gz/f5f8b48bdf89251d3183328f0249461e/mysql-mmm-2.2.1.tar.gz
# chmod x mysql-mmm-2.2.1.tar.gz
#tar -zxf mysql-mmm-2.2.1.tar.gz
#cd mysql-mmm-2.2.1
#make install
(7)、配置mmm1、编写配置文件,五台主机必须一致:
# vim /etc/mysql-mmm/mmm_common.conf
active_master_role writer #积极的master角色的标示,所有的db服务器要开启read_only参数,对于writer服务器监控代理会自动将read_only属性关闭。
<host default>
cluster_interface eno16777736 #群集的网络接口
pid_path /var/run/mmm_agentd.pid #pid路径
bin_path /usr/lib/mysql-mmm/ #可执行文件路径
replication_user rep #复制用户
replication_password 123456 #复制用户密码
agent_user mmm_agent #代理用户
agent_password 123456 #代理用户密码
</host>
<host master1> #master1的host名
ip 192.168.10.101 #master1的ip
mode master #角色属性,master代表是主
peer master2 #与master1对等的服务器的host名,也就是master2的服务器host名
</host>
<host master2> #和master的概念一样
ip 192.168.10.102
mode master
peer master1
</host>
<host slave1> #从库的host名,如果存在多个从库可以重复一样的配置
ip 192.168.10.103 #从的ip
mode slave #slave的角色属性代表当前host是从
</host>
<host slave2> #和slave的概念一样
ip 192.168.10.104
mode slave
</host>
<role writer> #writer角色配置
hosts master1,master2 #能进行写操作的服务器的host名,如果不想切换写操作这里可以只配置master,这样也可以避免因为网络延时而进行write的切换,但是一旦master出现故障那么当前的MMM就没有writer了只有对外的read操作。
ips 192.168.10.2 #对外提供的写操作的虚拟IP
mode exclusive #exclusive代表只允许存在一个主,也就是只能提供一个写的IP
</role>
<role reader> #read角色配置
hosts master2,slave1,slave2 #对外提供读操作的服务器的host名,当然这里也可以把master加进来
ips 192.168.10.3, 192.168.10.4, 192.168.10.5 #对外提供读操作的虚拟ip,这三个ip和host不是一一对应的,并且ips也hosts的数目也可以不相同,如果这样配置的话其中一个hosts会分配两个ip
mode balanced #balanced代表负载均衡
</role>
同时将这个文件拷贝到其它的服务器,配置不变
#for host in master1 master2 slave1 slave2 ; do scp /etc/mysql-mmm/mmm_common.conf $host:/etc/mysql-mmm/ ; done
2、代理文件配置
编辑 4台mysql节点机上的/etc/mysql-mmm/mmm_agent.conf
# vim /etc/mysql-mmm/mmm_agent.conf
includemmm_common.conf
this master1 #该值根据自身主机名而设置
<-注->:这个只配置db服务器,监控服务器不需要配置
3、启动代理进程 (在四台MySQL主机上都得设置)
编辑mysql-mmm-agent脚本文件,在#!/bin/sh下面加入如下内容
# vim /etc/init.d/mysql-mmm-agent
source /root/.bash_profile #添加此内容是为了mysql-mmm-agent服务能启机自启
添加成系统服务并设置为自启动
# chkconfig --add mysql-mmm-agent
# chkconfigmysql-mmm-agent on
# /etc/init.d/mysql-mmm-agent start
Daemon bin: '/usr/sbin/mmm_agentd'
Daemon pid: '/var/run/mmm_agentd.pid'
Starting MMM Agent daemon... Ok
# netstat -antp | grep mmm_agentd
tcp 0 0 192.168.10.101:9989 0.0.0.0:* LISTEN 9693/mmm_agentd
4、配置四台MySQL主机的防火墙
# firewall-cmd --permanent --add-port=9989/tcp
# firewall-cmd --reload
5、编辑 monitor主机上的mmm_mon.conf 配置
# vim /etc/mysql-mmm/mmm_mon.conf
includemmm_common.conf
<monitor>
ip 127.0.0.1 #为了安全性,设置只在本机监听,mmm_mond默认监听9988
pid_path /var/run/mmm_mond.pid
bin_path /usr/lib/mysql-mmm/
status_path /var/lib/misc/mmm_mond.status
ping_ips 192.168.10.101, 192.168.10.102, 192.168.10.103,192.168.10.104 #用于测试网络可用性 IP 地址列表,只要其中有一个地址 ping 通,就代表网络正常,这里不要写入本机地址
auto_set_online 0 #设置自动online的时间,默认是超过60s就将它设置为online,默认是60s,这里将其设为0就是立即online
</monitor>
<check default>
check_period 5 #检查周期默认为5s
trap_period 10 #一个节点被检测不成功的时间持续trap_period秒,就慎重的认为这个节点失败了,默认值:10s
timeout 2 #检查超时的时间,默认值:2s
restart_after 10000 #在完成restart_after次检查后,重启checker进程默认10000
max_backlog 86400 #记录检查rep_backlog日志的最大次数,默认值:60
</check>
<host default>
monitor_user mmm_monitor #监控db服务器的用户
monitor_password 123456 #监控db服务器的密码
</host>
debug 0 #debug 0正常模式,1为debug模式
6、启动监控进程:
编辑mysql-mmm-monitor脚本文件,在#!/bin/sh下面加入如下内容
# vim /etc/init.d/mysql-mmm-monitor
source /root/.bash_profile
7、添加成系统服务并设置为自启动
#chkconfig --add mysql-mmm-monitor
#chkconfigmysql-mmm-monitor on
#/etc/init.d/mysql-mmm-monitor start
Daemon bin: '/usr/sbin/mmm_mond'
Daemon pid: '/var/run/mmm_mond.pid'
Starting MMM Monitor daemon: Ok
[root@monitor1 ~]# netstat -anpt | grep 9988
tcp 0 0 127.0.0.1:9988 0.0.0.0:* LISTEN 8546/mmm_mond
<-注->:无论是在db端还是在监控端如果有对配置文件进行修改操作都需要重启代理进程和监控进程。MMM启动顺序为先启动monitor,再启动 agent
检查集群状态:[root@monitor1 ~]# mmm_control show
master1(192.168.10.101) master/ONLINE. Roles: writer(192.168.10.2)
master2(192.168.10.102) master/ONLINE. Roles: reader(192.168.10.4)
slave1(192.168.10.103) slave/ONLINE. Roles: reader(192.168.10.3)
slave2(192.168.10.104) slave/ONLINE. Roles: reader(192.168.10.5)
<-注->:如果服务器状态不是ONLINE,可以用如下命令将服务器上线,例如:
[root@monitor1 ~]#mmm_controlset_online master1
[root@monitor1 ~]#mmm_controlset_online master2
[root@monitor1 ~]#mmm_controlset_onlineslave1
[root@monitor1 ~]#mmm_controlset_onlineslave2
从上面的显示可以看到,写请求的VIP在master1上,所有从节点也都把master1当做主节点。
8、查看是否启用vip
master1:
[root@master1 ~]# ipaddr show dev eno16777736
3: eno16777736: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc pfifo_fast state UP qlen 1000
link/ether 00:0c:29:82:86:c8 brd ff:ff:ff:ff:ff:ff
inet 192.168.10.101/24 brd 192.168.10.255 scope global eno16777736
valid_lft forever preferred_lft forever
inet 192.168.10.2/32 scope global eno16777736
valid_lft forever preferred_lft forever
inet6 fe80::20c:29ff:fe82:86c8/64 scope link
valid_lft forever preferred_lft forever
master2:
[root@master2 ~]# ipaddr show dev eno16777736
3: eno16777736: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc pfifo_fast state UP qlen 1000
link/ether 00:0c:29:6e:b1:23 brd ff:ff:ff:ff:ff:ff
inet 192.168.10.102/24 brd 192.168.10.255 scope global eno16777736
valid_lft forever preferred_lft forever
inet 192.168.10.4/32 scope global eno16777736
valid_lft forever preferred_lft forever
inet6 fe80::20c:29ff:fe6e:b123/64 scope link
valid_lft forever preferred_lft forever
slave1:
[root@slave1 ~]# ipaddr show dev eno16777736
3: eno16777736: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc pfifo_fast state UP qlen 1000
link/ether 00:0c:29:d4:84:54 brd ff:ff:ff:ff:ff:ff
inet 192.168.10.103/24 brd 192.168.10.255 scope global eno16777736
valid_lft forever preferred_lft forever
inet 192.168.10.3/32 scope global eno16777736
valid_lft forever preferred_lft forever
inet6 fe80::20c:29ff:fed4:8454/64 scope link
valid_lft forever preferred_lft forever
slave2:
[root@slave2 ~]# ipaddr show dev eno16777736
3: eno16777736: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc pfifo_fast state UP qlen 1000
link/ether 00:0c:29:29:80:59 brd ff:ff:ff:ff:ff:ff
inet 192.168.10.104/24 brd 192.168.10.255 scope global eno16777736
valid_lft forever preferred_lft forever
inet 192.168.10.5/32 scope global eno16777736
valid_lft forever preferred_lft forever
inet6 fe80::20c:29ff:fe29:8059/64 scope link
valid_lft forever preferred_lft forever
在master2,slave1,slave2主机上查看主mysql的指向
mysql> show slave status\G;
*************************** 1. row ***************************
Slave_IO_State: Waiting for master to send event
Master_Host: 192.168.10.101
Master_User: rep
Master_Port: 3306
Connect_Retry: 60
(8)、MMM高可用性测试:服务器读写采有VIP地址进行读写,出现故障时VIP会漂移到其它节点,由其它节点提供服务。
首先查看整个集群的状态,可以看到整个集群状态正常
[root@monitor1 ~]# mmm_control show
master1(192.168.10.101) master/ONLINE. Roles: writer(192.168.10.2)
master2(192.168.10.102) master/ONLINE. Roles: reader(192.168.10.4)
slave1(192.168.10.103) slave/ONLINE. Roles: reader(192.168.10.3)
slave2(192.168.10.104) slave/ONLINE. Roles: reader(192.168.10.5)
模拟master1宕机,手动停止mysql服务,观察monitor日志,master1的日志如下:
[root@monitor1 ~]# tail -f /var/log/mysql-mmm/mmm_mond.log
2017/03/31 15:43:00 FATAL State of host 'master1' changed from ONLINE to HARD_OFFLINE (ping: OK, mysql: not OK)
2017/03/31 15:43:00 INFO Removing all roles from host 'master1':
2017/03/31 15:43:00 INFO Removed role 'writer(192.168.10.2)' from host 'master1'
2017/03/31 15:43:00 INFO Orphaned role 'writer(192.168.10.2)' has been assigned to 'master2'
从以上信息中可以看出来,master1启机后,自动将writer角色转移给master2
查看群集的最新状态:
[root@monitor1 ~]# mmm_control show
master1(192.168.10.101) master/HARD_OFFLINE. Roles:
master2(192.168.10.102) master/ONLINE. Roles: reader(192.168.10.4), writer(192.168.10.2)
slave1(192.168.10.103) slave/ONLINE. Roles: reader(192.168.10.3)
slave2(192.168.10.104) slave/ONLINE. Roles: reader(192.168.10.5)
从显示结果可以看出master1的状态有ONLINE转换为HARD_OFFLINE,写VIP转移到了master2主机上。
检查所有的db服务器群集状态:
[root@monitor1 ~]# mmm_control checks all
master1 ping [last change: 2017/03/31 15:28:30] OK
master1 mysql [last change: 2017/03/31 15:43:00] ERROR: Connect error (host = 192.168.10.101:3306, user = mmm_monitor)! Can't connect to MySQL server on '192.168.10.101' (111)
master1 rep_threads [last change: 2017/03/31 15:28:30] OK
master1 rep_backlog [last change: 2017/03/31 15:28:30] OK: Backlog is null
slave1 ping [last change: 2017/03/31 15:28:30] OK
slave1 mysql [last change: 2017/03/31 15:28:30] OK
slave1 rep_threads [last change: 2017/03/31 15:28:30] OK
slave1 rep_backlog [last change: 2017/03/31 15:28:30] OK: Backlog is null
master2 ping [last change: 2017/03/31 15:28:30] OK
master2 mysql [last change: 2017/03/31 15:28:30] OK
master2 rep_threads [last change: 2017/03/31 15:28:30] OK
master2 rep_backlog [last change: 2017/03/31 15:28:30] OK: Backlog is null
slave2 ping [last change: 2017/03/31 15:28:30] OK
slave2 mysql [last change: 2017/03/31 15:28:30] OK
slave2 rep_threads [last change: 2017/03/31 15:28:30] OK
slave2 rep_backlog [last change: 2017/03/31 15:28:30] OK: Backlog is null
从上面可以看到master1能ping通,说明只是服务死掉了。
查看master2主机的ip地址:
[root@master2 ~]# ipaddr show dev eno16777736
3: eno16777736: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc pfifo_fast state UP qlen 1000
link/ether 00:0c:29:6e:b1:23 brd ff:ff:ff:ff:ff:ff
inet 192.168.10.102/24 brd 192.168.10.255 scope global eno16777736
valid_lft forever preferred_lft forever
inet 192.168.10.4/32 scope global eno16777736
valid_lft forever preferred_lft forever
inet 192.168.10.2/32 scope global eno16777736
valid_lft forever preferred_lft forever
inet6 fe80::20c:29ff:fe6e:b123/64 scope link
valid_lft forever preferred_lft forever
slave1主机:
mysql> show slave status\G;
*************************** 1. row ***************************
Slave_IO_State: Waiting for master to send event
Master_Host:192.168.10.102
Master_User: rep
Master_Port: 3306
slave2主机:
mysql> show slave status\G;
*************************** 1. row ***************************
Slave_IO_State: Waiting for master to send event
Master_Host:192.168.10.102
Master_User: rep
Master_Port: 3306
启动master1主机的mysql服务,观察monitor日志,master1的日志如下:
[root@monitor1 ~]# tail -f /var/log/mysql-mmm/mmm_mond.log
2017/03/31 15:54:46 INFO Check 'rep_backlog' on 'master1' is ok!
2017/03/31 15:54:46 INFO Check 'rep_threads' on 'master1' is ok!
2017/03/31 15:54:50 INFO Check 'mysql' on 'master1' is ok!
2017/03/31 15:54:53 FATAL State of host 'master1' changed from HARD_OFFLINE to AWAITING_RECOVERY
从上面可以看到master1的状态由hard_offline改变为awaiting_recovery状态
用如下命令将服务器上线:
[root@monitor1 ~]#mmm_control set_online master1
查看群集最新状态
[root@monitor1 ~]# mmm_control show
master1(192.168.10.101) master/ONLINE. Roles:
master2(192.168.10.102) master/ONLINE. Roles: reader(192.168.10.4), writer(192.168.10.2)
slave1(192.168.10.103) slave/ONLINE. Roles: reader(192.168.10.3)
slave2(192.168.10.104) slave/ONLINE. Roles: reader(192.168.10.5)
可以看到主库启动不会接管主,只到现有的主再次宕机。
三、总结(1)master2备选主节点宕机不影响集群的状态,就是移除了master2备选节点的读状态。
(2)master1主节点宕机,由master2备选主节点接管写角色,slave1,slave2指向新master2主库进行复制,slave1,slave2会自动change master到master2.
(3)如果master1主库宕机,master2复制应用又落后于master1时就变成了主可写状态,这时的数据主无法保证一致性。
如果master2,slave1,slave2延迟于master1主,这个时master1宕机,slave1,slave2将会等待数据追上db1后,再重新指向新的主node2进行复制操作,这时的数据也无法保证同步的一致性。
(4)如果采用MMM高可用架构,主,主备选节点机器配置一样,而且开启半同步进一步提高安全性或采用MariaDB/mysql5.7进行多线程从复制,提高复制的性能。
更详细的信息请参考我的博客:http://wuyunkeji.blog.51cto.com
——如果写的还可以,请您多多评论,您的支持才是我坚持的动力!!!,