postgresql 主备及切换-恢复方案
前言
前期的gitlab 已经开始推广测试,最近对postgresql 做了主备,这里说下方案及在实施过程中遇到的坑。
postgresql 的具安装不在此介绍。
基础信息
primary_ip: 192.168.10.2,
standby_ip: 192.168.10.3,
PGDATA: /opt/gitlab/postgresql/data,
postgresql_version:(PostgreSQL) 9.6.8,
PGCONF_DIR: $PGDATA,
涉及修改的配置文件有:
- postgresql.conf --------- postgresql 主配置文件
- pg_hba.conf ------------- postgresql 访问规则文件
- recovery.conf ----------- postgresql 备库访问主库配置文件
注意事项!
1. 主备postgresql 版本需保持一致!
2. postgresql.conf 配置文件需保持一致!
3. 备库提权为主库后,切记不要直接启动原主库!
准备操作
在primary 192.168.10.2 主机操作
1.为备库准备主库,修改配置文件
cat postgresql.conf
wal_level = hot_standby # minimal, replica, or logical
max_wal_senders = 2 # max number of walsender processes
hot_standby = on # "on" allows queries during recovery
max_connections = 300 # (change requires restart)
archive_mode = on
restore_command = ''
cat pg_hba.conf
host all all 127.0.0.1/32 trust
host all all ::1/128 trust
host replication gitlab_replicator 192.168.10.3/32 trust
cat recovery.done
restore_command = ''
recovery_target_timeline = 'latest'
standby_mode = on
primary_conninfo = 'host=192.168.10.3 port=5432 user=gitlab_replicator'
2.创建用于复制的帐号,并赋予replication 权限
postgres=#CREATE USER gitlab_replicator REPLICATION LOGIN;
3.基本备份为备库准备引导数据
postgres=#SELECT pg_start_backup(back_20180929);
cd /opt/gitlab/postgresql && tar zcf base_data.tar.gz data
postgres=#SELECT pg_start_stop();
在 standby 192.168.10.3 主机操作
1.解压基本数据
将主库上创建的base_data.tar.gz
上传到备库主机,并解压到数据目录
tar zxf base_data.tar.gz -C /opt/gitlab/postgresql/
2.修改配置文件
注: postgresql.conf 文件内此部分一定要与主库的配置保持一致,否则可能会在主从切换恢复时产生错误
cat postgresql.conf
wal_level = hot_standby # minimal, replica, or logical
max_wal_senders = 2 # max number of walsender processes
hot_standby = on # "on" allows queries during recovery
max_connections = 300 # (change requires restart)
archive_mode = on
restore_command = ''
cat pg_hba.conf
host all all 127.0.0.1/32 trust
host all all ::1/128 trust
host replication gitlab_replicator 192.168.10.2/32 trust
cat recovery.conf
restore_command = ''
recovery_target_timeline = 'latest'
standby_mode = on
primary_conninfo = 'host=192.168.10.2 port=5432 user=gitlab_replicator'
3.启动备库,在主库执行sql,并在备库验证
主从切换
主备库的判断是根据当前是否存在recovery.conf文件
在将备库提升为主库时,会自动重命名recovery.conf文件为recovery.done。同时要将主库降为备库,降备方式为重命名recovery.done文件mv recover.done recovery.conf
这样在处理完主库故障后,才会将提升到主库的更新数据同步过来
这里提供个简单的思路及脚本,前提是假设主备之间不存在网络故障,且不存在同时为主或备的情况
判断主库的状态
1.为shut down
时
判断备库是否为in archive recovery
并执行将主库降为备库,将备库升为主库,其余状态发送报警
2.为in production
时
判断备库是否为in archive recovery
,其余状态发送报警
3.为in archive recovery
时
判断备库是否为in production
,其余状态发送报警
4.为shut down in recovery
时
发送报警
shell script
#!/bin/bash
PRIMARY_IP="192.168.10.2"
STANDBY_IP="192.168.10.3"
PGDATA="/DATA/postgresql/data"
SYS_USER="root"
PG_USER="postgresql"
PGPREFIX="/opt/pgsql"
pg_status()
{
ssh ${SYS_USER}@$1 /
"su - ${PG_USER} -c '${PGPREFIX}/bin/pg_controldata -D ${PGDATA} /
| grep cluster' | awk -F : '{print \$2}' | sed 's/^[ \t]*\|[ \t]*$//'"
}
# recover to primary
recovery_primary()
{
ssh ${SYS_USER}@$1 /
"su - ${PG_USER} -c '${PGPREFIX}/bin/pg_ctl promote -D ${PGDATA}'"
}
# primary to recovery
primary_recovery()
{
ssh ${SYS_USER}@$1 /
"su - ${PG_USER} -c 'cd ${PGDATA} && mv recovery.done recovery.conf'"
}
send_mail()
{
echo "send SNS"
}
case "`pg_status ${PRIMARY_IP}`" in
"shut down")
case "`pg_status ${STANDBY_IP}`" in
"in archive recovery")
primary_recovery ${PRIMARY_IP}
recovery_primary ${STANDBY_IP}
;;
"shut down in recovery"|"in production")
send_mail
;;
esac
;;
"in production")
case "`pg_status ${STANDBY_IP}`" in
"shut down in recovery"|"shut down"|"in production")
send_mail
;;
esac
echo "primary"
;;
"in archive recovery")
case "`pg_status ${STANDBY_IP}`" in
"shut down")
primary_recovery ${STANDBY_IP}
recovery_primary ${PRIMARY_IP}
;;
"shut down in recovery"|"in archive recovery")
send_mail
;;
esac
echo "recovery"
;;
"shut down in recovery")
case "`pg_status ${STANDBY_IP}`" in
"shut down in recovery"|"shut down"|"in archive recovery")
send_mail
;;
esac
echo "recovery down"
;;
esac
报错处理
error 1
FATAL: no pg_hba.conf entry for replication connection from host "192.168.1.2", user "standby", SSL off
需要将用户加入到192.168.1.2
的pg_hba.conf
文件内,并配置好认证方式及口令
error 2
FATAL: database system identifier differs between the primary and standby
DETAIL: The primary's identifier is 6589099331306617531, the standby's identifier is 6605061381709180314
这是因为在将备库提升为主库后,将原先的主库恢复为主库时没有完全将缺少的数据同步过来导致的
error 3
FATAL: number of requested standby connections exceeds max_wal_senders (currently 0)
FATAL: hot standby is not possible because max_connections = 100 is a lower setting than on the master server (its value was 200)
FATAL: hot standby is not possible because max_locks_per_transaction = 64 is a lower setting than on the master server (its value was 128)
这是因为备库的数量超过主库配置的允许备库最大连接数量了
这里配置的为0
此问提出现在将备库升为主库后,将原主库降为备库同步数据时,因此需要注意这部分的配置主备要一致
后记
postgresql 主主同步需要使用三方中间件实现,有需要的可查询相关资料
本文参考资料为postgresql 官方文档
免责声明:
① 本站未注明“稿件来源”的信息均来自网络整理。其文字、图片和音视频稿件的所属权归原作者所有。本站收集整理出于非商业性的教育和科研之目的,并不意味着本站赞同其观点或证实其内容的真实性。仅作为临时的测试数据,供内部测试之用。本站并未授权任何人以任何方式主动获取本站任何信息。
② 本站未注明“稿件来源”的临时测试数据将在测试完成后最终做删除处理。有问题或投稿请发送至: 邮箱/279061341@qq.com QQ/279061341