程序師世界是廣大編程愛好者互助、分享、學習的平台,程序師世界有你更精彩!
首頁
編程語言
C語言|JAVA編程
Python編程
網頁編程
ASP編程|PHP編程
JSP編程
數據庫知識
MYSQL數據庫|SqlServer數據庫
Oracle數據庫|DB2數據庫
 程式師世界 >> 編程語言 >> 網頁編程 >> PHP編程 >> 關於PHP編程 >> 基於PGPool的1主2從PostgreSQL流復制HA的搭建

基於PGPool的1主2從PostgreSQL流復制HA的搭建

編輯:關於PHP編程

基於PGPool的1主2從PostgreSQL流復制HA的搭建


基於PGPool的1主2從PostgreSQL流復制HA的搭建

PostgreSQL的流復制為HA提供了很好的支持,但是部署HA集群還需要專門的HA組件, 比如通用的Pacemaker+Corosync。pgpool作為PostgreSQL的中間件,也提供HA功能。

pgpool可以監視後端PostgreSQL的健康並實施failover,由於應用的所有流量都經過pgpool,可以很容易對故障節點進行隔離, 但,同時必須為pgpool配置備機,防止pgpool本身成為單點。pgpool自身帶watchdog組件通過quorum機制防止腦裂, 因此建議pgpool節點至少是3個,並且是奇數。在失去quorum後watchdog會自動摘除VIP,並阻塞客戶端連接。

下面利用pgpool搭建3節點PostgreSQL流復制的HA集群。 集群的目標為強數據一致HA,實現思路如下:

  • 基於PostgreSQL的1主2從同步復制
  • Slave的復制連接字符串使用固定的pgsql_primary作為Master的主機名,在/etc/hosts中將Master的ip映射到pgsql_primary上,通過/etc/hosts的修改實現Slave對復制源(Master)的切換。 之所以采取這種方式是為了避免直接修改recovery.conf後重啟postgres進程時會被pgpool檢測到並設置postgres後端為down狀態。
  • pgpool分別部署在3個節點上,pgpool的Master和PostgreSQL的Primary最好不在同一個節點上,這樣在PostgreSQL的Primary down時可以干淨的隔離故障機器。

環境

軟件

  • CentOS 7.0
  • PGPool 3.5
  • PostgreSQL9.5

節點

  • node1 192.168.0.211
  • node2 192.168.0.212
  • node3 192.168.0.213
  • vip 192.168.0.220

配置

  • PostgreSQL Port:5433
  • 復制賬號:replication/replication
  • 管理賬號:admin/admin

前提

  • 3個節點建立ssh互信。
  • 3個節點配置好主機名解析(/etc/hosts)
    將pgsql_primary解析為主節點的IP

    [postgres@node3 ~]$ cat /etc/hosts127.0.0.1   localhost localhost.localdomain localhost4 localhost4.localdomain4::1         localhost localhost.localdomain localhost6 localhost6.localdomain6192.168.0.211 node1192.168.0.212 node2192.168.0.213 node3192.168.0.211 pgsql_primary 
  • 3個節點事先裝好PostgreSQL,並配置1主2從同步流復制,node1是主節點。
    在2個Slave節點node2和node3上設置recovery.conf中的復制源的主機名為pgsql_primary

    [postgres@node3 ~]$ cat /data/postgresql/data/recovery.conf standby_mode = 'on'primary_conninfo = 'host=pgsql_primary port=5433 application_name=node3 user=replication password=replication keepalives_idle=60 keepalives_interval=5 keepalives_count=5'restore_command = ''recovery_target_timeline = 'latest' 

安裝pgpool

在node1,node2和node3節點上安裝pgpool-II

yum install http://www.pgpool.net/yum/rpms/3.5/redhat/rhel-7-x86_64/pgpool-II-release-3.5-1.noarch.rpmyum install pgpool-II-pg95 pgpool-II-pg95-extensions 

在Master上安裝pgpool_recovery擴展(可選)

[postgres@node1 ~]$ psql template1 -p5433 psql (9.5.2)Type "help" for help.template1=# CREATE EXTENSION pgpool_recovery; 

pgpool_recovery擴展定義了4個函數用於遠程控制PG,這樣可以避免了對ssh的依賴,不過下面的步驟沒有用到這些函數。

template1=> \dx+ pgpool_recovery    Objects in extension "pgpool_recovery"              Object Description               ----------------------------------------------- function pgpool_pgctl(text,text) function pgpool_recovery(text,text,text) function pgpool_recovery(text,text,text,text) function pgpool_remote_start(text,text) function pgpool_switch_xlog(text)(5 rows) 

配置pgpool.conf

以下是node3上的配置,node1和node2節點上參照設置

$ cp /etc/pgpool-II/pgpool.conf.sample-stream /etc/pgpool-II/pgpool.conf$ vi /etc/pgpool-II/pgpool.conflisten_addresses = '*'port = 9999pcp_listen_addresses = '*'pcp_port = 9898backend_hostname0 = 'node1'backend_port0 = 5433backend_weight0 = 1backend_data_directory0 = '/data/postgresql/data'backend_flag0 = 'ALLOW_TO_FAILOVER'backend_hostname1 = 'node2'backend_port1 = 5433backend_weight1 = 1backend_data_directory1 = '/data/postgresql/data'backend_flag1 = 'ALLOW_TO_FAILOVER'backend_hostname2 = 'node3'backend_port2 = 5433backend_weight2 = 1backend_data_directory2 = '/data/postgresql/data'backend_flag2 = 'ALLOW_TO_FAILOVER'enable_pool_hba = offpool_passwd = 'pool_passwd'pid_file_name = '/var/run/pgpool/pgpool.pid'logdir = '/var/log/pgpool'connection_cache = onreplication_mode = offload_balance_mode = onmaster_slave_mode = onmaster_slave_sub_mode = 'stream'sr_check_period = 10sr_check_user = 'admin'sr_check_password = 'admin'sr_check_database = 'postgres'delay_threshold = 10000000follow_master_command = ''health_check_period = 3health_check_timeout = 20health_check_user = 'admin'health_check_password = 'admin'health_check_database = 'postgres'health_check_max_retries = 0health_check_retry_delay = 1connect_timeout = 10000failover_command = '/home/postgres/failover.sh %h %H %d %P'failback_command = ''fail_over_on_backend_error = onsearch_primary_node_timeout = 10use_watchdog = onwd_hostname = 'node3'     ##設置本節點的節點名wd_port = 9000wd_priority = 1wd_authkey = ''wd_ipc_socket_dir = '/tmp'delegate_IP = '192.168.0.220'if_cmd_path = '/usr/sbin'if_up_cmd = 'ip addr add $_IP_$/24 dev eno16777736 label eno16777736:0'if_down_cmd = 'ip addr del $_IP_$/24 dev eno16777736'arping_path = '/usr/sbin'arping_cmd = 'arping -U $_IP_$ -w 1 -I eno16777736'wd_monitoring_interfaces_list = ''wd_lifecheck_method = 'heartbeat'wd_interval = 10wd_heartbeat_port = 9694wd_heartbeat_keepalive = 2wd_heartbeat_deadtime = 30heartbeat_destination0 = 'node1'    ##設置其它PostgreSQL節點的節點名heartbeat_destination_port0 = 9694heartbeat_device0 = 'eno16777736'heartbeat_destination1 = 'node2'    ##設置其它PostgreSQL節點的節點名heartbeat_destination_port1 = 9694heartbeat_device1 = 'eno16777736'other_pgpool_hostname0 = 'node1'    ##設置其它pgpool節點的節點名other_pgpool_port0 = 9999other_wd_port0 = 9000other_pgpool_hostname0 = 'node2'    ##設置其它pgpool節點的節點名other_pgpool_port0 = 9999other_wd_port0 = 9000 

配置PCP命令接口

pgpool-II 有一個用於管理功能的接口,用於通過網絡獲取數據庫節點信息、關閉 pgpool-II 等。要使用 PCP 命令,必須進行用戶認證。這需要在 pcp.conf 文件中定義一個用戶和密碼。

$ pg_md5 pgpoolba777e4c2f15c11ea8ac3be7e0440aa0$ vi /etc/pgpool-II/pcp.confroot:ba777e4c2f15c11ea8ac3be7e0440aa0 

為了免去每次執行pcp命令都輸入密碼的麻煩,可以配置免密碼文件。

$ vi ~/.pcppasslocalhost:9898:root:pgpool$ chmod 0600 ~/.pcppass 

配置pool_hba.conf(可選)

pgpool可以按照和PostgreSQL的hba.conf類似的方式配置自己的主機認證,所有連接到pgpool上的客戶端連接將接受認證,這解決了後端PostgreSQL無法直接對前端主機進行IP地址限制的問題。

開啟pgpool的hba認證

$ vi /etc/pgpool-II/pgpool.confenable_pool_hba = on 

編輯pool_hba.conf,注意客戶端的認證請求最終還是要被pgpool轉發到後端的PostgreSQL上去,所以pool_hba.conf上的配置應和後端的hba.conf一致,比如pgpool對客戶端的連接采用md5認證,那麼PostgreSQL對這個pgpool轉發的連接也要采用md5認證,並且密碼相同。

$ vi /etc/pgpool-II/pool_hba.conf 

如果pgpool使用了md5認證,需要在pgpool上設置密碼文件。

密碼文件名通過pgpool.conf中的pool_passwd參數設置,默認為/etc/pgpool-II/pool_passwd

設置pool_passwd的方法如下。

$ pg_md5 -m -u admin admin 

啟動pgpool

分別在3個節點上啟動pgpool。

[root@node3 ~]# service pgpool startRedirecting to /bin/systemctl start  pgpool.service 

檢查pgpool日志輸出,確認啟動成功。

[root@node3 ~]# tail /var/log/messagesNov  8 12:53:47 node3 pgpool: 2016-11-08 12:53:47: pid 31078: LOG:  pgpool-II successfully started. version 3.5.4 (ekieboshi) 

通過pcp_watchdog_info命令確認集群狀況

[root@node3 ~]# pcp_watchdog_info  -w -vWatchdog Cluster Information Total Nodes          : 3Remote Nodes         : 2Quorum state         : QUORUM EXISTAlive Remote Nodes   : 2VIP up on local node : NOMaster Node Name     : Linux_node2_9999Master Host Name     : node2Watchdog Node Information Node Name      : Linux_node3_9999Host Name      : node3Delegate IP    : 192.168.0.220Pgpool port    : 9999Watchdog port  : 9000Node priority  : 1Status         : 7Status Name    : STANDBYNode Name      : Linux_node1_9999Host Name      : node1Delegate IP    : 192.168.0.220Pgpool port    : 9999Watchdog port  : 9000Node priority  : 1Status         : 7Status Name    : STANDBYNode Name      : Linux_node2_9999Host Name      : node2Delegate IP    : 192.168.0.220Pgpool port    : 9999Watchdog port  : 9000Node priority  : 1Status         : 4Status Name    : MASTER 

通過psql命令確認集群狀況

[root@node3 ~]# psql -hnode3 -p9999 -U admin postgres...postgres=> show pool_nodes; node_id | hostname | port | status | lb_weight |  role   | select_cnt ---------+----------+------+--------+-----------+---------+------------ 0       | node1    | 5433 | 2      | 0.333333  | standby | 0 1       | node2    | 5433 | 2      | 0.333333  | standby | 0 2       | node3    | 5433 | 2      | 0.333333  | primary | 0(3 rows) 

准備failover腳本

准備failover腳本,並部署在3個節點上

/home/postgres/failover.sh

#!/bin/bashpgsql_nodes="node1 node2 node3"logfile=/var/log/pgpool/failover.logdown_node=$1new_master=$2down_node_id=$3old_master_id=$4old_master=$down_nodeexport PGDATA="/data/postgresql/data"export PGPORT=5433export PGDATABASE=postgresexport PGUSER=adminexport PGPASSWORD=admintrigger_command="pg_ctl -D $PGDATA promote -m fast"stop_command="pg_ctl -D $PGDATA stop -m fast"start_command="pg_ctl -D $PGDATA start"restart_command="pg_ctl -D $PGDATA restart -m fast"CHECK_XLOG_LOC_SQL="select pg_last_xlog_replay_location(),pg_last_xlog_receive_location()"log(){  echo "$*" >&2  echo "`date +'%Y-%m-%d %H:%M:%S'` $*" >> $logfile}# Execulte SQL and return the result.exec_sql() {    local host="$1"    local sql="$2"    local output    local rc    output=`psql -h $host -Atc "$sql"`    rc=$?    echo $output    return $rc}get_xlog_location() {    local rc    local output    local replay_loc    local receive_loc    local output1    local output2    local log1    local log2    local newer_location    local target_host=$1    output=`exec_sql "$target_host" "$CHECK_XLOG_LOC_SQL"`    rc=$?    if [ $rc -ne 0 ]; then        log "Can't get xlog location from $target_host.(rc=$rc)"        exit 1    fi    replay_loc=`echo $output | cut -d "|" -f 1`    receive_loc=`echo $output | cut -d "|" -f 2`    output1=`echo "$replay_loc" | cut -d "/" -f 1`    output2=`echo "$replay_loc" | cut -d "/" -f 2`    log1=`printf "%08s\n" $output1 | sed "s/ /0/g"`    log2=`printf "%08s\n" $output2 | sed "s/ /0/g"`    replay_loc="${log1}${log2}"    output1=`echo "$receive_loc" | cut -d "/" -f 1`    output2=`echo "$receive_loc" | cut -d "/" -f 2`    log1=`printf "%08s\n" $output1 | sed "s/ /0/g"`    log2=`printf "%08s\n" $output2 | sed "s/ /0/g"`    receive_loc="${log1}${log2}"    newer_location=`printf "$replay_loc\n$receive_loc" | sort -r | head -1`    echo "$newer_location"    return 0}get_newer_location(){    local newer_location    newer_location=`printf "$1\n$2" | sort -r | head -1`    echo "$newer_location"}log "##########failover start:$0 $*"# if standby down do nothingif [ "X$down_node_id" != "X$old_master_id" ]; then   log "standby node '$down_node' down,skip"   exitfi# check the old_master deadlog "check the old_master '$old_master' dead ..."exec_sql $old_master "select 1" >/dev/null 2>&1if [ $? -eq 0 ]; then  log "the old master $old_master is alive, cancel faiover"  exit 1fi# check all nodes other than the old master alive and is standbylog "check all nodes '$pgsql_nodes' other than the old master alive and is standby ..."for host in $pgsql_nodes ; do    if [ $host != $old_master ]; then        is_in_recovery=`exec_sql $host "select pg_is_in_recovery()"`        if [ $? -ne 0 ]; then          log "failed to check $host"          exit 1        fi        if [ "$is_in_recovery" != 't' ];then            log "$host is not a valid standby(is_in_recovery=$is_in_recovery)"            exit        fi    fidone# find the node with the newer xloglog "find the node with the newer xlog ..."# TODO wait for all xlog replayednewer_location=$(get_xlog_location $new_master)log "$new_master : $newer_location"new_primary=$new_masterfor host in $pgsql_nodes ; do  if [ $host != $new_primary -a $host != $old_master ]; then    location=$(get_xlog_location $host)    log "$host : $location"    if [ "$newer_location" != "$(get_newer_location $location $newer_location)" ]; then      newer_location=$location      new_primary=$host      log "change new primary to $new_primary"    fi  fidone# change replication source to the new primary in all standbysfor host in $pgsql_nodes ; do  if [ $host != $new_primary -a $host != $old_master ]; then    log "change replication source to $new_primary in $host ..."    output=`ssh -T $host "/home/postgres/change_replication_source.sh $new_primary" 2>&1`    rc=$?    log "$output"    if [ $rc -ne 0 ]; then      log "failed to change replication source to $new_primary in $host"      exit 1    fi  fidone# trigger failoverlog "trigger failover to '$new_primary' ..."ssh -T $new_primary su - postgres -c "'$trigger_command'"rc=$?log "fire promote '$new_primary' to be the new primary (rc=$rc)"exit $rc 

/home/postgres/changereplicationsource.sh

#!/bin/bashnew_primary=$1cat /etc/hosts | grep -v ' pgsql_primary$' >/tmp/hosts.tmpecho "`resolveip -s $new_primary` pgsql_primary" >>/tmp/hosts.tmpcp -f /tmp/hosts.tmp /etc/hostsrm -f /tmp/hosts.tmp 

添加2個腳本的執行權限

[postgres@node1 ~]# chmod +x /home/postgres/failover.sh /home/postgres/change_replication_source.sh 

注:以上腳本並不十分嚴謹,僅供參考。

failover測試

故障發生前的集群狀態

[root@node3 ~]# psql -h192.168.0.220 -p9999 -U admin postgresPassword for user admin: psql (9.5.2)Type "help" for help.postgres=> show pool_nodes; node_id | hostname | port | status | lb_weight |  role   | select_cnt ---------+----------+------+--------+-----------+---------+------------ 0       | node1    | 5433 | 2      | 0.333333  | primary | 3 1       | node2    | 5433 | 2      | 0.333333  | standby | 0 2       | node3    | 5433 | 2      | 0.333333  | standby | 0(3 rows)postgres=> select inet_server_addr(); inet_server_addr ------------------ 192.168.0.211(1 row 

殺死主節點的postgres進程

[root@node1 ~]# killall -9 postgres 

檢查集群狀態,已經切換到node2

postgres=> show pool_nodes;FATAL:  unable to read data from DB node 0DETAIL:  EOF encountered with backendserver closed the connection unexpectedly    This probably means the server terminated abnormally    before or while processing the request.The connection to the server was lost. Attempting reset: Succeeded.postgres=> show pool_nodes; node_id | hostname | port | status | lb_weight |  role   | select_cnt ---------+----------+------+--------+-----------+---------+------------ 0       | node1    | 5433 | 3      | 0.333333  | standby | 27 1       | node2    | 5433 | 2      | 0.333333  | primary | 11 2       | node3    | 5433 | 2      | 0.333333  | standby | 0(3 rows)postgres=> select inet_server_addr(); inet_server_addr ------------------ 192.168.0.212(1 row) 

恢復

恢復node1為新主的Slave

修改pgsql_primary的名稱解析為新主的ip

vi /etc/hosts...192.168.0.212 pgsql_primary 

從新主上拉備份恢復

su - postgrescp /data/postgresql/data/recovery.done  /tmp/rm -rf /data/postgresql/datapg_basebackup -hpgsql_primary -p5433 -Ureplication -D /data/postgresql/data -X stream -Pcp /tmp/recovery.done /data/postgresql/data/recovery.confpg_ctl -D /data/postgresql/data startexit 

將node1加入集群

pcp_attach_node -w 0 

確認集群狀態

postgres=> show pool_nodes; node_id | hostname | port | status | lb_weight |  role   | select_cnt ---------+----------+------+--------+-----------+---------+------------ 0       | node1    | 5433 | 1      | 0.333333  | standby | 27 1       | node2    | 5433 | 2      | 0.333333  | primary | 24 2       | node3    | 5433 | 2      | 0.333333  | standby | 0(3 rows) 

錯誤處理

  1. 地址被占用pgpool啟動失敗

    Nov 15 02:33:56 node3 pgpool: 2016-11-15 02:33:56: pid 3868: FATAL:  failed to bind a socket: "/tmp/.s.PGSQL.9999"Nov 15 02:33:56 node3 pgpool: 2016-11-15 02:33:56: pid 3868: DETAIL:  bind socket failed with error: "Address already in use" 

    由於上次沒有正常關閉導致,處理方法:

    rm -f /tmp/.s.PGSQL.9999 
  2. pgpool的master斷網後,連接阻塞

切換pgpool的master節點(node1)的網絡後,通過pgpool的連接阻塞,剩余節點的pgpool重新協商出新的Master,但阻塞繼續,包括新建連接,也沒有發生切換。

pgpool的日志裡不斷輸出下面的消息

Nov 15 23:12:37 node3 pgpool: 2016-11-15 23:12:37: pid 4088: ERROR:  Failed to check replication time lagNov 15 23:12:37 node3 pgpool: 2016-11-15 23:12:37: pid 4088: DETAIL:  No persistent db connection for the node 0Nov 15 23:12:37 node3 pgpool: 2016-11-15 23:12:37: pid 4088: HINT:  check sr_check_user and sr_check_passwordNov 15 23:12:37 node3 pgpool: 2016-11-15 23:12:37: pid 4088: CONTEXT:  while checking replication time lagNov 15 23:12:39 node3 pgpool: 2016-11-15 23:12:39: pid 4088: LOG:  failed to connect to PostgreSQL server on "node1:5433", getsockopt() detected error "No route to host"Nov 15 23:12:39 node3 pgpool: 2016-11-15 23:12:39: pid 4088: ERROR:  failed to make persistent db connectionNov 15 23:12:39 node3 pgpool: 2016-11-15 23:12:39: pid 4088: DETAIL:  connection to host:"node1:5433" failed 

node2和node3已經協商出新主,但連接阻塞狀態一直繼續,除非解禁舊master的網卡。

[root@node3 ~]# pcp_watchdog_info -w -vWatchdog Cluster Information Total Nodes          : 3Remote Nodes         : 2Quorum state         : QUORUM EXISTAlive Remote Nodes   : 2VIP up on local node : YESMaster Node Name     : Linux_node3_9999Master Host Name     : node3Watchdog Node Information Node Name      : Linux_node3_9999Host Name      : node3Delegate IP    : 192.168.0.220Pgpool port    : 9999Watchdog port  : 9000Node priority  : 1Status         : 4Status Name    : MASTERNode Name      : Linux_node1_9999Host Name      : node1Delegate IP    : 192.168.0.220Pgpool port    : 9999Watchdog port  : 9000Node priority  : 1Status         : 8Status Name    : LOSTNode Name      : Linux_node2_9999Host Name      : node2Delegate IP    : 192.168.0.220Pgpool port    : 9999Watchdog port  : 9000Node priority  : 1Status         : 7Status Name    : STANDBY 

根據下面的堆棧,是pgpool通過watchdog將某個後端降級時,阻塞了。這應該是一個bug。

[root@node3 ~]# ps -ef|grep pgpool.confroot      4048     1  0 Nov15 ?        00:00:00 /usr/bin/pgpool -f /etc/pgpool-II/pgpool.conf -nroot      5301  4832  0 00:10 pts/3    00:00:00 grep --color=auto pgpool.conf[root@node3 ~]# pstack 4048#0  0x00007f73647e98d3 in __select_nocancel () from /lib64/libc.so.6#1  0x0000000000493d2e in issue_command_to_watchdog ()#2  0x0000000000494ac3 in wd_degenerate_backend_set ()#3  0x000000000040bcf3 in degenerate_backend_set_ex ()#4  0x000000000040e1c4 in PgpoolMain ()#5  0x0000000000406ec2 in main () 

總結

本次1主2從的架構中,用pgpool實施PostgreSQL的HA,效果並不理想。與pgpool和pgsql部署在一起有關,靠譜的做法是把pgpool部署在單獨的節點或和應用服務器部署在一起。

  1. 1主2從或1主多從架構中,primary節點切換後,其它Slave要follow新的primary,需要自己實現,這一步要做的嚴謹可靠並不容易。
  2. pgpool的primary出現斷網錯誤會導致整個集群掛掉,應該是一個bug,實際部署時應盡量避免pgpool和pgsql部署在相同的節點。

參考

  • https://www.sraoss.co.jp/eventseminar/2016/edbsummit_2016.pdf#search='pgpool+2016'
  • http://francs3.blog.163.com/blog/static/4057672720149285445881/
  • http://blog.163.com/digoal@126/blog/static/1638770402014413104753331/
  • https://my.oschina.net/Suregogo/blog/552765
  • https://www.itenlight.com/blog/2016/05/18/PostgreSQL+HA+with+pgpool-II+-+Part+1

  1. 上一頁:
  2. 下一頁:
Copyright © 程式師世界 All Rights Reserved