特色
MySQL/Galera 是一種多主同步集群,但只限於使用 MySQL/InnoDB 引擎,並具有下面特點
同步復制
多個主服務器的拓撲結構
可以在任意節點上進行讀寫
自動控制成員,自動刪除故障節點
自動加入節點
真正給予行級別的並發復制
調度客戶連接
優勢
參考下面基於 DBMS 集群的解決方法
不存在從服務器角色
不存在事務丟失
讀寫操作可根據需要進行隨意擴展
更少的闩操作
知識點
MySQL/Galera 集群使用 Galera 庫執行復制,對應 Galera 復制接口,我們需要MySQL 服務器支持 wsrep API 接口
http://www.codership.com/products/mysql-write-set-replication-project
是否可以使用 MySQL 而不使用 mariadb?
不可以,因為 mysql 中沒有支持 wsrep_ 數據復制的參數,當然代碼級別上也具有很大差別
工作原理
mariadb 可以看做是常見的數據庫,負責連接應用(web, API 等)
單純的 mariadb 無法實現多個主服務器數據同步
多台數據庫中數據同步由 wsrep 接口實現
最終目標,實現多個 MySQL 同時讀寫
wsrep API
wsrep API 是一種數據庫插件接口,比較類似一種應用程序,主要針對寫復制
該程序主要用於定義應用程序如何調用復制庫實現回寫
wsrep API 由支持改庫的應用程序動態裝載
全局事務ID(GTID)
wsrep API 描述下面復制模型,一個應用程序,如數據庫當前的一個對象,當前被客戶端修改,對象改變導致事務產生一系列的原子性改變, 在集群中所有的節點都具備相同的對象,並由同步復制應用都各自節點,按照相同的順序產生相同變化從而實現數據同步
到最後,wsrep API 將會分配一個全局事務ID 該 ID 具有下面功能
標識對象的改變
標識對象自身 ID 最後狀態(正常情況下,ID 是連續不中斷的)
GTID 包含
一個 UUID 作為對象標識及經歷改變的序號,序號會發生連續的改變
GTID 允許比較應用程序狀態,建立對象改變的順序,決定對象的變化是否需要更新 GTID
通常 GTID 會卑記錄成下面格式
45eec521-2f34-11e0-0800-2a36050b826b:94530586304
言歸正傳,我們需要編譯 mariadb-mysql 及 galera 插件
galera/mysql 編譯步驟
https://downloads.mariadb.org/interstitial/mariadb-galera-5.5.33a/kvm-tarbake-jaunty-x86/mariadb-galera-5.5.33a.tar.gz/from/http://mirrors.scie.in/mariadb
yum install -y cmake
tar xf mariadb-galera-5.5.33a.tar.gz
cd mariadb-5.5.33a/
cmake -LAH
參考 CMakeCache.txt 文件中的配置信息
cmake -DINSTALL_MYSQLDATADIR:STRING=/mdb -DINSTALL_UNIX_ADDRDIR:STRING=/var/run/mysqld/mysql5.socket make make install
默認情況下, mariadb 安裝在 /usr/local/mysql
galera 編譯
https://launchpad.net/galera/2.x/23.2.7/+download/galera-23.2.7-src.tar.gz
添加數據源
baseurl=http://mirror.neu.edu.cn/fedora/epel//6Server/x86_64/
添加下面軟件包
yum erase -y mysql.x86_64 mysql-devel.x86_64 mysql-libs.x86_64 yum install -y boost-devel.x86_64 libodb-boost-devel.x86_64 bzr scons
解壓 galera-23.2.7-src.tar.gz 並進行編譯
cd /usr/src tar xf galera-23.2.7-src.tar.gz cd galera-23.2.7-src scons
編譯後能生成 libgalera_smm.so
復制編譯好的庫至下面位置 /usr/local/galera/lib/libgalera_smm.so
mkdir /usr/local/galera/lib -p cp /usr/src/galera-23.2.7-src/libgalera_smm.so /usr/local/galera/lib/libgalera_smm.so
復制 啟動腳本 /usr/src/galera-23.2.7-src/scripts/mysql/mysql-galera 到 /usr/local
cp /usr/src/galera-23.2.7-src/scripts/mysql/mysql-galera /usr/local
創建 /usr/local/mysql/etc/my.cnf
[mysqld] basedir=/usr/local/mysql big-tables bind-address=0.0.0.0 character-set-server=utf8 datadir=/mdb log-error=/var/log/mysqld/mysql5-error.log socket=/var/run/mysqld/mysql5.socket pid-file=/var/run/mysqld/mysql5.pid port=3306 user=mysql binlog_format = ROW binlog_cache_size = 1M character_set_server = utf8 collation_server = utf8_general_ci default-storage-engine = InnoDB expire_logs_days = 10 innodb_buffer_pool_size = 300M innodb_thread_concurrency = 16 innodb_log_buffer_size = 8M innodb_doublewrite = 1 innodb_file_per_table = 1 innodb_flush_log_at_trx_commit = 2 server-id = 1 max_connections = 1000 net_buffer_length = 8K open-files-limit = 65535 wsrep_cluster_address = 'gcomm://192.168.200.163,192.168.200.171,172.18.8.49,172.18.8.50' wsrep_provider = /usr/local/galera/lib/libgalera_smm.so wsrep_retry_autocommit = 0 wsrep_sst_method = rsync wsrep_provider_options="gcache.size=256m; gcache.page_size=256m" wsrep_slave_threads=16 wsrep_cluster_name='my_cluster' wsrep_node_name='db5' wsrep_sst_auth=tt:tt123 maridb 啟動測試
初始化數據庫
mkdir /mdb cd /usr/local/mysql ./scripts/install_mysql_db --datadir=/mdb
啟動腳本 /etc/rc.d/init.d/mysql5 確保文件可執行權限
#!/bin/sh # chkconfig: 2345 64 36 # description: A very fast and reliable SQL database engine.
basedir=/usr/local/mysql datadir=/mdb
# Default value, in seconds, afterwhich the script should timeout waiting # for server start. # Value here is overriden by value in my.cnf. # 0 means don't wait at all # Negative numbers mean to wait indefinitely service_startup_timeout=900 startup_sleep=1
# Lock directory for RedHat / SuSE. lockdir='/var/lock/subsys' lock_file_path="$lockdir/mysql"
# The following variables are only set for letting mysql.server find things.
# Set some defaults mysqld_pid_file_path=/var/run/mysqld/mysql5.pid if test -z "$basedir" then basedir=/usr/local/mysql bindir=/usr/local/mysql/bin if test -z "$datadir" then datadir=/usr/local/mysql/data fi sbindir=/usr/local/mysql/bin libexecdir=/usr/local/mysql/bin
else bindir="$basedir/bin" if test -z "$datadir" then datadir="$basedir/data" fi sbindir="$basedir/sbin" if test -f "$basedir/bin/mysqld" then libexecdir="$basedir/bin" else libexecdir="$basedir/libexec" fi fi
# datadir_set is used to determine if datadir was set (and so should be # *not* set inside of the --basedir= handler.) datadir_set=
# # Use LSB init script functions for printing messages, if possible # lsb_functions="/lib/lsb/init-functions" if test -f $lsb_functions ; then . $lsb_functions else log_success_msg() { echo " SUCCESS! $@" } log_failure_msg() { echo " ERROR! $@" } fi
PATH="/sbin:/usr/sbin:/bin:/usr/bin:$basedir/bin"
export PATH
mode=$1 # start or stop
[ $# -ge 1 ] && shift
other_args="$*" # uncommon, but needed when called from an RPM upgrade action # Expected: "--skip-networking --skip-grant-tables" # They are not checked here, intentionally, as it is the resposibility # of the "spec" file author to give correct arguments only.
case `echo "testing\c"`,`echo -n testing` in *c*,-n*) echo_n= echo_c= ;; *c*,*) echo_n=-n echo_c= ;; *) echo_n= echo_c='\c' ;; esac
parse_server_arguments() { for arg do case "$arg" in --basedir=*) basedir=`echo "$arg" | sed -e 's/^[^=]*=//'` bindir="$basedir/bin" if test -z "$datadir_set"; then datadir="$basedir/data" fi sbindir="$basedir/sbin" if test -f "$basedir/bin/mysqld" then libexecdir="$basedir/bin" else libexecdir="$basedir/libexec" fi libexecdir="$basedir/libexec" ;; --datadir=*) datadir=`echo "$arg" | sed -e 's/^[^=]*=//'` datadir_set=1
;; --pid-file=*) mysqld_pid_file_path=`echo "$arg" | sed -e 's/^[^=]*=//'` ;; --service-startup-timeout=*) service_startup_timeout=`echo "$arg" | sed -e 's/^[^=]*=//'` ;; esac done }
wait_for_pid () { verb="$1" # created | removed pid="$2" # process ID of the program operating on the pid-file pid_file_path="$3" # path to the PID file.
sst_progress_file=$datadir/sst_in_progress i=0 avoid_race_condition="by checking again"
while test $i -ne $service_startup_timeout ; do
case "$verb" in 'created') # wait for a PID-file to pop into existence. test -s "$pid_file_path" && i='' && break ;; 'removed') # wait for this PID-file to disappear test ! -s "$pid_file_path" && i='' && break ;; *) echo "wait_for_pid () usage: wait_for_pid created|removed pid pid_file_path" exit 1 ;; esac
# if server isn't running, then pid-file will never be updated if test -n "$pid"; then if kill -0 "$pid" 2>/dev/null; then : # the server still runs
else # The server may have exited between the last pid-file check and now. if test -n "$avoid_race_condition"; then avoid_race_condition="" continue # Check again. fi
# there's nothing that will affect the file. log_failure_msg "The server quit without updating PID file ($pid_file_path)." return 1 # not waiting any more. fi fi
if test -e $sst_progress_file && [ $startup_sleep -ne 100 ];then echo $echo_n "SST in progress, setting sleep higher" startup_sleep=100 fi
echo $echo_n ".$echo_c" i=`expr $i + 1` sleep $startup_sleep
done
if test -z "$i" ; then log_success_msg return 0 else log_failure_msg return 1 fi }
# Get arguments from the my.cnf file, # the only group, which is read from now on is [mysqld] if test -x ./bin/my_print_defaults then
print_defaults="./bin/my_print_defaults" elif test -x $bindir/my_print_defaults then print_defaults="$bindir/my_print_defaults" elif test -x $bindir/mysql_print_defaults then print_defaults="$bindir/mysql_print_defaults" else # Try to find basedir in /etc/my.cnf conf=/usr/local/mysql/etc/my.cnf print_defaults= if test -r $conf then subpat='^[^=]*basedir[^=]*=\(.*\)$' dirs=`sed -e "/$subpat/!d" -e 's//\1/' $conf` for d in $dirs do d=`echo $d | sed -e 's/[ ]//g'` if test -x "$d/bin/my_print_defaults" then print_defaults="$d/bin/my_print_defaults" break fi if test -x "$d/bin/mysql_print_defaults" then print_defaults="$d/bin/mysql_print_defaults" break fi done fi
# Hope it's in the PATH ... but I doubt it test -z "$print_defaults" && print_defaults="my_print_defaults" fi
# # Read defaults file from 'basedir'. If there is no defaults file there
# check if it's in the old (depricated) place (datadir) and read it from there #
extra_args="" if test -r "$basedir/my.cnf" then extra_args="-e $basedir/my.cnf" else if test -r "$datadir/my.cnf" then extra_args="-e $datadir/my.cnf" fi fi
parse_server_arguments `$print_defaults $extra_args mysqld server mysql_server mysql.server`
# # Set pid file if not given # if test -z "$mysqld_pid_file_path" then mysqld_pid_file_path=$datadir/`hostname`.pid else case "$mysqld_pid_file_path" in /* ) ;; * ) mysqld_pid_file_path="$datadir/$mysqld_pid_file_path" ;; esac fi
case "$mode" in 'start') # Start daemon
# Safeguard (relative paths, core dumps..) cd $basedir
echo $echo_n "Starting MySQL"
if test -x $bindir/mysqld_safe then # Give extra arguments to mysqld with the my.cnf file. This script # may be overwritten at next upgrade. $bindir/mysqld_safe --datadir="$datadir" --pid-file="$mysqld_pid_file_path" $other_args >/dev/null 2>&1 & wait_for_pid created "$!" "$mysqld_pid_file_path"; return_value=$?
# Make lock for RedHat / SuSE if test -w "$lockdir" then touch "$lock_file_path" fi
exit $return_value else log_failure_msg "Couldn't find MySQL server ($bindir/mysqld_safe)" fi ;;
'stop') # Stop daemon. We use a signal here to avoid having to know the # root password.
if test -s "$mysqld_pid_file_path" then mysqld_pid=`cat "$mysqld_pid_file_path"`
if (kill -0 $mysqld_pid 2>/dev/null) then echo $echo_n "Shutting down MySQL" kill $mysqld_pid # mysqld should remove the pid file when it exits, so wait for it. wait_for_pid removed "$mysqld_pid" "$mysqld_pid_file_path"; return_value=$? else log_failure_msg "MySQL server process #$mysqld_pid is not running!" rm "$mysqld_pid_file_path" fi
# Delete lock for RedHat / SuSE if test -f "$lock_file_path" then rm -f "$lock_file_path" fi exit $return_value else log_failure_msg "MySQL server PID file could not be found!" fi ;;
'restart') # Stop the service and regardless of whether it was # running or not, start it again. if $0 stop $other_args; then $0 start $other_args else log_failure_msg "Failed to stop running server, so refusing to try to start." exit 1 fi ;;
'reload'|'force-reload') if test -s "$mysqld_pid_file_path" ; then read mysqld_pid < "$mysqld_pid_file_path" kill -HUP $mysqld_pid && log_success_msg "Reloading service MySQL" touch "$mysqld_pid_file_path" else log_failure_msg "MySQL PID file could not be found!" exit 1 fi ;; 'status') # First, check to see if pid file exists if test -s "$mysqld_pid_file_path" ; then read mysqld_pid < "$mysqld_pid_file_path"
if kill -0 $mysqld_pid 2>/dev/null ; then log_success_msg "MySQL running ($mysqld_pid)" exit 0 else log_failure_msg "MySQL is not running, but PID file exists" exit 1 fi else # Try to find appropriate mysqld process mysqld_pid=`pidof $libexecdir/mysqld`
# test if multiple pids exist pid_count=`echo $mysqld_pid | wc -w` if test $pid_count -gt 1 ; then log_failure_msg "Multiple MySQL running but PID file could not be found ($mysqld_pid)" exit 5 elif test -z $mysqld_pid ; then if test -f "$lock_file_path" ; then log_failure_msg "MySQL is not running, but lock file ($lock_file_path) exists" exit 2 fi log_failure_msg "MySQL is not running" exit 3 else log_failure_msg "MySQL is running but PID file could not be found" exit 4 fi fi ;; 'configtest') # Safeguard (relative paths, core dumps..) cd $basedir echo $echo_n "Testing MySQL configuration syntax" daemon=$bindir/mysqld if test -x $libexecdir/mysqld then daemon=$libexecdir/mysqld elif test -x $sbindir/mysqld then daemon=$sbindir/mysqld elif test -x `which mysqld` then daemon=`which mysqld` else log_failure_msg "Unable to locate the mysqld binary!" exit 1 fi help_out=`$daemon --help 2>&1`; r=$? if test "$r" != 0 ; then log_failure_msg "$help_out" log_failure_msg "There are syntax errors in the server configuration. Please fix them!" else log_success_msg "Syntax OK" fi exit $r ;; 'bootstrap') # Bootstrap the cluster, start the first node # that initiate the cluster echo $echo_n "Bootstrapping the cluster" $0 start $other_args --wsrep-new-cluster ;; *) # usage basename=`basename "$0"` echo "Usage: $basename {start|stop|restart|reload|force-reload|status|configtest|bootstrap} [ MySQL server options ]" exit 1 ;; esac
exit 0
啟動每一台數據庫
service mysql5 start
在每台數據庫中建立下面用戶, 用於 sst 認證
GRANT USAGE ON *.* to tt@'%' IDENTIFIED BY 'tt123'; GRANT ALL PRIVILEGES on *.* to tt@'%'; GRANT USAGE ON *.* to tt@'localhost' IDENTIFIED BY 'tt123'; GRANT ALL PRIVILEGES on *.* to tt@'localhost'; flush privileges;
關閉所有數據庫
service mysql5 stop
創建並加入集群
集群中第一個節點啟動 (192.168.200.163)
創建軟鏈接,並啟動集群,集群啟動過程中會自動啟動 mariadb
ln -s /usr/local/mysql/bin/ /usr/local/mysql/sbin cd /usr/local/ ./mysql-galera -g gcomm:// start
測試是否成功啟動方法, 查詢是否會自動啟動 4567 端口
[root@db2 local]# netstat -ntl Active Internet connections (only servers) Proto Recv-Q Send-Q Local Address Foreign Address State tcp 0 0 0.0.0.0:3306 0.0.0.0:* LISTEN tcp 0 0 0.0.0.0:22 0.0.0.0:* LISTEN tcp 0 0 0.0.0.0:4567 0.0.0.0:* LISTEN
登錄 mysql 之後,查詢當前是否啟用 galery 插件
MariaDB [(none)]> show status like 'wsrep_ready'; +---------------+-------+ | Variable_name | Value | +---------------+-------+ | wsrep_ready | ON | +---------------+-------+ 1 row in set (0.00 sec)
注,on 為已經啟動插件狀態
關閉方法
cd /usr/local ./mysql-galera stop
其他節點加入集群方法
第一台 (192.168.200.163) 節點已經啟動成功
第二台 (192.168.200.171) 需要加入集群
cd /usr/local/ ./mysql-galera -g gcomm://192.168.200.163 start
可按上述方法進行集群啟動測試, 也可以參照下面方法, 觀察集群地址是否增加兩個服務器地址
MariaDB [(none)]> show status like 'wsrep_incoming_addresses'; +--------------------------+-------------------------------------------+ | Variable_name | Value | +--------------------------+-------------------------------------------+ | wsrep_incoming_addresses | 192.168.200.171:3306,192.168.200.163:3306 | +--------------------------+-------------------------------------------+ 1 row in set (0.00 sec)
第三台 (172.18.8.49) 需要加入集群
cd /usr/local/ ./mysql-galera -g gcomm://192.168.200.163,192.168.200.171 start
第四台 (172.18.8.50) 需要加入集群
cd /usr/local/ ./mysql-galera -g gcomm://192.168.200.163,192.168.200.171,172.18.8.49 start
注: 每次集群啟動, 將會啟用數據同步機制,令每個集群中的數據同步 假如,集群工作期間,節點 3(172.18.8.49) 脫離集群,重啟,發生故障 而脫機期間,節點1,2,4 仍可繼續工作 當節點3 重新在線時,加入集群前,將會自動進行數據同步 重新在線方法與上文中加入節點方法一致 常見 wsrep 參數注釋
MariaDB [terry]> show status like 'wsrep%'; +----------------------------+--------------------------------------+ | Variable_name | Value | +----------------------------+--------------------------------------+ | wsrep_local_state_uuid | bb5b9e17-66c8-11e3-86ba-96854521d205 | uuid 集群唯一標記 | wsrep_protocol_version | 4 | | wsrep_last_committed | 16 | sql 提交記錄 | wsrep_replicated | 4 | 隨著復制發出的次數 | wsrep_replicated_bytes | 692 | 數據復制發出的字節數 | wsrep_received | 18 | 數據復制接收次數 | wsrep_received_bytes | 3070 | 數據復制接收的字節數 | wsrep_local_commits | 4 | 本地執行的 sql | wsrep_local_cert_failures | 0 | 本地失敗事務 | wsrep_local_bf_aborts | 0 |從執行事務過程被本地中斷 | wsrep_local_replays | 0 | | wsrep_local_send_queue | 0 | 本地發出的隊列 | wsrep_local_send_queue_avg | 0.142857 | 隊列平均時間間隔 | wsrep_local_recv_queue | 0 | 本地接收隊列 | wsrep_local_recv_queue_avg | 0.000000 | 本地接收時間間隔 | wsrep_flow_control_paused | 0.000000 | | wsrep_flow_control_sent | 0 | | wsrep_flow_control_recv | 0 | | wsrep_cert_deps_distance | 0.000000 | 並發數量 | wsrep_apply_oooe | 0.000000 | | wsrep_apply_oool | 0.000000 | | wsrep_apply_window | 1.000000 | | wsrep_commit_oooe | 0.000000 | | wsrep_commit_oool | 0.000000 | | wsrep_commit_window | 1.000000 | | wsrep_local_state | 4 | | wsrep_local_state_comment | Synced | | wsrep_cert_index_size | 0 | | wsrep_causal_reads | 0 | | wsrep_incoming_addresses | 172.18.8.50:3306,172.18.8.49:3306 | 連接中的數據庫 | wsrep_cluster_conf_id | 18 | | wsrep_cluster_size | 2 | 集群成員個數 | wsrep_cluster_state_uuid | bb5b9e17-66c8-11e3-86ba-96854521d205 | 集群 ID | wsrep_cluster_status | Primary | 主服務器 | wsrep_connected | ON | 當前是否連接中 | wsrep_local_index | 1 | | wsrep_provider_name | Galera | | wsrep_provider_vendor | Codership Oy <[email protected]> | | wsrep_provider_version | 2.7(rXXXX) | | wsrep_ready | ON | 插件是否應用中 +----------------------------+--------------------------------------+ 40 rows in set (0.05 sec)
時間關系,還沒有時間進行壓力測試,也沒有比對 galera 與 Percona XtraDB Cluster 集群之間區別
另,如使用 rpm 則十分方便,網路很多教程, 不詳細描述