pt-heartbeat是用來監測主從延遲的情況的,眾所周知,傳統的通過show slave status\G命令中的Seconds_Behind_Master值來判斷主從延遲並不靠譜。
pt-heartbeat的思路比較巧妙,它在master中插入一條帶有當前時間(MySQL中的now()函數)的記錄到心跳表中,然後,該記錄會復制到slave中。slave根據當前的系統時間戳(Perl中的time函數)減去heartbeat表中的記錄值來判斷主從的延遲情況。具體,可參考下面--skew參數的說明。
常見用法:
Master上
需用--update參數
# pt-heartbeat --update -h 192.168.244.10 -u monitor -p monitor123 -D test
其中,--update會每秒更新一次heartbeat表的記錄 -D指的是heartbeat表所在的database
-D是--database的縮寫,--database後面即可以直接加參數,如--database test,也可用等號加參數,如--database=test,但是縮寫只能通過-D test加參數。
# pt-heartbeat --update -h 192.168.244.10 -u monitor -p monitor123 --database test # pt-heartbeat --update -h 192.168.244.10 -u monitor -p monitor123 --database=test
注意:在第一次運行時,需帶上 --create-table參數創建heartbeat表並插入第一條記錄。也可加上--daemonize參數,讓該腳本以後台進程運行。
Slave上
可用--monitor參數或者--check參數
--monitor參數是持續監測並輸出結果
# pt-heartbeat -D test --monitor -h 192.168.244.20 --master-server-id=1 -u monitor -p monitor123
10061.00s [ 167.68s, 33.54s, 11.18s ] 10062.00s [ 335.38s, 67.08s, 22.36s ] 10063.01s [ 503.10s, 100.62s, 33.54s ] ...
--check參數是只監測一次就退出了
# pt-heartbeat -D test --check -h 192.168.244.20 --master-server-id=1 -u monitor -p monitor123
10039.00
注意:--update, --monitor和--check三者是互斥的,--daemonize只適用於--update場景
看看各參數的意義
--ask-pass
連接數據庫時提示密碼
Prompt for a password when connecting to MySQL.
--charset
short form: -A
默認字符集(個人感覺這個選項很雞肋)
short form: -A; type: string Default character set. If the value is utf8, sets Perl’s binmode on STDOUT to utf8, passes the mysql_enable_utf8 option to DBD::mysql, and runs SET NAMES UTF8 after connecting to MySQL. Any other value sets binmode on STDOUT without the utf8 layer, and runs SET NAMES after connecting to MySQL.
--check
檢查從的延遲後退出,如果在級聯復制中,還可以指定--recurse參數,這時候,會檢測從庫的從庫的延遲情況。
Check slave delay once and exit. If you also specify --recurse, the tool will try to discover slave’s of the given slave and check and print their lag, too. The hostname or IP and port for each slave is printed before its delay. --recurse only works with MySQL.
--check-read-only
檢查server是否是只讀的,如果是只讀,則會跳過插入動作。
Check if the server has read_only enabled; If it does, the tool skips doing any inserts.
--config
Read this comma-separated list of config files; if specified, this must be the first option on the command line.
將參數寫入到參數文件中,
有以下幾點需要注意:
1> # pt-heartbeat --config pt-heartbeat.conf,而不能是# pt-heartbeat --config=pt-heartbeat.conf
2> 參數文件中只支持如下寫法
option option=value
option面前不能帶上--,而且option只能是全拼,不能是縮寫,譬如database,不能縮寫為-D
具體寫法可參考:https://www.percona.com/doc/percona-toolkit/2.1/configuration_files.html
試舉一例,如下所示
# cat pt-heartbeat.conf host=192.168.244.20 user=monitor password=monitor123 monitor database=test master-server-id=1
--create-table
創建heartbeat表如果該表不存在,該表由--database和--table參數來確認。
其中--table表的定義如下所示:
CREATE TABLE heartbeat ( ts varchar(26) NOT NULL, server_id int unsigned NOT NULL PRIMARY KEY, file varchar(255) DEFAULT NULL, -- SHOW MASTER STATUS position bigint unsigned DEFAULT NULL, -- SHOW MASTER STATUS relay_master_log_file varchar(255) DEFAULT NULL, -- SHOW SLAVE STATUS exec_master_log_pos bigint unsigned DEFAULT NULL -- SHOW SLAVE STATUS );
Create the heartbeat --table if it does not exist. This option causes the table specified by --database and --table to be created with the following MAGIC_create_heartbeat table definition: CREATE TABLE heartbeat ( ts varchar(26) NOT NULL, server_id int unsigned NOT NULL PRIMARY KEY, file varchar(255) DEFAULT NULL, -- SHOW MASTER STATUS position bigint unsigned DEFAULT NULL, -- SHOW MASTER STATUS relay_master_log_file varchar(255) DEFAULT NULL, -- SHOW SLAVE STATUS exec_master_log_pos bigint unsigned DEFAULT NULL -- SHOW SLAVE STATUS ); The heartbeat table requires at least one row. If you manually create the heartbeat table, then you must insert a row by doing: INSERT INTO heartbeat (ts, server_id) VALUES (NOW(), N); or if using --utc: INSERT INTO heartbeat (ts, server_id) VALUES (UTC_TIMESTAMP(), N); where N is the server’s ID; do not use @@server_id because it will replicate and slaves will insert their own server ID instead of the master’s server ID. This is done automatically by --create-table. A legacy version of the heartbeat table is still supported: CREATE TABLE heartbeat ( id int NOT NULL PRIMARY KEY, ts datetime NOT NULL ); Legacy tables do not support --update instances on each slave of a multi-slave hierarchy like “master ->slave1 -> slave2”. To manually insert the one required row into a legacy table: INSERT INTO heartbeat (id, ts) VALUES (1, NOW()); or if using --utc: INSERT INTO heartbeat (id, ts) VALUES (1, UTC_TIMESTAMP()); The tool automatically detects if the heartbeat table is legacy.
--create-table-engine
指定heartbeat表的存儲引擎
type: string Sets the engine to be used for the heartbeat table. The default storage engine is InnoDB as of MySQL 5.5.5.
--daemonize
腳本以守護進程運行,這樣即使腳本執行的終端斷開了,腳本也不會停止運行。
Fork to the background and detach from the shell. POSIX operating systems only.
--database
指定heartbeat表所在的數據庫
short form: -D; type: string The database to use for the connection.
--dbi-driver
pt-heartbeat不僅能檢測MySQL之間的心跳延遲情況,還可以檢測PG。
該參數指定連接使用的驅動,默認為mysql,也可指定為Pg
default: mysql; type: string Specify a driver for the connection; mysql and Pg are supported.
--defaults-file
指定參數文件的位置,必須為絕對路徑。
short form: -F; type: string Only read mysql options from the given file. You must give an absolute pathname.
--file
將最新的--monitor信息輸出到文件中,注意最新,新的信息會覆蓋舊的信息。
如果不加該參數,則monitor的信息會直接輸出到終端上,該選項通常和--daemonize參數一起使用。
譬如,
# pt-heartbeat -D test --monitor -h 192.168.244.20 --master-server-id=1 -u monitor -p monitor123 --file=result
該命令會在當前目錄下生成一個result文件,記錄最新的檢測信息
# cat result 1376.00s [ 1126.25s, 225.25s, 75.08s ]
type: string Print latest --monitor output to this file. When --monitor is given, prints output to the specified file instead of to STDOUT. The file is opened, truncated,and closed every interval, so it will only contain the most recent statistics. Useful when --daemonize is given.
--frames
統計的時間窗口,默認為1m,5m,15m,即分別統計1min,5min和15min內的平均延遲情況。
單位可以是s,m,h,d,注意:時間窗口越大,需要緩存的結果越多,對內存的消耗也越大。
type: string; default: 1m,5m,15m Timeframes for averages. Specifies the timeframes over which to calculate moving averages when --monitor is given. Specify as a comma-separated list of numbers with suffixes. The suffix can be s for seconds, m for minutes, h for hours, or d for days. The size of the largest frame determines the maximum memory usage, as up to the specified number of per-second samples are kept in memory to calculate the averages. You can specify as many timeframes as you like.
--help
Show help and exit.
--host
指定連接的主機,可縮寫為-h
short form: -h; type: string Connect to host.
--[no]insert-heartbeat-row
官方解釋如下:
default: yes Insert a heartbeat row in the --table if one doesn’t exist. The heartbeat --table requires a heartbeat row, else there’s nothing to --update, --monitor, or --check! By default, the tool will insert a heartbeat row if one is not already present. You can disable this feature by specifying --no-insert-heartbeat-row in case the database user does not have INSERT privileges.
事實上,在執行如下命令時,
# pt-heartbeat -D test --update -h 192.168.244.10 -u monitor -p monitor123
如果,heartbeat表為空,則會自動insert一條記錄。
但如果指定了--no-insert-heartbeat-row參數,則不會自動創建,此時,會提示如下信息:
# pt-heartbeat -D test --update -h 192.168.244.10 -u monitor -p monitor123 --no-insert-heartbeat-row No row found in heartbeat table for server_id 1. At least one row must be inserted into the heartbeat table for server_id 1. Please read the DESCRIPTION section of the pt-heartbeat POD.
PS:在測試的過程中,發現官方並沒有完整的校驗這個參數,即便傳入--no-insert-heartbeat和--insert-heartbeat參數也不會報錯,但是傳入--123-insert-heartbeat-ro,會報錯“Unknown option: 123-insert-heartbeat-ro”。
default: yes Insert a heartbeat row in the --table if one doesn’t exist. The heartbeat --table requires a heartbeat row, else there’s nothing to --update, --monitor, or --check! By default, the tool will insert a heartbeat row if one is not already present. You can disable this feature by specifying --no-insert-heartbeat-row in case the database user does not have INSERT privileges.
--interval
update和check heartbeat表的頻率,默認是1s。
type: float; default: 1.0 How often to update or check the heartbeat --table. Updates and checks begin on the first whole second then repeat every --interval seconds for --update and every --interval plus --skew seconds for --monitor. For example, if at 00:00.4 an --update instance is started at 0.5 second intervals, the first update happens at 00:01.0, the next at 00:01.5, etc. If at 00:10.7 a --monitor instance is started at 0.05 second intervals with the default 0.5 second --skew, then the first check happens at 00:11.5 (00:11.0 + 0.5) which will be --skew seconds after the last update which, because the instances are checking at synchronized intervals, happened at 00:11.0. The tool waits for and begins on the first whole second just to make the interval calculations simpler. Therefore,the tool could wait up to 1 second before updating or checking. The minimum (fastest) interval is 0.01, and the maximum precision is two decimal places, so 0.015 will be rounded to 0.02. If a legacy heartbeat table (see --create-table) is used, then the maximum precision is 1s because the ts column is type datetime.
--log
在腳本以守護進程執行時,將結果輸出到log指定的文件中。
type: string Print all output to this file when daemonized.
--master-server-id
指定master的server_id,在檢測從的延遲時,必須指定該參數,不然會報如下錯誤:
The --master-server-id option must be specified because the heartbeat table `test`.`heartbeat` uses the server_id column for --update or --check but the server's master could not be automatically determined.
type: string Calculate delay from this master server ID for --monitor or --check. If not given, pt-heartbeat attempts to connect to the server’s master and determine its server id.
--monitor
持續的檢測並輸出從的延遲情況
其中,檢測並輸出的頻率有--interval參數決定,默認為1s
注意:與--check的區別在於:
1> --monitor是持續輸出的,而--check是檢測一次即退出。
2> --monitor可與--file參數搭配,而--check與--file參數搭配無效。
Monitor slave delay continuously. Specifies that pt-heartbeat should check the slave’s delay every second and report to STDOUT (or if --file is given, to the file instead). The output is the current delay followed by moving averages over the timeframe given in --frames. For example, 5s [ 0.25s, 0.05s, 0.02s ]
--password
指定登錄的密碼,縮寫為-p
short form: -p; type: string Password to use when connecting. If password contains commas they must be escaped with a backslash: “exam,ple”
--pid
創建pid文件
type: string Create the given PID file. The tool won’t start if the PID file already exists and the PID it contains is different than the current PID. However, if the PID file exists and the PID it contains is no longer running, the tool will overwrite the PID file with the current PID. The PID file is removed automatically when the tool exits.
--port
指定登錄的端口,縮寫為-P
short form: -P; type: int Port number to use for connection.
--print-master-server-id
同時輸出主的server_id,在--monitor情況下,默認輸出為
1272.00s [ 21.20s, 4.24s, 1.41s ]
如果指定了該參數,則輸出為
1272.00s [ 21.20s, 4.24s, 1.41s ] 1
Print the auto-detected or given --master-server-id. If --check or --monitor is specified, specifying this option will print the auto-detected or given --master-server-id at the end of each line.
--recurse
在--check模式下,用於檢測級聯復制中從的延遲情況。其中,--recurse用於指定級聯的層級。
type: int Check slaves recursively to this depth in --check mode. Try to discover slave servers recursively, to the specified depth. After discovering servers, run the check on each one of them and print the hostname (if possible), followed by the slave delay. This currently works only with MySQL. See --recursion-method.
--recursion-method
在級聯復制中,找到slave的方法。有show processlist和show slave hosts兩種。
type: array; default: processlist,hosts Preferred recursion method used to find slaves. Possible methods are: METHOD USES =========== ================== processlist SHOW PROCESSLIST hosts SHOW SLAVE HOSTS none Do not find slaves The processlist method is preferred because SHOW SLAVE HOSTS is not reliable. However, the hosts method is required if the server uses a non-standard port (not 3306). Usually pt-heartbeat does the right thing and finds the slaves, but you may give a preferred method and it will be used first. If it doesn’t find any slaves, the other methods will be tried.
--replace
在--update模式下,默認是使用update操作進行記錄的更新,但有時候你不太確認heartbeat表中是否任何記錄時,此時可使用replace操作。
注意:如果是通過update進行記錄的更新,如果在腳本運行的過程中,truncate heartbeat表,腳本並不會異常退出,但是heartbeat表也有不會生成新的記錄。
但如果是通過replace方式進行記錄的更新,則即便是在上面這種場景下,heartbeat表仍舊會生成新的記錄。個人感覺通過replace操作進行記錄的更新更靠譜。
Use REPLACE instead of UPDATE for –update. When running in --update mode, use REPLACE instead of UPDATE to set the heartbeat table’s timestamp.The REPLACE statement is a MySQL extension to SQL. This option is useful when you don’t know whether the table contains any rows or not. It must be used in conjunction with –update.
--run-time
指定腳本運行的時間,無論是針對--update操作還是--monitor操作均實用。
type: time Time to run before exiting.
--sentinel
“哨兵”,如果指定的文件存在則提出,默認為/tmp/pt-heartbeat-sentinel
type: string; default: /tmp/pt-heartbeat-sentinel Exit if this file exists.
經測試,即便沒有帶上--sentinel參數,如果/tmp/pt-heartbeat-sentinel文件存在,則腳本一執行時就直接退出。
--sentinel作用在於自定義監控文件。
譬如在執行如下命令時, /root/123文件並不存在,則該腳本會繼續運行,在腳本運行的過程中,創建該文件,則腳本會馬上退出。
# pt-heartbeat -D test --update -h 192.168.244.10 -u monitor -p monitor123 --sentinel=/root/123
--slave-user
設置連接slave的用戶
type: string Sets the user to be used to connect to the slaves. This parameter allows you to have a different user with less privileges on the slaves but that user must exist on all slaves.
--slave-password
設置連接slave的用戶密碼
type: string Sets the password to be used to connect to the slaves. It can be used with –slave-user and the password for the user must be the same on all slaves.
--set-vars
設置腳本在與MySQL交互過程時的會話變量,但似乎並沒有什麼用
type: Array Set the MySQL variables in this comma-separated list of variable=value pairs. By default, the tool sets: wait_timeout=10000 Variables specified on the command line override these defaults. For example, specifying --set-vars wait_timeout=500 overrides the defaultvalue of 10000. The tool prints a warning and continues if a variable cannot be set.
--skew
指定check相對於update的延遲時間。默認為0.5秒
即--update更新一次後,--check會在0.5秒後檢查此次更新所對應的主從延遲情況。
可能有人會比較好奇,腳本是如何知道記錄是何時更新的,實際上,每次--update的時間都是秒的整點值,譬如,其中一次記錄的值為“2016-09-25T13:04:06.003130”。然後,0.5s後,腳本獲取slave上的系統時間,然後減去heartbeat中記錄值,來作為主從延遲的時間。這就要求,主從上的系統時間需要保持一致,不然得到的結果就沒有參考價值。
下面,可看看源碼實現,這個是整個腳本的核心邏輯。
my ($ts, $hostname, $server_id) = $sth->fetchrow_array(); my $now = time; PTDEBUG && _d("Heartbeat from server", $server_id, "\n", " now:", ts($now, $utc), "\n", " ts:", $ts, "\n", "skew:", $skew); my $delay = $now - unix_timestamp($ts, $utc) - $skew; PTDEBUG && _d('Delay', sprintf('%.6f', $delay), 'on', $hostname); # Because we adjust for skew, if the ts are less than skew seconds # apart (i.e. replication is very fast) then delay will be negative. # So it's effectively 0 seconds of lag. $delay = 0.00 if $delay < 0;
type: float; default: 0.5 How long to delay checks. The default is to delay checks one half second. Since the update happens as soon as possible after the beginning of the second on the master, this allows one half second of replication delay before reporting that the slave lags the master by one second. If your clocks are not completely accurate or there is some other reason you’d like to delay the slave more or less, you can tweak this value. Try setting the PTDEBUG environment variable to see the effect this has.
--socket
short form: -S; type: string Socket file to use for connection.
--table
指定心跳表的名字,默認為heartbeat
type: string; default: heartbeat The table to use for the heartbeat. Don’t specify database.table; use --database to specify the database. See --create-table.
--update
更新master中heartbeat表的記錄
Update a master’s heartbeat.
--user
指定連接的用戶
short form: -u; type: string User for login if not current user.
--utc
忽略系統時區,而使用UTC。如果要使用該選項,則--update,--monitor,--check中必須同時使用。
Ignore system time zones and use only UTC. By default pt-heartbeat does not check or adjust for different system or MySQL time zones which can cause the tool to compute the lag incorrectly. Specifying this option is a good idea because it ensures that the tool works correctly regardless of time zones. If used, this option must be used for all pt-heartbeat instances: --update, --monitor, --check, etc. You should probably set the option in a --config file. Mixing this option with pt-heartbeat instances not using this option will cause false-positive lag readings due to different time zones (unless all your systems are set to use UTC, in which case this option isn’t required).
--version
打印版本信息
--[no]version-check
檢查pt,連接的MySQL Server,Perl以及DBD::mysql的版本信息。
並且打印這些軟件特定版本的問題
Check for the latest version of Percona Toolkit, MySQL, and other programs. 84 Chapter 2. Tools Percona Toolkit Documentation, Release 2.2.19 This is a standard “check for updates automatically” feature, with two additional features. First, the tool checks the version of other programs on the local system in addition to its own version. For example, it checks the version of every MySQL server it connects to, Perl, and the Perl module DBD::mysql. Second, it checks for and warns about versions with known problems. For example, MySQL 5.5.25 had a critical bug and was re-released as 5.5.25a. Any updates or known problems are printed to STDOUT before the tool’s normal output. This feature should never interfere with the normal operation of the tool.