今天給一個客戶巡檢的情況下發從庫沒有業務的情況mysqld的cpu的一個core占用100%.查主庫慢查詢也沒有關於寫的SQL.
可以說是典的單進程復制把一個cpu占滿造成的.知道原因了,就好分析了.
分析一下binlog中寫的什麼,看看有什麼地方可以優化或是加速的.利用工具:pasrebinlog
利用show slave status\G; 查當前同步的到節點,然後對日值進行解析.
git clone https://github.com/wubx/mysql-binlog-statistic.git cd mysql-binlog-statistic/bin/ parsebinlog /u1/mysql/logs/mysql-bin.000806 ... ==================================== Table xx_db.xxtable: Type DELETE opt: 101246 Type INSERT opt: 103265 ================================ ...
以最大的數排序看, 定位到: xx_db.xxtable,對於一個日值中能刪除10幾萬,寫入10幾萬.是不是這個表寫入比較慢了呢.
在從庫上查看innodb的相關情況:
MySQL> show engine innodb status\G; ... ---TRANSACTION 1C0C2DFDF, ACTIVE 3 sec fetching rows mysql tables in use 1, locked 1 3361 lock struct(s), heap size 407992, 477888 row lock(s), undo log entries 42 MySQL thread id 43, OS thread handle 0x7fc1800c4700, query id 1908504 Reading event from the relay log TABLE LOCK table xx_db.xxtable trx id 1C0C2DFDF lock mode IX RECORD LOCKS space id 1002 page no 1975 n bits 1120 index `AK_movieid` of table xx_db.xxtable trx id 1C0C2DFDF lock_mode X locks rec but not gap RECORD LOCKS space id 1002 page no 6965 n bits 264 index `GEN_CLUST_INDEX` of table xx_db.xxtable trx id 1C0C2DFDF lock_mode X locks rec but not gap RECORD LOCKS space id 1002 page no 6967 n bits 256 index `GEN_CLUST_INDEX` of table xx_db.xxtable trx id 1C0C2DFDF lock_mode X locks rec but not gap RECORD LOCKS space id 1002 page no 6973 n bits 264 index `GEN_CLUST_INDEX` of table xx_db.xxtable trx id 1C0C2DFDF lock_mode X locks rec but not gap RECORD LOCKS space id 1002 page no 6982 n bits 256 index `GEN_CLUST_INDEX` of table xx_db.xxtable trx id 1C0C2DFDF lock_mode X locks rec but not gap RECORD LOCKS space id 1002 page no 6983 n bits 256 index `GEN_CLUST_INDEX` of table xx_db.xxtable trx id 1C0C2DFDF lock_mode X locks rec but not gap RECORD LOCKS space id 1002 page no 6987 n bits 256 index `GEN_CLUST_INDEX` of table xx_db.xxtable trx id 1C0C2DFDF lock_mode X locks rec but not gap RECORD LOCKS space id 1002 page no 6999 n bits 256 index `GEN_CLUST_INDEX` of table xx_db.xxtable trx id 1C0C2DFDF lock_mode X locks rec but not gap RECORD LOCKS space id 1002 page no 7000 n bits 256 index `GEN_CLUST_INDEX` of table xx_db.xxtable trx id 1C0C2DFDF lock_mode X locks rec but not gap TOO MANY LOCKS PRINTED FOR THIS TRX: SUPPRESSING FURTHER PRINTS ---------------------------- END OF INNODB MONITOR OUTPUT ...
從Innodb 的monitor output 中也可看到 xx_db.xxtable 這表已經是表級表了,造成並發比較低,而且有大量的: GEN_CLUST_INDEX 而且屬於一個事務. GEN_CLUST_INDEX表示沒有主建,內部產生一個主建,對於內部產生的主建很很容易造成page拆分的操作.
問題到這裡基本上可以得到解決問題的方法了:
給xx_db.xxtable 添加一個主建即可.這裡後是給xx_db.xxtable 添加了一個無業務意義的id int 自增主建.這樣立馬可以看到mysqld占用的cpu單核降到了3%左右, 同時後續同步一切正常,觀查一天沒出現同步延遲的問題.