程式師世界 >> 數據庫知識 >> MYSQL數據庫 >> MySQL綜合教程 >> 將MySQL-mmm Master從REPLICATION_FAIL狀態恢復

將MySQL-mmm Master從REPLICATION_FAIL狀態恢復

編輯：MySQL綜合教程

估計是糗百看多了，總是先要交待一下背景。

幾天前網站突然不能訪問了，頁面上除了框架沒有任何內容。從系統的運行日志看到的錯誤信息有：

Communications link failure

The last packet successfully received from the server was 7,875,055 milliseconds ago. The last packet sent successfully to the server was 7,875,055 milliseconds ago.

最後看到一句：

<span style="color:#ff0000;">Caused by: java.sql.SQLException: The table 'message' is full</span>

這個太不可思議了。在還沒有當前用戶量的情況不能出現數據庫寫滿的情況。於是到數據庫服務器Master1上查看，通過df -h命令查看，發現/var/已經滿了。這是才記起來：當時數據庫創建時，所有的數據文件都放在了另外一個目錄下，然後/var/lib/mysql/下面是softlink。現在這種情況，肯定當時建過表後，沒有移動到那個目錄下。接下來步驟就是：

1. service mysql stop停止MySQL服務

2. 將數據表文件移動到指定目錄，建立softlink

3. service mysql start啟動MySQL服務

4. 到MySQL-mmm上通過mmm_control set_offline db01，然後mmm_control set_online db01，將master01重新上線。

之後通過mmm_control show 查看狀態，已經是ONLINE了。

這樣就結束了，NO! NO! 按照糗百（我在為糗百做廣告，絕對沒有）的慣例這不是GC。

今天在聽一個報告的時候，突然想上去看看MySQL-mmm的運行狀態。mmm_control show，不願意看到的一幕出現了，db01的狀態是REPLICATION_FAIL，set_offline，set_online，重新啟動MySQL服務統統失效。

到db01上查看錯誤日志，看到了下面的信息：

111104 13:19:19 [ERROR] /usr/sbin/mysqld: Table 'table1' is marked as crashed and should be repaired

111104 13:19:19 [ERROR] Slave SQL: Error 'Table 'table1' is marked as crashed and should be repaired' on query. Default database: 'db1'. Query: '...'

111104 13:19:19 [Warning] Slave: Table './db1/table1' is marked as crashed and should be repaired Error_code: 145

111104 13:19:19 [Warning] Slave: Table 'table1' is marked as crashed and should be repaired Error_code: 1194

111104 13:19:19 [ERROR] Error running query, slave SQL thread aborted. Fix the problem, and restart the slave SQL thread with "SLAVE START". We stopped at log 'mysql-bin-master2.000022' position 110544518

登錄到數據庫，執行:

mysql> repair table table1;

mysql> start slave;

再查看錯誤日志，可以看到：

111104 13:19:19 [Note] Slave I/O thread: connected to master 'replication@db02:3306',replication started in log 'mysql-bin-master2.000022' at position 679172934

111104 13:24:18 [Note] Found 11845 of 11846 rows when repairing './db1/table1'

111104 13:27:03 [Note] Slave SQL thread initialized, starting replication in log 'mysql-bin-master2.000022' at position 110544518, relay log '/mysql/log_vol/replication/mysql-bin-master1.004525' position: 844646

到MySQL-mmm監控服務器上查看狀態，可以看到db01從REPLICATION_FAIL到REPLICATION_DELAY到ONLINE。等了一會兒，一直都是ONLINE狀態，看來是穩定了。不過writer還是在db02。那麼先把db02 set_offline，在把db02 set_online，可以看到writer切換到了db01。

有GC嗎？呵呵，解決問題就好了:-)

摘自 mydeman的學習日志