估計是糗百看多了,總是先要交待一下背景。
幾天前網站突然不能訪問了,頁面上除了框架沒有任何內容。從系統的運行日志看到的錯誤信息有:
Communications link failure
The last packet successfully received from the server was 7,875,055 milliseconds ago. The last packet sent successfully to the server was 7,875,055 milliseconds ago.
最後看到一句:
<span style="color:#ff0000;">Caused by: java.sql.SQLException: The table 'message' is full</span>
這個太不可思議了。在還沒有當前用戶量的情況不能出現數據庫寫滿的情況。於是到數據庫服務器Master1上查看,通過df -h命令查看,發現/var/已經滿了。這是才記起來:當時數據庫創建時,所有的數據文件都放在了另外一個目錄下,然後/var/lib/mysql/下面是softlink。現在這種情況,肯定當時建過表後,沒有移動到那個目錄下。接下來步驟就是:
1. service mysql stop停止MySQL服務
2. 將數據表文件移動到指定目錄,建立softlink
3. service mysql start啟動MySQL服務
4. 到MySQL-mmm上通過mmm_control set_offline db01,然後mmm_control set_online db01,將master01重新上線。
之後通過mmm_control show 查看狀態,已經是ONLINE了。
這樣就結束了,NO! NO! 按照糗百(我在為糗百做廣告,絕對沒有)的慣例這不是GC。
今天在聽一個報告的時候,突然想上去看看MySQL-mmm的運行狀態。mmm_control show,不願意看到的一幕出現了,db01的狀態是REPLICATION_FAIL,set_offline,set_online,重新啟動MySQL服務統統失效。
到db01上查看錯誤日志,看到了下面的信息:
111104 13:19:19 [ERROR] /usr/sbin/mysqld: Table 'table1' is marked as crashed and should be repaired
111104 13:19:19 [ERROR] Slave SQL: Error 'Table 'table1' is marked as crashed and should be repaired' on query. Default database: 'db1'. Query: '...'
111104 13:19:19 [Warning] Slave: Table './db1/table1' is marked as crashed and should be repaired Error_code: 145
111104 13:19:19 [Warning] Slave: Table 'table1' is marked as crashed and should be repaired Error_code: 1194
111104 13:19:19 [ERROR] Error running query, slave SQL thread aborted. Fix the problem, and restart the slave SQL thread with "SLAVE START". We stopped at log 'mysql-bin-master2.000022' position 110544518
登錄到數據庫,執行:
mysql> repair table table1;
mysql> start slave;
再查看錯誤日志,可以看到:
111104 13:19:19 [Note] Slave I/O thread: connected to master 'replication@db02:3306',replication started in log 'mysql-bin-master2.000022' at position 679172934
111104 13:24:18 [Note] Found 11845 of 11846 rows when repairing './db1/table1'
111104 13:27:03 [Note] Slave SQL thread initialized, starting replication in log 'mysql-bin-master2.000022' at position 110544518, relay log '/mysql/log_vol/replication/mysql-bin-master1.004525' position: 844646
到MySQL-mmm監控服務器上查看狀態,可以看到db01從REPLICATION_FAIL到REPLICATION_DELAY到ONLINE。等了一會兒,一直都是ONLINE狀態,看來是穩定了。不過writer還是在db02。那麼先把db02 set_offline,在把db02 set_online,可以看到writer切換到了db01。
有GC嗎?呵呵,解決問題就好了:-)
摘自 mydeman的學習日志