知道innodb_flush_log_at_trx_commit的意思,但是對它取值0,1,2一直有點模糊不清。特地找了MySQL 5.1的refrence,自己翻譯一下。雖然,也有官方的中文版翻譯,但是不好意思,有點不相信它。
英文原文如下:
innodb_flush_log_at_trx_commit
Command-Line Format Cinnodb_flush_log_at_trx_commit[=#]
Config-File Format innodb_flush_log_at_trx_commit
Option Sets Variable Yes, innodb_flush_log_at_trx_commit
Variable Name innodb_flush_log_at_trx_commit
Variable Scope Global
Dynamic Variable Yes
Permitted Values
Type numeric
Default 1
Valid Values 0, 1, 2
If the value of innodb_flush_log_at_trx_commit is 0, the log buffer is written out to the log file once per second and the flush to disk operation is performed on the log file, but nothing is done at a transaction commit. When the value is 1 (the default), the log buffer is written out to the log file at each transaction commit and the flush to disk operation is performed on the log file. When the value is 2, the log buffer is written out to the file at each commit, but the flush to disk Operation is not performed on it. However, the flushing on the log file takes place once per second also when the value is 2. Note that the once-per-second flushing is not 100% guaranteed to happen every second, due to process scheduling issues.
如果innodb_flush_log_at_trx_commit設置為0,log buffer將每秒一次地寫入log file中,並且log file的flush(刷到磁盤)操作同時進行;但是,這種模式下,在事務提交的時候,不會有任何動作。如果 innodb_flush_log_at_trx_commit設置為1(默認值),log buffer每次事務提交都會寫入log file,並且,flush刷到磁盤中去。如果innodb_flush_log_at_trx_commit設置為2,log buffer在每次事務提交的時候都會寫入log file,但是,flush(刷到磁盤)操作並不會同時進行。這種模式下,MySQL會每秒一次地去做flush(刷到磁盤)操作。注意:由於進程調度策略問題,這個“每秒一次的flush(刷到磁盤)操作”並不是保證100%的“每秒”。
The default value of 1 is the value required for ACID compliance. You can achIEve better performance by setting the value different from 1, but then you can lose at most one second worth of transactions in a crash. With a value of 0, any MySQLd process crash can erase the last second of transactions. With a value of 2, then only an Operating system crash or a power outage can erase the last second of transactions. However, InnoDB’s crash recovery is not affected and thus crash recovery does work regardless of the value.
默認值1是為了ACID (atomicity, consistency, isolation, durability)原子性,一致性,隔離性和持久化的考慮。如果你不把innodb_flush_log_at_trx_commit設置為1,你將獲得更好的性能,但是,你在系統崩潰的情況,可能會丟失最多一秒鐘的事務數據。當你把innodb_flush_log_at_trx_commit設置為0,mysqld進程的崩潰會導致上一秒鐘所有事務數據的丟失。如果你把innodb_flush_log_at_trx_commit設置為2,只有在操作系統崩潰或者系統掉電的情況下,上一秒鐘所有事務數據才可能丟失。(下面的這句話到底是針對 innodb_flush_log_at_trx_commit為2說的,還是針對前面這一整段說的,我就搞不清楚了,下次問問編寫這一段文檔的 MySQL的人去。感覺是針對整段的:就是說InnoDB的crash recovery會利用log file來恢復數據文件,跟innodb_flush_log_at_trx_commit的值沒有關系,管你這個值怎麼設置的,我從log file拿到多少數據,就恢復多少數據。)InnoDB的crash recovery崩潰恢復機制並不受這個值的影響,不管這個值設置為多少,crash recovery崩潰恢復機制都會工作。
For the greatest possible durability and consistency in a replication setup using InnoDB with transactions, use innodb_flush_log_at_trx_commit = 1 and sync_binlog = 1 in your master server my.cnf file.
為了在使用InnoDB事務的搭建復制環境中,達到最大的持久化和一致性,你需要在你的master主機的my.cnf中設置innodb_flush_log_at_trx_commit = 1並且設置sync_binlog = 1。
Caution
Many operating systems and some disk hardware fool the flush-to-disk Operation. They may tell MySQLd that the flush has taken place, even though it has not. Then the durability of transactions is not guaranteed even with the setting 1, and in the worst case a power outage can even corrupt the InnoDB database. Using a battery-backed disk cache in the SCSI disk controller or in the disk itself speeds up file flushes, and makes the Operation safer. You can also try using the Unix command hdparm to disable the caching of disk writes in hardware caches, or use some other command specific to the hardware vendor.
注意:
很多操作系統和一些磁盤硬件系統並不會真正的做flush-to-disk刷新到磁盤的這個操作。他們即使並沒有真正刷到磁盤也會告訴MySQLd說 flush刷新到磁盤的操作已經完成了。這樣的話,即使innodb_flush_log_at_trx_commit設置為1,也不能保證事務的持久化,最糟的情況下,一個主機掉電,就有可能導致InnoDB數據庫崩潰。你可以考慮在SCSI磁盤控制器裡面或者磁盤本身中,使用帶蓄電池後備電源的磁盤緩存disk cache,來提高文件刷新操作的速度,使得這個操作更加安全。你同樣可以嘗試使用Unix的hdparm命令來阻止硬件緩存hardware cache的寫磁盤緩存操作,或者使用其他硬件提供商hardware vendor提供的命令來避免寫磁盤緩存。
這裡既然提到了sync_binlog就順便把它也翻譯一下。
sync_binlog
Command-Line Format Csync-binlog=#
Config-File Format sync_binlog
Option Sets Variable Yes, sync_binlog
Variable Name sync_binlog
Variable Scope Global
Dynamic Variable Yes
Permitted Values
Platform Bit Size 32
Type numeric
Default 0
Range 0-4294967295
Permitted Values
Platform Bit Size 64
Type numeric
Default 0
Range 0-18446744073709547520
If the value of this variable is greater than 0, the MySQL Server synchronizes its binary log to disk (using fdatasync()) after every sync_binlog writes to the binary log. There is one write to the binary log per statement if autocommit is enabled, and one write per transaction otherwise. The default value of sync_binlog is 0, which does no synchronizing to disk ― in this case, the server relIEs on the Operating system to flush the binary log’s contents from to time as for any other file. A value of 1 is the safest choice because in the event of a crash you lose at most one statement or transaction from the binary log. However, it is also the slowest choice (unless the disk has a battery-backed cache, which makes synchronization very fast).
當sync_binlog變量設置為大於0的值時,MySQL在每次“sync_binlog”這麼多次寫二進制日志binary log時,會使用fdatasync()函數將它的寫二進制日志binary log同步到磁盤中去。如果啟用了autocommit,那麼每一個語句statement就會有一次寫操作;否則每個事務對應一個寫操作。 sync_binlog的默認值是0,這種模式下,MySQL不會同步到磁盤中去。這樣的話,MySQL依賴操作系統來刷新二進制日志binary log,就像操作系統刷其他文件的機制一樣。當sync_binlog變量設置為1是最安全的,因為在crash崩潰的情況下,你的二進制日志 binary log只有可能丟失最多一個語句或者一個事務。但是,這也是最慢的一種方式(除非磁盤有使用帶蓄電池後備電源的緩存cache,使得同步到磁盤的操作非常快)。
找了一下man fdatasync:
fdatasync() flushes all data buffers of a file to disk (before the system call returns). It resembles fsync() but is not required to update the metadata such as Access time.
fdatasync() (在系統調用system call返回前)將文件中所有的數據緩存區data buffers都flush刷到磁盤中去。它類似於fsync()函數,但是它不會更新元數據metadata:比如最後訪問時間等。