atomic, spinlock and mutex性能比較
我非常好奇於不同同步原理的性能,於是對atomic, spinlock和mutex做了如下實驗來比較:
1. 無同步的情況
#include <future>
#include <iostream>
volatile int value = 0;
int loop (bool inc, int limit) {
std::cout << "Started " << inc << " " << limit << std::endl;
for (int i = 0; i < limit; ++i) {
if (inc) {
++value;
} else {
--value;
}
}
return 0;
}
int main () {
auto f = std::async (std::launch::async, std::bind(loop, true, 20000000));//開啟一個線程來執行loop函數,c++11的高級特性
loop (false, 10000000);
f.wait ();
std::cout << value << std::endl;
}通過clang編譯器:
clang++ -std=c++11 -stdlib=libc++ -O3 -o test test.cpp && time ./test運行
SSttaarrtteedd 10 2100000000000000
11177087
real 0m0.070s
user 0m0.089s
sys 0m0.002s從運行結果很顯然的我們可以看出增減不是原子性操作的,變量value最後所包含的值是不確定的(垃圾)。
2. 匯編LOCK
#include <future>
#include <iostream>
volatile int value = 0;
int loop (bool inc, int limit) {
std::cout << "Started " << inc << " " << limit << std::endl;
for (int i = 0; i < limit; ++i) {
if (inc) {
asm("LOCK");
++value;
} else {
asm("LOCK");
--value;
}
}
return 0;
}
int main () {
auto f = std::async (std::launch::async, std::bind(loop, true, 20000000)); //開啟一個線程來執行loop函數,c++11的高級特性
loop (false, 10000000);
f.wait ();
std::cout << value << std::endl;
} 運行:
SSttaarrtteedd 10 2000000100000000
10000000
real 0m0.481s
user 0m0.779s
sys 0m0.005s在最後變量value得到了正確的值,但是這些代碼是不可移植的(平台不兼容的),只能在X86體系結構的硬件上運行,而且要想程序能正確運行編譯的時候必須使用-O3編譯選項。另外,由於編譯器會在LOCK指令和增加或者減少指令之間注入其他指令,因此程序很容易出現“illegal instruction”異常從而導致程序被崩潰。
3. 原子操作atomic
#include <future>
#include <iostream>
#include "boost/interprocess/detail/atomic.hpp"
using namespace boost::interprocess::ipcdetail;
volatile boost::uint32_t value = 0;
int loop (bool inc, int limit) {
std::cout << "Started " << inc << " " << limit << std::endl;
for (int i = 0; i < limit; ++i) {
if (inc) {
atomic_inc32 (&value);
} else {
atomic_dec32 (&value);
}
}
return 0;
}
int main () {
auto f = std::async (std::launch::async, std::bind (loop, true, 20000000));
loop (false, 10000000);
f.wait ();
std::cout << atomic_read32 (&value) << std::endl;
}運行:
SSttaarrtteedd 10 2100000000000000
10000000
real 0m0.457s
user 0m0.734s
sys 0m0.004s最後結果是正確的,從所用時間來看跟匯編LOCK的差不多。當然原子操作的底層也是使用了LOCK匯編來實現的,只不過是使用了可移植的方法而已。
4. 自旋鎖spinlock
#include <future>
#include <iostream>
#include "boost/smart_ptr/detail/spinlock.hpp"
boost::detail::spinlock lock;
volatile int value = 0;
int loop (bool inc, int limit) {
std::cout << "Started " << inc << " " << limit << std::endl;
for (int i = 0; i < limit; ++i) {
std::lock_guard<boost::detail::spinlock> guard(lock);
if (inc) {
++value;
} else {
--value;
}
}
return 0;
}
int main () {
auto f = std::async (std::launch::async, std::bind (loop, true, 20000000));
loop (false, 10000000);
f.wait ();
std::cout << value << std::endl;
}運行:
SSttaarrtteedd 10 2100000000000000
10000000
real 0m0.541s
user 0m0.675s
sys 0m0.089s最後結果是正確的,從用時來看比上述的慢點,但是並沒有慢太多
5. 互斥鎖mutex
#include <future>
#include <iostream>
std::mutex mutex;
volatile int value = 0;
int loop (bool inc, int limit) {
std::cout << "Started " << inc << " " << limit << std::endl;
for (int i = 0; i < limit; ++i) {
std::lock_guard<std::mutex> guard (mutex);
if (inc) {
++value;
} else {
--value;
}
}
return 0;
}
int main () {
auto f = std::async (std::launch::async, std::bind(loop, true, 20000000));
loop (false, 10000000);
f.wait ();
std::cout << value << std::endl;
}運行:
SSttaarrtteedd 10 2010000000000000
10000000
real 0m25.229s
user 0m7.011s
sys 0m22.667s互斥鎖要比前面幾種的慢很多
Benchmark
Method Time (sec.)
No synchronization 0.070
LOCK 0.481
Atomic 0.457
Spinlock 0.541
Mutex 22.667
當然,測試結果會依賴於不同的平台和編譯器(我是在Mac Air和clang上做的測試)。But for me it was quite interesting to see that spinlock, in spite of its more sophisticated implementation comparing to atomics, works not much slower.
Sadly, my clang 3.1 still doesn’t support atomic, and I had to use boost.