常寫程序的人都很清楚,小對象放棧上,大對象放堆上。問題是有時候太多小對象迫不得已也得放堆上,例如鏈表節點、樹節點等等,很多時候這些數據結構中每個節點存儲的都是一個小對象。
以一個雙鏈表為例(std::list),假設用於存儲32位整數大約1M(1048575)個,在32位系統上,這個鏈表用於存儲這些整數將耗費24MB內存,而並不上理論上的12MB(每個節點有兩個指針和一個32位整數,應該耗費12字節,因此1M個這樣的節點大約耗費12MB內存)。真正的原因何在?
其實無論是new還是malloc每分配一塊內存都要多分幾個字節用於存儲塊大小信息、狀態等,當創建小節點很多(分配的小塊內存很多)時,這種開銷就會相當可觀。對於分配的小塊內存一般會有8字節的開銷,然後再圓整成8的倍數,分配大塊內存則未必是這樣,通常不是圓整成8的倍數,而是圓整成一個更大值的倍數,甚至圓整成2的指數個字節。
回到上面的雙鏈表,對於每塊12字節的內存,new至少分配20(12+8)字節,然後圓整成8的倍數,這樣就成了24字節了。在32位系統上,其實沒有必要用8字節存儲那些額外信息(塊大小、狀態...),4字節足矣,因為4的二進制為100,後面有兩個0足可以存儲狀態了(一個0就夠),但是為了在64位系統上不出問題,通常用8字節。
在內存相對比較緊張的嵌入式應用中,過多的動態小塊內存的總開銷有時可能到了無法忍受的程度。此時,我們不得不自己設計一個小塊內存管理器,下面將會講述怎麼用位圖算法設計它。
每塊內存用一個二進制位表示它的使用狀態,如果該塊內存被占用,則把對應位圖中的對應位置0,如果空閒則置1,原理十分簡單。
實現中,需要解決以下幾個問題。
1) 如何快速找到某塊內存的位圖中的對應位?
這個問題涉及到位運算,見本主頁的某個帖子"如何找到一個整數二進制的第一個1"。但是僅僅靠那麼簡單的位運算還解決不了問題。需要考慮一個位圖多大,如果用64位整數表示一個位圖的話,那麼每塊內存需要用1字節的開銷記錄在對應位圖中的索引狀態位,因為1字節有8位,最大能表示到256(0~255),而一個64位整數僅僅64位,可見能表示的范圍綽綽有余。
2) 一塊內存回收怎麼獲得它的大小?
對於鏈表和樹節點,每個這樣的容器中各個節點的大小相同,因此沒有必要存儲這個大小,只有這樣才可能節約每塊內存的overhead開銷。
3) 需要用多大的緩存呢?
向系統申請的每個chunk字節數是:8字節的位圖 + 64 * (申請塊大小+1字節的索引)。當一個chunk全部被使用時,可以不保留也沒有必要保留它的信息。如果該chunk僅僅使用一部分,則緩存這個chunk,且僅僅緩存一個chunk,別的chunk即使僅僅使用一部分,也置之不理,只有某個非當前chunk全空時才會交給系統。這樣既不至於緩存太多chunk導致其它進程動態申請內存成為問題,也能不至於分配和回收內存速度很慢。
但是這存在一點問題,尤其那些純粹搞理論的人可能會提出,在極端情況下,如果每個chunk僅僅剩下一塊內存使用,由於這些chunk均不能交給系統,理論上內存利用率極低。這種情況不予考慮,實際上任何一個內存管理器均存在這種極端情況下的隱患,包括new/malloc....,操作系統的堆內存管理器等等。
1 /// bitmap_fixed_size_allocator.hpp
2 #ifndef SINGLE_OBJECT_ALLOCATOR_HPP_
3 #define SINGLE_OBJECT_ALLOCATOR_HPP_
4
5 #include <cstdlib> /// malloc, free
6 #include "mutex.hpp"
7 #include "guard.hpp"
8
10
11 /// It is specialy designed for small identical byte object memory management.
12
13 /// It is not the fastest and memory efficient small allocator, because for every
14 /// memory block there is a metadata field. To manage this data field, a lot of
15 /// bitwise operations are necessary although this will degrade performance. The
16 /// metadata field should not be removed to improve performance (like a freelist?),
17 /// because that design will cause memory blow up at some intensive situations.
18 /// Even though, it is still a bit faster and memory efficient than a general purpose
19 /// malloc/free function or new/delete operator built-in most C/C++ compilers.
20
21 /// SERIOUS WARNINGS:
22 /// Never try to use (directly or indirectly) more than one allocator object in
23 /// the same scope if you do not want to crash your program!!!
24 /// It is designed for internal use only. Do not use it in multi-processor
25 /// environment, because it can not prevent false sharing and blow up even you can
26 /// endure its snail speed!!!
27
28
29 class bitmap_fixed_size_allocator : public default_mutex
30 {
31 public:
32 #ifdef CXX0x
33 typedef uint64_t size_type;
34 typedef uint8_t index_type;
35 #else
36 typedef unsigned long long size_type;
37 typedef unsigned char index_type;
38 #endif
39
40 typedef bitmap_fixed_size_allocator alloc;
41 public:
42 bitmap_fixed_size_allocator(size_type, size_type = 256);
43 ~bitmap_fixed_size_allocator();
44
45 void* allocate(); /// allocate a block
46 void deallocate(void*); /// deallocate a block
47 void clear(); /// return all chunks to OS
48
49 bool operator == (const bitmap_fixed_size_allocator&) const;
50 bool operator != (const bitmap_fixed_size_allocator&) const;
51
52 protected:
53 union object { index_type idx; char pad[1U]; }; /// metadata of a block
54 union chunk { size_type map; char pad[1U]; }; /// metadata of a chunk
55
56 static const size_type FREE = static_cast<size_type>(-1);
57
58 static chunk* chk_hd; /// only one chunk reserved
59 size_type blk_sz;
60 size_type sz_lmt; /// threshold of block (maximum block)
61
62 void mark_bit(size_type&, index_type) const; /// reserved one block by set 0
63 void clear_bit(size_type&, index_type) const; /// released one block by set 1
64 index_type find_fbit(size_type) const; /// get free bit in map
65 object* get_fblk(chunk*); /// get a free block in chunk
66 object* chunk_alloc(); /// allocate a chunk from OS
67
68 private:
69 /// prevent some silly operations
70 bitmap_fixed_size_allocator(const bitmap_fixed_size_allocator&);
71 bitmap_fixed_size_allocator& operator =(const bitmap_fixed_size_allocator);
72 };
73
74
75 typename bitmap_fixed_size_allocator::chunk* bitmap_fixed_size_allocator::chk_hd = 0;
76
77 /// Only one allocator at same scope, so it is always equal if a comparison necessary.
78
79 bool bitmap_fixed_size_allocator::operator ==(const bitmap_fixed_size_allocator&) const
80 {
81 return true;
82 }
83
84
85 bool bitmap_fixed_size_allocator::operator !=(const bitmap_fixed_size_allocator&) const
86 {
87 return false;
88 }
89
90 bitmap_fixed_size_allocator::bitmap_fixed_size_allocator(size_type block_size,
91 size_type max_block_size) :
92 default_mutex(),
93 blk_sz(block_size + sizeof(object)),
94 sz_lmt(max_block_size + sizeof(object))
95 {}
96
97
98 bitmap_fixed_size_allocator::~bitmap_fixed_size_allocator()
99 {
100 std::free(chk_hd);
101 }
102
103 inline void bitmap_fixed_size_allocator::mark_bit(size_type& map, index_type idx) const
104 {
105 map &= ~(1U << idx);
106 } /// set 0
107
108 inline void bitmap_fixed_size_allocator::clear_bit(size_type& map, index_type idx) const
109 {
110 map |= (1U << idx);
111 } /// set 1
112
113 inline void* bitmap_fixed_size_allocator::allocate()
114 {
115 void* ptr = 0;
116 if(sz_lmt < blk_sz)
117 ptr = std::malloc(blk_sz - sizeof(object));
118 else
119 {
120 chunk* cptr = chk_hd;
121 if(0 == cptr || 0 == cptr->map)
122 ptr = chunk_alloc();
123 else
124 ptr = get_fblk(cptr);
125 }
126 return ptr;
127 }
128
129 inline void bitmap_fixed_size_allocator::deallocate(void* p)
130 {
131 if(0 == p)
132 return;
133 if(sz_lmt < blk_sz)
134 std::free(p);
135 else
136 {
137 object* bptr = (object*)p - 1U;
138 index_type bidx = bptr->idx;
139 chunk* cptr = (chunk*)((char*)bptr - bidx * blk_sz) - 1U;
140 clear_bit(cptr->map, bidx);
141 if(FREE == cptr->map && cptr != chk_hd)
142 std::free(cptr);
143 }
144 }
145
146 inline typename bitmap_fixed_size_allocator::index_type
147 bitmap_fixed_size_allocator::find_fbit(size_type map) const
148 {
149 index_type idx;
150 #if (defined(__GNUC__) && (defined(__i386__) || defined(__x86_64__)))
151 int x1, x2;
152 asm ("bsf %0,%0\n" "jnz 1f\n" "bsf %1,%0\n" "jz 1f\n" "addl $32,%0\n"
153 "1:": "=&q" (x1), "=&q" (x2):"1" ((int) (map >> 32)),
154 "0" ((int) map));
155 idx = x1;
156 #else
157 static const index_type lsb_64_table[64] =
158 {
159 63, 30, 3, 32, 59, 14, 11, 33,
160 60, 24, 50, 9, 55, 19, 21, 34,
161 61, 29, 2, 53, 51, 23, 41, 18,
162 56, 28, 1, 43, 46, 27, 0, 35,
163 62, 31, 58, 4, 5, 49, 54, 6,
164 15, 52, 12, 40, 7, 42, 45, 16,
165 25, 57, 48, 13, 10, 39, 8, 44,
166 20, 47, 38, 22, 17, 37, 36, 26
167 };
168 unsigned long folded;
169 map ^= map - 1;
170 folded = (int) map ^ (map >> 32);
171 idx = lsb_64_table[folded * 0x78291ACF >> 26];
172 #endif
173 return idx;
174 }
175
176 inline typename bitmap_fixed_size_allocator::object*
177 bitmap_fixed_size_allocator::get_fblk(chunk* cptr)
178 {
179 index_type fbit = find_fbit(cptr->map);
180 mark_bit(cptr->map, fbit);
181 object* bptr = (object*)((char*)(cptr + 1U) + fbit * blk_sz);
182 bptr->idx = fbit;
183 return bptr + 1U;
184 }
185
186 inline typename bitmap_fixed_size_allocator::object*
187 bitmap_fixed_size_allocator::chunk_alloc()
188 {
189 chk_hd = (chunk*)std::malloc((sizeof(size_type) << 3U) * blk_sz + sizeof(chunk));
190 if(0 == chk_hd)
191 return 0;
192 chk_hd->map = FREE;
193 return get_fblk(chk_hd);
194 }
195
196 inline void bitmap_fixed_size_allocator::clear()
197 {
198 if(0 != chk_hd && FREE == chk_hd->map)
199 {
200 std::free(chk_hd);
201 chk_hd = 0;
202 }
203 }
204
205
206 /// only wrappers below.
207
208 template<typename T>
209 class single_object_allocator;
210
211 /// single_object_allocator<void> specialization.
212
213 template<>
214 class single_object_allocator<void>
215 {
216 public:
217 typedef unsigned long size_type;
218 typedef long difference_type;
219 typedef void* pointer;
220 typedef const void* const_pointer;
221 typedef void value_type;
222
223 template<typename T>
224 struct rebind { typedef single_object_allocator<T> other; };
225 };
226
227 template <typename T>
228 class single_object_allocator
229 {
230 static bitmap_fixed_size_allocator bm;
231 public:
232 typedef T value_type;
233 typedef T* pointer;
234 typedef const T* const_pointer;
235 typedef T& reference;
236 typedef const T& const_reference;
237 typedef unsigned long size_type;
238 typedef long difference_type;
239
240 template<typename U>
241 struct rebind { typedef single_object_allocator<U> other; };
242
243 single_object_allocator() {}
244
245 single_object_allocator(const single_object_allocator&) {}
246
247 template<typename U>
248 single_object_allocator(const single_object_allocator<U>&) {}
249
250 ~single_object_allocator() throw() {}
251
252 pointer allocate(size_type n) // n僅僅用於校驗而已,幾乎沒有任何價值,稍微修改下代碼,可以去掉它
253 {
254 if(n == sizeof(T))
255 {
256 guard<default_mutex> g(bm);
257 return static_cast<pointer>(bm.allocate());
258 }
259 else
260 throw "memory corruption";
261 }
262
263 void deallocate(pointer p, size_type n) // n 僅用於校驗而已,幾乎沒有什麼價值,可以稍微修改下,去掉該參數
264 {
265 guard<default_mutex> g(bm);
266 if(n == sizeof(T))
267 bm.deallocate(p);
268 else
269 {
270 bm.clear();
271 bm.~bitmap_fixed_size_allocator();
272 throw "memory corruption";
273 }
274 }
275
276 pointer address(reference x) { return (pointer)&x; }
277
278 size_type max_size() throw()
279 { return (static_cast<size_type>(-1) / sizeof(T)) >> 6; }
280
281 void destroy(pointer p) throw() { p->~T(); }
282
283 void construct(pointer p, const T& val)
284 { ::new((void*)p) T(val); }
285
286 };
287
288 template <typename T>
289 bitmap_fixed_size_allocator single_object_allocator<T>::
290 bm(sizeof(T), (static_cast<size_type>(-1) / sizeof(T)) >> 6);
291
292
293 template<typename T1, typename T2>
294 inline bool operator ==(const single_object_allocator<T1>&, const single_object_allocator<T2>&)
295 { return true; }
296
297 template<typename T>
298 inline bool operator ==(const single_object_allocator<T>&, const single_object_allocator<T>&)
299 { return true; }
300
301 template<typename T1, typename T2>
302 inline bool operator !=(const single_object_allocator<T1>&, const single_object_allocator<T2>&)
303 { return false; }
304
305 template<typename T>
306 inline bool operator !=(const single_object_allocator<T>&, const single_object_allocator<T>&)
307 { return false; }
308
309
310 #endif /// SINGLE_OBJECT_ALLOCATOR_HPP_
1 // Code copied from boost with minor changes by Chipset
2
3 // Copyright (C) 2000 Stephen Cleary
4 //
5 // Distributed under the Boost Software License, Version 1.0. (See
6 // accompanying file LICENSE_1_0.txt or copy at
7 // http://www.boost.org/LICENSE_1_0.txt)
8 //
9 // See http://www.boost.org for updates, documentation, and revision history.
10
11 #ifndef ALLOCATOR_MUTEX_HPP_
12 #define ALLOCATOR_MUTEX_HPP_
13
14 // Light-Weight wrapper classes for OS thread
15
16
17 #if defined(WIN32) || defined(_WIN32) || defined(__WIN32__) || defined(__CYGWIN__) || defined(__MINGW32__)
18 #include <windows.h>
19 #endif
20
21 #if defined(_POSIX_THREADS)
22 #include <pthread.h>
23 #endif
24
25
26 // Some compatible macros on Windows system
27 #if defined(WIN32) || defined(_WIN32) || defined(__WIN32__) || defined(__CYGWIN__) || defined(__MINGW32__)
28 // How about on Win64? Sorry, I do not know which API to bind yet.
29
30 class win32_mutex
31 {
32 private:
33 ::CRITICAL_SECTION mtx;
34 win32_mutex(const win32_mutex&);
35 void operator =(const win32_mutex&);
36 public:
37 win32_mutex() { ::InitializeCriticalSection(&mtx); }
38 ~win32_mutex() { ::DeleteCriticalSection(&mtx); }
39 void lock() { ::EnterCriticalSection(&mtx); }
40 void unlock() { ::LeaveCriticalSection(&mtx); }
41 };
42
43 typedef win32_mutex default_mutex;
44
45 #elif defined(_POSIX_THREADS)
46
47 class pthread_mutex
48 {
49 private:
50 ::pthread_mutex_t mtx;
51 pthread_mutex(const pthread_mutex&);
52 void operator =(const pthread_mutex&);
53 public:
54 pthread_mutex() { ::pthread_mutex_init(&mtx, 0); }
55 ~pthread_mutex() { ::pthread_mutex_destroy(&mtx); }
56 void lock() { ::pthread_mutex_lock(&mtx); }
57 void unlock() { ::pthread_mutex_unlock(&mtx); }
58 };
59
60 typedef pthread_mutex default_mutex;
61
62 #else // defined(_POSIX_THREADS)
63
64
65 class null_mutex
66 {
67 private:
68 null_mutex(const null_mutex &);
69 void operator =(const null_mutex&);
70 public:
71 null_mutex() {}
72 static void lock() {}
73 static void unlock() {}
74 };
75
76 typedef null_mutex default_mutex;
77
78 #endif
79
80
81 #endif
1 // Code copied from boost with minor changes by Chipset
2
3 // Copyright (C) 2000 Stephen Cleary
4 //
5 // Distributed under the Boost Software License, Version 1.0. (See
6 // accompanying file LICENSE_1_0.txt or copy at
7 // http://www.boost.org/LICENSE_1_0.txt)
8 //
9 // See http://www.boost.org for updates, documentation, and revision history.
10
11 #ifndef ALLOCATOR_GUARD_HPP_
12 #define ALLOCATOR_GUARD_HPP_
13
14
15 template <typename Mutex>
16 class guard
17 {
18 private:
19 Mutex& mtx;
20 guard(const guard&);
21 void operator =(const guard&);
22 public:
23 explicit guard(Mutex& nmtx) : mtx(nmtx) { mtx.lock(); }
24 ~guard() { mtx.unlock(); }
25 };
26
27 #endif
理論上一般情況下上述內存管理器比較節約內存,因為每塊內存的開銷僅僅1字節,64個這樣的小塊組成一個chunk,僅再需要8字節開銷,遠遠比一般的new/malloc節約內存。當然,有人可能爭辯說這1字節的開銷也可以去掉,用自由列表就是了,而問題恰恰出在這裡。用自有列表可以避免搜索位圖,分配和回收內存均在一個內存池裡進行,速度更快,但是對於長生命周期的對象來說,跟洩漏內存沒有什麼分別。還是用上面的那個雙鏈表為例,用自由列表理論上1M個節點僅需要12MB堆內存,但是如果該鏈表哪怕只有1個節點,從操作系統看上去也會用12MB堆內存,而此時我們的單對象位圖內存管理器需要840(8 + 64 * (12+1) = 840)字節堆內存,從操作系統看上去是1頁(多數32位系統是4096字節)。
需要指出的是single_object_allocator跟STL的allocator分配和回收內存的行為完全不同,不能用它來當作std::list的第二個參數,否則需要修改allocate/deallocate的實現代碼。
single_object_allocator只能管理等大的內存塊,此時可能比一般比系統的malloc/free, new/delete速度稍快一些,盡管它的設計初衷是為了盡可能的節約內存。