程式師世界 >> 編程語言 >> C語言 >> C++ >> C++入門知識 >> hashmap 設計

hashmap 設計

編輯：C++入門知識

先來看看泛型的數據隊列。
自然界中的數據關系多分為兩種，拿人類來看，一類是靠人與人之間的關系來互相關聯，我稱為關系型，另一種是靠屁股相互關聯，我稱為位置型。
對於關系型，一個富豪家族，女屌絲想成為其中成員很簡單，與其中一個老頭建立夫妻關系就加入了這個家族，被虐待了後想脫離它也很簡單，解除夫妻關系就完事了。
再看看位置型，比如廁所裡有20個坑，上面都貼上數字，有人報告第17個坑堵了，找起來會很方便。再比如某市一個市有20個市長，內部等級順序都分好了的，在一次內部會議合影時要排序，攝影員喊一聲第14位的拉鏈開了，第14位市長馬上就什麼知道是自己。
這兩種類型的優點都出來了：
關系型：插入，刪除成員方便。
位置型：查找方便。

並不是報喜不報憂，對方的優點就是已方的缺點，反之，已方的優點就是對方的缺點，大自然是很奇秒的。

人類最大的創新就是學習和模擬大自然，因為大自然沒有申請專利保護。飛機是模彷鳥類，潛艇是模彷魚類，程序員也要模彷，於是便有了我們的數組和鏈表，分別對應位置型和關系型。
在標准庫裡，分別是 vector和 list.
vector 是分配一段連續的內存塊，一塊一塊的被填充，當填充完的時候，就整個重新分配，並把數據遷移過去，就像那些富豪家族在國內發展不了了，整個搬遷米國。
list 來一個分配一個，建立關系，走一個釋放一個。

vector，list 不是基於 key-value 型的，要把它封裝才能被當作緩存。
於是便有了前文提到的紅黑樹和本文的主角hash map.
紅黑樹的思想是關系型的，不過關系復雜點，模彷的更像點，有了父親，左孩子，右孩子的概念，其實理解成兒子和女兒又有何不可呢。
hash map 自然就是基本位置型的了。

那麼hash map的工作原理是什麼呢？
從名字上就可以看出來，新數據插入和查找都要經過兩步：
1.hash，就是散列。
2.map,就是存儲映射。

那麼為什麼要散列呢，一切都是為了找位置，明白了麼，就像那麼多副市長，是怎麼排序的，可以靠體型，也可以靠體重，散列的方法有很多種。
比如兩個字符串key:
1.hello
2.world

如何才能快速的定位到哪個位置呢，可以把以首字母的ascii碼為key, 就能快速存儲和快速查找了。hello 對應的是104, world 對應的是119.
hello 104
world 119

建立了一個對應表，既是hash表，也是map表。
再復雜一點，比如有下面的場景：
1.hello
2.hello world
3.world

第1種和第2種如何區別呢，共同對應104位，這個問題在學術上叫散列沖突，比如恰好有兩個副市長的體重一樣。

下面看看我是怎麼設計hash map的

[cpp]
typedef struct yumei_hash_map_s yumei_hash_map_t;
typedef struct yumei_cache_data_s yumei_cache_data_t;

struct yumei_hash_map_s
{
    int reserve;
    yumei_cache_data_t*   data;

};

struct yumei_cache_map_data_s
{
    int key;
    int times;
    int start_time;

    void* data;

    yumei_cache_map_data_t* next;
};

yumei_hash_map_t*   yumei_hash_map_create();
int yumei_hash_map_release( yumei_hash_map_t* hm );

int yumei_hash_map_insert( yumei_hash_map_t* hm, int key, void* data );
int yumei_hash_map_get( yumei_hash_map_t* hm, int key, void** data );
int yumei_hash_map_check( yumei_hash_map_t* hm );
int yumei_hash_map_update( yumei_hash_map_t* hm, int key, void* data );

typedef struct yumei_hash_map_s yumei_hash_map_t;
typedef struct yumei_cache_data_s yumei_cache_data_t;

struct yumei_hash_map_s
{
int reserve;
yumei_cache_data_t* data;

};

struct yumei_cache_map_data_s
{
int key;
int times;
int start_time;

void* data;

yumei_cache_map_data_t* next;
};

yumei_hash_map_t* yumei_hash_map_create();
int yumei_hash_map_release( yumei_hash_map_t* hm );

int yumei_hash_map_insert( yumei_hash_map_t* hm, int key, void* data );
int yumei_hash_map_get( yumei_hash_map_t* hm, int key, void** data );
int yumei_hash_map_check( yumei_hash_map_t* hm );
int yumei_hash_map_update( yumei_hash_map_t* hm, int key, void* data );

整個框架就搭起來了，核心是散列和沖突的解決。
散列算法：

[cpp]
int kv = key &( reserve - 1 );

int kv = key &( reserve - 1 );
沖突解決：
[cpp]
while( tmp ){
    if( tmp->key == key ){
        break;
    }

    tmp = tmp->next;
}

data = tmp->data;

while( tmp ){
if( tmp->key == key ){
break;
}

tmp = tmp->next;
}

data = tmp->data;

其實是在同一個位置上多掛幾個，就像一個市長位置上有幾個候選人，散列以後再來比對,這是最簡單有效的方法，當然如果需要在掛列上再進行排列或者特殊場景下的二次散列可能效果更好。