程式師世界 >> 編程語言 >> C語言 >> C++ >> C++入門知識 >> unordered容器，unorderedmap

unordered容器，unorderedmap

編輯：C++入門知識

unordered容器，unorderedmap

散列容器(hash container)：
通常比二叉樹的存儲方式可以提供更高的訪問效率.
#include <boost/unordered_set.hpp>
#include <boost/unordered_map.hpp>
using namespace boost;
散列集合簡介:
unordered庫提供兩個散列集合類unordered_set和unordered_multiset,STLport也提供hash_set和hash_multiset，它們的接口，用法與stl裡的標准關聯容器set/multiset相同，只是內部使用散列表代替了二叉樹實現，因此查找復雜度由數降為常數。
unordered_set/unordered_multiset簡要聲明：
template<class Key, class Hash = boost::hash<Key>,
class Pred = std::equal_to<Key>,
class Alloc = std::allocator<Key>>
class unordered_set;

template<class Key, class Hash = boost::hash<Key>,
class Pred = std::equal_to<Key>,
class Alloc = std::allocator<Key>>
class unordered_multiset;
與std::set相比，unorder_set的模板增加了一個計算散列值的模板類型參數，通常是boost::hash,最好不要去改變它，另外比較謂詞參數使用std::equal_to<>,而不是set中的less<>,這是因為散列容器不需要保持有序。

散列集合的用法:
注意散列容器的無序性，不能再散列容器上使用binary_search,lower_bound和upper_bound這樣用於已序區間的算法，散列容器自身也不提供這樣的成員函數。
示范：程序定義了一個模板函數hash_func()，用以操作hash_set/unordered_set,兩者的表現是完全一樣的，首先使用boost::assign初始化散列集合，以迭代器遍歷輸出，然後用size()顯示容器大小，用clear()清空集合，再用insert()插入兩個元素，用find()查找元素，最後用erase()刪除一個元素，這些都是標准容器的標准操作。

#include <iostream>
#include <hash_set>
#include <boost/unordered_set.hpp>
#include <boost/assign/list_of.hpp>
using namespace boost;
using namespace std;
template<typename T>
void hash_func()
{
using namespace boost::assign;
T s = (list_of(1), 2, 3, 4, 5); //初始化數據
for (T::iterator p = s.begin(); p != s.end(); ++p) //使用迭代器遍歷集合
{ cout<< *p<<" "; }
cout<<endl;
cout<<s.size()<<endl;
s.clear();
cout<<s.empty()<<endl;
s.insert(8);
s.insert(45);
cout<<s.size()<<endl;
cout<<*s.find(8)<<endl;
s.erase(45);
}

int main()
{
hash_func<unordered_set<int>>();
system("pause");
return 0;
}

散列映射簡介:
unordered庫提供兩個散列映射類undorderd_map和unordered_multimap,它們的接口，用法與stl裡的標准關聯容器map/multimap相同，只是內部使用散列表代替了二叉樹，模板參數多了散列計算函數，比較謂詞使用equal_to<>。
unordered_map和unordered_multimap的簡要聲明:
template<class Key, class Mapped,
class Hash = boost::hash<Key>,
class Pred = std::equal_to<Key>,
class Alloc = std::allocator<Key>>
class unordered_map;

template<class Key, class Mapped,
class Hash = boost::hash<Key>,
class Pred = std::equal_to<Key>,
class Alloc = std::allocator<Key>>
class unordered_multimap;
散列映射的用法：
unordered_map/unordered_multimap屬於關聯式容器，采用std::pair保存key-value形式的數據，可以理解一個關聯數組，提供operator[]重載，用法與標准容器map相同.
unordered_multimap允許有重復的key-value映射，因此不提供operator[].
示范：
#include <iostream>
#include <hash_map>
#include <boost/unordered_map.hpp>
#include <boost/assign/list_of.hpp>
#include <boost/typeof/typeof.hpp>
using namespace boost;
using namespace std;
using namespace stdext;
int main()
{
using namespace boost::assign;
//使用assign初始化
unordered_map<int, string> um = map_list_of(1, "one")(2, "two")(3, "three");
um.insert(make_pair(10, "ten"));
cout<<um[10]<<endl;
um[11] = "eleven";
um[15] = "fifteen";
for (BOOST_AUTO(p, um.begin()); p != um.end(); ++p)
cout<<p->first<<"-"<<p->second<<",";
cout<<endl;
um.erase(11);
cout<<um.size()<<endl;
hash_map<int, string> hm = map_list_of(4, "four")(5, "five")(6, "six");
for (BOOST_AUTO(p, hmbegin()); p != hm.end(); ++p)
cout<<p->first<<"-"<<p->second<<",";
cout<<endl;
system("pause");
return 0;
}

性能比較：
示范：程序使用boost隨機庫random()向容器插入10000個1到100之間的整數，然後執行count和find操作；
#include <iostream>
#include <typeinfo>
#include <hash_map>
#include <set>
#include <boost/unordered_set.hpp>
#include <boost/assign/list_of.hpp>
#include <boost/typeof/typeof.hpp>
#include <boost/random.hpp>
#include <boost\Progress.hpp>
using namespace boost;
using namespace std;
using namespace stdext;

template<typename T>
void fill_set(T &c)
{
variate_generator<mt19937, uniform_int<>> gen(mt19937(), uniform_int<>(0, 100));
for (int i = 0; i < 10000; ++i)//插入一萬個整數
  c.insert(gen());
}
template<typename T>
void test_perform()
{
T c;
cout<<typeid(c).name()<<endl;
{
  boost::progress_timer t;
  fill_set(c);
}
{
  boost::progress_timer t;
  c.count(10);
}
{
  boost::progress_timer t;
  c.find(20);
}
}

int main()
{
test_perform<multiset<int>>();
//test_perform<hash_multiset<int>>();
test_perform<unordered_multiset<int>>();
system("pause");
return 0;
}

高級議題：
內部數據結構：
unordered庫使用“桶（bucket）”來存儲元素，散列值相同的元素被放入同一個桶中，當前散列容器的桶的數量可以用成員函數bucket_count()來獲得，bucket_size()返回桶中的元素數量，例如：
unordered_set<int> us = (list_of(1), 2, 3, 4);
cout<< us.bucket_count()<<endl;
for(int i = 0; i < us.bucket_count(); ++i)//訪問每個桶
{cout<<us.bucket_size(i)<<",";}
當散列容器中有大量數據時，桶中的元素數量也會增多，會造成訪問沖突。為了提高散列容器的性能，unordered庫會在插入元素時自動增加桶的數量，用戶不能直接指定桶的數量，但可以在構造函數或者rehash()函數指定最小的桶的數量。例如:
unordered_set<int> us(100);//使用100個桶存儲數據
us.rehash(200);//使用200個桶
c++0x RT1草案還規定有一個函數max_load_factor()，它可以獲取或設定散列容器的最大負載因子，即桶中元素的最大平均數量，通常最大負載因子都是1，用戶不應當去改變它，過大或過小都沒有意義。

支持自定義類型：
unordered庫支持c++內建類型和大多數標准庫容器，但不支持用戶自定義的類型，因為它無法計算自定義類型的散列值。
如果要使unordered支持自定義類型，需要定制類模板的第二個和第三個參數，也就是供散列函數和相等比較謂詞。
相等比較謂詞，unordered庫默認使用std::equal_to,這是一個標准庫中的函數對象，它使用operator==，只有自定義類實現了這個操作符就可以了，不必再特意編寫一個函數對象，如果需要用一個特別的相等判斷規則，那麼可以額外寫函數對象，傳遞給unordered容器。
散列函數則是必須要實現的，這也是為什麼它被放在模板參數列表前面的原因，我們需要使用boost.hash庫來計算自定義類型的散列值，組簡單的使用方法是編寫一個hash_value()函數，創建一個hash函數對象，然後使用它的operator()返回散列值。
下面的代碼定義了一個類demo_class，它實現了operator==和散列函數，可以被unordered所容納：
#include <iostream>
#include <typeinfo>
#include <hash_map>
#include <set>
#include <boost/unordered_set.hpp>
#include <boost/assign/list_of.hpp>
#include <boost/typeof/typeof.hpp>
#include <boost/random.hpp>
#include <boost/Progress.hpp>
#include <boost/functional/hash.hpp>
using namespace boost;
using namespace std;
using namespace stdext;
using namespace boost::assign;
struct demo_class
{
int a;
friend bool operator==(const demo_class& l, const demo_class& r)
{ return l.a == r.a; }
};
size_t hash_value(demo_class & s)
{ return boost::hash<int>()(s.a);}

int main()
{
unordered_set<demo_class> us;
demo_class a1;
a1.a =100;
cout<<hash_value(a1)<<endl;
demo_class a2;
a2.a =100;
cout<<hash_value(a2)<<endl;

system("pause");
return 0;
}
與TR1的差異：
boost.unordered基本上依據c++0x標准草案來實現，但它有個小的”擴充“，增加了比較操作符operator==和operator !=；注意：這兩個操作符不屬於c++0x標准，如果將來程序要切換到新標准可能會遇到不可用移植的問題。