Posted 2022-09-14Updated 2026-03-04Algorithms8 minutes read (About 1249 words)

Hash map

导言

IPCC比赛的时候，发现用hash map会比红黑树的map慢。感觉不对劲，我hash空间声明大，冲突不就低了吗？

hash的相关基本知识

散列表的装填因子定义为^1：

$α= 填入表中的元素个数 / 散列表的长度$

α是散列表装满程度的标志因子。
由于表长是定值，α与“填入表中的元素个数”成正比，所以，
α越大，填入表中的元素较多，产生冲突的可能性就越大；
α越小，填入表中的元素较少，产生冲突的可能性就越小。

unordered_map定义

在插入元素时,根据待插入元素的关键码,以hashfunc计算出该元素的存储位置,在结构中按此位置进行存放

在搜索元素时,对关键码进行同样的计算,把求得的函数值当做元素的存储位置,在结构中按此位置去元素比较,若关键码相等,则搜索成功

C++11 定义了std::hash

template<
    class Key,
    class T,
    class Hash = std::hash<Key>,
    class KeyEqual = std::equal_to<Key>,
    class Allocator = std::allocator< std::pair<const Key, T> >
> class unordered_map;

很明显可以reserve设置大小

unordered_map详解

自定义结构体hash函数

hashfunc设计原则

哈希函数的定义域必须包括需要存储的全部关键码,而如果散列表允许m个地址时,其值域必须在0-(m-1)之间

哈希函数计算出来的地址能均匀分布在整个空间中

哈希函数应该比较简单

举例

如果想让自定义的class作为key（unordered_map<key,value>）来使用unordered_map，需要实现^3：

哈希函数，需要实现一个class重载operator()，将自定义class变量映射到一个size_t类型的数。一般常用std::hash模板来实现。
1. For two parameters k1 and k2 that are equal, std::hash<Key>()(k1) == std::hash<Key>()(k2).
2. For two different parameters k1 and k2 that are not equal, the probability that std::hash<Key>()(k1) == std::hash<Key>()(k2) should be very small, approaching 1.0/std::numeric_limits<std::size_t>::max().
判断两个自定义class类型的key变量是否相等的函数，一般在自定义class里重载operator==。

template <>
   struct hash<Myclass>
   {
       size_t operator()(const Myclass &k) const
       {
           int h = k.first;
           for (auto x : k.second)
           {
               h ^= x;
           }
           return h;
       }
   };

或者

struct MyHash
{
  size_t operator() (const pair<int,int>& p) const
  {
    return hash<long long>()( (static_cast<long long>(x.first) ) 
    ^ ( (static_cast<long long>(x.first))<<32) );
  }
};

防止冲突的自定义hash

防止黑客攻击^4

struct custom_hash {
    static uint64_t splitmix64(uint64_t x) {
        // http://xorshift.di.unimi.it/splitmix64.c
        x += 0x9e3779b97f4a7c15;
        x = (x ^ (x >> 30)) * 0xbf58476d1ce4e5b9;
        x = (x ^ (x >> 27)) * 0x94d049bb133111eb;
        return x ^ (x >> 31);
    }

    size_t operator()(uint64_t x) const {
        static const uint64_t FIXED_RANDOM = chrono::steady_clock::now().time_since_epoch().count();
        return splitmix64(x + FIXED_RANDOM);
    }
};

结构体万能hash

解释见转载的作者

template <typename T>
inline void hash_combine(size_t& seed, const T& val)
{
    seed ^= std::hash<T>()(val) + 0X9E3779B9 + (seed << 6) + (seed >> 2);
}

template <typename T>
inline void hash_val(size_t& seed, const T& val)
{
    hash_combine(seed, val);
}

template <typename T, typename... Types>
inline void hash_val(size_t& seed, const T& val, const Types&... args)
{
    hash_combine(seed, val);
    hash_val(seed, args...);
}

template <typename... Types>
inline size_t hash_val(const Types&... args)
{
    size_t seed = 0;
    hash_val(seed, args...);
    return seed;
}

class MyObject{
public:
    string a;
    char b;
    unsigned c;
    bool operator==(const MyObject& rhs) const
    {
        return a == rhs.a && b == rhs.b && c == rhs.c;
    }

    bool operator!=(const MyObject& rhs) const
    {
        return !operator==(rhs);
    }

    friend ostream& operator<<(ostream& os, const MyObject& ptr)
    {
        return os << ptr.a << "   " << ptr.b << "   " << ptr.c << endl;
    }
};

class MyHashFunction{
public:
    std::size_t operator()(const MyObject& obj) const {
        return hash_val(obj.a, obj.b, obj.c);
    }
};

unordered_map性能

加快 std::unorderd_map 的速度

为了避免动态改变哈希表的大小，可以预估容器中带插入元素的数量，并且预先申请好内存空间，避免动态分配过程中造成的 rehash 现象。当容器中的总元素超出了填充因子时，容器会重新申请更大的内存扩充桶的数量，然后重新哈希。

例如，我会预先调用 reverse 成员函数预先分配内存空间，以及设置好最大填充因子。

1
2
3

unordered_map<int,int> hash;
mp.reserve(1024);
mp.max_load_factor(0.25);

但是没什么用？

Google 开源的 abesil flat hash map

Google 则采用了开放寻址法中最简单的线性探测方法来解决哈希碰撞，取得了更快的速度。

开放定址法,当哈希表未满,在插入同义字时,可以把key值存放在下一个空位置(线性探测)
线性探测: 从发生冲突的位置开始,依次向后探测,直到寻找下一个空位置为止.

更快，解释开发寻址法，cache更优秀

高性能

https://github.com/greg7mdp/parallel-hashmap

https://github.com/greg7mdp/sparsepp

参考文献

Hash map

http://icarus.shaojiemike.top/2022/09/14/Work/Algorithms/datastructure/hashmap/

Author

Shaojie Tan

Posted on

2022-09-14

Updated on

2026-03-04

Licensed under

#C++hash map

Hash map

hash的相关基本知识

unordered_map定义

unordered_map详解

自定义结构体hash函数

hashfunc设计原则

举例

防止冲突的自定义hash

结构体万能hash

unordered_map性能

加快 std::unorderd_map 的速度

Google 开源的 abesil flat hash map

高性能

参考文献

Author

Posted on

Updated on

Licensed under

Like this article? Support the author with

Categories

Subscribe for updates

follow.it

Links

Recents

Archives

Tags