算法入门：深入理解哈希表（C++实现详解）-CSDN博客

本文链接：https://blog.csdn.net/jay_515/article/details/148782007

哈希表是算法世界中高效查找的魔法师，能以接近O(1)的时间复杂度完成数据检索。本文将带你从零开始掌握这一核心数据结构！

一、为什么需要哈希表？

在算法与数据结构中，我们经常遇到快速查找的需求。数组查找需要O(n)时间，二分查找需要O(log n)，而哈希表能在平均O(1)时间复杂度内完成查找操作，这种效率提升在数据处理中至关重要。

应用场景

数据库索引
缓存系统（如Redis）
编译器符号表
拼写检查器
数据去重

二、哈希表核心原理

哈希表通过哈希函数将键(key)映射到存储位置，实现快速访问。其核心思想是：

关键概念

哈希函数：将任意大小数据映射到固定大小值
哈希冲突：不同键映射到相同位置
负载因子：元素数量/桶数量，衡量表的使用率

三、哈希冲突解决方案

1. 链地址法（Separate Chaining）

最常用的冲突解决方法，每个桶使用链表存储相同哈希值的元素

2. 开放寻址法（Open Addressing）

当冲突发生时，按预定规则寻找下一个空桶：

线性探测
二次探测
双重哈希

四、C++ STL中的哈希表

C++11引入了高效的哈希表实现：

#include <unordered_map>
#include <unordered_set>

int main() {
    // 哈希映射示例
    std::unordered_map<std::string, int> wordCount;
    wordCount["apple"] = 5;      // 插入
    wordCount["banana"] = 3;
    
    if (wordCount.find("apple") != wordCount.end()) {
        std::cout << "Apple count: " << wordCount["apple"] << std::endl;
    }
    
    // 哈希集合示例
    std::unordered_set<int> uniqueNumbers;
    uniqueNumbers.insert(10);
    uniqueNumbers.insert(20);
    uniqueNumbers.insert(10);  // 重复元素不会插入
    
    return 0;
}

五、手动实现哈希表（链地址法）

下面我们实现一个简易但完整的哈希表：

#include <iostream>
#include <vector>
#include <list>
#include <functional> // for std::hash

template <typename K, typename V>
class HashMap {
private:
    struct Node {
        K key;
        V value;
        Node(K k, V v) : key(k), value(v) {}
    };

    std::vector<std::list<Node>> buckets;
    int capacity;
    int size;
    const double LOAD_FACTOR = 0.75;

    int getBucketIndex(K key) {
        // 使用C++标准库的哈希函数
        std::hash<K> hashFunc;
        return hashFunc(key) % capacity;
    }

    void rehash() {
        int oldCapacity = capacity;
        capacity *= 2;
        std::vector<std::list<Node>> newBuckets(capacity);
        
        // 重新插入所有元素
        for (auto &bucket : buckets) {
            for (auto &node : bucket) {
                int index = getBucketIndex(node.key);
                newBuckets[index].push_back(Node(node.key, node.value));
            }
        }
        
        buckets = std::move(newBuckets);
    }

public:
    HashMap(int initialCapacity = 10) : capacity(initialCapacity), size(0) {
        buckets.resize(capacity);
    }

    void insert(K key, V value) {
        int index = getBucketIndex(key);
        
        // 检查键是否已存在
        for (auto &node : buckets[index]) {
            if (node.key == key) {
                node.value = value; // 更新值
                return;
            }
        }
        
        // 插入新节点
        buckets[index].push_back(Node(key, value));
        size++;
        
        // 检查负载因子
        if ((1.0 * size) / capacity > LOAD_FACTOR) {
            rehash();
        }
    }

    bool get(K key, V &value) {
        int index = getBucketIndex(key);
        for (auto &node : buckets[index]) {
            if (node.key == key) {
                value = node.value;
                return true;
            }
        }
        return false;
    }

    bool remove(K key) {
        int index = getBucketIndex(key);
        auto &bucket = buckets[index];
        
        for (auto it = bucket.begin(); it != bucket.end(); it++) {
            if (it->key == key) {
                bucket.erase(it);
                size--;
                return true;
            }
        }
        return false;
    }

    int getSize() const {
        return size;
    }

    bool isEmpty() const {
        return size == 0;
    }
};

代码解析：

数据结构：使用vector存储桶，每个桶是list组成的链表
哈希函数：使用C++标准库的std::hash
动态扩容：当负载因子超过0.75时自动扩容
基本操作：插入、查找、删除时间复杂度平均为O(1)

六、哈希函数设计原则

优秀的哈希函数应满足：

一致性：相同键产生相同哈希值
高效性：计算速度快
均匀性：键值均匀分布到各个桶

简单字符串哈希函数示例：

size_t stringHash(const std::string &str) {
    size_t hash = 5381; // 初始种子
    
    for (char c : str) {
        // hash * 33 + c 
        hash = ((hash << 5) + hash) + c; 
    }
    
    return hash;
}

七、性能分析与优化

时间复杂度

操作	平均情况	最坏情况
插入	O(1)	O(n)
查找	O(1)	O(n)
删除	O(1)	O(n)

优化策略

动态扩容：保持合理负载因子（通常0.7-0.8）
优质哈希函数：减少冲突概率
桶结构优化：链表过长时可转为平衡树（如Java HashMap）

八、实战练习

两数之和（LeetCode 1）

vector<int> twoSum(vector<int>& nums, int target) {
    unordered_map<int, int> numMap;
    for (int i = 0; i < nums.size(); i++) {
        int complement = target - nums[i];
        if (numMap.find(complement) != numMap.end()) {
            return {numMap[complement], i};
        }
        numMap[nums[i]] = i;
    }
    return {};
}