C# Dictionary的实现原理

在 C# 中，Dictionary<TKey, TValue> 是一个基于哈希表（Hash Table）实现的键值对集合。它提供了高效的插入、删除和查找操作，平均时间复杂度接近 O(1)。下面是 Dictionary 的核心实现原理：

1. Dictionary 的核心数据结构

C# 的 Dictionary<TKey, TValue> 主要由以下几个部分组成：

数组（buckets）：存储哈希桶（Bucket）的索引。
数组（entries）：存储键值对（哈希表中的实际数据）。
哈希函数（GetHashCode）：将键映射到哈希桶。
碰撞解决（拉链法）：采用 链地址法（Chaining with Linked List）。
负载因子（Load Factor）：决定何时扩容，默认为 0.75。

数据结构

private int[] buckets;         // 哈希桶数组，存储 entry 的索引
private Entry[] entries;       // 存储实际的键值对
private int count;             // 当前存储的元素个数
private int freeList;          // 指向空闲 entry（删除后形成的空位）
private int freeCount;         // 空闲 entry 的数量

private struct Entry
{
    public int hashCode;       // 计算出的哈希值
    public int next;           // 指向下一个冲突的元素（-1 表示无冲突）
    public TKey key;           // 键
    public TValue value;       // 值
}

2. Dictionary 的主要操作

（1）添加元素

当调用 dictionary.Add(key, value) 时，Dictionary 执行以下步骤：

计算哈希值：调用 key.GetHashCode() 计算哈希值 hashCode。
计算索引：对哈希值取模，index = hashCode % buckets.Length，确定应该存放的桶。
检查冲突：
- 如果该桶为空，则直接存储。此时count字段记录字典中元素个数。令bucket[index] = count，然后将内容存储到Entry[count]中，随后count++。
- 如果发生哈希冲突，则使用链地址法存储，即将 entries 中的 next 指向旧的 Entry，形成链表。

扩容（如有必要）：

如果 count > capacity * 0.75，则触发扩容，通常是 2 倍扩容。

public void Add(TKey key, TValue value)

{

 int hashCode = key.GetHashCode() & 0x7FFFFFFF;  // 计算哈希值
 int index = hashCode % buckets.Length;  // 计算桶的索引

 // 处理哈希冲突：如果桶为空，直接插入
 if (buckets[index] == -1)
 {
 	buckets[index] = count;  // 将当前元素插入桶数组
 	entries[count] = new Entry { hashCode = hashCode, key = key, value = value, next = -1 	};
 	count++;
 }
 else
 {
 	// 发生哈希冲突，遍历链表
 	int current = buckets[index];
 	while (current >= 0)
 	{
     	Entry entry = entries[current];
     	if (entry.hashCode == hashCode && EqualityComparer<TKey>.Default.Equals(entry.key, key))
     	{
         	entries[current].value = value;  // 如果找到相同的键，更新值
         	return;
     	}
     	current = entry.next;  // 查找下一个节点
 	}

 	// 如果没有找到，插入新的元素到链表头部
 	entries[count] = new Entry { hashCode = hashCode, key = key, value = value, next = buckets[index] };
 	buckets[index] = count;  // 更新桶的索引
 	count++;
 }

 // 扩容检查
 if (count > buckets.Length * 0.75)
 {
 	Resize();  // 扩容
 }

}

（2）查找元素

当调用 dictionary[key] 或 TryGetValue(key, out value) 时：

计算哈希值：hashCode = key.GetHashCode()。
计算索引：index = hashCode % buckets.Length。
遍历链表

：
- 如果桶中第一个元素匹配，则直接返回。
- 如果有哈希冲突（next != -1），遍历 entries 直到找到 key。

（3）删除元素

当调用 dictionary.Remove(key) 时：

计算哈希值，找到哈希桶索引。
遍历链表，找到匹配的 Entry。
更新链表指针：
- 如果是第一个元素，则更新 buckets 指向 next 位置。
- 如果在链表中，则调整 next 以跳过该元素。
回收 entry：
- 将 entry 记录到 freeList，方便后续使用。

（4）扩容机制

当 count >= capacity * 0.75 时，Dictionary 会进行 2 倍扩容：

创建更大的 buckets 和 entries 数组（通常是 2 倍大小）。
重新计算索引，重新分配 buckets 和 entries。
重新插入所有 Entry，因为 hashCode % newCapacity 结果不同，所有键值对需要重新计算索引。

3. Dictionary 的碰撞处理

C# 的 Dictionary 采用 链地址法 处理哈希冲突：

entries 形成链表，每个 Entry 记录 next 指向同一个桶的下一个 Entry。
遍历链表时，检查 hashCode 和 key 是否匹配。

示例假设 buckets 长度为 5，并插入以下键值对：

dict.Add(1, "Apple");
dict.Add(6, "Banana"); // 6 % 5 == 1，与 1 发生哈希冲突

哈希桶结构如下：

buckets: [ -1,  0 -> 1, -1, -1, -1 ]   // index 1 存储链表 (1 → 6)
entries: 
    0: { hashCode=1, key=1, value="Apple", next=1 }
    1: { hashCode=6, key=6, value="Banana", next=-1 }

查找 dict[6] 时：

计算 index = 6 % 5 = 1。
遍历链表：
- entries[0] (key=1) 不匹配，继续遍历 next=1。
- entries[1] (key=6) 匹配，返回 "Banana"。

4. Dictionary 的优缺点

优点

查找快：平均时间复杂度 O(1)。
插入删除快：平均时间复杂度 O(1)。
自动扩容：避免手动管理大小。

缺点

内存占用大：数组 + 链表可能浪费额外空间。
哈希碰撞影响性能：冲突越多，查找速度降至 O(n)。
不保证顺序：Dictionary 无序存储键值对。

5. Dictionary 的替代方案

数据结构	适用场景
`SortedDictionary<TKey, TValue>`	需要按键排序（基于红黑树，O(log n)）
`SortedList<TKey, TValue>`	适用于小数据量，查找快但插入慢
`HashSet<T>`	仅存储键，不存储值
`ConcurrentDictionary<TKey, TValue>`	多线程安全字典