理解数据结构 hashtable的简易理解思路

结构图

为了方便演示，下图中分区算法为下标取模

private int hashFun(int id) {
      //使用 hash并取模

      return id % size;
  }

在这里插入图片描述

Hashtable的结构如图所示：是一个数组（元素为各个链表的表头）+ 多个链表组成，也就是说 hashtable 结合了数组和链表的优点。

数组的特点：寻址容易，插入和删除困难；
链表的特点：寻址困难，而插入和删除操作容易；
哈希表的特点：寻址插入和删除操作都容易。

简单的代码实现

该代码实现的功能有分区存储与查找

数据 Emp 的存储：存储时采用ID 取模算法确定数据的存储位置。
数据 Emp的查找：根据 ID 查找数据

public class HashTableDemo {
    public static void main(String[] args) {
        HashTable hashTable = new HashTable(7);
        hashTable.add(new Emp(1, "jack"));
        hashTable.add(new Emp(2, "tom"));
        hashTable.add(new Emp(3, "smith"));
        hashTable.add(new Emp(4, "mary"));
        hashTable.add(new Emp(5, "king"));
        hashTable.add(new Emp(6, "scott"));
        hashTable.add(new Emp(7, "peter"));
        hashTable.add(new Emp(8, "jack1"));
        hashTable.add(new Emp(9, "jack2"));
        hashTable.add(new Emp(10, "jack3"));
        hashTable.add(new Emp(11, "jack4"));
        hashTable.add(new Emp(12, "jack5"));
        hashTable.add(new Emp(13, "jack6"));
        hashTable.add(new Emp(14, "jack7"));
        hashTable.add(new Emp(15, "jack8"));
        hashTable.list();
    }
}

class HashTable {
    /**
     * 数组
     */
    private EmpLinkedList[] empLinkedListArray;
    /**
     * 数组的大小
     */
    private int size;

    public HashTable(int size) {
        this.size = size;
        empLinkedListArray = new EmpLinkedList[size];
        for (int i = 0; i < size; i++) {
            empLinkedListArray[i] = new EmpLinkedList();
        }
    }

    public void add(Emp emp) {
        // 使用散列函数确定放入哪个链表
        int empLinkedListNo = hashFun(emp.id);
        empLinkedListArray[empLinkedListNo].add(emp);
    }

    /**
     * 使用某种算法确定数据的存放位置
     * 这里简单的使用模运算，来确定数据应该落在那个数组元素指定的链表上，
     * 只是一种特例，因为所存的数据包含 ID，所以采用对 ID 取模运算。
     * 更具一般性的算法是使用 哈希函数，对数组大小取模，这里只是个例子。
     *
     * @param id 数组的索引
     * @return
     */
    private int hashFun(int id) {
        //使用 hash并取模

        return id % size;
    }

    public void list() {
        for (int i = 0; i < size; i++) {
            empLinkedListArray[i].list(i);
        }
    }
}

class EmpLinkedList {
    private Emp head;

    public void add(Emp emp) {
        if (head == null) {
            head = emp;
            return;
        }
        Emp temp = head;//游标
        while (temp.next != null) {
            temp = temp.next;
        }
        temp.next = emp;
    }

    public void list(int no) {
        if (head == null) {
            System.out.println("第" + no + "链表为空");
            return;
        }
        System.out.print("第" + no + "链表信息为：");
        Emp temp = head;
        while (temp != null) {
            System.out.print("id=" + temp.id + " name=" + temp.name + " ");
            temp = temp.next;
        }
        System.out.println();
    }

    public Emp findEmpById(int id) {
        if (head == null) {
            return null;
        }
        Emp temp = head;
        while (temp != null) {
            if (temp.id == id) {
                return temp;
            }
            temp = temp.next;
        }
        return temp;
    }
}

class Emp {
    public int id;
    public String name;
    public Emp next;

    public Emp(int id, String name) {
        this.id = id;
        this.name = name;
    }

    public int getId() {
        return id;
    }

    public void setId(int id) {
        this.id = id;
    }

    public String getName() {
        return name;
    }
}

哈希表的理解

可以将每一条链表理解为一个区间，数据按照某种算法添加到与之相适应的区间，也称为分区算法， kafka中分区的设计思路也是基于此。

这里的算法是对与数据相关的那个属性或者特征值（称为 key）进行某种运算,得到一个唯一的一个数据，该算法的结果决定的所谓的数据落盘。

哈希表的主要功能：

分区存储：对具有相同的特征的数据进行跟组存储。
分区查找：依据数据的特性值，在特定的分组中查找数据，提高了检索的效率。

哈希表名字的由来

为了保证数据查找的唯一性，采用哈希函数才作为分区算法，即是哈希表的由来。

为什么采用哈希函数？

基于哈希函数的性质，所以采用哈希函数作为分区算法。

哈希函数是一种将任意长度的数据映射为固定长度输出的算法。哈希函数广泛应用于计算机科学的多个领域，如数据索引、密码学、数据完整性校验等。以下是哈希函数的主要概念及其性质：

概念

输入与输出：
- 输入：可以是任意长度的数据（字符串、文件等）。
- 输出：固定长度的哈希值（通常是固定长度的二进制串或十六进制字符串）。
用途：
- 数据索引：快速查找数据。
- 数据完整性校验：检测数据是否被篡改。
- 密码学：安全地存储密码。
- 散列表：实现高效的数据结构。

性质

确定性：
- 对于相同的输入，哈希函数总是产生相同的输出。
- 这一性质确保了哈希值的可预测性和一致性。
均匀分布：
- 哈希函数应尽量使不同的输入产生不同的哈希值，以减少冲突。
- 均匀分布有助于提高哈希表的性能。
单向性：
- 从哈希值很难反推出原始输入数据。
- 这一性质在密码学中尤为重要，确保了数据的安全性。
抗碰撞性：
- 弱抗碰撞性：给定一个输入数据，很难找到另一个不同的输入数据，使得两者的哈希值相同。
- 强抗碰撞性：很难找到两个不同的输入数据，使得它们的哈希值相同。
- 抗碰撞性是哈希函数在密码学应用中的关键属性。
计算效率：
- 哈希函数应具有较高的计算效率，以便在实际应用中能够快速生成哈希值。
敏感性：
- 即使输入数据有微小的变化，哈希值也会发生显著变化。
- 这一性质有助于检测数据的微小修改。

常见的哈希函数

MD5：128位哈希值，已不再推荐用于安全性要求高的场景。
SHA-1：160位哈希值，同样不推荐用于安全性要求高的场景。
SHA-256：256位哈希值，广泛用于安全敏感的应用。
SHA-3：新一代哈希函数，提供更高的安全性和性能。

哈希表的实质

根据对哈希表的结构和算法分析可知，

哈希表的结构：数组（这里体现了哈希）+链表（这里体现里表）；
哈希表的实现逻辑：就是由哈希算法（分区算法）、数组（存放分区首元素）和链表（可以理解为分区）来实现的。

Java 中的 hashtable

public class Hashtable<K,V>
    extends Dictionary<K,V>
    implements Map<K,V>, Cloneable, java.io.Serializable {

    /**
     * The hash table data.
     */
    private transient Entry<?,?>[] table;

    /**
     * The total number of entries in the hash table.
     */
    private transient int count;

    /**
     * The table is rehashed when its size exceeds this threshold.  (The
     * value of this field is (int)(capacity * loadFactor).)
     *
     * @serial
     */
    private int threshold;

    /**
     * The load factor for the hashtable.
     *
     * @serial
     */
    private float loadFactor;

    /**
     * The number of times this Hashtable has been structurally modified
     * Structural modifications are those that change the number of entries in
     * the Hashtable or otherwise modify its internal structure (e.g.,
     * rehash).  This field is used to make iterators on Collection-views of
     * the Hashtable fail-fast.  (See ConcurrentModificationException).
     */
    private transient int modCount = 0;

    /** use serialVersionUID from JDK 1.0.2 for interoperability */
    @java.io.Serial
    private static final long serialVersionUID = 1421746759512286392L;

    /**
     * Constructs a new, empty hashtable with the specified initial
     * capacity and the specified load factor.
     *
     * @param      initialCapacity   the initial capacity of the hashtable.
     * @param      loadFactor        the load factor of the hashtable.
     * @throws     IllegalArgumentException  if the initial capacity is less
     *             than zero, or if the load factor is nonpositive.
     */
    public Hashtable(int initialCapacity, float loadFactor) {
    }

    /**
     * Constructs a new, empty hashtable with the specified initial capacity
     * and default load factor (0.75).
     *
     * @param     initialCapacity   the initial capacity of the hashtable.
     * @throws    IllegalArgumentException if the initial capacity is less
     *              than zero.
     */
    public Hashtable(int initialCapacity) {
        this(initialCapacity, 0.75f);
    }

    /**
     * Constructs a new, empty hashtable with a default initial capacity (11)
     * and load factor (0.75).
     */
    public Hashtable() {
        this(11, 0.75f);
    }
    /**
     * Constructs a new hashtable with the same mappings as the given
     * Map.  The hashtable is created with an initial capacity sufficient to
     * hold the mappings in the given Map and a default load factor (0.75).
     *
     * @param t the map whose mappings are to be placed in this map.
     * @throws NullPointerException if the specified map is null.
     * @since   1.2
     */
    public Hashtable(Map<? extends K, ? extends V> t) {
        this(Math.max(2*t.size(), 11), 0.75f);
        putAll(t);
    }

}

从源码中可以看出来hashtable 一共提供了 4 个构造方法

public Hashtable(int initialCapacity, float loadFactor)：用指定初始容量和指定加载因子构造一个新的空哈希表
public Hashtable(int initialCapacity) ：用指定初始容量和默认的加载因子 (0.75) 构造一个新的空哈希表
public Hashtable()：默认构造函数，容量为 11，加载因子为 0.75
public Hashtable(Map<? extends K, ? extends V> t)：构造一个与给定的Map具有相同映射关系的新哈希表