9_哈希表.md

哈希表

什么是哈希表

哈希表表（Hash table，散列表），是根据关键码值(Key value)而直接进行访问的数据结构。也就是说，它通过把关键码值映射到表中一个位置来访问记录，以加快查找的速度。这个映射函数叫做哈希函数，存放记录的数组叫做散列表。给定表M，存在函数f(key)，对任意给定的关键字值key，代入函数后若能得到包含该关键字的记录在表中的地址，则称表M为哈希(Hash）表，函数f(key)为哈希(Hash) 函数。

举个例子：统计 s="fsgererhgerh"中字符出现次数（其中s只包含小写字母）

我们通常会使用这样的数组： int[] freq=new int[26];

这个freq就是一个哈希表

其中每一个字符都和一个索引对应：

哈希表设计思想：

哈希表充分体现了算法设计领域的经典思想：空间换时间。
哈希表是空间和时间之间的平衡
哈希函数的设计十分重要
“键”通过哈希函数得到的“索引”分布越均匀越好

哈希函数的设计

哈希函数设计原则：

1.一致性：如果a==b,则hash(a)==hash(b)，反之，则不成立

2.高效性：计算高效简便

3.均匀性：哈希值均匀分布

整数

小范围正整数：

直接使用

小范围负整数：

进行偏移，比如[-100,100]，左右区间都+100，进行偏移，得到[0,200]

大整数：比如身份证号 342401199607198888

通常做法：取模

1.取后4位，相当于 mod 10000 -->获得 8888

2.取后6位，相当于mod 1000000 --> 获得 198888，但是获取的数据不可能超过32万，因为在身份证中19为表示的是一个月的某一天，最大值是31，即所谓分布不均匀问题; 只取后6位，就没有利用所有信息。

3.解决2中的两个问题的简单解决办法：模一个素数

浮点型

浮点型在计算机中也是32位或者64位的二进制表示的，只不过是计算机解析成浮点数的，因此，可以根据浮点数的二进制表示，转化为整数。

字符串

转换成整型处理

"163" --> 1 * 10^2 + 6 * 10^1 + 3 * 10^0

"code" --> c * 26^3 + o * 26^2 + d * 26^1 + e * 26^0

"code"也可以这样转换：

"code" --> c * B^3 + o * B^2 + d * B^1 + e * B^0

(26和B都是进制，合理选择进制即可)

因此有如下哈希函数：

hash("code")= (c * B^3 + o * B^2 + d * B^1 + e * B^0) % M

进一步转化：

hash("code")= (((c * B + o)*B + d)*B + e) % M

(((c * B + o)*B + d)*B + e)数据有可能过大，再进一步进行转化：

hash("code")= ((((c % M)* B + o) % M * B + d) %M * B + e) % M

对应代码如下：

int hash=0;
for(int i=0;i<s.length();i++){
    hash=( hash * B + s.charAt(i) )%M;
}

复合类型

转化成整型处理

比如：对于自定义的Date

class Date{
    int year;
    int month;
    int day;
}

hash(date)=(((date.year % M ) * B + date.month ) % M * B + date.day) % M

注意：以上都是都是转化为整型处理，但这并不是唯一的方法。

Java中hashCode方法

Java中hashCode函数的使用

public class Main {
    public static void main(String[] args) {
        Integer a=new Integer(3);
        System.out.println(a.hashCode()); //3

        Integer b=new Integer(-3);
        System.out.println(b.hashCode());//-3

        Double c=new Double(123.4);
        System.out.println(c.hashCode());//--641253373

        String s="abcd";
        System.out.println(s.hashCode());//2987074
    }
}

在自定义对象中重写hashCode函数

public class Student {
    private int id;
    private double score;
    private String name;
    
    public Student(int id,double score,String name){
        this.id=id;
        this.score=score;
        this.name=name;
    }

    @Override
    public int hashCode() {
        int B=26;
        
        int hash=0;
        hash = hash * B + id;
        hash = hash * B + ((Double)score).hashCode();
        //忽略name的大小写问题
        hash = hash * B + name.toLowerCase().hashCode();
        return hash;
    }
}

public class Main2 {
    public static void main(String[] args) {
        Student s=new Student(1,90.0,"aaa");
        System.out.println(s.hashCode()); //-1999996187

        Student s2=new Student(1,90.0,"aaa");
        System.out.println(s2.hashCode());//-1999996187
    }
}

重写equals()方法

检查是否为同一个对象的引用，如果是直接返回 true；
检查是否是同一个类型(判断Class对象是否相同)，如果不是，直接返回 false；
将 Object 对象进行转型
判断每个关键域是否相等

@Override
public boolean equals(Object obj) {
    if(this==obj){
        return true;
    }
    if(obj==null || this.getClass()!=obj.getClass()){
        return false;
    }
    Student another=(Student)obj;
    return this.id==another.id &&
            this.score==another.score &&
            this.name.toLowerCase().equals(another.name.toLowerCase());
}

public class Main2 {
    public static void main(String[] args) {
        Student s=new Student(1,90.0,"aaa");
        System.out.println(s.hashCode()); //-1999996187

        Student s2=new Student(1,90.0,"aaa");
        System.out.println(s2.hashCode());//-1999996187

        //s和s2的地址是不同的，但是内容是相同的
        System.out.println(s==s2); //false
        System.out.println(s.equals(s2)); //true
    }
}

注意：Java中每个对象默认有默认的hashCode实现。hashCode就是该对象的地址。

重写hashCode的情况：

Student s=new Student(1,90.0,"aaa");
System.out.println(s.hashCode()); //-1999996187

Student s2=new Student(1,90.0,"aaa");
System.out.println(s2.hashCode());//-1999996187

未重写hashCode的情况:

Student s=new Student(1,90.0,"aaa");
System.out.println(s.hashCode()); //21685669

Student s2=new Student(1,90.0,"aaa");
System.out.println(s2.hashCode());//2133927002

s和s2的hashCode不同，因为s和s2的地址不同。

哈希冲突的处理 Seperate Chaining

计算哈希值 (hashCode(k1) & 0x7fffffff) % M

采用查找表解决哈希冲突，查找表也可以使用平衡树结构

使用HashMap作为查找表

注意：

1.Java8之前，每个位置对应一个链表

2.Java8之后，当哈希冲突达到一定程度，每个位置从链表转成红黑树

哈希表编程

public class HashTable<K,V> {
    private TreeMap<K,V>[]  hashtable;
    private int M;
    private int size;

    public HashTable(int M){
        this.M=M;
        this.size=0;
        this.hashtable=new TreeMap[M];
        for(int i=0;i<M;i++){
            hashtable[i]=new TreeMap<>();
        }
    }

    public HashTable(){
        this(97);
    }

    private int hash(K key){
        return (key.hashCode() & 0x7fffffff) % M;
    }

    public int getSize(){
        return size;
    }
}

哈希表具体功能实现

public void add(K key,V value){
    TreeMap<K,V> map=hashtable[hash(key)];
    if(map.containsKey(key)){
        map.put(key,value);
    }else{
        map.put(key,value);
        size++;
    }
}

public V remove(K key){
    TreeMap<K,V> map=hashtable[hash(key)];
    V ret=null;
    if(map.containsKey(key)){
        ret=map.remove(key);
        size--;
    }
    return ret;
}

public void set(K key,V value){
    TreeMap<K,V> map=hashtable[hash(key)];
    if(!map.containsKey(key)) {
        throw new IllegalArgumentException(key + "does not exist!");
    }
    map.put(key,value);
}

public boolean contains(K key){
    return hashtable[hash(key)].containsKey(key);
}

public V get(K key){
    return hashtable[hash(key)].get(key);
}

哈希表链表法时间复杂度分析

哈希表的动态空间处理

public class HashTable2<K,V> {
    private static final int upperTol=10;
    private static final int lowerTol=2;
    private static final int initCapacity=7;

    private TreeMap<K,V>[]  hashtable;
    private int M;
    private int size;

    public HashTable2(int M){
        this.M=M;
        this.size=0;
        this.hashtable=new TreeMap[M];
        for(int i=0;i<M;i++){
            hashtable[i]=new TreeMap<>();
        }
    }

    public HashTable2(){
        this(initCapacity);
    }

    private int hash(K key){
        return (key.hashCode() & 0x7fffffff) % M;
    }

    private void resize(int newM){
        TreeMap<K,V>[] newHashtable=new TreeMap[newM];
        for(int i=0;i<newM;i++){
            newHashtable[i]=new TreeMap<>();
        }

        int oldM=M;
        this.M=newM;
        for(int i=0;i<oldM;i++){
            TreeMap<K,V> map=hashtable[i];
            for(K key:map.keySet()){
                newHashtable[hash(key)].put(key,map.get(key));
            }
        }
        hashtable=newHashtable;
    }

    public int getSize(){
        return size;
    }

    public void add(K key,V value){
        TreeMap<K,V> map=hashtable[hash(key)];
        if(map.containsKey(key)){
            map.put(key,value);
        }else{
            map.put(key,value);
            size++;
            if(size >= upperTol*M){
                resize(2*M);
            }
        }
    }

    public V remove(K key){
        TreeMap<K,V> map=hashtable[hash(key)];
        V ret=null;
        if(map.containsKey(key)){
            ret=map.remove(key);
            size--;
            if(size < lowerTol*M && M/2>=initCapacity){
                resize(M/2);
            }
        }
        return ret;
    }

    public void set(K key,V value){
        TreeMap<K,V> map=hashtable[hash(key)];
        if(!map.containsKey(key)) {
            throw new IllegalArgumentException(key + "does not exist!");
        }
        map.put(key,value);
    }

    public boolean contains(K key){
        return hashtable[hash(key)].containsKey(key);
    }

    public V get(K key){
        return hashtable[hash(key)].get(key);
    }
}

时间复杂度分析：

平均时间复杂度：O(1)

实际上每个操作的时间复杂度是：O( log(lowerTol) ) ~ O( log(upperTol) ),而lowerTol和upperTol都是常数。

优化哈希表

public class HashTable3<K,V> {
    //哈希表容量的常量表
    private final int[] capacity = {
            53, 97, 193, 389, 769, 1543, 3079, 6151, 12289, 24593,
            49157, 98317, 196613, 393241, 786433, 1572869, 3145739,
            6291469, 12582917, 25165843, 50331653, 100663319, 201326611,
            402653189, 805306457, 1610612741};
    //int类型的最大素数就是 1610612741

    private static final int upperTol=10;
    private static final int lowerTol=2;
    private int capacityIndex=0;

    private TreeMap<K,V>[]  hashtable;
    private int M;
    private int size;

    public HashTable3(){
        this.M=capacity[capacityIndex];
        this.size=0;
        this.hashtable=new TreeMap[M];
        for(int i=0;i<M;i++){
            hashtable[i]=new TreeMap<>();
        }
    }

    private int hash(K key){
        return (key.hashCode() & 0x7fffffff) % M;
    }

    private void resize(int newM){
        TreeMap<K,V>[] newHashtable=new TreeMap[newM];
        for(int i=0;i<newM;i++){
            newHashtable[i]=new TreeMap<>();
        }

        int oldM=M;
        this.M=newM;
        for(int i=0;i<oldM;i++){
            TreeMap<K,V> map=hashtable[i];
            for(K key:map.keySet()){
                newHashtable[hash(key)].put(key,map.get(key));
            }
        }
        hashtable=newHashtable;
    }

    public int getSize(){
        return size;
    }

    public void add(K key,V value){
        TreeMap<K,V> map=hashtable[hash(key)];
        if(map.containsKey(key)){
            map.put(key,value);
        }else{
            map.put(key,value);
            size++;
            if(size >= upperTol*M && capacityIndex+1<capacity.length){
                capacityIndex++;
                resize(capacity[capacityIndex]);
            }
        }
    }

    public V remove(K key){
        TreeMap<K,V> map=hashtable[hash(key)];
        V ret=null;
        if(map.containsKey(key)){
            ret=map.remove(key);
            size--;
            if(size < lowerTol*M && capacityIndex-1>=0){
                capacityIndex--;
                resize(capacity[capacityIndex]);
            }
        }
        return ret;
    }

    public void set(K key,V value){
        TreeMap<K,V> map=hashtable[hash(key)];
        if(!map.containsKey(key)) {
            throw new IllegalArgumentException(key + "does not exist!");
        }
        map.put(key,value);
    }

    public boolean contains(K key){
        return hashtable[hash(key)].containsKey(key);
    }

    public V get(K key){
        return hashtable[hash(key)].get(key);
    }
}

设计的哈希表中的不足

Java8之前，每一个位置对应一个链表，K不要求具有可比性,所以是无序的。

Java8之后，当哈希冲突得到一定程度时，每一个位置从链表转化成红黑树，这时就要求K具有可比性。

开放地址法解决哈希冲突

1.线性探测法

每次遇到哈希冲突，就+1

2.平方探测法

遇到哈希冲突, 就+1

再次遇到哈希冲突，就+4

再次遇到哈希冲突，就+9

...

3.二次哈希

每次遇到哈希冲突，就 +hash2(key)

其他解决哈希冲突的方法：

再哈希法 Rehashing
Coalesced Hashing(综合了开放地址法和链地址法)

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Expand file tree

哈希表

什么是哈希表

哈希函数的设计

整数

浮点型

字符串

复合类型

Java中hashCode方法

哈希冲突的处理 Seperate Chaining

哈希表编程

哈希表的动态空间处理

优化哈希表

设计的哈希表中的不足

开放地址法解决哈希冲突

Search code, repositories, users, issues, pull requests...

FilesExpand file tree

9_哈希表.md

Latest commit

History

9_哈希表.md

File metadata and controls

哈希表

什么是哈希表

哈希函数的设计

整数

浮点型

字符串

复合类型

Java中hashCode方法

哈希冲突的处理 Seperate Chaining

哈希表编程

哈希表的动态空间处理

优化哈希表

设计的哈希表中的不足

开放地址法解决哈希冲突

Expand file tree