Jdk1.6 Collections Framework源码解析(6)-IdentityHashMap

BrokenDreams

浏览: 249063 次
性别:
来自: 北京

最近访客更多访客>>

紫澜风暴SimonLee

opnemen

chen88358323

walson_z

博主相关

博客

微博

相册

留言

关于我

博客专栏

: Java并发包源码解析
浏览量：97973

文章分类

社区版块

存档分类

博客分类：

Java集合

Java 集合源码

        这篇总结一下java.util.IdentityHashMap。从类名上可以猜到，这个类本质应该还是一个散列表，只是前面有Identity修饰，是一种特殊的HashMap。
        简单的说，IdentityHashMap和HashMap的区别在于对key的比较。
        HashMap中会调用key的hashCode方法，hashCode方法可能会根据具体情况进行重写。在比较key时会用equals方法进行比较，equals方法也可能被重写。

    public V put(K key, V value) {
        if (key == null)
            return putForNullKey(value);
                       //调用key的hashCode方法
        int hash = hash(key.hashCode());
        int i = indexFor(hash, table.length);
        for (Entry<K,V> e = table[i]; e != null; e = e.next) {
            Object k;
                                   //比较:k1==k2 或者 k1.equals(k2)
            if (e.hash == hash && ((k = e.key) == key || key.equals(k))) {
                V oldValue = e.value;
                e.value = value;
                e.recordAccess(this);
                return oldValue;
            }
        }

        modCount++;
        addEntry(hash, key, value, i);
        return null;
    }

IdentityHashMap中会调用System.identityHashCode(x)来获得对象的hashCode(也就是对象的hashCode方法没有被覆盖情况下的返回值)，仅用“==”来进行后面key的比较。

    public V put(K key, V value) {
        Object k = maskNull(key);
        Object[] tab = table;
        int len = tab.length;
                //在hash方法中会调用System.identityHashCode(x)
        int i = hash(k, len);

        Object item;
        while ( (item = tab[i]) != null) {
            //只用==进行比较
            if (item == k) {
		V oldValue = (V) tab[i + 1];
                tab[i + 1] = value;
                return oldValue;
            }
            i = nextKeyIndex(i, len);
        }

        modCount++;
        tab[i] = k;
        tab[i + 1] = value;
        if (++size >= threshold)
            resize(len); // len == 2 * current capacity.
        return null;
    }

    private static int hash(Object x, int length) {
        int h = System.identityHashCode(x);
        // Multiply by -127, and left-shift to use least bit as part of hash
        return ((h << 1) - (h << 8)) & (length - 1);
    }

看个具体的例子：

	public static void main(String[] args) {
		String k1 = new String("a");
		String v1 = new String("A");
		String k2 = new String("a");
		String v2 = new String("A");
		HashMap<String, String> hashMap = new HashMap<String, String>();
		hashMap.put(k1, v1);
		hashMap.put(k2, v2);
		System.out.println("hashMap:"+hashMap);
		
		IdentityHashMap<String, String> identityHashMap = new IdentityHashMap<String, String>();
		identityHashMap.put(k1, v1);
		identityHashMap.put(k2, v2);
		System.out.println("identityHashMap:"+identityHashMap);
	}

运行结果：

hashMap:{a=A}
identityHashMap:{a=A, a=A}

IdentityHashMap和HashMap内部都实现了散列表，但有区别，体现在对散列冲突的处理上。HashMap中以分离链表的方式来解决散列冲突，也就是将散列在同一个桶内的数据组织成一个链表结构；IdentityHashMap中则以开放寻址的方式来解决散列冲突，当发生散列冲突时，数据被放入下一个空闲地址(也叫线性探测法)。
所以IdentityHashMap和HashMap在实现细节上区别很大，来看一下吧。

public class IdentityHashMap<K,V>
    extends AbstractMap<K,V>
    implements Map<K,V>, java.io.Serializable, Cloneable
{
    /**
     * The initial capacity used by the no-args constructor.
     * MUST be a power of two.  The value 32 corresponds to the
     * (specified) expected maximum size of 21, given a load factor
     * of 2/3.
     */
    private static final int DEFAULT_CAPACITY = 32;

    /**
     * The minimum capacity, used if a lower value is implicitly specified
     * by either of the constructors with arguments.  The value 4 corresponds
     * to an expected maximum size of 2, given a load factor of 2/3.
     * MUST be a power of two.
     */
    private static final int MINIMUM_CAPACITY = 4;

    /**
     * The maximum capacity, used if a higher value is implicitly specified
     * by either of the constructors with arguments.
     * MUST be a power of two <= 1<<29.
     */
    private static final int MAXIMUM_CAPACITY = 1 << 29;

    /**
     * The table, resized as necessary. Length MUST always be a power of two.
     */
    private transient Object[] table;

    /**
     * The number of key-value mappings contained in this identity hash map.
     *
     * @serial
     */
    private int size;

    /**
     * The number of modifications, to support fast-fail iterators
     */
    private transient volatile int modCount;

    /**
     * The next size value at which to resize (capacity * load factor).
     */
    private transient int threshold;

    /**
     * Value representing null keys inside tables.
     */
    private static final Object NULL_KEY = new Object();

几个地方要注意下，MAXIMUM_CAPACITY是2的29次方，还记得HashMap的是2的30次方，为啥会差2呗呢？这个问题先记住。还有数组的类型是Object类型，由于数组的每个位置只放一个数据(目前看起来是这样)，所以下面出现NULL_KEY，表示一个为null的键，为了区别于null。那value放在哪呢？继续往下看。

    /**
     * Constructs a new, empty identity hash map with a default expected
     * maximum size (21).
     */
    public IdentityHashMap() {
        init(DEFAULT_CAPACITY);
    }

    /**
     * Constructs a new, empty map with the specified expected maximum size.
     * Putting more than the expected number of key-value mappings into
     * the map may cause the internal data structure to grow, which may be
     * somewhat time-consuming.
     *
     * @param expectedMaxSize the expected maximum size of the map
     * @throws IllegalArgumentException if <tt>expectedMaxSize</tt> is negative
     */
    public IdentityHashMap(int expectedMaxSize) {
        if (expectedMaxSize < 0)
            throw new IllegalArgumentException("expectedMaxSize is negative: "
                                               + expectedMaxSize);
        init(capacity(expectedMaxSize));
    }

    /**
     * Returns the appropriate capacity for the specified expected maximum
     * size.  Returns the smallest power of two between MINIMUM_CAPACITY
     * and MAXIMUM_CAPACITY, inclusive, that is greater than
     * (3 * expectedMaxSize)/2, if such a number exists.  Otherwise
     * returns MAXIMUM_CAPACITY.  If (3 * expectedMaxSize)/2 is negative, it
     * is assumed that overflow has occurred, and MAXIMUM_CAPACITY is returned.
     */
    private int capacity(int expectedMaxSize) {
        // Compute min capacity for expectedMaxSize given a load factor of 2/3
        int minCapacity = (3 * expectedMaxSize)/2;

        // Compute the appropriate capacity
        int result;
        if (minCapacity > MAXIMUM_CAPACITY || minCapacity < 0) {
            result = MAXIMUM_CAPACITY;
        } else {
            result = MINIMUM_CAPACITY;
            while (result < minCapacity)
                result <<= 1;
        }
        return result;
    }

    /**
     * Initializes object to be an empty map with the specified initial
     * capacity, which is assumed to be a power of two between
     * MINIMUM_CAPACITY and MAXIMUM_CAPACITY inclusive.
     */
    private void init(int initCapacity) {
        // assert (initCapacity & -initCapacity) == initCapacity; // power of 2
        // assert initCapacity >= MINIMUM_CAPACITY;
        // assert initCapacity <= MAXIMUM_CAPACITY;

        threshold = (initCapacity * 2)/3;
        table = new Object[2 * initCapacity];
    }

    /**
     * Constructs a new identity hash map containing the keys-value mappings
     * in the specified map.
     *
     * @param m the map whose mappings are to be placed into this map
     * @throws NullPointerException if the specified map is null
     */
    public IdentityHashMap(Map<? extends K, ? extends V> m) {
        // Allow for a bit of growth
        this((int) ((1 + m.size()) * 1.1));
        putAll(m);
    }

还记得HashMap中有个加载因子，这里没有这个属性，在代码中固定为2/3了。要注意init方法中的table = new Object[2 * initCapacity]，这里怎么变成了容量的2倍了？继续看看添加方法吧。

    /**
     * Associates the specified value with the specified key in this identity
     * hash map.  If the map previously contained a mapping for the key, the
     * old value is replaced.
     *
     * @param key the key with which the specified value is to be associated
     * @param value the value to be associated with the specified key
     * @return the previous value associated with <tt>key</tt>, or
     *         <tt>null</tt> if there was no mapping for <tt>key</tt>.
     *         (A <tt>null</tt> return can also indicate that the map
     *         previously associated <tt>null</tt> with <tt>key</tt>.)
     * @see     Object#equals(Object)
     * @see     #get(Object)
     * @see     #containsKey(Object)
     */
    public V put(K key, V value) {
        Object k = maskNull(key);
        Object[] tab = table;
        int len = tab.length;
        int i = hash(k, len);

        Object item;
        while ( (item = tab[i]) != null) {
            if (item == k) {
		V oldValue = (V) tab[i + 1];
                tab[i + 1] = value;
                return oldValue;
            }
            i = nextKeyIndex(i, len);
        }

        modCount++;
        tab[i] = k;
        tab[i + 1] = value;
        if (++size >= threshold)
            resize(len); // len == 2 * current capacity.
        return null;
    }

首先是对"Null"key的处理，将其转化成前面的NULL_KEY对象，以便计算hash值，同时能和真正为null的位置区分。
接下来通过一个hash函数计算hash值：

    /**
     * Returns index for Object x.
     */
    private static int hash(Object x, int length) {
        int h = System.identityHashCode(x);
        // Multiply by -127, and left-shift to use least bit as part of hash
        return ((h << 1) - (h << 8)) & (length - 1);
    }

这个hash函数中，语句((h<<1) - (h<<8)) & (length-1)，后半部分之前分析过，就是取余。前面的(h<<1) - (h<<8)会保留h的后7位和一个0作为后8位，前面的由于减操作变的不确定了。这么做的意图还不是很清晰(路过高手指点下)，但这样做会产生一个结果就是整合hash函数得到的结果都是偶数。为什么是偶数？继续往下看。
接下来会用得到的hash值作为内部数据的下标来获取数组中对应的数据，如果对应位置没有数据的话(没有发生冲突)，会把Key放到这个位置，然后把Value放到下一个位置(hash + 1)。也就是说，所有的Key都保存在内部数组的偶数下标位置，所有的Value都保存在所有的奇数下标位置。现在明白了为什么初始化时内部容量变成了2倍，为什么hash函数的结果是偶数了吧。所以当产生冲突的时候，会继续寻找下一个可用位置：

    /**
     * Circularly traverses table of size len.
     */
    private static int nextKeyIndex(int i, int len) {
        return (i + 2 < len ? i + 2 : 0);
    }

有了put方法的分析，get方法也就显而易见了。

    /**
     * Returns the value to which the specified key is mapped,
     * or {@code null} if this map contains no mapping for the key.
     *
     * <p>More formally, if this map contains a mapping from a key
     * {@code k} to a value {@code v} such that {@code (key == k)},
     * then this method returns {@code v}; otherwise it returns
     * {@code null}.  (There can be at most one such mapping.)
     *
     * <p>A return value of {@code null} does not <i>necessarily</i>
     * indicate that the map contains no mapping for the key; it's also
     * possible that the map explicitly maps the key to {@code null}.
     * The {@link #containsKey containsKey} operation may be used to
     * distinguish these two cases.
     *
     * @see #put(Object, Object)
     */
    public V get(Object key) {
        Object k = maskNull(key);
	Object[] tab = table;
        int len = tab.length;
        int i = hash(k, len);
        while (true) {
	    Object item = tab[i];
            if (item == k)
                return (V) tab[i + 1];
            if (item == null)
                return null;
            i = nextKeyIndex(i, len);
        }
    }

添加和获取的方法实现起来比较容易，但删除一个数据就蛋疼了。需要考虑冲突的问题，如果删除的位置之前发生过n次冲突，那么删除动作不仅要删除当前位置的数据，还要把之前发生过冲突的n个元素的位置往前调整(不严格的说)，否则线性探测链就会断掉，会影响其他操作(如get)的正确性。来看一下具体实现：

    /**
     * Removes the mapping for this key from this map if present.
     *
     * @param key key whose mapping is to be removed from the map
     * @return the previous value associated with <tt>key</tt>, or
     *         <tt>null</tt> if there was no mapping for <tt>key</tt>.
     *         (A <tt>null</tt> return can also indicate that the map
     *         previously associated <tt>null</tt> with <tt>key</tt>.)
     */
    public V remove(Object key) {
        Object k = maskNull(key);
        Object[] tab = table;
        int len = tab.length;
        int i = hash(k, len);

        while (true) {
            Object item = tab[i];
            if (item == k) {
                modCount++;
                size--;
                V oldValue = (V) tab[i + 1];
                tab[i + 1] = null;
                tab[i] = null;
                closeDeletion(i);
                return oldValue;
            }
            if (item == null)
                return null;
            i = nextKeyIndex(i, len);
        }

    }

可以看到，前面的代码和put类似，只是删除东西完成之后还要调用一个closeDeletion方法。

    /**
     * Rehash all possibly-colliding entries following a
     * deletion. This preserves the linear-probe
     * collision properties required by get, put, etc.
     *
     * @param d the index of a newly empty deleted slot
     */
    private void closeDeletion(int d) {
        // Adapted from Knuth Section 6.4 Algorithm R
        Object[] tab = table;
        int len = tab.length;

        // Look for items to swap into newly vacated slot
        // starting at index immediately following deletion,
        // and continuing until a null slot is seen, indicating
        // the end of a run of possibly-colliding keys.
        Object item;
        for (int i = nextKeyIndex(d, len); (item = tab[i]) != null;
             i = nextKeyIndex(i, len) ) {
            // The following test triggers if the item at slot i (which
            // hashes to be at slot r) should take the spot vacated by d.
            // If so, we swap it in, and then continue with d now at the
            // newly vacated i.  This process will terminate when we hit
            // the null slot at the end of this run.
            // The test is messy because we are using a circular table.
            int r = hash(item, len);
            if ((i < r && (r <= d || d <= i)) || (r <= d && d <= i)) {
                tab[d] = item;
                tab[d + 1] = tab[i + 1];
                tab[i] = null;
                tab[i + 1] = null;
                d = i;
            }
        }
    }

大概过程是从删除的位置开始往下找(通过nextKeyIndex方法)，当存在下一个元素的时候，通过重新计算元素hash值的方式，判断下一个元素放到本容器中时是否产生过冲突。如果产生过冲突，那么将该元素放到当前位置(d、d+1)，将该元素之前位置(i、i+1)清空。然后把i赋给d，继续往下找。知道找到null位置为止。
有了上面的分析，其他代码很容易看懂了。java.util.IdentityHashMap总结到这里。

分享到：

Jdk1.6 Collections Framework源码解析(7)- ... | Jdk1.6 Collections Framework源码解析(5)- ...

2013-08-27 14:10
浏览 1553
评论(0)
分类:编程语言
查看更多

发表评论

您还没有登录,请您登录后再发表评论