.NET 集合类型的发展

  • Jonathan Allen
  • 朱永光

2008 年 7 月 4 日

话题:.NET开源DevOps语言 & 开发架构

.NET Framework 中的集合类型在这几年经历了显著的发展。得益于微软新的开放策略,让我们可以展现一个常用数据结构的两个版本:在.NET 和 Mono 中的哈希表。

理论上,哈希表是一个非常简单的构造,就是数组或链表的集合被划分到有限数量的存储体中。然而,即使你擅长于多线程逻辑,在获取该数据结构的元素项时,仍然有些复杂。

//   Copyright (c) Microsoft Corporation.  All rights reserved.

public virtual Object this[Object key] {

         get {

             if (key == null) {

                 throw new ArgumentNullException("key", Environment.GetResourceString("ArgumentNull_Key"));

             }

             uint seed;

             uint incr;

             // Take a snapshot of buckets, in case another thread does a resize

             bucket[] lbuckets = buckets;

             uint hashcode = InitHash(key, lbuckets.Length, out seed, out incr);

             int  ntry = 0;

             bucket b;

             int bucketNumber = (int) (seed % (uint)lbuckets.Length);

             do

             {

                 int currentversion;

                 //     A read operation on hashtable has three steps:

                 //        (1) calculate the hash and find the slot number.

                 //        (2) compare the hashcode, if equal, go to step 3. Otherwise end.

                 //        (3) compare the key, if equal, go to step 4. Otherwise end.

                 //        (4) return the value contained in the bucket.

                 //     After step 3 and before step 4. A writer can kick in a remove the old item and add a new one

                 //     in the same bukcet. So in the reader we need to check if the hash table is modified during above steps.

                 //

                 // Writers (Insert, Remove, Clear) will set 'isWriterInProgress' flag before it starts modifying

                 // the hashtable and will ckear the flag when it is done.  When the flag is cleared, the 'version'

                 // will be increased.  We will repeat the reading if a writer is in progress or done with the modification

                 // during the read.

                 //

                 // Our memory model guarantee if we pick up the change in bucket from another processor,

                 // we will see the 'isWriterProgress' flag to be true or 'version' is changed in the reader.

                 //

                 int spinCount = 0;

                 do {

                     // this is violate read, following memory accesses can not be moved ahead of it.

                     currentversion = version;

                     b = lbuckets[bucketNumber];

                     // The contention between reader and writer shouldn't happen frequently.

                     // But just in case this will burn CPU, yield the control of CPU if we spinned a few times.

                     // 8 is just a random number I pick.

                     if( (++spinCount) % 8 == 0 ) {

                         Thread.Sleep(1);   // 1 means we are yeilding control to all threads, including low-priority ones.

                     }

                 } while ( isWriterInProgress || (currentversion != version) );

                 if (b.key == null) {

                     return null;

                 }

                 if (((b.hash_coll & 0x7FFFFFFF) == hashcode) &&

                     KeyEquals (b.key, key))

                     return b.val;

                 bucketNumber = (int) (((long)bucketNumber + incr)% (uint)lbuckets.Length);

             } while (b.hash_coll < 0 && ++ntry < lbuckets.Length);

             return null;

         }

是的,各位读者请注意,这里存在一个伪自循环锁(pseudo-spin lock),并调用了 Threading.Sleep。

而在.NET 2.0 和泛型集合中,微软决定放弃集合的内部锁定机制。相反,要求任何锁定都必须在外部调用。我们在 Generic.Dictionary 类中可以发现更为简洁的代码。

//   Copyright (c) Microsoft Corporation.  All rights reserved.

private int FindEntry(TKey key) {

    if( key == null) {

        ThrowHelper.ThrowArgumentNullException(ExceptionArgument.key);

    }

    if (buckets != null) {

        int hashCode = comparer.GetHashCode(key) & 0x7FFFFFFF;

        for (int i = buckets[hashCode % buckets.Length]; i >= 0; i = entries[i].next) {

            if (entries[i].hashCode == hashCode && comparer.Equals(entries[i].key, key)) return i;

        }

    }

    return -1;

}

Mono 版本的 Hashtable 和 Dictionary 在实现上仍然是大相径庭,而且也不同于微软的实现。

// Copyright (C) 2004-2005 Novell, Inc (http://www.novell.com)

public virtual Object this[Object key]

{

     get

     {

         if (key == null)

             throw new ArgumentNullException("key", "null key");

         Slot[] table = this.table;

         int[] hashes = this.hashes;

         uint size = (uint)table.Length;

         int h = this.GetHash(key) & Int32.MaxValue;

         uint indx = (uint)h;

         uint step = (uint)((h >> 5) + 1) % (size - 1) + 1;

         for (uint i = size; i > 0; i--)

         {

             indx %= size;

             Slot entry = table[indx];

             int hashMix = hashes[indx];

             Object k = entry.key;

             if (k == null)

                 break;

             if (k == key || ((hashMix & Int32.MaxValue) == h

                 && this.KeyEquals(key, k)))

             {

                 return entry.value;

             }

             if ((hashMix & CHAIN_MARKER) == 0)

                 break;

             indx += step;

         }

         return null;

     }

而 Mono 的 dictionary 实现则为:

// Copyright (C) 2004 Novell, Inc (http://www.novell.com)

// Copyright (C) 2005 David Waite

// Copyright (C) 2007 HotFeet GmbH (
http://www.hotfeet.ch)

public TValue this [TKey key] {

    get {

        if (key == null)

            throw new ArgumentNullException ("key");

        // get first item of linked list corresponding to given key

        int hashCode = hcp.GetHashCode (key);

        int cur = table [(hashCode & int.MaxValue) % table.Length] - 1;

        // walk linked list until right slot is found or end is reached

        while (cur != NO_SLOT) {

            // The ordering is important for compatibility with MS and strange

            // Object.Equals () implementations

            if (linkSlots [cur].HashCode == hashCode && hcp.Equals (keySlots [cur], key))

                return valueSlots [cur];

            cur = linkSlots [cur].Next;

        }

        throw new KeyNotFoundException ();

    }

好了,现在你可以看到,四种方法本质上实现了同样的功能。毋庸置疑,使用 Sleep 命令的那种是最差的,至于哪一个最好,我想还得由你来决定。

查看英文原文:On the Evolution of the .NET Collections
.NET开源DevOps语言 & 开发架构