On The .NET Implementation of String.Equals

June 14, 2012

6 comments

Poking around in dotPeek, I came across the implementation of String.Equals(string) in the .NET framework. This is how it looks:

   1: [ReliabilityContract(Consistency.WillNotCorruptState, Cer.MayFail)]

   2:     [SecuritySafeCritical]

   3:     private static unsafe bool EqualsHelper(string strA, string strB)

   4:     {

   5:       int length = strA.Length;

   6:       if (length != strB.Length)

   7:         return false;

   8:       fixed (char* chPtr1 = &strA.m_firstChar)

   9:         fixed (char* chPtr2 = &strB.m_firstChar)

  10:         {

  11:           char* chPtr3 = chPtr1;

  12:           char* chPtr4 = chPtr2;

  13:           while (length >= 10 && (*(int*) chPtr3 == *(int*) chPtr4 && *(int*) (chPtr3 + 2) == *(int*) (chPtr4 + 2)) 

  14:               && (*(int*) (chPtr3 + 4) == *(int*) (chPtr4 + 4) && *(int*) (chPtr3 + 6) == *(int*) (chPtr4 + 6) && *(int*) (chPtr3 + 8) == *(int*) (chPtr4 + 8)))

  15:           {

  16:             chPtr3 += 10;

  17:             chPtr4 += 10;

  18:             length -= 10;

  19:           }

  20:           while (length > 0 && *(int*) chPtr3 == *(int*) chPtr4)

  21:           {

  22:             chPtr3 += 2;

  23:             chPtr4 += 2;

  24:             length -= 2;

  25:           }

  26:           return length <= 0;

  27:         }

  28:     }

As you can tell, this is some funky stuff! As this is a performance critical method (comparing strings is a pretty common task, obviously), the implementers have gone all pointery. There are several things to note here:

  • The fixed keyword at lines 8-9: it tells the GC to leave our strings alone and not move them around while we’re using the pointers to them. Pointers won’t do much good if the managed objects are moved in the heap before the comparison is done.
  • The beautiful while loop at line 13 compares 20 bytes at a time (int = 4 bytes and we’re looking at 5 integers at a time). When we’re at the last 20 bytes, we do 4 bytes at a time. I assume this rather weird method of comparing is faster, perhaps due to less pointer increments/decrements. If one of you hackers would like to explain, that would be awesome.
  • One might wonder, since a char = 2 bytes, and we’re looking at 4 bytes at a time at a minimum, how come this method works on strings with an odd length? Well, our current guess is that since the string is null terminated in memory, we either compare the null char (for odd strings) or not (for even strings). Either way – it shouldn’t matter to the result, we can always assume an even amount of characters. This assumption would of course not work for strings in different lengths, but lines 5-7 take care of that for us. (This my colleague Yuval Pinter’s insight. Thanks!)

Like I said, funky stuff.

Add comment
facebook linkedin twitter email

Leave a Reply

Your email address will not be published.

*

You may use these HTML tags and attributes: <a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code> <del datetime=""> <em> <i> <q cite=""> <strike> <strong>

6 comments

  1. vvvladJune 15, 2012 ב 9:48

    Hi,
    I can’t see line 13 last part because it is cut by the layout of the blog.
    Could you please adjust it?

    Thanks

    Reply
  2. doronyJune 17, 2012 ב 10:20

    Fixed, thanks

    Reply
  3. Sasha GoldshteinJune 26, 2012 ב 10:30

    I should note that this is a REALLY slow implementation. CRT’s wcscmp beats it by an order of magnitude IIRC.

    Reply
  4. Paul MannSeptember 2, 2012 ב 6:07

    Thank you for the excellent posts!

    Reply
  5. Tommy SimpsonSeptember 4, 2012 ב 6:33

    Hi. The content is quite Good

    Reply
  6. seoOctober 22, 2012 ב 18:14

    Anyone did not remember to add Playlist. com, everywhere it certainly is not also important for that you register and steady flow any kind of music you desire.

    Reply