Sets of Mutable Objects is a Bad Idea

October 27, 2011


I guess this is something that is obvious to many, but it really had me stumped yesterday. I was looking at code that was joining two HashSets, that both had the same item. Not the same reference, but they had the same hash code, and Equals between the items returned true. And yet, Set1.UnionWith(Set2) changed Set1 to have two items. That seemed insane. Isn’t it in the contract of HashSet to not contain the same item twice? How could it be? It took a while, but I figured out the issue. Let’s try to recreate a situation where a set contains two identical items.

First, let’s define a Person class.

   1: public class Person : IEquatable<Person>

   2: {

   3:     public Person(string firstName, string lastName)

   4:     {

   5:         FirstName = firstName;

   6:         LastName = lastName;

   7:     }


   9:     public string FirstName { get; set; }

  10:     public string LastName { get; set; }


  12:     public bool Equals(Person other)

  13:     {

  14:         if (ReferenceEquals(null, other)) return false;

  15:         if (ReferenceEquals(this, other)) return true;

  16:         return Equals(other.FirstName, FirstName) && Equals(other.LastName, LastName);

  17:     }


  19:     public override bool Equals(object obj)

  20:     {

  21:         if (ReferenceEquals(null, obj)) return false;

  22:         if (ReferenceEquals(this, obj)) return true;

  23:         if (obj.GetType() != typeof (Person)) return false;

  24:         return Equals((Person) obj);

  25:     }


  27:     public override int GetHashCode()

  28:     {

  29:         unchecked

  30:         {

  31:             return ((FirstName != null ? FirstName.GetHashCode() : 0)*397) ^ (LastName != null ? LastName.GetHashCode() : 0);

  32:         }

  33:     }

  34: }

Note how the hash code computation takes both properties into account (this is Resharper’s auto generated code).

And now, let’s pass a test:

   1: [Test]

   2: public void MakeASetWithDuplicateValues()

   3: {

   4:     //The objects start out different

   5:     var p1 = new Person("Doron", "Yaacoby");

   6:     var p2 = new Person("Doron", "Y");


   8:     //we put them in sets

   9:     var set1 = new HashSet<Person> { p1};            

  10:     var set2 = new HashSet<Person> { p2 };


  12:     //we now make them equal

  13:     p2.LastName = "Yaacoby";

  14:     Assert.That(p1, Is.EqualTo(p2));


  16:     //union create a set with duplicate values

  17:     set2.UnionWith(set1);

  18:     Assert.That(set2.Count, Is.EqualTo(2));

  19: }

So what happens here? We change set2’s item, but the set doesn’t know about it. It already computed p2’s hash code and stored it accordingly. It doesn’t know that its hash code has changed later. So when it comes to add p1 to the set, it thinks p1 is a brand new item with a brand new hash code, and lets it in.

Conclusion: do not change objects that are in sets, or even better – don’t create sets of mutable objects.

Add comment
facebook linkedin twitter email

Leave a Reply

Your email address will not be published.

You may use these HTML tags and attributes: <a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code> <del datetime=""> <em> <i> <q cite=""> <s> <strike> <strong>



  1. GillyNovember 2, 2011 ב 23:37

    You should look into the Iesi.Collections library. It has an ImutableSet collection and a HashSet collection that might be useful for you.

    I don’t have a lot of experience with them, but i’ve seen their use in the NHibernate source code and know it comes with unit tests…

    – Gilly.

  2. KodeySeptember 4, 2012 ב 6:06

    Based on the two tables spefciied in the join clause, all data is returned from the right table. On the left table, the matching data is returned in addition to NULL values where a record exists in the right table but not in the left table.inspiroHost recently posted..