DCSIMG
Google+ October 2011 - Posts - Doron's .NET Space

October 2011 - Posts

I guess this is something that is obvious to many, but it really had me stumped yesterday. I was looking at code that was joining two HashSets, that both had the same item. Not the same reference, but they had the same hash code, and Equals between the items returned true. And yet, Set1.UnionWith(Set2) changed Set1 to have two items. That seemed insane. Isn’t it in the contract of HashSet to not contain the same item twice? How could it be? It took a while, but I figured out the issue. Let’s try to recreate a situation where a set contains two identical items.

First, let’s define a Person class.

   1: public class Person : IEquatable<Person>
   2: {
   3:     public Person(string firstName, string lastName)
   4:     {
   5:         FirstName = firstName;
   6:         LastName = lastName;
   7:     }
   8:  
   9:     public string FirstName { get; set; }
  10:     public string LastName { get; set; }
  11:  
  12:     public bool Equals(Person other)
  13:     {
  14:         if (ReferenceEquals(null, other)) return false;
  15:         if (ReferenceEquals(this, other)) return true;
  16:         return Equals(other.FirstName, FirstName) && Equals(other.LastName, LastName);
  17:     }
  18:  
  19:     public override bool Equals(object obj)
  20:     {
  21:         if (ReferenceEquals(null, obj)) return false;
  22:         if (ReferenceEquals(this, obj)) return true;
  23:         if (obj.GetType() != typeof (Person)) return false;
  24:         return Equals((Person) obj);
  25:     }
  26:  
  27:     public override int GetHashCode()
  28:     {
  29:         unchecked
  30:         {
  31:             return ((FirstName != null ? FirstName.GetHashCode() : 0)*397) ^ (LastName != null ? LastName.GetHashCode() : 0);
  32:         }
  33:     }
  34: }

Note how the hash code computation takes both properties into account (this is Resharper’s auto generated code).

And now, let’s pass a test:

   1: [Test]
   2: public void MakeASetWithDuplicateValues()
   3: {
   4:     //The objects start out different
   5:     var p1 = new Person("Doron", "Yaacoby");
   6:     var p2 = new Person("Doron", "Y");
   7:  
   8:     //we put them in sets
   9:     var set1 = new HashSet<Person> { p1};            
  10:     var set2 = new HashSet<Person> { p2 };
  11:  
  12:     //we now make them equal
  13:     p2.LastName = "Yaacoby";
  14:     Assert.That(p1, Is.EqualTo(p2));
  15:     
  16:     //union create a set with duplicate values
  17:     set2.UnionWith(set1);
  18:     Assert.That(set2.Count, Is.EqualTo(2));
  19: }

So what happens here? We change set2’s item, but the set doesn’t know about it. It already computed p2’s hash code and stored it accordingly. It doesn’t know that its hash code has changed later. So when it comes to add p1 to the set, it thinks p1 is a brand new item with a brand new hash code, and lets it in.

Conclusion: do not change objects that are in sets, or even better – don’t create sets of mutable objects.

Posted by dorony | 3 comment(s)
תגים:

Lately I’ve been refactoring some really old code, and it helped me realize that in about 90% of the cases, inheritance from a class (unlike interface implementation) is a Bad Thing. Of course, I’m hardly the first to think that, but it’s not until I had to refactor deep, and absolutely wrong, object graphs until I felt it in my bones.

But why? Isn’t inheritance a legitimate way to reuse code?

Well, no, for several reasons:

  1. It makes the code harder to understand. To understand the flow of your class, you have to go all the way up to the highest base class, and follow the protected method route, which are implemented in various child classes. It’s a torture.
  2. It makes the code harder to test. Are your tests really testing the class itself, or its base class. If you want to test the class itself, and not base class, how do you mock it out?
  3. It makes the code harder to maintain. Adding new functionality can be complicated, as you decide to add it to the base class, or its child classes.

In fact, I suggest that whenever you’re writing a “BaseThingy” class, you should rethink your design, and prefer interface implementation and composition instead. Composition solves all three issues, and in most cases, is a much better choice. In fact, "Composition over Inheritance” is such a well accepted notion, that it even has its own Wikipedia article.

Posted by dorony | with no comments
תגים: