The Enumerable<T>.Distinct method is a very useful helper; it was definitely requited to complete the LINQ to SQL offering (to support TSQL’s DISTINCT), and the corresponding support in LINQ to objects is very nice indeed. However, since DISTINCT runs on primitives in the DB, and objects may be a bit more complex than a simple byte-to-byte comparison – the plot might thicken.
This test passes easily.
[Test]
public void DistinctOnPrimitives()
{
var arr = new[] {1, 1, 2};
Assert.AreEqual(2, arr.Distinct().Count());
}
But this one fails:
public class A { public int B { get; set; } }
[Test]
public void NaiveDistinctOnObjects()
{
var arr = new[] { new A { B = 1 }, new A { B = 1 }, new A { B = 2 } };
Assert.AreEqual(2, arr.Distinct().Count());
}
We could override the Equals and GetHashCode to make the test pass.
Btw, have ReSharper do it for you. Click “Generate” on the type and select “Equality Members”.


And BAM:
public class A
{
public int B { get; set; }
public bool Equals(A other)
{
if (ReferenceEquals(null, other)) return false;
if (ReferenceEquals(this, other)) return true;
return other.B == B;
}
public override bool Equals(object obj)
{
if (ReferenceEquals(null, obj)) return false;
if (ReferenceEquals(this, obj)) return true;
if (obj.GetType() != typeof (A)) return false;
return Equals((A) obj);
}
public override int GetHashCode()
{
return B;
}
}
And that’s one way to make the previous test pass. But what if we don’t want to implement equality? What if the object should normally be compared by ref, and not by value? What if the equality grammar for this object is not what’s required for this specific Distinction? What if we cannot change the class?
A lovely override for Distinct accepts an IEqualityComparer<T> parameter. That’s a whole new type. So if we need to send an instance of a comparer, we need to both implement the comparison and instantiate it.
(Note that I’ve omitted a lot of required checks (nulls) for brevity)
public class AEqualityComparer : IEqualityComparer<Tests.A>
{
public bool Equals(Tests.A x, Tests.A y)
{
return x.B == y.B;
}
public int GetHashCode(Tests.A obj)
{
return obj.B;
}
}
And now we can have this passing test:
[Test]
public void EqualityComparerDistinctOnObjects()
{
var arr = new[] { new A { B = 1 }, new A { B = 1 }, new A { B = 2 } };
Assert.AreEqual(2, arr.Distinct(new AEqualityComparer()).Count());
}
But I have a distinct feeling (pun intended. Send your complaints here) that real production logic would call this Distinct(new AEqualityComparer()) a lot, and the instantiation is totally useless there.
<SingletonsAhead>
Fortunately, there’s a pattern to help us with that issue – we can have a single instance of the comparer to be used every call. Since our comparer object is stateless, and is basically just an encapsulation of unchanging comparison logic – I have no problems actually using this pattern here.
I’m using a cool little Singleton<T> helper class which looks like this:
public abstract class Singleton<T>
{
public static T Instance { get { return Nested.InnerInstance; } }
private static class Nested
{
public static T InnerInstance { get; private set; }
static Nested()
{
InnerInstance = (T)Activator.CreateInstance(typeof (T), true);
}
}
}
(It’s lockless and lazy, so it’s a good enough implementation).
Now our comparer class looks like this:
public class AEqualityComparer : Singleton<AEqualityComparer>, IEqualityComparer<Tests.A>
{
protected AEqualityComparer() {}
public bool Equals(Tests.A x, Tests.A y)
{
return x.B == y.B;
}
public int GetHashCode(Tests.A obj)
{
return obj.B;
}
}
And the passing tests looks like this:
[Test]
public void SingletonEqualityComparerDistinctOnObjects()
{
var arr = new[] { new A { B = 1 }, new A { B = 1 }, new A { B = 2 } };
Assert.AreEqual(2, arr.Distinct(AEqualityComparer.Instance).Count());
}
Which is, in my mind, HORRIBLE.
The actual comparison is twice removed from where it’s executed, and I ha to create a new class for this pointless exercise.
</SingletonsAhead>
A much much cooler method to do this would be like so:
[Test]
public void DistinctByPropertyOnObjects()
{
var arr = new[] { new A { B = 1 }, new A { B = 1 }, new A { B = 2 } };
Assert.AreEqual(2, arr.DistinctByX(x => x.B).Count());
}
Or, if we have more than one property to compare – like this:
[Test]
public void DistinctByPropertyOnMoreComplexObjects()
{
var arr = new[] { new A { B = 1 }, new A { B = 1 }, new A { B = 2 } };
Assert.AreEqual(2, arr.DistinctByX(x => x.B, x => x.C).Count());
}
So how do we go about implementing that? Long story short – here’s the code for the extension method:
public static class EnumerableOfTExtenstions
{
public static IEnumerable<T> DistinctByX<T>(this IEnumerable<T> source,
params Func<T, object>[] membersToCompare)
{
return source.Distinct(AdHocComparer<T>.For(membersToCompare));
}
private class AdHocComparer<T> : IEqualityComparer<T>
{
private readonly Func<T, object>[] _MembersToCompare;
private AdHocComparer(Func<T, object>[] membersToCompare)
{
_MembersToCompare = membersToCompare;
}
public static IEqualityComparer<T> For(Func<T, object>[] membersToCompare)
{
return new AdHocComparer<T>(membersToCompare);
}
public bool Equals(T x, T y)
{
return _MembersToCompare.All(c => c(x).Equals(c(y)));
}
public int GetHashCode(T obj)
{
return _MembersToCompare.Aggregate(0, (agg, c) => agg ^ c(obj).GetHashCode());
}
}
}
Now, it’s not perfect by any means: we’re still instantiating objects per comparison (need to cache those somehow, and use an efficient key for that cache), we have an additional overhead from the delegate access to the properties, we’re still not checking for nulls and all kinds of funky cases (need to write some more code), but by golly, that’s EXPRESSIVE, ain’t it?
So after having my fun with the extension method – I’ve reverted my code to use the singleton-based comparer. It felt like the grownup thing to do. Sorry for the anti-climax.
Yesterday at the Alt.Net tools night I gave a short talk about how to get started with NHibernate. “Getting started” means we’re talking about Greenfield project, heavily favoring Conventions (and automappings) over Configuration, and not having to mess with existing, untouchable codebase or DB schema.
The weird part was sitting there, after having used NH, a 5 year-old project, for about a week, and across the table sat Ayende, one of the main contributors to NH (and author of NHProf). Most of the time he wasn’t throwing heavy items at me, so I called that session a success.
The goal of the demo was to make this integration test pass:
[Test]
public void SaveAndLoadEntity_AssertSavedValueIsSame()
{
// Arrange
var instance = new Saver();
var entity = new MyEntity();
entity.Val = 34;
instance.Save(entity);
// Act
var retVal = instance.Load();
// Assert
Assert.AreEqual(34, retVal.Val);
}
(The test was created in exactly three seconds using QuickUnit. Try it out.)
I won’t re-run all the “run test, fix error” iterations here, just review the concepts and show the outcome.
DB config Issues:
- We want the test to pass as easily as possible, have no side effects, and require minimal setup costs – so we’ve used SQLite in-memory DB, using System.Data.Sqlite (free). The single DLL is the ADO.NET adapter AND the actual DB.
- Standard configuration of in-memory SQLite kills the DB after every session closes, so we needed a different connection string.
- Since we’re creating a new DB every test – we need to create the DB schema every time, using the SchemaExport tool. (More)
- Make sure the reference to System.Data.Sqlite has the “Copy local = true” flag (as it is not a completely managed dll, and does not default to true).
- Also watch out for 64bit issues and FW4.0 issues with System.Data.Sqlite.
Mapping issues:
- We desperately want to use FluentNHibernates’ automapping, so we provide an assembly to scan for entities, and a minimal adaptation of the DefaultAutomappingConfiguration object (I’ve chose to use attributes to mark persisted types. Inheritance can do just the same).
- Every entity needs a primary key in RDBMS. So we’ve added the Id property on MyEntity.
- NH wants to provide you with lazy loading for your data, so it needs to override your properties (creating dynamic proxies at runtime), so you need to make them virtual (and add a reference to the NHibernate.ByteCode.Castle assembly)
NHibernate usage issues:
- An ISession object is lightweight (and can be thought of as “Cache Scope”). Create one when it’s a logical thing to do.
- An ISessionFactory object is VERY heavyweight. Create and save it.
- Always open (and commit) a transaction (saving OR loading).
- The three main query mechanisms for NH are HQL, ICriteria and Linq2NH. I find it silly that in 2010 I won’t use linq to express my queries, so (for NH 2.1.2 only) you need to add the NHibernate.Linq.dll. NH v3.0 incorporated that syntax in the core.
Test issues:
- Make sure you add the required references above to the tests project, so that NHibernate can, at runtime, use all those assemblies required.
1: using System;
2: using FluentNHibernate.Automapping;
3: using FluentNHibernate.Cfg;
4: using FluentNHibernate.Cfg.Db;
5: using NHibernate;
6: using System.Linq;
7: using NHibernate.Cfg;
8: using NHibernate.Linq;
9: using NHibernate.Tool.hbm2ddl;
10:
11: namespace NHStart
12: {
13: public class Saver
14: {
15: public void Save(MyEntity entity)
16: {
17: using (var session = GetSession())
18: using(var tx = session.BeginTransaction())
19: {
20: session.Save(entity);
21:
22: tx.Commit();
23: }
24: }
25:
26: private ISession GetSession()
27: {
28: var sessionFactory = Factory;
29: return sessionFactory.OpenSession();
30: }
31:
32: protected ISessionFactory _Factory;
33: protected ISessionFactory Factory
34: {
35: get
36: {
37: if (_Factory == null)
38: {
39: Configuration config = null;
40:
41: _Factory =
42: Fluently
43: .Configure()
44: .Database(SQLiteConfiguration.Standard.ConnectionString(x => x.Is("Data Source=:memory:;Version=3;New=True;Pooling=True;Max Pool Size=1")))
45: .Mappings(x=>x.AutoMappings.Add(AutoMap
46: .Assemblies(new MyAutomappingConfiguration(), typeof(MyEntity).Assembly)))
47: .ExposeConfiguration(x=>config = x)
48: .BuildSessionFactory();
49:
50: using (var session = _Factory.OpenSession())
51: {
52: new SchemaExport(config).Execute(false, true, false, session.Connection, Console.Out);
53: }
54:
55: }
56: return _Factory;
57: }
58: }
59:
60: public MyEntity Load()
61: {
62: using (var session = GetSession())
63: using (var tx = session.BeginTransaction())
64: {
65: var item = session
66: .Linq<MyEntity>()
67: .First();
68:
69: tx.Commit();
70:
71: return item;
72: }
73: }
74: }
75:
76: public class MyAutomappingConfiguration : DefaultAutomappingConfiguration
77: {
78: public override bool ShouldMap(Type type)
79: {
80: return type.GetCustomAttributes(true).OfType<PersistedAttribute>().Any();
81: }
82: }
83:
84: [Persisted]
85: public class MyEntity
86: {
87: public virtual int Id { get; set; }
88: public virtual int Val { get; set; }
89: public virtual string Val2 { get; set; }
90: }
91:
92: public class PersistedAttribute : Attribute {}
93: }
I’ve even added line count to show how 93 lines of code can create a (simple) persistence layer. It was fun.