Longest Common Prefix with C# and LINQ

10/05/2013

one comment

I recently can across an interesting problem:  Given a set of n directories, find the most nested directory that is an ancestor of all of them.  This is equivalent to finding the longest common prefix. For example, if I have the following paths:

/dir1/dir2/dir3
/dir1/dir2/dir4

Then the longest common prefix is just /dir1/dir2.  If we now add /dir1/dir5 to the set, we get:

/dir1/dir2/dir3
/dir1/dir2/dir4
/dir1/dir5

Then the longest common prefix changes to /dir1.  In order to do this, I start by comparing the first and second directories by splitting them on the / symbol, and then comparing the elements of the resulting arrays.  The common elements are the components of the common prefix, and the common prefix ends at the index where the elements in the arrays are different.  I then take this common prefix, compare it with the next directory of the set and repeat.

For example, for /dir1/dir2/dir3 and /dir1/dir2/dir4 above:

image

OK, so nothing overly complex there.  There are probably better algorithms for doing this, but the code is not performance-critical and I have more important things to spend my time on. 

Here is the C# code for implementing the algorithm:

public string GetLongestCommonPrefix(string[] directories)
{
    if (directories == null || directories.Length == 0)
        return String.Empty;

    const char SEPARATOR = '/';

    IList<string> directoryParts = new List<string>(
        directories[0].Split(SEPARATOR));

    for (int index = 1; index < directories.Length; index++)
    {
        IList<string> first = directoryParts;
        string[] second = directories[index].Split(SEPARATOR);
        int maxPrefixLength = Math.Min(first.Count, second.Length);
        var tempDirectoryParts = new List<string>(maxPrefixLength);

        for (int part = 0; part < maxPrefixLength; part++)
        {
            if (first[part] == second[part])
                tempDirectoryParts.Add(first[part]);
        }
        directoryParts = tempDirectoryParts;
    }

    return String.Join(SEPARATOR.ToString(), directoryParts);
}

This works and gets the job done.  But it’s ugly.  In particular:

  • Where’s the algorithm?  All I see is a bunch of assignments and loops.  It’s really difficult to tell what’s going on here and see the big picture.
  • With so many loops and indices and assignments, it’s practically guaranteed that you’ll get lost and have off-by-1 errors and other such fun bugs.  I actually needed tests (plural) for this ‘simple’ piece of code.

Let’s compare this with an implementation based on LINQ (which in essence is an implementation based on the functional programming paradigm):

 

public string GetLongestCommonPrefix(string[] directories)
{

    if (directories == null || directories.Length == 0)
        return String.Empty;

    const char SEPARATOR = '/';


    string[] prefixParts =
        directories.Select(dir => dir.Split(SEPARATOR))
        .Aggregate(
            (first, second) => first.Zip(second, (a, b) => 
                                    new { First = a, Second = b })
                                .TakeWhile(pair => pair.First.Equals(pair.Second))
                                .Select(pair => pair.First)
                                .ToArray()
        );

    return string.Join(SEPARATOR.ToString(), prefixParts);
}

 

MUCH nicer.  Note how I have no state management of any kind (no array indices, temporary lists, etc.), which means that the chances I’ll miss an edge case are much slimmer.  The code is just function composition, the kind that programmers do all day, every day.  Also note that the algorithm discussed previously is pretty much right in your face.  We’re telling the compiler WHAT we want done rather then HOW to do it.  Granted, you need to know the LINQ syntax and methods to understand what’s going on.  But LINQ has been around long enough so that many people are familiar with it and there are plenty of examples available. 

But you know what’s really cool about this code?  The fact that once it compiles it Just Works.  So once the compiler and I finished arguing about the types, all of my tests passed without a hitch.  That’s impressive.

I look forward to giving my C# code a functional touch!

Add comment
facebook linkedin twitter email

Leave a Reply

Your email address will not be published.

You may use these HTML tags and attributes: <a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code> <del datetime=""> <em> <i> <q cite=""> <s> <strike> <strong>

*

one comment

  1. Steven Gravitz06/07/2013 ב 19:01

    I enjoyed firing up VStudio today and playing around with your linq expression. Unfortunately, at work, we’re still stuck on 3.5, so no “Zip” method is available.

    Ever since discovering Linq 5 years ago, it has totally transformed the way we develop .net apps, and javascript as well. Underscore.js is an open source library that brings a Linq approach to working with javascript arrays and objects. Check it out if you do any js development.

    Thanks for posting this — I’m really glad I ran across it.

    -Steve

    Reply