.NET Geek

"It is upon the Trunk that a gentleman works" - Confucius
CodeRush Plugin – Navigate to Implementation

EDIT: As of version 9.3.2 of CodeRush functionality similiar to that provided by this plugin is provided in the core product and support and builds for this plugin is discontinued... If you need assistance for earlier version of CR, just leave a comment and I'll get back to you.

Instead of posting a new a post every time a new build of CodeRush (and the plugin) becomes available, I’ll start to post the binaries here by updating this post. If you are interested in the source code, you can get it here. If you want a walk-through of how the plugin was built, you can find the first post in a series here.

image_4[1]

Plugin binaries built against CodeRush version:

9.2.4
9.2.8
9.2.9

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

To install the plugin copy the binaries to: C:\Users\{your user name}\Documents\DevExpress\IDE Tools\Community\PlugIns

Calling the Google Closure Javascript Compiler - Code

Here’s a follow-up to the last post that contained a few screenshots showing the integration of the Google Closure Javascript compiler in Visual Studio.

Since I said in the last post that I would make the source available if there was interest, I’m doing that now.

The project structure is as following

solution struct

All of the code for accessing the compilation service is in the GoogleClosure project. Initially I didn’t have a Winform project. I split the projects for this post, realizing that some people interested in the code might not have CodeRush.

The GoogleClosure project also contains the UI in a user control. As such, there was no code to be written in neither the VS plugin or in the Winform application. Placing the user control from the GoogleClosure project is enough.

About the code: The code was written in brain-storming mode. As such, I won’t stand to trial for any best-practice not followed or any shortcut taken. ;-)
Seriously though, there’s not a lot of code here. In essence, the Compilation namespace in the GoogleClosure project, contains classes that maps to the compiler service results and options. The ClosureCompilerService class is doing all the communcation with the service. The compiler service can return its results in XML, JSON and text. I implemented only parsing of the XML results which is simple in .Net using Linq2Xml.

P.S (The “CR_” naming convention is common for CodeRush plugins.)

You can download the source code here.

Shout it

Integrating Google Javascript Compiler in Visual Studio

Friday, Google released some of their Javascript tools. Among them the Closure Javascript compiler. Being excited that Google has released Closure (the compiler), I decided to give it a go. So late Saturday night after the party was over and the kids were sleeping, I played a little with the online UI when I saw that the compiler services are exposed through a REST web service. At that point it would be a crime to go to sleep.

2 hours later…

In this screenshot you can see some compiler statistics. (Yes, you can use the compiler service as a pretty-printer or minifier)

stats

Dubious code is flagged with warnings. (I like the “Is there a bug?” message)

warning

If you get the script plain wrong, you’ll be greeted with an error.

error

I love the fact that it is possible to whip up something like this in just a couple of hours.

The Visual Studio Integration was built as a CodeRush plugin (which takes a full 5 minutes). The rest of the time was spent reading the Closure web service spec and parsing the results from the web service.

It’s not all good however. In the current implementation you need to copy the source from the editor to the code pane in the tool window which isn’t a great user experience. Also, there’s no easy way to “jump” to the error location when there is a warning or an error. There is also no UI for setting the compiler options.

Future Plans: (Maybe)

  1. Compile the enclosing scope under the caret by pressing a key combination.
  2. Integrate with Visual Studio Error tool window. (I have no idea how to do that)
  3. Expose the compiler options through an option page. (Currently I use some decent defaults)
  4. Any other ideas?

If there’s interest, I’m considering writing up a small series of how the plugin was built including source and binaries. (It’s small enough that it won’t take all my spare time)

Shout it

Update: Code can be found here - http://blogs.microsoft.co.il/blogs/kim/archive/2009/11/10/calling-the-google-closure-javascript-compiler-code.aspx

My Favorite SQL Server 2k8 Feature

Since I'm typing this on my new Eee 1101HA I'm going to keep it short. (I hate these mini keyboards)

Keeping an eye on our production Sql Server box is an integral part of my daily routine. I would not categorize myself as a DBA as I don't do much administrative work related to our databases. I'm dealing mainly with issues that are related to development and making sure our system runs efficiently. One of those tasks is to make sure that our database is properly indexed and maintaining indexes as the database grows. Anyone who has done any significant work against a database knows the importance of proper indexing and that proper indexing is a balancing act.
Clustered, non clustered, covering, fill factors and page splits are just a tiny subset of the things that you have to keep on top of.

My absolute favorite feature in Sql Server 2008 is filtered indexes. In a nutshell you create an index with a filter which is used to determine what goes into the index and what not.
Let's say you have a table with some bit flags. The selectivity on these columns might be too low to be of any real use for even the smartest query analyzer and you'll often see scans when filtering on these bit flags.
In some of our tables we have bit flags where the vast majority of the records have these bit flags set to false. In these scenarios filtered indexes just shine. After creating a filtered index on only the true values of the bit flag we had queries that would take minutes to execute drop to a few milliseconds.

For more information and some important issues related to maintaining filtered indexes, you should check out the following excellent posts.
http://www.sqlskills.com/BLOGS/KIMBERLY/post/Filtered-indexes-and-filtered-stats-might-become-seriously-out-of-date.aspx
http://blogs.technet.com/josebda/archive/2009/03/17/indexing-best-practices-for-sql-server-2008.aspx

CodeRush plugin "Navigate to Implementation" - binaries for CodeRush 9.2.4

For convenience, here are the binaries for the Navigate to Implementation CodeRush plugin. It is built against CodeRush 9.2.4. A detailed overview of the functionality of the plugin can be found here. The source code is available on Google Code: http://dxcorecommunityplugins.googlecode.com/svn/trunk/CR_NavigationContrib

 

Importing Large Xml Files to SQL Server Using SqlBulkCopy

Say you have a large Xml file that contains relatively tabular data that you want to import into SQL Server. There are several ways to go about this. Let’s look at a couple of options.

  1. Load the file into an XDocument. Extract elements from the DOM using Linq and then use ADO.Net to insert the data into the database.
  2. Load the data into a DataSet using ReadXml and save the data to the database
  3. Read through the data using an Xml reader and save each record to the database

Options number 1 and 2 requires that we load the entire document into memory before processing. We will work under the assumption that the files we receive are too large to load into memory. That leaves with option number 3 which allows us to read the file in a fast forward only mode where we only hold a small portion of the file in memory. Once an item has been processed and the reader moves forward, the previously read data is unavailable to the reader. The main bottleneck now is how do we push the data in a really efficient way into the database. First of all let’s look at some sample data.

<?xml version="1.0" encoding="utf-8"?>
<lab_results>
  <result type="A01" origin="xb102">
    <name>aaa</name>
    <description>sample description1</description>
  </result>
  <result type="A02" origin="xb103">
    <name>bbb</name>
    <description>sample description2</description>
  </result>
</lab_results> 

The files we are about to process contain lab results. Each file will contain somewhere between 5-8 million result elements. Each result element will contain approximately 150 characters. 150 characters * 2 (Unicode) * 5 million = about 1.5GB per file. Obviously we can’t read that into memory all at once. Another detail is that the file contains type and origin codes which needs to be mapped to their appropriate values in the database.

Here’s the sample table structure that will have to hold the data we import.

lab results

LabResultOrigin and LabResultType are standard lookup tables. Here’s some sample data so we can conceptualize the conversions we will need to make later.

LabResultType

TypeId Description
1 A01
2 A02
3 B01

LabResultOrigin

OriginId Description
1 xb102
2 xb103
3 xz101

For example, when we process the Xml file we will need to convert the from the textual result type of “A01” to 1.

Importing the data

When importing using SqlBulkCopy there are two ways to feed it data. One is to use a DataTable which we discarded above because of the size of the files. Another option is to use a DataReader. The problem is that there is no class in .Net that reads xml and implements the IDataReader interface.

That means we will have to roll our own. Looking at the IDataReader interface it seems we will have to implement a boatload of methods. Or do we?
Here’s a neat little trick that was mentioned in the passing in a comment by “jezemine” in this blog post.

  1. Create a class that implements IDataReader.
  2. Press Alt+Shift+F10 to implement the interface members (with a throw NotImplementedException)
  3. Pass your class to SqlBulkCopy. It will break on a NotImplementedException of course.
  4. Go ahead and implement the method it crashed on.
  5. Repeat steps #3 and #4 until all required methods have been implemented.

Only 3 out of more than 20 methods in IDataReader are actually used by SqlBulkCopy. Not too bad.

We will create an XmlDataReader class that implements IDataReader so that we can pass it to SqlBulkCopy. The calling code will look something like the following:

using (XmlTextReader xmlTextReader = new XmlTextReader(message.FileName))
using (SqlBulkCopy sqlBulkCopy = new SqlBulkCopy(connectionString, SqlBulkCopyOptions.TableLock))
{
    SetupBulkCopy(sqlBulkCopy);
 
    var reader = new LabResultDataReader(xmlTextReader);
    sqlBulkCopy.WriteToServer(reader);
}

As you can see, there is no mention of an XmlDataReader, but rather a LabResultDataReader. I decided to split the reader into two distinct pieces. One is a general purpose XmlDataReader that can be used for other xml structures and not only the one in the sample. A special purpose LabResultDataReader is derived from XmlDataReader and this class knows how to handle the structure of the lab result xml file.

In order to tell SqlBulkCopy how to map between the data in the file and the columns in the database we need some plumbing as seen in the call to SetupBulkCopy().

private static void SetupBulkCopy(SqlBulkCopy sqlBulkCopy)
{
    sqlBulkCopy.ColumnMappings.Add(new SqlBulkCopyColumnMapping(0, "TypeId"));
    sqlBulkCopy.ColumnMappings.Add(new SqlBulkCopyColumnMapping(1, "OriginId"));
    sqlBulkCopy.ColumnMappings.Add(new SqlBulkCopyColumnMapping(2, "ResultName"));
    sqlBulkCopy.ColumnMappings.Add(new SqlBulkCopyColumnMapping(3, "Description"));
 
    sqlBulkCopy.DestinationTableName = "LabResult";
}

Note that we didn’t include the LabResultId column in the mapping above. The reason for this is that the LabResultId is an Identity column and is generated automatically by SQL Server.

The general purpose XmlDataReader is as follows:

public abstract class XmlDataReader : IDataReader
{
    private readonly string m_rowElementName;
 
    private readonly XmlReader m_xmlReader;
    private readonly int m_fieldCount = -1;
 
    private bool m_disposed;
 
    protected IEnumerator<XElement> m_enumerator;
 
    public abstract object GetValue(int i);
 
    /// <summary>
    /// Initialize the XmlDataStreamer. After initialization call Read() to move the reader forward.
    /// </summary>
    /// <param name="xmlReader">XmlReader used to iterate the data. Will be disposed by when done.</param>
    /// <param name="fieldCount">IDataReader FiledCount.</param>
    /// <param name="rowElementName">Name of the XML element that contains row data</param>
    public XmlDataReader(XmlReader xmlReader, int fieldCount, string rowElementName)
    {
        m_rowElementName = rowElementName;
        m_fieldCount = fieldCount;
        m_xmlReader = xmlReader;
        m_enumerator = GetXmlStream().GetEnumerator();
    }
 
    public bool Read()
    {
        return m_enumerator.MoveNext();
    }
 
    public int FieldCount
    {
        get { return m_fieldCount; }
    }
 
    public XElement CurrentElement
    {
        get { return m_enumerator.Current; }
    }
 
    /// <summary>
    /// http://msdn.microsoft.com/en-us/library/system.xml.linq.xstreamingelement.aspx
    /// </summary>
    /// <param name="m_xmlReader"></param>
    /// <returns></returns>
    private IEnumerable<XElement> GetXmlStream()
    {
        XElement rowElement;
        using (m_xmlReader)
        {
            m_xmlReader.MoveToContent();
 
            while (m_xmlReader.Read())
            {
                if (IsRowElement())
                {
                    rowElement = XElement.ReadFrom(m_xmlReader) as XElement;
                    if (rowElement != null)
                    {
                        yield return rowElement;
                    }
                }
            }
        }
    }
 
    private bool IsRowElement()
    {
        if (m_xmlReader.NodeType != XmlNodeType.Element)
            return false;
 
        return m_xmlReader.Name == m_rowElementName;
    }
 
    protected virtual void Dispose()
    {
        if (m_disposed)
            return;
 
        m_enumerator.Dispose();
        m_disposed = true;
    }
 
 
    #region Members not required by SqlBulkCopy
 
    #region IDataReader Members
 
 
 
    public bool NextResult()
    {
        throw new NotImplementedException();
    }
 
    public int RecordsAffected
    {
        get { throw new NotImplementedException(); }
    }
 
    public string GetDataTypeName(int i)
    {
        throw new NotImplementedException();
    }
 
    // Deleted tons of methods not required...
}

Now let’s have a look at the special purpose LabResult reader.

public class LabResultDataReader : XmlDataReader
{
    private const string XmlTagRow = "result";
 
    private const int FieldCount = 4;
    private const int InvalidItemId = -1;
 
    public LabResultDataReader(XmlReader xmlReader)
        : base(xmlReader, FieldCount, XmlTagRow){ }
 
    public override object GetValue(int i)
    {
        switch (i)
        {
            case 0:
                return CurrentElement.Attribute("type").EnumFromValue<ResultType>();
            case 1:
                return CurrentElement.Attribute("origin").EnumFromValue<Origin>();
            case 2:
                return CurrentElement.Element("name").value;
            case 3:
                return CurrentElement.Element("description").value;
            default:
                throw new InvalidOperationException("Column count mismatch.");
        }
    }
}

There are two main features of the XmlDataReader. The first is that it flattens the Xml hierarchical structure to resemble a row. For each iteration of the enumerator, it extracts the <result> element and all its children. The special purpose reader (in our example the LabResultsDataReader) maps the nested <result> tag to a set of column values. The second significant feature is the streaming nature of the reader. We never hold more than a single <result> element in memory at any given time.

If you look at the code above and wondered what are those .EnumFromValue<> methods on the XmlAttributes, you were right. These are just Extension methods that I added for convenience.

public static T EnumFromValue<T>(this XAttribute attribute)
{
    string value = attribute.Value;
    if (string.IsNullOrEmpty(value))
        return default(T);
 
    try
    {
        T converted = (T)Enum.Parse(typeof(T), value, true);
        return converted;
    }
    catch (Exception)
    {
        return default(T);
    }
}

As you can see, despite that this turned into a not so short post, there’s not that much code required for this to work. Hopefully this will give you enough to assist you in building your own high performance ETL solution.

A few drawbacks of the proposed solution. I have only had to process a very limited set of files. In the case where many files of different schema would have to be parsed and processed, the design of having a designated derived class for each type of file could become a pain point. There is no support for parsing files with a schema that is not known at compile time. This implies that if the schema changes or a new type of file needs to be imported, a new build is needed.

Suggestions, comments, improvements etc. are welcome.

kick it on DotNetKicks.com
Multi-Threading is Hard – Who do you trust?

Something has been bothering me for quite some time and I am frankly not entirely sure what to do.

I have been working on a multi-threaded server application for over a year and feel that I have a decent grasp on multi-threading. More importantly, I have learned to respect the complexity.

So what is bothering me? Over the last few months I have encountered a number of blog posts that propose some solution with code that multi-threaded.

The problem – It’s Wrong!

Sometimes I send an email to the author, but for the most part, the posts are not being updated. That leaves the potential for other developers to take code from their trusted source and use it. Only to find down the road, maybe only after a year in production when the volume strains the system. You might claim that you cannot trust anyone, but multi-threading is complex enough that it is easy to be tempted to grab something from someone you otherwise believe knows his stuff. The problem I see most often is that of a lack of understanding of memory barriers and volatility. This is a basic concept if you do any kind of multi-threading and I am mostly surprised.

Besides sending an email to the author I am not going to start to blog about bad sources for multi-threading.

So where do you go?

Don’t trust code you find online. If you don’t understand it, don’t use it. (assuming it is not copyrighted and that you are allowed to grab it)

Do you know of any good sources for multi-threaded programming?

A Must Read - Release It!

I just finished reading Release It by Michael Nygard. The book deals with the topic of having software in production. Over the years I've been on quite a few projects from the requirements phase through development and eventually production. (No, not all of them reached production)

I can't give enough compliments about the book. The writing style is a brilliant mix of development related issues and "war" stories from Michael's own experiences. It covers both anti-patterns and patterns required for successful survival in a production environment.

From the editiorial review:

In Release It!, Michael T. Nygard shows you how to design and
architect your application for the harsh realities it will face.
You'll learn how to design your application for maximum uptime,
performance, and return on investment.

Nitpickers corner: The only issue I found less than optimal was the placement of the sidebars. The sidebars are sometimes located such that I either had to interupt my reading flow to read them, or go back a page later.

 

Posted: May 30 2009, 08:21 PM by Kim | with no comments
תגים:, ,
SqlBulkCopy Bug Workaround

We are using SqlBulkCopy to import large xml documents into a database. However, we encountered a “minor” bug in the .Net Framework related to table naming. It turns out that if you have a dot “.” in the table name, SqlBulkCopy doesn’t work. The problem has been reported and a KB article is available, but without a workaround besides renaming the table. Our problem wasn’t with the actual name of the table(s), but with the name of the schema. The schema naming convention used in that specific database is [CompanyName.Project].TableName.

The following code failes with the exception below. (Note the assignment to DestinationTable below.)

IDataReader reader = new XmlDataStreamer(/* other stuff here */);
using (SqlBulkCopy sqlBulkCopy = new SqlBulkCopy("Server=(local);Database=ScratchDb;Trusted_Connection=True;", SqlBulkCopyOptions.TableLock))
{
    sqlBulkCopy.DestinationTableName = "[SchemaPart1.Part2].ImportTable";
    sqlBulkCopy.WriteToServer(reader);
}

SqlBulkException

For the sake of argument, let’s just say that it is beyond our control to change the naming of the schema. After Googling around for a solution, it appeared to be more than a few with the same problem. Now what do we do?

My first thought was to try to figure out if there’s a way to manipulate .Net into accepting the table name. After spending some quality time with Reflector it was time to take a walk and think about something else. I discussed the issue with one of the other developers on the team. While describing the issue a solution took form.

If I can’t change the .Net Framework and I can’t change the schema/table name, maybe I can somehow disguise the table name. Being inspired by dynamic languages lately - If it looks like a table and behaves like a table it must be a table. (Not really, but maybe .Net won’t notice the difference)

Solution: Use a view!

  • In the database create a new schema without dots in the name.
  • Create a view over the import table with a one-to-one mapping to the table.

That leaves us with a view named: [CompanyNameProjectImport].ImportTableView

The following works:

IDataReader reader = new XmlDataStreamer(/* other stuff here */);
using (SqlBulkCopy sqlBulkCopy = new SqlBulkCopy("Server=(local);Database=ScratchDb;Trusted_Connection=True;", SqlBulkCopyOptions.TableLock))
{
    sqlBulkCopy.DestinationTableName = "[CompanyNameProjectImport].ImportTableView";
    sqlBulkCopy.WriteToServer(reader);
}
How Many Rows in a Table?

Have you ever wondered how many rows your SQL Server database tables contains?

A few times now I've wanted to grab a list of table names with the row count for each table. In a post about a few undocumented Stored Procedures I came across this neat script.

EXEC
sp_MSforeachtable 'SELECT ''?'', Count(*) as NumberOfRows FROM ?'

It uses an undocumented stored procedure that iterates over all user tables in the database. The only issue I had with it, was that the output was really tedious to read. I wanted a simple tabular display showing me a list of tables with their name and number of rows, but sorted in descending order by the number of rows. Note that you can get this information in Management Studio

Here's a short script that does that...

if(object_id ('tempdb.dbo.#TableRowCount') is not null)     
  
drop table #TableRowCount
go 

create
 table #TableRowCount 
  
(Id int identity(1,1) NOT NULL primary key
  
,TableName nvarchar(100)     
   ,RowsInTable int)
 

insert
 into #TableRowCount (TableName, RowsInTable)       
   exec sp_MSforeachtable 'SELECT ''?'', Count(*) as NumberOfRows FROM ?'
 

select * from #TableRowCount order by RowsInTable desc

 

Alternatively you can use the following script (which is what Management Studio uses)

select
   
[name]
AS TableName
  
,TableRowCount = (
             select sum (spart.rows) from sys.partitions spart 
             where spart.object_id = tbl.object_id and spart.index_id < 2 )
from sys.tables tbl
order by TableRowCount desc

Which do you prefer? 

Posted: Apr 30 2009, 06:06 PM by Kim | with no comments
תגים:,
Solving an Expensive Database Lookup

We had an interesting problem the other day. In our database (SQL Server 2008) we have a few tables with possibly many millions of records. We send some of the data from these and related tables to a third party service for processing and get status reports back. The problem was that the reports that we get back cannot easily be correlated back to the original records in our database.

Let’s say we have a person table and an address table with a 1:m relationship between person and address.
The tables could look something like:

Capture

Together with the person and address data we send some additional data points for processing. The processing report contains the following data.

ProcessingResult, FirstName, LastName, AddressLine1, AddressLine2, City, State, Zip.

When we receive the processing report we need to correlate between the person record(s) and the processing result for that name and address. (Could you have two people with the same name at the same address?)

So what do you do if there are 10,000,000 person records with the same amount of addresses?

I don’t know what your first thought is, but my first thought was that either we index the data in some way or this won’t work. You can’t do a table-scan on 10 million rows. From there the next thought that came to mind was to index on all the address columns to limit the resultset down to the persons with a given address. Essentially a covering index on Address. Doable, but there has to be a better way. My issue with this approach is that this index is expensive both in terms of space and IO.

Let’s rephrase the question:
“How can we create an alternative representation of Address that allows us to limit the number of matches?”
It doesn’t have to be exact. If the result would return, say 100 addresses, we could just do an exhaustive search on those and compare all columns with the data returned from the service.

Spoiler below…

public static string Hash(Address address)
{
    StringBuilder sb = new StringBuilder(150);
    sb.Append(address.AddressLine1);
    sb.Append(address.AddressLine2);
    sb.Append(address.City);
    sb.Append(address.Zip);
    sb.Append(address.State);
    return new Hasher(HashType.MD5).Hash(sb.ToString().ToLowerInvariant());
}

Create a condensed representation of Address and store that representation with an index.

By hashing the address object using MD5, we obtain a 32 character string representation of the address. To store this hashed representation we can add a column to the address table named AddressHash and create an index on that column. Now, when we receive the processing report, we hash the address in the report and do a lookup against the hash value stored in the database. We then take any matching records (possibly only one) and do a comparison between the found record and the address from the service.

Here’s the Hasher class.

public enum HashType
{
    MD5,
    Sha1,
    Sha256,
    Sha384,
    Sha512,
}
 
public class Hasher
{
    private HashAlgorithm m_hasher;
 
    private readonly HashType DefaultHashType = HashType.MD5;
 
    /// <summary>
    /// Instantiates Hasher with the default HashType
    /// </summary>
    public Hasher()
    {
        m_hasher = GetHasher(DefaultHashType);
    }
 
    public Hasher(HashType hashType)
    {
        m_hasher = GetHasher(hashType);
    }
 
    /// <summary>
    /// Hash input and return the hashed result in a Hex string.
    /// </summary>
    /// <param name="input"></param>
    /// <returns>Hash of input or null if input is null</returns>
    public string Hash(string input)
    {
        if (String.IsNullOrEmpty(input))
            return string.Empty;
 
        byte[] data = m_hasher.ComputeHash(Encoding.Unicode.GetBytes(input));
        return ToHex(data);
    }
 
    /// <summary>
    /// Verify that hasing input matches an existing hash value
    /// </summary>
    /// <param name="input">The string to check against its hash</param>
    /// <param name="hash"></param>
    /// <returns>Match result or false if either parameter is null</returns>
    public bool IsHashMatch(string input, string hash)
    {
        if (String.IsNullOrEmpty(input) || String.IsNullOrEmpty(hash))
            return false;
 
        string inputHash = Hash(input);
        return string.Compare(inputHash, hash, StringComparison.OrdinalIgnoreCase) == 0;
    }
 
    private string ToHex(byte[] data)
    {
        StringBuilder builder = new StringBuilder();
        for (int i = 0; i < data.Length; i++)
        {
            builder.Append(data[i].ToString("X"));
        }
        return builder.ToString();
    }
 
    private static HashAlgorithm GetHasher(HashType hashType)
    {
        switch (hashType)
        {
            case HashType.MD5:
                return MD5.Create();
            case HashType.Sha1:
                return SHA1.Create();
            case HashType.Sha256:
                return SHA256.Create();
            case HashType.Sha384:
                return SHA384.Create();
            case HashType.Sha512:
                return SHA512.Create();
            default:
                throw new ArgumentOutOfRangeException("Unknown HashType");
        }
    }
 
}

P.S While the Hex conversion of the hashed result is not an absolute requirement, we wanted to store the hash in a human readable format.

How would you solve this problem to achieve better performance? (Storage is less important)

Happy Coding!

Posted: Apr 26 2009, 10:57 PM by Kim | with no comments
תגים:, , ,
Adapters and Functional Abstraction

I can’t decide what I think about the following implementation so I decided to throw it out here. Any thoughts are welcome.

In a previous post I wrote about our wrapper around Bouncy Castle PGP encryption. Turned out it worked pretty well except one little problem. The PGP implementation on the receiving end would complain about our signature. In despair I even posted (and later answered) a question on StackOverflow. :-)

The problem was that they use an old version of a PGP library that couldn’t deal with our newer v4 signatures. Oh, forget it, I’m not going to bore you with the details…

I looked at the code for creating the object that does the signing and it returns an instance of a PgpSignatureGenerator (A Bouncy Castle class). Instead of using this signature generator we needed to use a PgpV3SignatureGenerator (Also a Bouncy Castle class). It was no big deal to change the code to instantiate and return a different type. However, we wanted to be able to return the v4 signature generator as well. Since both PgpSignatureGenerator and PgpV3SignatureGenerator are concrete implementations without a common base class we could not just return a common abstraction. Ok, still not a big deal. We can just create a wrapper (adapter) around the implementations and return that.

That’s when you’re hit by the principle/rule police. SOLID, good? SOLID bad? should you prefer composition over inheritance or use a inheritance based version of the adapter pattern. Oh no, is this Adapter or is it Facade? … Just kidding…
But I couldn’t resist it after the last couple of weeks of noise around principles and rules and whether they’re good or not.

After a closer look at the two classes, I saw that they conform to the same interface (for the parts relevant to us at least). So I decided to skip over any design pattern I knew of and just take the shortest path to a solution that works. We need to make two calls on the signature generator while encrypting. This is after the initial setup of the signature generator. You can see the two signatures below. For this discussion it doesn’t matter what they accomplish.

public void Update(byte[] buffer, int offset, int length);
public PgpSignature Generate();
 

 

 

The following class captures this functionality and the caller doesn’t know at the time of calling the methods whether it’s a v4 or v3 signature.

public class Signer
{
    public Action<byte[], int, int> Update { get; set; }
    public Func<PgpSignature> Generate { get; set; }
}

Signer can now be instantiated with either a v3 or v4 signature generator.

public Signer GetSignatureGenerator(Stream compressedOut, 
                                    SignatureVersion signatureVersion)
{
    if (signatureVersion == SignatureVersion.V3)
    {
        PgpV3SignatureGenerator pgpV3SignatureGenerator = 
            new PgpV3SignatureGenerator(...);
 
        // some more initialization code
        return new Signer
        {
            Update = (data, offset, length) => pgpV3SignatureGenerator.Update(data, offset, length),
            Generate = () => pgpV3SignatureGenerator.Generate()
        };
    }
    else if (signatureVersion == SignatureVersion.V4)
    {
        PgpSignatureGenerator pgpSignatureGenerator = 
            new PgpSignatureGenerator(...);
 
        // some more initialization code
        pgpSignatureGenerator.GenerateOnePassVersion(false).Encode(compressedOut);
        return new Signer
        {
            Update = (data, offset, length) => pgpSignatureGenerator.Update(data, offset, length),
            Generate = () => pgpSignatureGenerator.Generate()
        };
    }
    else
    {
        throw new ArgumentException("Invalid signature version");
    }
}

You can get a v3 signature generator by using the following.

var signer = GetSignatureGenerator(compressedOut, SignatureVersion.V3);

While it might not be entirely idiomatic to encapsulate by capturing only the behavior of the adaptee, I kind of like the minimalistic approach. It migth break in more complex scenarios without a common interface, but for the simple stuff I think I like it.

So what do you think?
Nice? Ugly? other?

Visual Studio Color Scheme – Done for now

After a tough last week where I spent far too much time debugging encryption keys, I needed something to relax. I haven’t tweaked my Visual Studio color settings in a while (a couple of years) so I spent a couple hours today refining them.

You probably don’t want to know what I do when I’ve had a really bad week. Ok, I’ll let you know. I put on some music I like and watch Defrag. Preferably at 2 AM. I don’t know who’s idiotic idea it was to remove the graphical defrag interface from Windows, but I have Diskeeper which also does a better job of defragging.

Where was I ? Oh, color settings.

Here it is. Let me know what you think…

VS2008 Settings file

PGP Zip Encrypted Files With C#

On a recent project here at Renaissance, we needed to send files over FTP to some third party vendor. One of the requirements was that the files had to be encrypted using PGP (Pretty Good Privacy). After some research we decided to use Bouncy Castle. Bouncy Castle is an open source C# implementation of the OpenPGP standard. It is available in Java as well.
An additional requirement was that the PGP Encrypted files needed to be signed as well.

If you have no background in cryptology or PGP and this sounds like gibberish, here’s a short simplified background on symmetric key encryption.

To share PGP encrypted files the sender and recipient both need two keys. One public and one private. The sender encrypts the file to send with the recipients public key and sign with his private key. Both parties then exchange public keys. Each party can decrypt using its own private key and it can verify who sent the file using the senders public key.

If this still sounds gibberish, I found this illustration on the LinomaSoftware site a good visual explanation. (never used it, just searched Google for PGP image)

With that out of the way, how hard can it be to encrypt and sign a file? Not very hard, but far too much code to write. We found a few samples online, but nothing I felt comfortable to use in our codebase. Credits to John Opincar who published a post on single pass encryption and signing. We used the blog post of his, the Bouncy test suites and a some trial and failure to get it working.

One of the issues with all the sample code out there, is that there are so many responsibilities squeezed together that unless you know what the code is doing beforehand, it is hard to grasp. It was to me at least. That might be partially related to me having no significant background in cryptology or PGP.

Lets see some code. No matter if I’m doing TDD or not, I always try to write the client code before the API. That way I shape the API from the point of view of the consuming code and avoid surprising and clunky interfaces later. I wanted the calling code to look like this.

private static void EncryptAndSign()

{

    PgpEncryptionKeys encryptionKeys = new PgpEncryptionKeys(

                                   PublicKeyFileName, PrivateKeyFileName, "PasswordOfMyPrivateKey");

    PgpEncrypt encrypter = new PgpEncrypt(encryptionKeys);

    using (Stream outputStream = File.Create(EncryptedFileName))

    {

        encrypter.EncryptAndSign(outputStream, new FileInfo(FileToEncrypt));

    }

}

From the sample code above you can see that we have separated Key management code from the actual encryption code. The PgpEncryptionKeys class instantiates and deals with the intricacies of key management. The PgpEncrypt class does this actual encryption. There were two reasons for this separation. The first is that key management is a separate concern conceptually. Another is that while we currently point to the location of the key files, we might want to change that in the future. I want to be able to change the way we instantiate the keys without touching the encryption code. No efforts were made at this point to create interfaces and/or abstract classes for evolution or extensibility. We’ll do that when/if we’ll need it.

Next we will have a look at the actual implementation. I will not walk through and explain all the code. We tried to make the code as self explanatory as possible. However, if you have no other background related to encryption and PGP besides this blog post, you should probably spend a few hours reading up on that before considering using this code. Treat this code As-Is with no commitment on my side to keep it up-to-date with bug fixes and improvements.

using System;

using System.IO;

using System.Linq;

using Org.BouncyCastle.Bcpg.OpenPgp;

namespace Renaissance.Common.Encryption

{

    public class PgpEncryptionKeys

    {

        public PgpPublicKey PublicKey { get; private set; }

        public PgpPrivateKey PrivateKey { get; private set; }

        public PgpSecretKey SecretKey { get; private set; }

        /// <summary>

        /// Initializes a new instance of the EncryptionKeys class.

        /// Two keys are required to encrypt and sign data. Your private key and the recipients public key.

        /// The data is encrypted with the recipients public key and signed with your private key.

        /// </summary>

        /// <param name="publicKeyPath">The key used to encrypt the data</param>

        /// <param name="privateKeyPath">The key used to sign the data.</param>

        /// <param name="passPhrase">The (your) password required to access the private key</param>

        /// <exception cref="ArgumentException">Public key not found. Private key not found. Missing password</exception>

        public PgpEncryptionKeys(string publicKeyPath, string privateKeyPath, string passPhrase)

        {

            if (!File.Exists(publicKeyPath))

                throw new ArgumentException("Public key file not found", "publicKeyPath");

            if (!File.Exists(privateKeyPath))

                throw new ArgumentException("Private key file not found", "privateKeyPath");

            if (String.IsNullOrEmpty(passPhrase))

                throw new ArgumentException("passPhrase is null or empty.", "passPhrase");

            PublicKey = ReadPublicKey(publicKeyPath);

            SecretKey = ReadSecretKey(privateKeyPath);

            PrivateKey = ReadPrivateKey(passPhrase);

        }

        #region Secret Key

        private PgpSecretKey ReadSecretKey(string privateKeyPath)

        {

            using (Stream keyIn = File.OpenRead(privateKeyPath))

            using (Stream inputStream = PgpUtilities.GetDecoderStream(keyIn))

            {

                PgpSecretKeyRingBundle secretKeyRingBundle = new PgpSecretKeyRingBundle(inputStream);

                PgpSecretKey foundKey = GetFirstSecretKey(secretKeyRingBundle);

                if (foundKey != null)

                    return foundKey;

            }

            throw new ArgumentException("Can't find signing key in key ring.");

        }

        /// <summary>

        /// Return the first key we can use to encrypt.

        /// Note: A file can contain multiple keys (stored in "key rings")

        /// </summary>

        private PgpSecretKey GetFirstSecretKey(PgpSecretKeyRingBundle secretKeyRingBundle)

        {

            foreach (PgpSecretKeyRing kRing in secretKeyRingBundle.GetKeyRings())

            {

                PgpSecretKey key = kRing.GetSecretKeys()

                    .Cast<PgpSecretKey>()

                    .Where(k => k.IsSigningKey)

                    .FirstOrDefault();

                if (key != null)

                    return key;

            }

            return null;

        }

        #endregion

        #region Public Key

        private PgpPublicKey ReadPublicKey(string publicKeyPath)

        {

            using (Stream keyIn = File.OpenRead(publicKeyPath))

            using (Stream inputStream = PgpUtilities.GetDecoderStream(keyIn))

            {

                PgpPublicKeyRingBundle publicKeyRingBundle = new PgpPublicKeyRingBundle(inputStream);

                PgpPublicKey foundKey = GetFirstPublicKey(publicKeyRingBundle);

                if (foundKey != null)

                    return foundKey;

            }

            throw new ArgumentException("No encryption key found in public key ring.");

        }

        private PgpPublicKey GetFirstPublicKey(PgpPublicKeyRingBundle publicKeyRingBundle)

        {

            foreach (PgpPublicKeyRing kRing in publicKeyRingBundle.GetKeyRings())

            {

                PgpPublicKey key = kRing.GetPublicKeys()

                    .Cast<PgpPublicKey>()

                    .Where(k => k.IsEncryptionKey)

                    .FirstOrDefault();

                if (key != null)

                    return key;

            }

            return null;

        }

        #endregion

        #region Private Key

        private PgpPrivateKey ReadPrivateKey(string passPhrase)

        {

            PgpPrivateKey privateKey = SecretKey.ExtractPrivateKey(passPhrase.ToCharArray());

            if (privateKey != null)

                return privateKey;

            throw new ArgumentException("No private key found in secret key.");

        }

        #endregion

    }

}

As you can see from the code and comments, PGP has a concept of key rings. In other words there can be many keys. We assume a single key.

Now to the PGP encryption class

using System;

using System.IO;

using Org.BouncyCastle.Bcpg;

using Org.BouncyCastle.Bcpg.OpenPgp;

using Org.BouncyCastle.Security;

namespace Renaissance.Common.Encryption

{

    /// <summary>

    /// Wrapper around Bouncy Castle OpenPGP library.

    /// Bouncy documentation can be found here: http://www.bouncycastle.org/docs/pgdocs1.6/index.html

    /// </summary>

    public class PgpEncrypt

    {

        private PgpEncryptionKeys m_encryptionKeys;

        private const int BufferSize = 0x10000; // should always be power of 2 

        /// <summary>

        /// Instantiate a new PgpEncrypt class with initialized PgpEncryptionKeys.

        /// </summary>

        /// <param name="encryptionKeys"></param>

        /// <exception cref="ArgumentNullException">encryptionKeys is null</exception>

        public PgpEncrypt(PgpEncryptionKeys encryptionKeys)

        {

            if (encryptionKeys == null)

                throw new ArgumentNullException("encryptionKeys", "encryptionKeys is null.");

            m_encryptionKeys = encryptionKeys;

        }

        /// <summary>

        /// Encrypt and sign the file pointed to by unencryptedFileInfo and

        /// write the encrypted content to outputStream.

        /// </summary>

        /// <param name="outputStream">The stream that will contain the

        /// encrypted data when this method returns.</param>

        /// <param name="fileName">FileInfo of the file to encrypt</param>

        public void EncryptAndSign(Stream outputStream, FileInfo unencryptedFileInfo)

        {

            if (outputStream == null)

                throw new ArgumentNullException("outputStream", "outputStream is null.");

            if (unencryptedFileInfo == null)

                throw new ArgumentNullException("unencryptedFileInfo", "unencryptedFileInfo is null.");

            if (!File.Exists(unencryptedFileInfo.FullName))

                throw new ArgumentException("File to encrypt not found.");

            using (Stream encryptedOut = ChainEncryptedOut(outputStream))

            using (Stream compressedOut = ChainCompressedOut(encryptedOut))

            {

                PgpSignatureGenerator signatureGenerator = InitSignatureGenerator(compressedOut);

                using (Stream literalOut = ChainLiteralOut(compressedOut, unencryptedFileInfo))

                using (FileStream inputFile = unencryptedFileInfo.OpenRead())

                {

                    WriteOutputAndSign(compressedOut, literalOut, inputFile, signatureGenerator);

                }

            }

        }

        private static void WriteOutputAndSign(Stream compressedOut,

            Stream literalOut,

            FileStream inputFile,

            PgpSignatureGenerator signatureGenerator)

        {

            int length = 0;

            byte[] buf = new byte[BufferSize];

            while ((length = inputFile.Read(buf, 0, buf.Length)) > 0)

            {

                literalOut.Write(buf, 0, length);

                signatureGenerator.Update(buf, 0, length);

            }

            signatureGenerator.Generate().Encode(compressedOut);

        }

        private Stream ChainEncryptedOut(Stream outputStream)

        {

            PgpEncryptedDataGenerator encryptedDataGenerator;

            encryptedDataGenerator =

                new PgpEncryptedDataGenerator(SymmetricKeyAlgorithmTag.TripleDes,
                                              new SecureRandom());

            encryptedDataGenerator.AddMethod(m_encryptionKeys.PublicKey);

            return encryptedDataGenerator.Open(outputStream, new byte[BufferSize]);

        }

        private static Stream ChainCompressedOut(Stream encryptedOut)

        {

            PgpCompressedDataGenerator compressedDataGenerator =

                new PgpCompressedDataGenerator(CompressionAlgorithmTag.Zip);

            return compressedDataGenerator.Open(encryptedOut);

        }

        private static Stream ChainLiteralOut(Stream compressedOut, FileInfo file)

        {

            PgpLiteralDataGenerator pgpLiteralDataGenerator = new PgpLiteralDataGenerator();

            return pgpLiteralDataGenerator.Open(compressedOut, PgpLiteralData.Binary, file);

        }

        private PgpSignatureGenerator InitSignatureGenerator(Stream compressedOut)

        {

            const bool IsCritical = false;

            const bool IsNested = false;

            PublicKeyAlgorithmTag tag = m_encryptionKeys.SecretKey.PublicKey.Algorithm;

            PgpSignatureGenerator pgpSignatureGenerator =

                new PgpSignatureGenerator(tag, HashAlgorithmTag.Sha1);

            pgpSignatureGenerator.InitSign(PgpSignature.BinaryDocument, m_encryptionKeys.PrivateKey);

            foreach (string userId in m_encryptionKeys.SecretKey.PublicKey.GetUserIds())

            {

                PgpSignatureSubpacketGenerator subPacketGenerator =
                   new PgpSignatureSubpacketGenerator();

                subPacketGenerator.SetSignerUserId(IsCritical, userId);

                pgpSignatureGenerator.SetHashedSubpackets(subPacketGenerator.Generate());

                // Just the first one!

                break;

            }

            pgpSignatureGenerator.GenerateOnePassVersion(IsNested).Encode(compressedOut);

            return pgpSignatureGenerator;

        }

    }

}

It should be clear from the code above, but one concept that helped understand the implementation of the Bouncy classes was that they basically just creates a pipeline of streams. We expressed these as XXX ChainXXX(innerStream){} where the ChainXXX methods take the stream to wrap and returns the wrapped stream. Encapsulating this concept into small ChainXXX classes made the resulting code much more readable IMHO.

Comments, corrections and improvements are welcome as always…

kick it on DotNetKicks.com
Posted: Jan 23 2009, 12:14 PM by Kim | with 20 comment(s)
תגים:, , , ,
Building a Lean Development Machine

The time to rebuild my development machine had arrived again. I’ve been running Vista 32 bit for about a year. I was basically happy with Vista as a development machine, but something else had bothered me over the last couple of months. All the mundane crap that was clogging my system. For example, Skype, Trillian, ITunes, Hamachi, PDF readers, MS Office … The list goes on and on…

I want a super lean installation of Windows 2008 Server x64, Visual Studio 2008, Sql Server 2008 x64, TortoiseSVN, VisualSVN, CodeRush and WinMerge.

The choice of Windows 2008 Server is to match our production servers which are running Windows 2008 Server.

So that’s what I did. I installed the very short list of software above. I used windows-server-2008-workstation-converter to make the server more suitable as a workstation. Visual Studio is incredibly fast on the new installation. I have decent hardware, but it is still a notebook. (Some disagree with me that the Dell XPS 1730 is a notebook, but I’ll leave that as a dispute on definition of a notebook :-) )

For a notebook the hardware specs are pretty good.

image

In addition it has 2 X 200GB 7200rpm HDD running in a RAID0 configuration.

What about all the other software packages? Everything else I installed in a virtual machine running Windows XP Pro. XP running in a VM on Win2k8 x64 on this machine is so fast you don’t even notice it’s running in a virtual machine. Sweet!

 

kick it on DotNetKicks.com
More Posts Next page »