DCSIMG
September 2008 - Posts - IronShay

September 2008 - Posts

The GetHashCode Method

I'm currently learning for a test I have this week in Data Structures course. I was reading about Hash tables when I realized - .Net has this "GetHashCode" method that I had always ignored, maybe I can learn for my test and C# on the same time!

I immediately opened VS and created the following class:

public class Student

{

    public string Name { get; set; }

 

    public override int GetHashCode()

    {

        return 1;

    }

}

As you can see, the GetHashCode will return the same value for every instance of the class. This is bad, very bad actually. Hash functions are all about spreading the data in a uniform way. This is really isn't the case here... Let's see how such a thing can effect performance.

I used the 3.5 new added collection - HashSet<T>. I guessed that my bad hash function will shine the most when using a hash-based collection. HashSet should be very effective - Add, Remove and Contains should run in O(1) time... this is based, though, on the uniformity assumption - which is not correct in this example.

I ran the following code:

static void Main(string[] args)

{

    HashSet<Student> t = new HashSet<Student>();

 

    Stopwatch s = new Stopwatch();

    s.Start();

 

    for (int i = 0; i < 10000; i++)

    {               

        t.Add(new Student());

    }

 

    s.Stop();

    Console.WriteLine("It took {0} ms", s.ElapsedMilliseconds);

}

And what I got was this horrifying result:

It took 3901 ms

Almost 4 seconds to insert only 10,000 records! that's A LOT!

I decided to fix it so I changed the Student class:

public class Student

{

    public string Name { get; set; }

    public int ID { get; set; }

 

    public static int id = 0;

 

    public Student()

    {

        ID = ++id;

    }

 

    public override int GetHashCode()

    {

        return ID;

    }

}

I ran the testing code again and got the following result:

It took 12 ms

Wow! The improvement in the GetHasCode method resulted in 325% execution time improvement!

Why did it happen?

In short, a hash collection uses an array to index its members. Each array item contains a pointer to the actual item. The index of the array item is calculated using the hash function (in C# - the GetHashCode method). When the hash function returns a unique index for each item, there's no problem and the array index can be used to point for the unique instance. The problem starts when the same hash code exists for 2 different instances. This is when the fun begins!
When something like this happens, the process to find a new index is called "collision resolution" and it has several possible implementations (I don't know what was used in the HashSet implementation). As faster the collision is resolved, the faster the code will run.

My GetHashCode implementation in the first example wasn't uniform at all, collisions couldn't be resolved with ease so what I did, actually, was taking all the good things of HashSet to their worst peek...

This was a real short explanation of hashing... If I write something like that in my test tomorrow I'll probably fail... If you want to read more about hashing and collisions, wikipedia is a good place to start.

In conclusion

The GetHashCode method can effect performance dramatically. It'd better to leave it alone and let the .Net framework calculate it for you. If you are really into writing your own one, choose a really good implementation that will keep your indexes uniform enough.

All the best,
Shay

Share this post :
Posted by shayf | 4 comment(s)

.Net Tip: Convert a String to Title Case

If you have a string like "hello world" and you want to use it in a title, you'll probably need to convert this string to "Hello World". The .Net framework gives us this ability out-of-the-box using the TextInfo.ToTitleCase method:

string helloWorld = "hello world";

Console.WriteLine(System.Threading.Thread.CurrentThread.CurrentCulture.TextInfo.ToTitleCase(helloWorld));

This will end up printing "Hello World" on the screen.

Enjoy,
Shay.

Posted by shayf | 2 comment(s)

Office Tip: How To Show Internal Application Dialogs

If you want to show an application dialog (for example, the "Save As" dialog) from your own code, all you have to do is (the following is a Word object model example):

Application.Dialogs[WdWordDialog.wdDialogFileSaveAs].Show(ref Type.Missing);

The WdWordDialog has, I guess, all of the available dialogs one can use in Word... Use it generously!

By the way, this will work with any Office application you're playing with. The differences will be the dialog enum name (and values of course) and the parameters you need to send to the Show method.

All the best,
Shay.

Posted by shayf | with no comments
תגים:,

Dynamic Languages and the .Net Framework

Before we start, I want to get you into the right mood...

Think about Che Guevara,
 

Think about a revolution,
 

Think about a whole new world

Now that you're in the right mood...

Dynamic languages on top of the .Net framework is the new and fresh breeze from Microsoft. The DLR, the dynamic languages runtime, is written on top of the CLR and provides centralized support for dynamic languages that are written on top of it.

What is in it for me??

Ohhhhhhh a lot! Let's explain a bit more:

For scripting guys - think about using the features that the .Net framework gives to static languages - System.Windows.Forms, WCF, System.Diagnostics and more...

For web developers (Silverlight too!) - think about writing JavaScript for the client side code and the server side code. Master one language only in order to write web applications! do you feel that breeze on your face?

For all developers - writing tests is not the most fun task on earth, won't it be much easier to use a dynamic language for that? we will have the dynamic features while testing our static language written classes. Ohhh yea!

Well, there is a lot to come, a lot of new opportunities and great implementations! Get ready!

Want to hear more and see some very very cool demos? Vote for my session at the Dev Academy conference - "Dynamic Languages and the .Net Framework" on the sessions survey.

Viva La Revolucion!
Shay.

Posted by shayf | with no comments

Presentation, code and resources from my session at the Office User Group

First I'd like to thank everyone who attended my session at the Office User Group, it was a great fun!

I'm going to write some more posts soon that will each target a specific subject from the session. So stay tuned!

The files from the session:

Some Q&A from the session

Q: Can I add building blocks programmatically?
A: Yes you can. It can be done using the Add method on the BuildingBlockEntries collection. Read about it here.

Q: Must I install .Net Framework 3.5 with my VSTO 3.0 solution?
A: Yes. VSTO 3.0 is written in .Net 3.5 and it is a prerequisite for VSTO 3.0 solutions.

Q: Does the deployment feature of VSTO 3.0 creates a setup.exe?
A: Yes it does. The setup uses ClickOnce though. In order to create an msi file, you'll have to create a setup project which is not a complicated task as well. Read about it here.

More resources

Office Development Center
OBA Developer Portal
VSTO Blog
VSTO Forum - a great place to look for answers!

So again, thanks for attending,
If you have any questions regarding the session, the samples or any other thing you're more than welcome to contact me and I'll try to do my best,
Shay.

Posted by shayf | with no comments