May 2009 - Posts
After learning about the performance characteristics of .NET Reflection, you might be tempted to say that “Reflection is bad for performance”. Roughly the same argument applies to string concatenation, out-of-process communication, context switching, virtual function calls…
If you follow this path blindly, you will end up saying that programming is bad for performance.
And as we all know, it is, but that doesn’t really lead us anywhere. What I’m saying is that common sense needs be applied, as always, even to statements like “XYZ is bad for performance”. Examples, good and bad:
A: We’re using .NET Remoting in our application to send the hourly report (consisting of over 100KB of data!) to a centralized server.
B: The performance cost of .NET Remoting is too high – there is serialization, data transfer and deserialization involved. You need to roll out a custom solution over TCP to transfer the data every hour, and preferably compress it.
Nonsense, of course (and there’s also a severe cost-effectiveness problem lurking in that statement, even if it were somewhat true). Awesome. This one was easy. Now, what about this one:
A: All the methods of my Vector and Matrix classes are virtual to maximize testability. Even the properties are virtual so that I can mock them away.
B: Virtual methods and properties that are potentially called in tight loops are hazardous – especially if they are very short and could be inlined otherwise. Make the critical operations non-virtual.
Hmm. This could be right, it mainly depends on the scenario. If I were writing a Vector class of my own, then I would probably expect it to be called in tight loops – that’s what these vectors are for, after all… So in this context, virtual methods might be bad for performance. All righty then, next:
A: I’m using Reflection to implement custom logging between the UI and BL layers of my application. Whenever a user initiates an action in the UI, my generic code inspects the MethodBase.GetCurrentMethod() return value and prints out the name of the method called to a log file.
B: Reflection is very bad for your health! The cost of retrieving the current method is overwhelming, and whenever you say Reflection you must forget about performance.
I call the B and S words on this one. That’s what you’re worried about? Reflection? You have a custom logging framework writing data to a file, and Reflection is your concern? And hey, how many times is the user going to click a button?! Reflection for an operation that happens maybe once a second is perfectly reasonable.
I would be the last person to advocate against designing an application for performance and testing the results afterwards. I’m just saying that if all you’ve got is the blind “this is bad for performance” hammer, then every programming problem becomes a nail. And that’s hardly a way to design or implement anything.
Another title I’ve considered for this post was:
When measuring something, make sure you’re really measuring it.
Micro-benchmarking is the art of measuring tiny operations, and as always when measuring something tiny – there’s the problem of making sure that you’re actually measuring it. Let’s take a couple of “trivial” examples, because usually trivial examples end up being more convoluted than the difficult ones.
Example 1 – Measuring the time of the getpid() system call
Assume that we want to measure the time it takes to trap into the OS kernel, and do so by measuring the time it takes to call the getpid() function lots of times in a loop. Depending on the system you try this on (I tried it on Ubuntu running in VMWare Server) you might find that getpid() takes no more than 10-20 clock cycles on average.
Does this mean that a system call costs 10-20 clock cycles? This does not align with our prior knowledge of what a system call is comprised of, but on the other hand – how can you argue with empiric measurements? (Read on to find out how.)
Example 2 – Measuring the time to copy memory using memcpy()
Assume that we want to measure memory bandwidth and latency by using the memcpy() function from the CRT library. We measure the time it takes to copy various block sizes and try to deduce an expression that describes memory bandwidth as a function of block size. If you do this naively, and again – depending on the system you try this on – you might find that the results are a scatter of {block size, time} pairs with no apparent order or meaning. Or you might even find that memory latency and bandwidth are exponential in block size.
Again, this does not align with our understanding of memory bandwidth and latency. The time to access a memory location (say, a system word) should not be dependent on the size of the block accessed. How do we argue with this measurement?
Hmm?
Well, apparently there’s an easy way out of the argument in both cases. In the latter case, it’s pretty obvious that what we’re measuring it cache access times and cache bandwidth and not memory bandwidth. And if you think about it, it’s actually not that easy to avoid the cache effects. And even if you avoid them somehow, there’s also TLB effects to take into consideration. But one way or another, writing a loop to call memcpy() on the same source and destination locations a million times is not a good way to measure memory bandwidth. There’s a difference between measuring the time it takes to call the memcpy() function – which is an accurate measurement – and measuring the underlying mechanism, which is not necessarily reflected by the time it takes to call memcpy().
In the former case, it’s apparent that the getpid() system call either (1) does not really occur (i.e. is optimized somehow) or (2) does not involve a system call. Indeed, in some libc implementations the process id is cached after the first call to getpid(), so it doesn’t actually involve a system call except for the first time it’s called. This again means that we’re not measuring the cost of a system call – we’re measuring the cost of calling getpid() in a loop.
Armed with the consequences of this discussion, what do you say about the following results (slightly altered to protect the innocent):
I wanted to measure the time it takes to perform a cast from a string to an object. So I wrote a loop and executed it 1,000,000,000 times. In the loop body I assigned a string variable to an object reference.
It took 550ms to execute the loop. I conclude that it takes 0.55ns (nanoseconds) to perform a single cast of a string to an object. (In the Debug build, it takes over 5 seconds. Bizarre.)
The likely flaw with this measurement is that the result is obviously way too low. On a 2GHz processor, 0.55ns is just short of 2 clock cycles. The overhead of actually running the measurement loop for each iteration is almost certainly more than 2 clock cycles, so it’s impossible for the overall measurement to be so fast unless…
- There’s no measurement taking place. It’s possible that the entire measurement loop is optimized away because it does, well, nothing? (Assigning the same string variable to the same object reference a billion times is considered nothing, indeed.) – or –
- There’s nothing inside the loop and the only thing measured was the loop overhead. (Now this is a tricky one, because due to compiler imperfections it might actually be possible that if you remove the loop body altogether, then the entire loop will be optimized away; but if the loop body stays, then the loop body is optimized away and the loop itself is not optimized away. This is why micro-benchmarking is hard.)
I will not do any finger-pointing in this post, but if you’re reading this and are considering to perform a micro-benchmark, bear in mind that the first most important thing to do is to make sure that you’re actually measuring something. The next step is to make sure that this something you’re measuring is actually the same thing that you want to measure. The analysis of the results – well, this is the subject of an entirely different post.
I’ll dare saying that you’ve probably already heard of the “XP Mode” lurking within the Windows 7 RC. Don’t get it wrong – it doesn’t mean that the Windows 7 OS can run in Windows XP mode (although there are compatibility shims for applications that can emulate whatever version of Windows that you want). It means that the new Virtual PC Beta, installed on top of Windows 7, can provide a seamless experience for installing and executing applications on a Windows XP virtual machine. The seamless experience is achieved by showing the application window “outside” the virtual machine – similar to the UI that Parallels and VMWare Fusion (Unity) provide for the Mac.

I wouldn’t be telling you this rehashed piece of news if I had nothing to add. My significant other’s PC, running the Windows 7 RC as of a few days, has a (fairly) expensive scanner attached to it – an HP Scanjet 5590.
Unfortunately, our attempts to install the custom scanner software on Windows 7 were doomed to failure. I tried the Vista compatibility mode, the XP compatibility mode – nothing seemed to help. The funny thing is that the drivers automatically found by Windows Update were perfectly good – but the custom scanner software has the ultimate “Scan to PDF” option, which streamlines the process of scanning several double-sided pages to a single PDF without having to combine individual image files by hand.
So what I did was install the Virtual PC Beta on this machine and launch the preinstalled Windows XP virtual machine. It booted up, configured with 256MB of RAM and a network connection, and then all I had to do is attach the scanner (connected to the physical host via USB):
… and install the custom HP scanner software. Subsequently, I shut down the virtual machine and voila – the Start Menu of the Windows 7 host shows the custom scanner application installed within the virtual machine:
Next, I launched it, and after waiting for a few seconds we have the XP application showing a window on the Windows 7 desktop, and working perfectly, scanning pages and pages of text as if it were running natively on the physical host.
The meaning of this for application compatibility is probably best described as a silver bullet. While I find it hard to imagine that home PC users will find it convenient to install a custom virtualization solution, install Windows XP within that solution and then install applications on that copy of Windows XP – I find it perfectly easy to believe that home PC users will be able to install applications in the preinstalled virtual Windows XP environment and use them directly from the Windows 7 Start Menu. Wow.
The title might imply that this is a question I’ve been pondering about, but in fact I’ve made up my mind a long time ago. However, I’ve been asked this question multiple times by colleagues (who are usually developers) and friends (who are not necessarily developers). Therefore, I decided to write down my answer once and for all.
I currently have 6 physical machines at home running various Windows 7 builds, as well as multiple virtual machines for testing and other purposes. All in all, I have builds of Windows 7 ranging from the M3 PDC build (6801) through the RC build that was released a couple of days ago.
My primary work laptop has Windows 7 installed in a dual boot configuration with Windows Vista. However, the main reason I still have Windows Vista installed on it is that I’m way too lazy to clean that partition, not because I’m still using it.
Yes, there’s probably some hardware out there that doesn’t work quite well on Windows 7 (yet). I personally didn’t have any driver problems with Windows 7, and I still keep seeing updates to video, network and various other drivers pouring from Windows Update.
Yes, there are also some applications out there that don’t work so well on Windows 7. I’ve had my share of problems trying to install and run Virtual PC on a 64-bit Windows 7 Beta; however, the RC build and the new Virtual PC Beta seem to be getting along peacefully. Office, Visual Studio and the other handful of applications that I need on a daily basis work seamlessly on Windows 7, and the bunch of new features, reliability improvements and performance work make the transition fully worthwhile for me.
Another thing to bear in mind when you’re making your decision is that the upgrade path from Beta to RC or from RC to RTM is neither recommended nor supported. There is a workaround that makes it possible to perform the upgrade from earlier Windows 7 builds, but if you encounter any problems in the process there won’t necessarily be a way to fix them.
These are my two cents; I obviously don’t assume any responsibility for your time or efforts installing and trying out Windows 7. However, I can tell you with absolute honesty that I will never want to install Windows Vista again. :-)
After reading Mike Taulty’s post “Metadata Classes – A Force for Good or Evil?”, I realized that this is something that I was highly annoyed with in the past, and never got a chance to write anything about.
If you haven’t seen them yet, “metadata classes” as Mike refers to them are a way to extend the metadata of code that doesn’t belong to you. For example, in ASP.NET Dynamic Data you get a set of tool-generated types and can’t decorate them with attributes directly because these attributes will be deleted when the code is re-generated. So what do you do? You write another class, echo the properties of the original class, and decorate them with the attributes you wanted to put on the original class’ properties.
In my humble opinion, this is a coding horror. (Not in the popular blog sense, in the original Steve McConnell sense.)
It’s perfectly fine with me that the class metadata is not specified on the class itself, because it’s auto-generated (although in scenarios like these I would often copy and maintain the auto-generated code myself – I hate to see auto-generated code in my project that I don’t fully control). My problem is with the fact that you have to write another class which has only one purpose – to be a metadata container for someone else. How are developers supposed to learn object-oriented programming principles if they are forced to write a class that has absolutely no meaning in the domain model, and serves only as a container of attributes?
Don’t get me wrong – this isn’t a trivial problem to solve. Alon Fliess and I have had the pleasure of developing an AOP framework in which metadata on types could be expressed as .NET attributes or as an XML configuration file, which contains serialized attributes mapped to the type/method/property which they decorate. The mechanics of combining this metadata, propagating it across classes and interfaces and other challenges were at the core of this framework, and enabled us to build exciting features.
I’m not suggesting that you should implement a framework for combining attributes and configuration every time you need metadata in your code. Regardless, the idea of having a class which serves as a metadata container rings the coding horror bell for me. If you have other ideas how this knot can be untied, please feel free to use the comments or write another post. :-)
Simian (Similarity Analyzer) detects duplicate code in almost every human readable text format.
The primary case brought forward by the product webpage is that you can correct a bug in one place in the program, without knowing that the same buggy code was copy-pasted to another location in the program.
But that’s just a tiny fraction of the scenarios where “reuse by copy-paste” detection can improve the quality of your code. Additionally, I can think of even more esoteric but interesting scenarios, such as plagiarism detection.
Finally, you could address the output of Simian as a code quality metric, and even consider it a build-breaking metric. Some amusing figures:
- From the Simian webpage: Running against a large source base such as the entire 390,309 LOC […] in 4,242 files of the JDK 1.5.0_13 source, identified 66,375 duplicate LOC in 1,260 files […]
- When I ran Simian on the sources of the CLR Profiler 2.0, it found 2,928 duplicate lines out of a total of 17,714 lines.
- When I ran Simian on a small part of the sources of the latest version of ReactOS, it found 8,138 duplicate lines out of a total of 55,951 lines.
Why don’t you try running Simian on your own source code?