In the previous installment we’ve seen how tinkering with the prefetch settings on Windows XP improved the cold startup performance of a client application. Unfortunately, the improvement wasn’t significant enough and we were called in for another round of thinking.
After looking at the virtual memory address space of the process we found out that there were many relocated DLLs. We could easily determine this with Process Explorer, which shows you the image base address and actual base address for each DLL loaded into the process. Lots of relocations cost you some startup time, and seeing that there were ~300 DLLs involved, there was room for improvement.
So we rebased every single DLL that the application was using (with the rebase.exe utility from the SDK) and did some measurements again. Unfortunately, the results were inconclusive. Additionally, it was hard to tell whether the observed performance improvements were due to the rebase operation itself, or due to the fact that after rebasing the binaries were no longer signed (rebasing changes the file contents).
Another thing we tried with more success was an open-source executable compressor (packer) called UPX. It’s fairly trivial to use, and in this case it managed to cut the size of the application’s binaries by approximately half. UPX doesn’t support managed assemblies yet, but it doesn’t fail when encountering them.
As a result of using a packer, we managed to transform some of the cold startup I/O work to CPU work. Seeing that the CPU was not fully utilized during startup, this has shaved off another couple of seconds in a fairly consistent fashion, and with hardly any effort.