XPerf – Windows Performance Toolkit

March 15, 2008


Event Tracing for Windows has been with us since Windows 2000.  It is an infrastructure for raising events from various system components, and has only been used by a small number of kernel-mode entities.  In Windows XP, MOF files (familiar from WMI provider metadata) were used to describe events.  Finally, in Windows Vista and Windows Server 2008 events were described by XML manifests, an investment was made in popularizing ETW, and hundreds of new event providers were added.

What kind of information is generated by all these providers?  Well, first of all, there’s the Windows Event Log which consumes some of the information generated by ETW providers (but not all).  So we get all kind of diagnostic messages on things happening in the system.  Another provider is the Performance Monitor, which features the ability to query a collection set of ETW events.  Integrating these various sources of information is not an easy task, especially if you are alternating between analyzing a system as a whole and analyzing a set of specific applications within the same trace.

It is this integration that has led to the birth of the Windows Performance Toolkit.  It features the data collection and integration tools necessary to interpret and utilize ETW output correctly.  It is specifically constructed so that lots of information can be viewed in a coherent fashion, and so that a system-wide image of what’s going on can be obtained.  Additionally, through the use of the kernel sampling interrupt, a global sampling profiler is available (including call stack analysis).  And best of all?  It’s completely free!

So let’s take a look at some of the abilities of this new toolkit.  First of all, you need to install it from WHDC.  Note that the installation is only supported on Windows Vista SP1 and Windows Server 2008 (I got it to be almost fully functional without Vista SP1, but caveat emptor).  Data collection can be performed on a Windows XP SP2 or Windows Server 2003 system (you only need xperf.exe and perfctrl.dll for that), but the trace decoding can only be performed on NT 6.0 and higher.

Let’s start with a sample global data collection.  (By the way, if you’re not into my walkthroughs, the toolkit comes with an extensive set of documentation – online and offline.  The starter document is 65 pages, which gives you an idea of how big this thing really is.)

I went to an administrative command line prompt on my Vista, navigated to the toolkit’s installation directory (C:\Program Files\Microsoft Windows Performance Toolkit by default) and typed:

xperf -on Base

This enables a set of ETW providers to publish their events (the exact set of providers can be seen by typing xperf -providers).  These events are then collected and written to temporary buffers.  Note that these temporary buffers might grow very large, so if you plan to perform data collection across a long period of time, you better have some free disk space and lots of memory available on the box.

I then proceeded to write a few lines in this post, open a couple of programs, and then went back to the command line and typed:

xperf -d result.etl

This disables the selected ETW data collection and writes the raw results to the result.etl file (note that this command might take quite some time to complete).  These results are, well, raw, so they need to be analyzed soon.  We can accomplish this via the command line by launching a set of actions on the output file, or using the GUI Performance Analyzer tool (xperfview) which is part of the toolkit.

An example of what you can accomplish on the command line can be demonstrated by typing the following action sequence:

xperf -i result.etl -o proclist.txt -a process

This generates a list of processes active during the trace into the proclist.txt file.  Here’s some of the sample content from my box:

Start Time,   End Time, Process,            DataPtr,     Process Name ( PID),  ParentPID,  SessionID,          UniqueKey
       MIN,        MAX, Process, 0X0000000000168EE0,             Idle (   0),          0,          0, 0x0000f80002150400
       MIN,        MAX, Process, 0X00000000001693C0,           System (   4),          0,          0, 0x0000fa8001897040
       MIN,        MAX, Process, 0X00000000001A17A0,      svchost.exe ( 356),        708,          0, 0x0000fa8004ce4040
       MIN,        MAX, Process, 0X0000000000171080,         smss.exe ( 520),          4,          0, 0x0000fa800455f610
       MIN,        MAX, Process, 0X00000000001712B0,        csrss.exe ( 616),        604,          0, 0x0000fa8004804c10
       MIN,        MAX, Process, 0X00000000001904A0,      svchost.exe ( 620),        708,          0, 0x0000fa8004d2ac10

Another example of command line info generation would be:

xperf -i result.etl -o diskio.txt -a diskio

This gives you some disk I/O statistics for the duration of the trace.  Here’s the sample from my box:

Start Time,  End Time,    Read,   Write, Usage %
         0,   1000000,      18,      30,   51.37
   1000000,   2000000,       0,      11,    2.44
   2000000,   3000000,       0,       2,    1.05
   3000000,   4000000,       1,       4,    4.26
   4000000,   5000000,       1,      15,   20.17
   5000000,   6000000,       0,       4,    1.26
   6000000,   7000000,       0,       7,    2.19
   7000000,   8000000,       1,      30,    2.02
   8000000,   9000000,       9,       3,   13.47

Finally, if we’re interested in a graphical interpretation of everything, just go ahead and launch the Performance Analyzer (xperfview) and then navigate to the trace file, or just type:

xperf result.etl

…from the command line.  Here’s what the initial output looks like on my system:


That sure looks like a lot of useful data!  The first graph is CPU utilization by process, the second one is disk utilization by process.  You can filter some processes out, of course:


…and you can zoom in, select, make the graphs overlay so you can see the information side by side:



And then there’s the summary table feature, if you just right click on the graph you’re interested in and choose “Summary Table”:

image  image

…and if you look at the Disk I/O summary and right click for a Detail Graph, you get the breakdown of specific I/O requests and can even visually see fragmented file access if present (!):


Note how you can see the volume on the right (C and D in my case), and if you hover on each individual dot you see the information on the process making the request and the actual file path.  On the left you see the disk location information (offset in bytes from the disk start), so that if there are jumps across the disk all the time, we have some fragmented non-sequential access going on.

And then there’s the page fault details, to the level of the file being requested, the size of the I/O request, the average time spent servicing the request, the total I/O time, etc.:


Bear in mind: we have seen nothing yet.  There are some preconfigured profiles for analyzing standby/resumption, hibernation, boot, application startup, network activity, and many other scenarios.  You can get events generated on virtual memory allocation, power management events, registry access, driver events, system calls, interrupts, DPCs/APCs, context switches, what not.  (Friendly reminder once more: there’s documentation available both online and offline, go and read it right away! :-))

However, there’s another piece of coolness to see if you’re still with me and interested in what else the tool is capable of.  Using the kernel sample profile interrupt, the ETW system can capture the instruction pointer and the stack during trace execution.  This data is logged to an ETL file, and can then be analyzed to see what kind of call stacks your application encountered.  (This also gives away the information on performance bottlenecks – i.e. where your application spends the most of its time.)

So what I did to demonstrate this was write a very simple application which displays prime numbers in a given range.  It’s written in a very inefficient way so that it takes quite some time to execute.  To profile what’s going on inside, I went to the administrative command prompt again and typed:

xperf -on SysProf

Note for advanced users: “Base” would also have worked, because it also enables profiling.  If I’m not interested in collecting disk I/O and memory statistics, however, then I’m better off with “SysProf” (which resolves to PROC_THREAD+LOADER+PROFILE).  Without PROC_THREAD and LOADER you won’t get reliable results.

Then all I had to do is launch my application, let it run to termination, generate a results file as before (xperf -d …) and launch the Performance Analyzer to see what we got.

The first and foremost thing that you have to do at this point is have debugging symbols configured properly.  You could spend hours on trying to understand what’s wrong with what you have done, only to discover it’s due to symbols improperly configured.  To get symbols for Windows code lined up properly all you need to do is set up the _NT_SYMBOL_PATH system environment variable (My Computer -> Properties -> Advanced System Settings -> Environment Variables) to the following string:


You can replace C:\Symbols with any other downstream store on your disk that you prefer (or you could omit this section entirely).  What this tells the symbols engine is that whenever symbols for Windows are needed, they are automagically downloaded from Microsoft and cached at the downstream store (C:\Symbols).  Yes, this requires a connection to the Internet.  Yes, the symbols will be re-downloaded if Windows updates are installed or a service pack is deployed.  Yes, if you’re offline, you could also download the entire cache and install it, but whenever something is updated you risk getting out of sync.

For your own code, you can prepend the path to the PDB files to the above string, for example:

set _NT_SYMBOL_PATH = C:\Code\MyApp\release;%NT_SYMBOL_PATH%
''' OR '''
set _NT_SYMBOL_PATH = C:\Code\MyApp\release;SRV*C:\Symbols*http://msdl.microsoft.com/download/symbols

When you finally launch the tool, before you go ahead and open any graphs, go to the Trace menu and click Load Symbols.  This might take some time now or later, depending on what you’re trying to do.  Essentially, it might be downloading symbols for the entire system for the first time, which could take several minutes or even more on a slow connection.  On my system, there were 142 modules for which symbols were downloaded, occupying a total of 93MB disk space.

After you’ve got symbols loaded, choose a region on the graph or the entire graph, right click and select “Summary Table”.  Use the column chooser on the left to group by Process and Stack, and show you the Weight and the Weight%.  Note that you can do this for any process, not just for your code – all you need is symbols (which you can have on debug and release builds nowadays with no problem at all).  For my prime numbers application on my system I get the following output:


So it seems that MyApp.exe!IsPrime took 38.26% of the weight in this profiling session.  printf, on the other hand, took 0.71% of the weight.

Note we can’t deduce the execution time from this output; we can only see the relative weight a function had (i.e. the number of samples where this function was on the stack compared to the total number of samples taken by the profiler).

After optimizing the code a little bit (so that prime numbers are calculated more efficiently) and running it again under the profiler, here are the results:


The code was so optimized that there are no samples for the IsPrime function anymore (it was not inlined – we’re talking about a debug build), and printf is responsible for 2.92% of the weight.

Yes, there is some room for improvement here because commercial tools like Microsoft’s own Visual Studio Profiler give you so much more, including the ability to compare performance reports, analyze managed code, use a convenient API to determine when profiling should start and when it should end, profile memory allocations etc. – but this one is for free and it’s part of an integrated suite of lots and lots of additional functionality.  Additionally, if you look closely into the output, you can see that the grouping performed by this profiler can be done by the stack trace and not the function – so you can see how many times your function or sequence of functions was called in a particular order and manner.  This is something even commercial profilers make a difficult objective to achieve.

Just one final note before you go ahead and try it: stack profiling works only on the 32-bit editions of Windows Vista or Windows Server 2008.  To be more accurate, I couldn’t figure out how to make it work on a 64-bit system and gave up (and the following blog post gives me some hope that one day we’ll figure it out).  All I could get on a 64-bit system is just a list of functions getting called with their sample times and counts (which is great), but not the detailed elegant call stacks you can get on a 32-bit machine.

So get your hands on the Windows Performance Toolkit and try it out!  Let me know what you discovered.

Add comment
facebook linkedin twitter email

Leave a Reply

Your email address will not be published.

You may use these HTML tags and attributes: <a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code> <del datetime=""> <em> <i> <q cite=""> <s> <strike> <strong>



  1. Michael MilirudDecember 27, 2008 ื‘ 4:59 AM

    Glad to see you are enjoying xperf ๐Ÿ™‚

    x64 ETW stackwalking is only supported on Vista SP1, Windows Server 2008 and above, and requires setting a certain registry value to 1 and rebooting the machine (so that the kernel picks it up).

    Here are the necessary control scripts:


    —-8<---- @REG QUERY "HKLM\System\CurrentControlSet\Control\Session Manager\Memory Management" -v DisablePagingExecutive ----8<---- TurnOnStackwalk64.cmd: ----8<---- @REG ADD "HKLM\System\CurrentControlSet\Control\Session Manager\Memory Management" -v DisablePagingExecutive -d 0x1 -t REG_DWORD -f @IF NOT %ERRORLEVEL% == 0 echo error: Could not configure system for 64-bit stackwalking. Please run this script from an elevated administrator console. ----8<---- TurnOffStackwalk64.cmd: ----8<---- @REG ADD "HKLM\System\CurrentControlSet\Control\Session Manager\Memory Management" -v DisablePagingExecutive -d 0x0 -t REG_DWORD -f @IF NOT %ERRORLEVEL% == 0 echo error: Could not remove 64-bit stackwalking configuration. Please run this script from an elevated administrator console. ----8<---- Have fun tracing, ๐Ÿ™‚ Michael

  2. hector yuenJuly 9, 2009 ื‘ 1:36 AM

    question: is it possible to add log filters? for example, I want to log just DiskReads, and I don’t care about the rest of the events.

  3. DanApril 14, 2010 ื‘ 7:13 PM

    Xperf is really great, I have used it to debug some tough resource issues. I am moving onto designing some new components and want to instrument them with ETW. Is there a way once this is done to allow Xperf to log the information from my components so that I can view them in the viewer alongside the other data? For example, I’d like to place some marks in the Generic Events using ETW, plus have some extra statistics that are applicable to my application available as well. Is this custom integration with Xperf viewer possible (i.e., through a plug-in)? I’ve been looking and I can’t find this information. Thanks!


  4. Jaroslav KarlikMay 25, 2011 ื‘ 1:44 PM

    Guys, can somebody post a link to download xperf.exe and xperfviewer.exe? I just downloaded WPT 4.7 and it just not contains xperf.exe anymore. Instead of this i can find there WpfPerf.exe which looks and behaves differently…

    thanks for help

  5. Sasha GoldshteinMay 29, 2011 ื‘ 4:38 PM

    @Jaroslav: You need to download and install the Windows 7 SDK and then are installers for x86, x64, and ia64 in the Bin directory. If you use the Windows SDK web installer, I’m pretty sure you don’t have to download more than a few dozen megabytes.

  6. Alexander SafronovJune 3, 2011 ื‘ 11:10 PM

    Didn’t you have to run it as “xperf -on SysProf -StackWalk Profile” so that it includes stack traces in the output? without “-StackWalk Profile” i think it would show you only the functions, without who called who.

  7. NPJanuary 12, 2015 ื‘ 9:05 PM

    I am struggling to buffer the trace under winxp, however the parameters work fine for win7.

    xperf.exe -on Diag -BufferSize 1024 -MinBuffers 32 -MaxBuffers 32 -ClockType PerfCounter -Buffering

    Is buffering mode not supported under winxp?

    Here the error I get under winXP SP3.

    xperf: error: NT Kernel Logger: The specified path is invalid. (0xa1).


    1. Sasha Goldshtein
      Sasha GoldshteinJanuary 15, 2015 ื‘ 6:11 PM

      TBH, I don’t know. But the parameters obviously are the same on all OSs, so if it doesn’t work, it might be unsupported.