All Your Base Are Belong To Us

Mostly .NET internals and other kinds of gory details

March 2008 - Posts

TechEd 2008: Next Generation Production Debugging

My TechEd Eilat 2008 session titled Next Generation Production Debugging (webcast promo link) will be held on Monday, April 7, at 17:30-18:45 in the Hilton hotel.

This is going to be a hardcore session where I will strive to show you as many cool tools as possible to make sure you're going to enjoy your production debugging like never before!  We will be looking at a client-server application with dozens of different issues, and learn to:

  • Leverage some non-debugging tools before we dive in and start debugging like there's no tomorrow;
  • Take a dump of a production process and analyze it on your development machine without interfering with the production machine's activity;
  • Disentangle memory leaks from various perspectives using at least 3 different kinds of tools;
  • Take deadlocks apart by using built-in OS mechanisms and CLR deadlock detectors;
  • Use automatic verifications against our running code to catch inconsistencies and errors that we haven't been aware of;
  • Take a bird's view at the performance characteristics of our application and drill down into the most intricate of details using a completely free toolset.

If time doesn't permit us going into each of these subjects in detail, here's a promise: After the session, I will post a detailed walkthrough on every and each of the demos, including code, screenshots and resources.  I will also link to every tool I used.

If I haven't whetted your appetite yet, take a look at my DevAcademy2 session recording and my post on debugging and investigation tools to get a brief approximation of just how much fun we are all going to have.

Other sessions by Sela's lecturers that I will certainly be looking into are:

  • Alon Fliess will make sure we finally understand how ORPC is different from SOA and what kind of bridges we have between these approaches;
  • Manu Cohen-Yashar will guide us through SOA concepts with ESB guidance, workflows, WCF, communications security, identity management and more;
  • Noam King will show us some sexy web applications written using ASP.NET MVC framework and Dynamic Data Controls;
  • Tomer Shamam will teach us about data binding in WPF and just how powerful can this approach be with regard to building cleaner code;
  • Alex Golesh (and Tamir Khason) will excite us all with WPF, Silverlight and... XNA!  They are promising a cool game built on stage and I believe them.

See you in Eilat!

Snooping the Contents of a Password Edit Control

image

Did you ever get a chance to blankly stare at a screen similar to the above, trying to recollect what your password really was?  Security is great, and so is "Save password"; you try snooping for the application's configuration file or the registry where the password might be stored, only to find the application is storing it encrypted.

If you're determined enough, you could start searching the process memory for strings to see if the password is stored somewhere in plaintext form.  Or, if you're really determined, you could set a breakpoint in the window's window procedure, click OK and debug from there to get to the point where the password is actually being used (without symbols, of course, this is not going to be very easy).

So what's the alternative?  Well, if we could write some code...

Clearly the application in question has access to the contents of the password box.  How does it get access?  Well, eventually it all boils down to the WM_GETTEXT window message being sent to the edit control.  (Note: this isn't the case for every single edit control - for example, in WPF there's no window in the Windows sense of the word behind the control; in Internet Explorer the password input field is not a window either.)

So why don't we send that message to the window and get the password out as a result?  (Preemptive disclaimer first: this is not a full-blown solution, just an attempt on my behalf to show a nice aspect of Win32.)

First of all, we'll need the window handle (HWND), which is easily obtained using a tool like Spy++ (part of Visual Studio tools).  Just click Alt+F3 from within Spy++ and navigate to the window:

image

So now we could theoretically write a program that sends the WM_GETTEXT message and gets the password, right?  (Dramatic suspense.)

Wrong.  It would be too easy for malware to intercept passwords if this were the case (it's still easy, but at least not that easy), and therefore the password edit control will not give us the text if we're sending the message from another process.  Here's what happens when we use SendMessage to send the edit control the WM_GETTEXT message:

image

So this clearly isn't going to work.  But why doesn't it work?  Because we're trying to send WM_GETTEXT across processes.  What if we could somehow call WM_GETTEXT from within the target process?

That would be a totally different game.  But how can we do it?  Windows gives us the facilities to create a thread within another process, provided that we have the appropriate permissions to access that process.  That other thread can send the WM_GETTEXT message within the context of the target process and retrieve the password for us.

So here's the code for the thread I'd like executed in another process:

DWORD WINAPI ExtractPassword(LPVOID fromWindow)

{

    HWND hwndPassword = (HWND)fromWindow;

    char* lpszBuffer = new char[1024];

    SendMessage(hwndPassword, WM_GETTEXT,

        (WPARAM)1024, (LPARAM)lpszBuffer);

    return (DWORD)lpszBuffer;

}

With the obvious limitation of not working on 64-bit (casting pointer to a DWORD), it seems all right.  (I do care deeply about 64-bit; it's just that returning the pointer from the thread is the simplest alternative which will prove reusable later on.  So this is for methodical purposes only.)

However, this still isn't that simple.  When we're off creating a new thread in the target process, what would be that thread's start routine?  You might be tempted to say, this "ExtractPassword" function I've just written earlier.  But take a look at CreateRemoteThread's parameters:

image

It's fairly clear that we are passing a pointer to a thread routine.  But that pointer is in another process' address space!  So if ExtractPassword happens to be mapped at some virtual address (call it X) in my address space, it doesn't mean that it's mapped at address X in the other process, or that it's mapped there at all!

Attentive readers, you're right!  The same can be argued about the use of the SendMessage function.  It just happens to be part of USER32.DLL, which is very likely to be mapped in the target process, and even very likely to be mapped at the same address as in our process.  (But this is not always true, for example on Vista the Address Space Layout Randomization feature might render this assumption invalid.  Again, for methodical purposes we will forget about this limitation for now.)

So what we need to think about next is getting the code for ExtractPassword in the remote process.  This can be accomplished by multiple ways, one of them being just writing the function's code into the process' memory.  This will require us to know the function's size in advance (not very difficult, but requires some work).  Another alternative is simply putting the function in a DLL and getting that DLL injected (loaded) into the target process.  If we were worried about attack detection, we could unload that DLL later on (but of course, we aren't).

So, first things first, we need to put the function in a DLL and get the DLL loaded in the target process.  I changed the function signature a little bit so that it's exported from the DLL and so that its name is not mangled:

extern "C" DWORD __declspec(dllexport) DoWork(LPVOID lParam)

{

    HWND hwndPassword = (HWND)lParam;

    char* lpszBuffer = new char[1024];

    SendMessage(hwndPassword, WM_GETTEXT,

        (WPARAM)1024, (LPARAM)lpszBuffer);

    return (DWORD)lpszBuffer;

}

Once we have this in place, we just need to load the DLL into the target process by using a combination of CreateRemoteThread and LoadLibrary (again, note that LoadLibrary is part of KERNEL32.DLL and therefore highly likely to be present at the same load address as in our process, except with ASLR).  Note that LoadLibrary takes as a parameter the DLL name, so we have to write it into the target process' memory first.

//Open a handle to the process

HANDLE hInjecteeProcess = OpenProcess(PROCESS_ALL_ACCESS,

    FALSE, dwInjecteeProcessId);

//Allocate memory for the DLL name

LPVOID lpszDllName = VirtualAllocEx(hInjecteeProcess,

    NULL, 4096, MEM_RESERVE|MEM_COMMIT, PAGE_READWRITE);

//Write the DLL name into the process

const wchar_t InjectedDllName[] = L"InjectedDll.dll";

DWORD dwBytesWritten;

WriteProcessMemory(hInjecteeProcess, (LPVOID)lpszDllName,

    (LPCVOID)InjectedDllName, wcslen(InjectedDllName)*2+1,

    &dwBytesWritten);

//Create a remote thread to load the DLL

HANDLE hRemoteThread = CreateRemoteThread(hInjecteeProcess,

    NULL, 0, (LPTHREAD_START_ROUTINE)LoadLibrary,

    lpszDllName, 0, NULL);

WaitForSingleObject(hRemoteThread, INFINITE);

//Wait for thread completion and get DLL base

DWORD dwExitCode;

GetExitCodeThread(hRemoteThread, &dwExitCode);

LPVOID lpDllBase = (LPVOID)dwExitCode;

//Close the handle to the remote thread

CloseHandle(hRemoteThread);

Note the steps here: first of all, we open a handle to the process requesting PROCESS_ALL_ACCESS rights (we need the process id for that, but Process Explorer or Task Manager can help us discover it easily).  Then we allocate memory within the target process to store the DLL name, which we proceed to write into that memory.  Finally, we create a remote thread executing the LoadLibrary function for us (note that coincidentally, the signatures for LPTHREAD_START_ROUTINE and LoadLibrary are the same on 32-bit, even though the parameter types don't seem to be the same), and take that thread's exit code.  The thread's exit code is the return value from LoadLibrary, which is an HMODULE.  It's not really a handle to anything, it's just the base address at which the DLL was loaded in the target process.

So at this point we have the DLL loaded, now we need to create another thread which will execute the DoWork function (previously called ExtractPassword) for us.  But this is another dangerous call, because DoWork in our process might be loaded at a completely different address than at the target process.  However, we know where the DLL was loaded in the target process, and we can discover the offset of the DoWork function from the DLL's base address, so we can calculate the appropriate address within the target process, like so:

//Load the DLL in our process

HMODULE hDll = LoadLibrary(L"InjectedDll.dll");

//Get DoWork's address and calculate offset

FARPROC lpMyFunc = GetProcAddress(hDll, "DoWork");

DWORD_PTR dwOffset =

    (DWORD_PTR)lpMyFunc - (DWORD_PTR)hDll;

//Calculate address in remote process

LPVOID lpRemoteProc =

    (LPVOID)((DWORD_PTR)lpDllBase + dwOffset);

//Create thread with that address

hRemoteThread = CreateRemoteThread(hInjecteeProcess,

    NULL, 0, (LPTHREAD_START_ROUTINE)lpRemoteProc,

    (LPVOID)hwnd, 0, NULL);

WaitForSingleObject(hRemoteThread, INFINITE);

GetExitCodeThread(hRemoteThread, &dwExitCode);

CloseHandle(hRemoteThread);

//This time, the exit code is the password,

//so read the password from the target

DWORD dwBytesRead;

ReadProcessMemory(hInjecteeProcess,

    (LPCVOID)dwExitCode, lpszPassword, 1024,

    &dwBytesRead);

//And print it

printf("Password is: %S\n", lpszPassword);

Voila, we have the password!

image

If you'd like to, you can download the complete code for this post as a Visual Studio 2008 solution.  Friendly reminder again: this is not production-ready code; several pitfalls were outlined along the way, the most significant ones being 64-bit compliance and ASLR.

XPerf - Windows Performance Toolkit

Event Tracing for Windows has been with us since Windows 2000.  It is an infrastructure for raising events from various system components, and has only been used by a small number of kernel-mode entities.  In Windows XP, MOF files (familiar from WMI provider metadata) were used to describe events.  Finally, in Windows Vista and Windows Server 2008 events were described by XML manifests, an investment was made in popularizing ETW, and hundreds of new event providers were added.

What kind of information is generated by all these providers?  Well, first of all, there's the Windows Event Log which consumes some of the information generated by ETW providers (but not all).  So we get all kind of diagnostic messages on things happening in the system.  Another provider is the Performance Monitor, which features the ability to query a collection set of ETW events.  Integrating these various sources of information is not an easy task, especially if you are alternating between analyzing a system as a whole and analyzing a set of specific applications within the same trace.

It is this integration that has led to the birth of the Windows Performance Toolkit.  It features the data collection and integration tools necessary to interpret and utilize ETW output correctly.  It is specifically constructed so that lots of information can be viewed in a coherent fashion, and so that a system-wide image of what's going on can be obtained.  Additionally, through the use of the kernel sampling interrupt, a global sampling profiler is available (including call stack analysis).  And best of all?  It's completely free!

So let's take a look at some of the abilities of this new toolkit.  First of all, you need to install it from WHDC.  Note that the installation is only supported on Windows Vista SP1 and Windows Server 2008 (I got it to be almost fully functional without Vista SP1, but caveat emptor).  Data collection can be performed on a Windows XP SP2 or Windows Server 2003 system (you only need xperf.exe and perfctrl.dll for that), but the trace decoding can only be performed on NT 6.0 and higher.

Let's start with a sample global data collection.  (By the way, if you're not into my walkthroughs, the toolkit comes with an extensive set of documentation - online and offline.  The starter document is 65 pages, which gives you an idea of how big this thing really is.)

I went to an administrative command line prompt on my Vista, navigated to the toolkit's installation directory (C:\Program Files\Microsoft Windows Performance Toolkit by default) and typed:

xperf -on Base

This enables a set of ETW providers to publish their events (the exact set of providers can be seen by typing xperf -providers).  These events are then collected and written to temporary buffers.  Note that these temporary buffers might grow very large, so if you plan to perform data collection across a long period of time, you better have some free disk space and lots of memory available on the box.

I then proceeded to write a few lines in this post, open a couple of programs, and then went back to the command line and typed:

xperf -d result.etl

This disables the selected ETW data collection and writes the raw results to the result.etl file (note that this command might take quite some time to complete).  These results are, well, raw, so they need to be analyzed soon.  We can accomplish this via the command line by launching a set of actions on the output file, or using the GUI Performance Analyzer tool (xperfview) which is part of the toolkit.

An example of what you can accomplish on the command line can be demonstrated by typing the following action sequence:

xperf -i result.etl -o proclist.txt -a process

This generates a list of processes active during the trace into the proclist.txt file.  Here's some of the sample content from my box:

Start Time,   End Time, Process,            DataPtr,     Process Name ( PID),  ParentPID,  SessionID,          UniqueKey
       MIN,        MAX, Process, 0X0000000000168EE0,             Idle (   0),          0,          0, 0x0000f80002150400
       MIN,        MAX, Process, 0X00000000001693C0,           System (   4),          0,          0, 0x0000fa8001897040
       MIN,        MAX, Process, 0X00000000001A17A0,      svchost.exe ( 356),        708,          0, 0x0000fa8004ce4040
       MIN,        MAX, Process, 0X0000000000171080,         smss.exe ( 520),          4,          0, 0x0000fa800455f610
       MIN,        MAX, Process, 0X00000000001712B0,        csrss.exe ( 616),        604,          0, 0x0000fa8004804c10
       MIN,        MAX, Process, 0X00000000001904A0,      svchost.exe ( 620),        708,          0, 0x0000fa8004d2ac10

Another example of command line info generation would be:

xperf -i result.etl -o diskio.txt -a diskio

This gives you some disk I/O statistics for the duration of the trace.  Here's the sample from my box:

Start Time,  End Time,    Read,   Write, Usage %
         0,   1000000,      18,      30,   51.37
   1000000,   2000000,       0,      11,    2.44
   2000000,   3000000,       0,       2,    1.05
   3000000,   4000000,       1,       4,    4.26
   4000000,   5000000,       1,      15,   20.17
   5000000,   6000000,       0,       4,    1.26
   6000000,   7000000,       0,       7,    2.19
   7000000,   8000000,       1,      30,    2.02
   8000000,   9000000,       9,       3,   13.47

Finally, if we're interested in a graphical interpretation of everything, just go ahead and launch the Performance Analyzer (xperfview) and then navigate to the trace file, or just type:

xperf result.etl

...from the command line.  Here's what the initial output looks like on my system:

image

That sure looks like a lot of useful data!  The first graph is CPU utilization by process, the second one is disk utilization by process.  You can filter some processes out, of course:

image

...and you can zoom in, select, make the graphs overlay so you can see the information side by side:

image

image

And then there's the summary table feature, if you just right click on the graph you're interested in and choose "Summary Table":

image  image

...and if you look at the Disk I/O summary and right click for a Detail Graph, you get the breakdown of specific I/O requests and can even visually see fragmented file access if present (!):

image

Note how you can see the volume on the right (C and D in my case), and if you hover on each individual dot you see the information on the process making the request and the actual file path.  On the left you see the disk location information (offset in bytes from the disk start), so that if there are jumps across the disk all the time, we have some fragmented non-sequential access going on.

And then there's the page fault details, to the level of the file being requested, the size of the I/O request, the average time spent servicing the request, the total I/O time, etc.:

image

Bear in mind: we have seen nothing yet.  There are some preconfigured profiles for analyzing standby/resumption, hibernation, boot, application startup, network activity, and many other scenarios.  You can get events generated on virtual memory allocation, power management events, registry access, driver events, system calls, interrupts, DPCs/APCs, context switches, what not.  (Friendly reminder once more: there's documentation available both online and offline, go and read it right away! :-))

However, there's another piece of coolness to see if you're still with me and interested in what else the tool is capable of.  Using the kernel sample profile interrupt, the ETW system can capture the instruction pointer and the stack during trace execution.  This data is logged to an ETL file, and can then be analyzed to see what kind of call stacks your application encountered.  (This also gives away the information on performance bottlenecks - i.e. where your application spends the most of its time.)

So what I did to demonstrate this was write a very simple application which displays prime numbers in a given range.  It's written in a very inefficient way so that it takes quite some time to execute.  To profile what's going on inside, I went to the administrative command prompt again and typed:

xperf -on SysProf

Note for advanced users: "Base" would also have worked, because it also enables profiling.  If I'm not interested in collecting disk I/O and memory statistics, however, then I'm better off with "SysProf" (which resolves to PROC_THREAD+LOADER+PROFILE).  Without PROC_THREAD and LOADER you won't get reliable results.

Then all I had to do is launch my application, let it run to termination, generate a results file as before (xperf -d ...) and launch the Performance Analyzer to see what we got.

The first and foremost thing that you have to do at this point is have debugging symbols configured properly.  You could spend hours on trying to understand what's wrong with what you have done, only to discover it's due to symbols improperly configured.  To get symbols for Windows code lined up properly all you need to do is set up the _NT_SYMBOL_PATH system environment variable (My Computer -> Properties -> Advanced System Settings -> Environment Variables) to the following string:

SRV*C:\Symbols*http://msdl.microsoft.com/download/symbols

You can replace C:\Symbols with any other downstream store on your disk that you prefer (or you could omit this section entirely).  What this tells the symbols engine is that whenever symbols for Windows are needed, they are automagically downloaded from Microsoft and cached at the downstream store (C:\Symbols).  Yes, this requires a connection to the Internet.  Yes, the symbols will be re-downloaded if Windows updates are installed or a service pack is deployed.  Yes, if you're offline, you could also download the entire cache and install it, but whenever something is updated you risk getting out of sync.

For your own code, you can prepend the path to the PDB files to the above string, for example:

set _NT_SYMBOL_PATH = C:\Code\MyApp\release;%NT_SYMBOL_PATH%
''' OR '''
set _NT_SYMBOL_PATH = C:\Code\MyApp\release;SRV*C:\Symbols*http://msdl.microsoft.com/download/symbols

When you finally launch the tool, before you go ahead and open any graphs, go to the Trace menu and click Load Symbols.  This might take some time now or later, depending on what you're trying to do.  Essentially, it might be downloading symbols for the entire system for the first time, which could take several minutes or even more on a slow connection.  On my system, there were 142 modules for which symbols were downloaded, occupying a total of 93MB disk space.

After you've got symbols loaded, choose a region on the graph or the entire graph, right click and select "Summary Table".  Use the column chooser on the left to group by Process and Stack, and show you the Weight and the Weight%.  Note that you can do this for any process, not just for your code - all you need is symbols (which you can have on debug and release builds nowadays with no problem at all).  For my prime numbers application on my system I get the following output:

StackTrace1

So it seems that MyApp.exe!IsPrime took 38.26% of the weight in this profiling session.  printf, on the other hand, took 0.71% of the weight.

Note we can't deduce the execution time from this output; we can only see the relative weight a function had (i.e. the number of samples where this function was on the stack compared to the total number of samples taken by the profiler).

After optimizing the code a little bit (so that prime numbers are calculated more efficiently) and running it again under the profiler, here are the results:

StackTrace2

The code was so optimized that there are no samples for the IsPrime function anymore (it was not inlined - we're talking about a debug build), and printf is responsible for 2.92% of the weight.

Yes, there is some room for improvement here because commercial tools like Microsoft's own Visual Studio Profiler give you so much more, including the ability to compare performance reports, analyze managed code, use a convenient API to determine when profiling should start and when it should end, profile memory allocations etc. - but this one is for free and it's part of an integrated suite of lots and lots of additional functionality.  Additionally, if you look closely into the output, you can see that the grouping performed by this profiler can be done by the stack trace and not the function - so you can see how many times your function or sequence of functions was called in a particular order and manner.  This is something even commercial profilers make a difficult objective to achieve.

Just one final note before you go ahead and try it: stack profiling works only on the 32-bit editions of Windows Vista or Windows Server 2008.  To be more accurate, I couldn't figure out how to make it work on a 64-bit system and gave up (and the following blog post gives me some hope that one day we'll figure it out).  All I could get on a 64-bit system is just a list of functions getting called with their sample times and counts (which is great), but not the detailed elegant call stacks you can get on a 32-bit machine.

So get your hands on the Windows Performance Toolkit and try it out!  Let me know what you discovered.

WCF Router and Publish/Subscribe Sample Implementation

A WCF intermediary router is available on MSDN as a sample.  The sample demonstrates what you would do to implement routing logic from a client to a destination service.  It also builds the groundwork for implementing other SOAP intermediaries, such as those that cache message responses, validate incoming messages, load-balance requests across multiple servers, and several additional scenarios.  However, it is slightly complicated if all you need a router to do is forward requests from one place to another.

Additionally, Juwal Lowy's "WCF Essentials" article on the October 2006 issue of MSDN Magazine provides the foundations of building a WCF publish/subscribe architecture.  However, it lacks in one area - services need to explicitly implement subscription and publishing logic for the specific event contracts they are interested in.  I was pursuing a generic solution where a single service could provide subscription and publishing services for a generic contract.

So let's look into both of these subjects.  Router first.  All we need to do is provide a service which has an untyped contract, accepting the Message class.  As the SOAP "Action" header we will specify "*", meaning that the router is completely contract-independent.

[ServiceContract]
public interface IRouter
{
    [OperationContract(Action = "*", ReplyAction = "*")]
    Message Action(Message msg);
}

The router implementing this contract will simply create a channel to the actual destination and forward the message there:

[ServiceBehavior(InstanceContextMode=InstanceContextMode.Single,
    ConcurrencyMode=ConcurrencyMode.Multiple)]
class Router : IRouter
{
    ChannelFactory<IRouter> _forwardCF;

    public Router(Binding binding, Uri forwardTo)
    {
        _forwardCF = new ChannelFactory<IRouter>
            (binding, new EndpointAddress(forwardTo));
        _forwardCF.Endpoint.Behaviors.Add(new MustUnderstandBehavior(false));
    }

    public Message Action(Message msg)
    {
        IRouter target = _forwardCF.CreateChannel();
        try
        {
            return target.Action(msg);
        }
        finally
        {
            ((ICommunicationObject)target).Close();
        }
    }
}

Note that there's just a single instance of the router, and its concurrency mode allows multiple clients to enter it.  The router itself is completely stateless and therefore we don't mind multiple threads - if the target requires synchronization then it's the target's responsibility, not the router's.  Another detail is the MustUnderstandBehavior set to disable validation - the router doesn't understand the request and reply messages, and if we omit the behavior we will get a ProtocolException.

This is just the basic skeleton: If we want dynamic routing, we can implement a routing table; if we want the client to provide the target URI, we can take it from the incoming message headers; if we want load balancing, . . .  You get the idea.

Publish/subscribe next.  What we need to do here is refine the router so that there's a publish/subscribe mechanism built-in the routing service.  Whenever a published message arrives, it will be inspected and forwarded to the registered subscribers.  So this is what the contract should look like:

[ServiceContract]
public interface ISubscribe
{
    [OperationContract]
    void Subscribe(string action, string ea);
    [OperationContract]
    void Unsubscribe(string action, string ea);
}

[ServiceContract]
public interface IPublish
{
    [OperationContract(Action = "*", IsOneWay = true)]
    void Publish(Message msg);
}

The service itself is fairly easy, because all we need to do is distribute requests to all registered endpoints in the generic way we've seen earlier.  So as far as we're concerned, we're treating the endpoints as IPublish and forwarding the same message we received.

[ServiceBehavior(InstanceContextMode=InstanceContextMode.Single,
    ConcurrencyMode=ConcurrencyMode.Multiple)]
class PubSubService : ISubscribe, IPublish
{
    Dictionary<string, List<EndpointAddress>> _subscribers =
        new Dictionary<string, List<EndpointAddress>>();

    public void Subscribe(string action, string ea)
    {
        lock (_subscribers)
        {
            List<EndpointAddress> ealist;
            if (!_subscribers.TryGetValue(action, out ealist))
            {
                ealist = new List<EndpointAddress>();
                _subscribers.Add(action, ealist);
            }
            ealist.Add(new EndpointAddress(ea));
        }
    }

    public void Unsubscribe(string action, string ea)
    {
        lock (_subscribers)
        {
            //Add error handling
            _subscribers[action].Remove(new EndpointAddress(ea));
        }
    }

    public void Publish(Message msg)
    {
        List<EndpointAddress> targets;
        lock (_subscribers)
        {
           //Make a copy of the collection
           targets = _subscribers[OperationContext.Current.
                IncomingMessageHeaders.Action].ToList();
        }

        foreach (EndpointAddress ea in targets)
        {
            IPublish pub = ChannelFactory<IPublish>.CreateChannel(
                new NetTcpBinding(), ea);
            pub.Publish(msg);
        }
    }
}

Again, the service is a singleton but multiple thread access is allowed to scale during the async publishing operation.  However, a lock must be applied whenever modifying the registration information.  (Since this should happen less frequently than publishing, it's not a big overhead.)

Clearly, there's lots to add here - we need to handle distribution failures and transparently remove subscriptions or provide a removal policy, there's room for caching the distribution channels, the solution is bound to NetTcpBinding only - but again, we have the skeleton.

An important requirement is persisting the subscriptions.  In a long-running system, the pub/sub infrastructure is a lasting component which must survive restarts.  Therefore, storing the state in a Dictionary<> is not the best option.  We could either provide serialization to a durable store (like a DB) on each operation, or use .NET 3.5 Durable Services to accomplish the same goal.  I will explore making the pub/sub a durable service in a future post.

Wrapping it up - you can download the sample code from my SkyDrive.  It's really simple to create a truly scalable enterprise-level system using out-of-the-box WCF services.  However, we can expect more from the future - ESB, ISB and of course Oslo are supposed to make extinct the kind of infrastructure code we have just seen.

Windows Server 2008 Open House Presentation and Demos

Alon Fliess and I have presented at three Open House sessions at Microsoft on the subject of the upcoming Windows Server 2008.  My last session was February 21, several days after the RTM but still a few days before the Heroes Happen {Here} launch event.

Several participants asked for the slides and demos (in past sessions as well), so I decided to upload everything to my SkyDrive for everyone's convenience.  The subjects covered in the latest presentation follow:

Please note that the demos have been tested with Windows Vista RTM (no SP1) and Windows Server 2008 RC0 (September 2007); however, there is no apparent reason for them to fail on Vista SP1 or the Server 2008 RTM.

C++ Developers Just Got Lambdas?

Well, no lambdas yet (unless you look at the proposals for the upcoming C++0x standard and be your own judge), but a significant set of additions to the C++ toolset.  I'm talking about the Visual C++ 2008 Libraries Feature Pack Beta 1, commonly referred to as TR1 (licensed from Dinkumware) even though it comes with an incredibly cool MFC update as well, licensed by Microsoft from BCGSoft.

First of all, if you're a hardcore C++ developer who is interested in what kind of progress the standards committee has been making, you simply have to download this "Feature Pack" and play around with it.  If you're really hardcore, you might be into reading the actual document ("Technical Report 1"), but I'd pass if I were you.

What kind of goodies does the Feature Pack have in store for us?

  • MFC Update - Office Ribbon and Vista interface with Visual Studio-style docking toolbars and windows, advanced MDI tabs, new GUI controls (such as the paint surface) and many other welcome additions;
  • TR1 Implementation - Smart pointers, function binders, unordered containers, almost every RNG implementation out there, some handy math functions, and additional, smaller pieces of functionality.

Passing functions around?  CurryingFunctors?  We haven't landed in Scheme-land (or even C# 3.0) yet, but TR1 surely gives us some of the library support for producing very powerful things by binding functions to partial functions and parameters.  Consider the following C# 3.0 segment:

//Convert a two-parameter function to a specific one-parameter:
Func<int, int> CurryAdd(Func<int, int, int> func, int a) {
    return x => func(a, x);
}

Func<int, int, int> Add = (a,b) => a+b;
//Add can take two numbers and add them
Func<int, int> Add5To = CurryAdd(Add, 5);
//Add5To can take a number and add 5 to it
Console.WriteLine(Add5To(3));

It's surely elegant how we can manipulate lambda expressions and functions all around.  But now in TR1 we can simulate a similar effect for C++:

int Add(int a, int b) { return a+b; }
//Add can take two numbers and add them

function<int(int)> Add5To = bind(Add, 5, _1);
//Add5To can take a number and add 5 to it
cout << Add5To(3) << endl;

bind comes from <functional> and stuff like _1 requires you to have the std::tr1::placeholders namespace in scope.  These placeholders give you the nice ability to do stuff like:

void PrintCoords(int x, int y) {
    cout << "x = " << x << ", y = " << y << endl;
}

function<void(int,int)> printSwappedCoords =
    bind(PrintCoords, _2, _1);    //Reverse the order
printSwappedCoords(5, 3);
//Prints: x = 3, y = 5

Additional classes around the framework (like reference_wrapper) give you the ability to simulate C# outer variables by capturing functor parameters in the functor itself.  Compare C# 3.0 first:

Action<int> f = x => Console.WriteLine(++x);
int i = 15;
Action f2 = () => f(i);    //i is bound now
f2();    //Prints: 16
++i;
f2();    //Prints: 17

And now compare TR1:

void PrintAndIncrement(int x) { cout << ++x << endl; }

int i = 15;
function<void()> f2 = bind(PrintAndIncrement, cref(i));
f2();    //Prints: 16
++i;
f2();    //Prints: 17

Thanks to the new mem_fn support, all of this functor coolness is also available on member functions, with almost no limitations.  Compare C# first:

public static IEnumerable<T> Transform<T>(this IEnumerable<T> coll, Func<T, T> transform)
{
    foreach (T elem in coll)
        yield return transform(elem);
}
public static void ForEach<T>(this IEnumerable<T> coll, Action<T> act)
{
    foreach (T elem in coll)
        act(elem);
}

var initials = new string[]{"Sasha","Masha","Grisha"}.Transform(s => s.Substring(0,1));
initials.ForEach(Console.WriteLine);

Yes, we can do that with TR1, and look at that amazingly beautiful code (transform comes from <algorithm>):

vector<string> names;
names.push_back("Sasha");
names.push_back("Masha");
names.push_back("Grisha");

vector<string> initials;
transform(names.begin(), names.end(), back_inserter(initials),
    bind(&string::substr, _1, 0, 1));

copy(initials.begin(), initials.end(),
    ostream_iterator<string>(cout,"\n"));

Internally, bind uses mem_fn here to give us a functor based on string::substr, which is a member function (with an unmatched number of arguments) unlike most of the things we used in previous examples.  Note that this is the truly elegant solution, while what we did in C# was just work around the problem by using a lambda which takes the three parameter challenge and takes it down to one: s => s.SubString(0,1).

Let's leave the lambdas alone - there's so much other coolness lying around.  For example, consider the full-blown regular expression support, including non-greedy qualifiers, capture groups, and more:

regex r("([A-z0-9]+)@([A-z0-9\\.]+)");
for (string s; getline(cin, s); )
{
    smatch matches;
    if (regex_match(s, matches, r))
    {
        cout << "User: " << matches[1]
             << ", domain: " << matches[2] << endl;
    }
}

Cryptographers and mathematicians will surely love to discover the contents of the <random> header file:

image

...and everyone else will rejoice at the availability of unordered containers based on a hash function, similar to .NET's Dictionary<> and HashSet<>.  BTW, the values aren't even required to be unique: we have an unordered_set<> and an unordered_multiset<>; an unordered_map<> and an unordered_multimap<>, to suit everyone's taste.

Finally, some reference counting smart pointers such as shared_ptr<> and weak_ptr<> will free us from scrupulously thinking where objects might get destroyed under our nose.  Reference counting doesn't replace garbage collection: cycles (just like in COM) will still result in a leak, so if you have object A referencing object B and B referencing A back, these two objects will never be deleted without external interference.  BTW, shared_ptr gives you derived-to-base polymorphic semantics; both ways, with static_pointer_cast and dynamic_pointer_cast.  Finally, shared_ptr works perfectly with containers, so it's a great wrapper if you need to take a noncopyable or polymorphic type and put it in a vector<>.  If you add weak_ptr to the mix, you've got everything you need for C++ resource management:

template <typename T>
class Timer {
private:
    const weak_ptr<T>& _observable;
    int _times, _period;
public:
    Timer(const weak_ptr<T>& o, int times, int period) :
      _observable(o), _times(times), _period(period) { EnableTimer(); }
private:
    void EnableTimer(bool enable=true) //...
    void FireTimer() {
        shared_ptr<T> realPtr = _observable.lock();
        if (realPtr) { realPtr->OnTimer(); }
        else { EnableTimer(false); }
    }
};
struct Timerable {
    void OnTimer() //...
};

//And now somewhere in code:
Timer<Timerable>* p;
{
    shared_ptr<Timerable> t(new Timerable());
    p = new Timer<Timerable>(t, 10, 10);
    //when t runs out of scope, the timer will stop
    //when it tries to invoke the weak_ptr's function
}

The above code gives you a timer which keeps a weak_ptr to its target, and therefore will automatically stop whenever the target object is released.  (This is very much like .NET's WeakReference, which can be used in a similar way.)

Oh, and one minor thing: TR1 specifies compiler support for type traits, the ability to ask questions about facilities provided by a type during compile time.  For example:

cout << std::tr1::has_trivial_copy<MyType>::value;
cout << std::tr1::is_base_of<Base, Derived>::value;
cout << std::tr1::is_convertible<int, float>::value;
//...and many more

By the way, these facilities have been available in Microsoft's compiler for quite some time, as intrinsics of the form __has_XXX or _is_XXX (see Compiler Support for Type Traits).

I've surely focused on TR1 in this post, but some day Alon or I will surely drill into the depths of the new MFC update.  Still, to whet your appetite and encourage you into the download (which comes with lots and lots of cool samples!), here are some screenshots of what you can get out of the box without any customization, custom controls, coding work etc.:

Out of the box wizard-generated dialog, using the Office Blue theme image
Same thing, with the Office Black theme image
CMFCImageEditorDialog editing an image (the entire dialog is out of the box) image
Check Mnemonics, Property window showing properties, events and messages (including registration) - MFC feels more like WinForms? image

Minor installation quirks:

  • It says so on the MSDN Download Center page, but I'll repeat it here: during installation you need to have the original Visual Studio 2008 installation media handy.  If you installed from a network share/drive, that same share/drive must be mapped.  If you installed from a DVD, the DVD must be in the drive.  And so on.
  • The Feature Pack Beta 1 is currently only supported for Visual Studio 2008 Professional and Visual Studio 2008 Team Edition, in the English version only.
Microsoft Performance Open House Presentation and Demos

On March 10 Alik Levin and I presented at the Microsoft Performance Open House in Raanana.  Alik's presentation focused on PDLC (Performance Development Life Cycle) and addressed various tools and techniques for performance measurement and analysis; my session featured an in-depth overview of some performance-killers across the .NET framework ("not being friends with the GC" among other issues) as well as a detailed demo of exposing Windows performance counters from .NET, using the CLR Profiler to analyze memory allocations and using the Visual Studio 2008 Profiler for analyzing CPU consumption and comparing profiler reports.

Here's the list of topics you'll see in the presentation slides and the attached demos:

  • Exposing performance counters from .NET and logging them using Performance Monitor;
  • Analyzing an application's memory allocations and garbage collection patterns using CLR Profiler;
  • Analyzing an algorithm's CPU-wise performance using Visual Studio 2008 Profiler, improving the code and comparing profiler runs;
  • How value types should be implemented correctly and what are the costs of not doing so;
  • How the garbage collector can be your best friend or worst enemy and what kind of things you would be looking for;
  • Why CPU cache matters even in the abstract .NET world.

The presentation slides and the demo projects I've shown during the session can be downloaded from my SkyDrive.  Additionally, here's a non-exhaustive set of resources I've mentioned in my presentation:

Thanks to everyone who attended this event, and see you around at TechEd Eilat!  DEV444 - Next Generation Production Debugging (webcast promo link), on Monday 7/4 at 17:30-18:45 (Hilton hotel).