DCSIMG
December 2007 - Posts - All Your Base Are Belong To Us

All Your Base Are Belong To Us

Mostly .NET internals and other kinds of gory details

December 2007 - Posts

CLR and Unhandled Exception Filters

I was recently asked how the CLR and Win32 unhandled exception filters interoperate.  Specifically, the original question was (edited for clarity):

We’ve registered to the current AppDomain's UnhandledException event handler.  From unmanaged code, we’ve called the SetUnhandledExceptionFilter Win32 API function.  In the face of an exception, all unhandled exceptions are caught by the unmanaged exception handler, and the call stack of managed exceptions is shown corrupted. Removing the unmanaged registration catches managed exceptions correctly, but then unmanaged exceptions are lost.

From looking at the SSCLI implementation and running a couple of tests I was able to conclude the following:

  • The CLR relies on the SEH unhandled exception filter mechanism to catch unhandled exceptions.
  • The CLR's exception filter does the following:
    • Invokes the previous exception filter in the chain;
    • Invokes the delegates registered to AppDomain.UnhandledException in the AppDomain;
    • Migrates the thread to the default AppDomain and again invokes delegates registered to AppDomain.UnhandledException.

So if you install your own unhandled exception filter, there are two options:

  • The CLR has installed its exception filter before you have installed your exception filter.  In this case, the CLR will invoke your exception filter before performing its own exception processing (because it's a good citizen of Win32);
  • The CLR has installed its exception filter after you have installed your exception filter.  In this case, you should invoke the CLR's exception filter before performing your processing (because you are a good citizen of Win32).

This shows that one way or another, you get a chance to process the unhandled exception before the CLR performs its processing.  If you want to specifically filter out exceptions that are raised by the CLR itself, you can pass to the CLR any exceptions with the exception code 0xE0434F4E ('COM'+1), which is the SEH exception code for CLR exceptions.  However, this is not enough (because not all CLR-handled exceptions have the CLR SEH exception code), as the following program demonstrates.

(If all you were interested in is just getting your unmanaged exception filter and the CLR exception filter working in unison, then you can stop reading right now.  However, if you're curious, read on...)

The demo consists of two assemblies, a C# console application and a C++/CLI class library used to register an unmanaged exception filter and raise an unmanaged exception (this could be done on either side; fully in C++/CLI by throwing a managed exception, or fully on the C# side by using unsafe code and dereferencing an invalid pointer).  In the C# application:

image

In the C++/CLI class library:

image

What happens now depends on which kind of exception was thrown.  If an unmanaged access violation occurred (SEHInstaller.CauseUnmanagedException), then the output looks as follows - note that the native exception code is 0xC0000005 which is the exception code for an access violation.  The unmanaged exception filter was invoked first.

image

If a managed exception occurred (ThrowApplicationException), then the output looks as follows - note that the exception code is 0xE0434F4D which is the exception code for exceptions raised by the CLR.  The unmanaged exception filter was invoked first.

image

Finally, if a managed exception manifests as dereferencing a null reference (p.ToString), then the output looks as follows - note that the exception code is 0xC0000005 again which means it's not a CLR-induced exception - it's a native exception that the CLR itself swallows and turns into a NullReferenceException by the time the AppDomain's unhandled exception handler executes!

image

This does make sense, because here's the sequence of assembly instructions for calling the Program.ToString method:

image

The second statement dereferences the ECX register, because it's necessary in order to find the virtual method slot for the ToString method.  This dereference is not guarded by a SEH try block, so it manifests as an unhandled exception and later translated by the CLR, as we have seen.

(You can find the demo code here - 9KB, Visual Studio 2008 solution.)

On Measuring Performance

To rephrase this post (or rant) in a nutshell: Measuring performance is not as simple as people think it is.  I have seen all kinds of information in books, the web, in hallway conversations etc. that is based on some kind of simple performance measurement; these performance measurements almost never reflect the actual state of affairs.  To measure performance, it isn’t enough to wrap a block of code with two Environment.TickCount samples; it isn’t enough to use a Stopwatch, either, even though even a Stopwatch is not often to be seen.  Measuring performance correctly, and more importantly – deciding what to measure before you slap up the actual implementation – is not nearly as trivial as that.

First and foremost, performance is not always about execution time.  This is something you learn in college or in the university – algorithms have a run-time complexity as well as a space complexity (and there are some interesting attempts to bring the formal concepts of complexity analysis, which mostly works for theoretical algorithms and data structures written on paper, to the world of real class libraries).  But it is not always about memory usage either; measuring performance as the amount of time it takes to perform some action completely disregards other aspects such as latency vs. throughput, performance under load, the scalability of a particular solution, the effect a solution has on overall system performance, and many other aspects of that ilk.  An algorithm that processes 10 requests per second (and therefore takes 100ms on average per request) might not have an average latency of 200ms or even more; the same algorithm might be consuming twice as much memory, which will go unnoticeable on a powerful development machine but will cause page thrashing and cache thrashing on the actual production system; the same algorithm might be perfect for the not-so-mighty dual core system it is being written on, but will grind to a screeching halt on a massively parallelized, 16-way system due to high contention, locking that’s too granular or too coarse, lock convoys, and a dozen of other issues.

This is a topic of such great importance that I find it necessary to restate the most obvious and apparent truths of what you might be interested in measuring.  (This is not a comprehensive drill-down into the subject; I’ll reserve the right to do that in a separate post.)

Run-time is important.  Do measure the CPU time that your algorithm is taking to complete.  But while you’re at it, don’t just measure CPU time.  You also care about context switches.  You also care about the actual CPU cycles you consumed (before Vista, it is not necessarily a function of CPU time).  You also care whether there were any interrupts that you were accounted for, but in fact were serviced by an entirely different part of the system.  It doesn’t help if you’re saying 100ms without having any additional data.  And by the way, this is one of the reasons that I often don’t care about the time your algorithm takes; instead, I care how good it is compared to other algorithms.  After all, if I should choose your algorithm over someone else’s, I am not choosing between 100ms and 200ms, which is going to be different on the production machine anyway; I’m choosing between something that’s potentially good and something that’s potentially twice as good.

Memory utilization is important.  And memory utilization is a subject wide enough to deserve a blog post (if not a book) on its own.  So you’re not just looking at the number of bytes your application occupies in system memory (BTW, are you looking at physical memory or virtual memory?).  You need to care about the working set size.  You need to care about private (non-shared) memory as opposed to shared memory – this has an impact on overall system performance.  You need to focus on page faults and see if they’re soft page faults or hard page faults – the different is vast and inexcusable.  In a managed environment, you need to understand if your memory is a single consecutive chunk or if it’s really a million of different objects, because it has a grave effect on garbage collection timing.  Oh, and you're using garbage collection?  That's cool, what kind of garbage collector are you using?  Does concurrent GC make a difference for your application?  In .NET, would your algorithm run faster under the multiple-heaps, multiple GC-threads server GC or the single-heap, single GC thread workstation GC?  You need to make sure that your memory usage patterns are consistent, and that you are not fragmenting system memory or your language-of-choice memory allocator’s internal data structures; besides fragmentation, there are dozens of best practices and strategies for allocating memory to be friendly to the environment you’re using – learn them, and use them.  Obvious as it sounds, you must make sure that your memory usage doesn’t change over time, and suddenly unveils a memory leak.  And that’s just memory.

I/O utilization is important.  Oftentimes, page faults are ignored from this perspective – but a page fault is just another kind of I/O.  Do you have lots of them?  Can your file system accesses be satisfied from the system cache?   Are you using every hint there is to let the system optimize, pre-fetch and cache your I/O patterns?  Are you using asynchronous (a.k.a. overlapped) I/O, completion ports, I/O priorities, bandwidth reservations, QoS, handle priority hints, every single mechanism there is to ensure that this is really the best the system can do for you?

Degree of concurrency is important.  You are not running on a single-core machine anymore.  You are not.  This is a nice dream that has come to an abrupt end with the frightening popularity of multi-core machines in every kitchen.  How do you scale to 4 processors?  How do you scale to more?  How many parts of your application can run in parallel, and how many are inherently synchronized?  How much contention do you have per each lock you hold, per each thread you create, per request, per client, per process?  And what about lock convoys, starvation, priority inversion, NUMA systems, cache collision – is there a chance you’ll be seeing those in the near future?  Have you considered lock-free algorithms and data structures?  Have you considered functional languages?  Should you?

Performance under load is important.  There’s a nice graph you can try sketching for your current system, whatever it’s supposed to do – the X axis would be the average throughput per request, and the Y axis would be the number of requests being added to the system every second.  You might see some surprising results – have you considered the fact that the more threads you spin off for servicing requests, the more locking and waiting you might induce into the system?  While you have load in mind, do consider whether latency or throughput or both are important to you; do consider where you want to invest the majority of your optimization time; do invest in making the decision and writing the specification as to how many concurrent requests you are required to support.  Don’t let that specification and that consideration stem from the system as you implemented it.

Finally, when you decide what to measure, spend some time on thinking what parts of your system you are willing to optimize.  If you don’t know your hot path (and even if you think you know your hot path), give the profiler a chance.  It isn’t biased to thinking that lookup in a Dictionary<,> is an O(1) operation which can be discarded.  It isn’t biased to thinking that there’s nothing to optimize if you’re just moving a piece of memory from one place to the next.  You can yell at it later when it gives you the unexpected results; and there will always be unexpected results, no matter the size of your application. (And this applies to any kind of tool, not just profilers; the authors of these tools have often already made the mistakes you are going to make for the first time – isn’t it best to learn from their experience? Is there really no alternative to writing a console application that calls a function in a poorly written loop, guarded by a sloppy Stopwatch?)

Another common pitfall is incorrectly choosing the input size and the number of iterations for testing the code in question.  Assume you’re measuring your own home-bred algorithm for searching a substring in a sequence of characters.  Measuring your algorithm with a constant input size is just as bad as measuring it with a constant number of iterations.  The most common case I’ve seen is measuring just one iteration and then happily concluding something.

Nothing can bring you farther from the truth than the fallacy of the single iteration.  If you’re writing managed code, your method must be JIT-compiled when it’s first invoked; the compilation cost is often much higher than the cost of the algorithm itself.  For any kind of code, the first time it is executed, the processor doesn’t have it in its instruction cache yet; bringing the instruction sequence to the instruction cache can lend significant delays to the overall result.  Finally, your code is probably accessing data (few algorithms run without data); this data must be brought in from secondary storage, must be paged into memory, must be brought into the processor’s data cache.  If you think that’s cheap, or if you think your language-of-choice is giving you a good enough abstraction, think again.  We are all running on the (roughly) same kind of architecture; we are all enjoying the same leaky abstraction.  Programming in Ruby, Boo, Python or JScript gives you no advantage over the plain-old-assembler; in fact, it probably gives you a disadvantage because assembly programmers who don’t know their way around the CPU cache are rare to find.

The opposite of the single iteration fallacy is choosing poor input sizes; searching a substring in a sequence of 5 characters is not something that lends itself to optimization easily.  Does your home-bred search algorithm consider small input sizes?  Does your home-bred sort algorithm use a fall-back insertion sort when the inputs are small?  How is your home-bred garbage collector dealing with page faults, optimizing memory access, making correct use of the cache?  Oh, it isn’t?

Surprising as it may seem, it’s necessary to thoroughly understand your execution environment before you can measure anything.  There aren’t many guides to creating a clean system that can be used for performance tests; but there are too many systems used for these tests which are entirely unsuitable for the task.  For example, I wouldn’t trust any performance measurement from the average bloatware-infested laptop in a large enterprise; I wouldn’t trust any performance measurement from my parents’ spyware-infested desktop at home; I wouldn’t trust any performance measurement from a single-core system, especially today.  The point I’m trying to make is that you really need to understand the kind of environment you need; even if all you care about is CPU time, don’t discard the fact that you have a 5,400RPM hard drive – the first page fault you are going to take when the drive has spun off to a state of sleep will cost you so much more than on an ultra-fast 10,000RPM beast with embedded NVRAM.  After spending quite some time writing performance tests and measuring performance of simple and complex algorithms alike, I can certainly conclude that a clean environment is not easy to find and not easy to construct even if you have all the sufficient resources.  But if you can’t make it perfectly clean, at least do strive to come close; and it is easy to see that the environment is affecting your measurements, if, for example, two successive samples give you wildly different results.  Oh yes, there are non-deterministic algorithms that genetically evolve and so there’s no way of predicting their run-time.  Yours probably isn’t one of them.

I don’t believe I have to write this, but there’s another trick that will save you lots of embarrassment should you decide to make your performance measurements public.  Please, oh please, make sure that you understand the domain in question before you run around trying to measure it.  Do you understand degrees of transactional isolation?  If you don’t, then you shouldn’t be comparing using ReadCommitted isolation to Serializable isolation and posting a bunch of useless code on the web.  Do you understand the premises of concurrent execution?  If you don’t, why trying to coerce an existing framework to parallelizing a piece of entirely non-parallelizable code?  Are you familiar with the scenarios that motivated the development of various collection classes?  If you aren’t, how is it useful if you compare the cost of looking up an element in an unsorted bag as opposed to a hash-indexed dictionary?  If you don’t understand how a virtual method is different from a non-virtual method (sans the fact it can be overridden in a derived class), don’t try to come up with a clever way of optimizing virtual method calls.  I am only writing this because your time is important, and my time is important.

Last but not least, performance testing is not a one-time activity. It is becoming clear today, with the advent of unit testing frameworks, evangelists, methodologies – the entire Agile development industry – that testing a piece of software for correctness is not something you leave to the QA phase of development. But that’s testing correctness; testing performance is an entirely different beast. What’s most bewildering is the fact that some of these unit-testing principles are so involved with proving code correctness that they sacrifice performance for the very sake of being able to test correctness (e.g. arguing that all methods be virtual by default because it makes it so much easier to mock methods on objects and provide stubs for these objects). It makes no sense for anyone, myself included, to argue that performance should be preferred over correctness. But why are we forced to choose? Why are people trying to convince us that correctness is something that comes at the expense of performance, citing the ever-abused “Optimization is the root of all evil” quote with frothing mouths?

Back to topic, performance testing is not just running a profiler over your code a couple of times to make sure you know what’s going on. Performance testing is about setting up performance goals and performance specifications; it’s about executing performance tests on a daily basis and comparing their results to ensure that no regression has occurred; it’s about rigorously making sure that there isn’t a single performance datum that you can define as perplexing; it’s all about understanding that your code is not “just correct” – that there is an entirely different set of criteria, an evolving set of criteria, that is relevant if performance is important to you (and trust me, it is). We have to cope with an enormous gap in methodology, tools and processes for performance testing; but I think we can give the correctness people a decent fight.

I know this post is beriddled with questions; I also know that I’m not giving you any easy answers.  Performance is important; it’s crucial; it’s often overlooked when Correctness raises its ugly head.  It’s crucial when designing a system; it’s crucial when developing it; it’s crucial when testing it and when deploying it to production.  I think there hasn’t been a subject of such importance so unexplored in computer software; so neglected and forgotten and washed away by the Big Methodology Wave; so inexcusably coerced and distorted by people, tools and processes alike. We’re just making our baby steps in the quest of exploring it today, and there is still so much that lies further ahead.

DEP, NXCOMPAT, .NET 2.0 SP1

Yesterday I had an interesting case that I thought of sharing, even though there's nothing very new.

In the customer's scenario, a .NET 2.0 GUI application which was functioning perfectly on XP, has stopped working when she moved that application to Vista.  Namely, it was getting an access violation in a piece of native code, and terminating unexpectedly.  After much grief and looking up some forums on the web, she discovered that a similar problem could be caused by Data Execution Prevention, an important security feature that modern hardware and software work in tandem to implement.  (Very briefly, DEP is designed to protect memory pages from being executed, i.e. making it impossible for an application to execute code that is dynamically emitted to a normal data page.  This is a security feature because an attacker might emit malicious code into a normal data page and then cause a branch to that page - which is prevented if the page is protected as a no-execute page.)

From this point on, the customer worked hard on disabling DEP for the native ActiveX control that was causing the problem, since it was an ATL control compiled with Visual Studio 6 (old ATL used to generate dynamic thunks into no-execute memory, and then executing them).  A much better option would be to recompile the control with Visual Studio 2003/2005/2008 - this would make the problem go away; unfortunately, I repeatedly see organizations that are afraid of porting old, rotting code to the new development tools, sentencing themselves to eternally having a Visual Studio 6 installation on a rusty desktop in some desolate office corner.

Surprisingly, the customer's problems didn't end with disabling DEP for the application, for the simple reason that she wasn't even able to disable DEP for the application.  Even when DEP was set up for Opt-In (meaning that only Windows system binaries are covered by DEP, and other applications must explicitly be specified), the exception was still there.  Setting DEP to Opt-Out (meaning that applications can opt-out from using DEP) still didn't let us specify that application as an exception.  The only workaround was to turn DEP off altogether, which wasn't really an option since the application was a client application, to be installed on workstations with various configurations in an uncontrolled environment (read: people's homes).  By the way, we also tried enabling the DisableNX compatibility fix using Application Compatibility Toolkit (check out the KB explanation).

This was starting to seem really fishy, so I asked the following innocent question (seeing that the customer had a very organized build process): "Did you have .NET 2.0 SP1 installed on your build server?"

The answer was a puzzled yes; and then I said, "aha," and we had the solution.  Just a couple of days ago, I've read a post documenting a change in the .NET SP1 C# compiler (csc.exe), specifying that it turns on the PE header NXCOMPAT image flag.  What's the meaning of that flag, anyway?

Well, if your executable doesn't have the NXCOMPAT flag, it means that it doesn't work with DEP, wasn't tested to work with DEP, or at least you're not 100% sure that you know what kind of relationship you and DEP are going to have.  In that scenario, if your application crashes in one of the well-known paths (such as old ATL components), there's an emulation compatibility feature that will get you back up and running.

However, if your executable has the NXCOMPAT flag in its header, it means that you're 100% certain that it works with DEP.  Which is a good point of critique towards that barely-documented change in the SP1 C# compiler: if the compiler specifies that default for me, it doesn't mean that I tested my program to work with DEP!  99% of developers and 100% of customers don't even know what DEP or NXCOMPAT is, and they don't know it for a good reason - they shouldn't care.  So this has the full potential of a barely-documented breaking change, and I personally feel that the explanation (albeit not an official Microsoft one) is not very convincing:

"Obviously this is not ideal, but aggressively building a computing ecosystem filled with DEP-enabled applications and their accompanying security benefits is very beneficial to Windows users." (from Ed Maurer's post)

So what we had to do is modify the build environment so it doesn't emit the flag.  Since there's no way to prevent the compiler from emitting it, you must use the EditBin utility to remove the NXCOMPAT flag from the compiled binary.  Since that binary used to be signed, you will need to manually resign it using SN.  Both things have to be done as a post-build step; and it clearly stinks.

Just to clarify things a little bit, this has nothing to do with Vista.  Coincidentally, the customer started testing the application on Vista simultaneously with deploying .NET 2.0 SP1 to the build server, so she was testing a build with NXCOMPAT.  We didn't check, but the exact same thing was supposed to happen on XP.  Finally, this also explains why the application couldn't be added to the list of DEP exceptions on the system, or why ACT didn't work - if the executable is marked as NXCOMPAT, it takes precedence over any setting other than turning DEP off altogether.

Video Recording of My Developers Academy Lecture (DEV409: Psychic Performance and Debugging 101)

(This is going to be my last post about the DevAcademy II; I promise.)

The video recording of my Developers Academy II lecture (DEV409: Psychic Performance and Debugging 101, download the slides and demos too) is now available.  If you tried accessing it during the last two days, there was something wrong with the streaming, but it works now (I'm watching it as I'm writing this post).  There are several funny quotes, such as "The last thing I prepared for you is an application crash; it's a good way to end the day - a crash."

By the way, thank you all one last time for attending, and thanks for choosing to read this and watch my video.

A brief summary of what you'll see in the video (each supported by a practical demonstration, the tools and process to diagnose the problem, the necessary theory to understand the issue, and a practical solution):

  1. Diagnosing a .NET memory leak
  2. Diagnosing GC trouble which causes the application to spend too much time in GC
  3. Diagnosing an application hang caused by an orphaned lock
  4. Explaining a cache collision scenario introduced by a single field in data structure
  5. Diagnosing excessive contention and using a lock-free queue to solve it
  6. Diagnosing an application crash using WinDbg and production-debugging breakpoints

I also recommend you take a look at the matrix of all lecture recordings.  I will certainly be watching most of them in the following couple of days.

The recording quality is terrible (the video is bad; audio is all right), but on LCD screens it's quite possible to see what's going on (especially since the WinDbg sessions are black on a white background, even though the fonts are smeared a little bit).

image

Using the .foreach debugger command to display the GC roots of all objects of a particular type.

image

Setting a breakpoint in the CLR before GC occurs, to display the contents of generation 0 (using the community debugging extension SOSEX).

image

Explaining about the evolution of locking in computer algorithms, as an introduction for wait-free data structures (comparing a lock-free queue to a queue that uses a .NET monitor for synchronization).

image

Using the Windows Performance Monitor to monitor the demo application for contention, memory usage, CPU usage and other parameters.  Note the DevAcademy.OrderProcessing performance counters - these are custom counters exposed by the application.

(By the way, it's in Hebrew.  You will still be able to see the demonstrations and the various WinDbg commands if you don't understand Hebrew, but there are lots of audio explanations as to what's going on ...)

.NET File Access, Process Monitor, and Continental Airlines

(Surprisingly, this is not a post about associative memory and how apparently disconnected things appear to be inter-related after all.)  I'm writing this from my plane back home from the US, where I had the pleasure of teaching a Windows Internals course on Sela's behalf.  As part of the course, I demonstrate various debugging and investigation tools that can be used in the Windows ecosystem, including Sysinternals' excellent Process Monitor.  For those of you unfamiliar with the tool, I strongly recommend it - it's the Swiss Army knife for analyzing file system and registry activity on a machine.  Combined with a powerful set of filters, it can tell you what each individual process is doing from these aspects.  Combined with additional tools such as Logger or APIMon, you have the entire picture of what's going on inside the process, without the burden of a debugger attach and in terms that an IT admin can understand.

What I was trying to do is create a simple scenario for students to practice the usage of this kind of tool.  So what I need was a simple application that performed some invalid file system or registry access.  I wrote a "Notepad Replacement" application which simulated a Notepad, but its Save function would randomly fail by trying to write to an invalid path, an invalid volume or otherwise distorting the actual save activity.  Since it was a simple UI, I wrote in using .NET WinForms, and regretted every second of it later.  Students were simply unable to find the error in Process Monitor!  No matter what kind of error, it was being swallowed by .NET's file and file security abstraction before it could ever reach Process Monitor's driver.  The following table summarizes what happens from Process Monitor's perspective with every kind of injected fault that I was trying to simulate, in a .NET application as opposed to a native Win32 application:

Fault Type / Environment

Win32

.NET

Accessing an invalid path (e.g. "C:\?.txt") The access fails and goes through Process Monitor on the failure path (see Fig. 1) The access fails with a .NET exception, doesn't go through Process Monitor
Accessing an invalid network path (e.g. "\\someserver\someshare\1.txt") The access fails and goes through Process Monitor on the failure path (see Fig. 2) The access fails with a .NET exception, (almost) doesn't go through Process Monitor (see Fig. 4)
Accessing an invalid redirected path (e.g. "Z:\1.txt") The access fails and doesn't go through Process Monitor The access fails with a .NET exception, and doesn't go through Process Monitor
Accessing an invalid local redirected path (e.g. "Y:\1.txt" where Y: is a local subst folder) The access fails and goes through Process Monitor on the failure path, but doesn't show the original path requested (see Fig. 3) Same as Win32

Figure 1 - accessing an invalid file (Win32)

image

Figure 2 - accessing an invalid network path (Win32)

image

Figure 3 - accessing an invalid local redirected path (Win32)

image 

In my case, I had Y: redirected to D:\Temp, so Process Monitor dutifully shows D:\Temp.  This does make sense, because it operates at a lower level than the symbolic link parsing subst creates.

Figure 4 - accessing an invalid network path (.NET)

image

Now, what does all of this have to do with Continental Airlines anyway?  Well, my flight to the US (Continental from Tel-Aviv to Newark, and then jetBlue from JFK to Sacramento) went as smooth as a flight can go.  Furthermore, the jetBlue flight back was also splendid (strongly recommended; they were also the only airline to offer a non-stop flight from NYC to Sacramento and back).  The problems began when I got to Newark at 10:30AM, with my flight at 3:50PM, assured that I have plenty of time and everything is certainly going to be OK.

The first minor annoyance was that baggage check-in for the flight only started at 12:30PM even though they said it will be at 12.  Furthermore, there was just one lady trying to take care of several dozens of early birds like myself who wanted to check in.  For me, the first big warning light went on when she mumbled something on the lines of "I can't get the computer to issue you a seat, the flight is full so you'll have to try it at the gate later".  Considering that the flight was booked more than a week before, and considering that I was at the airport checking in my baggage over 3 hours before the flight, I wasn't much concerned with that warning light.  Little did I know!

When I went through airport security and got to the gate, a nice lady (different one, of course) explained to me how the system works - you know, that usual story they give you about the poor poor airline companies not being able to make 100% profit, so they overbook flights.  This one was particularly overbooked, but she told me time and again that I will have a seat, no problem.

A couple of minutes later, several security guards approach the gate area and ask me to leave because they are creating a restricted security area for the boarding process.  I'm kind of used to that with flights going to Israel, so again no problem.  At 2:15PM they start admitting people inside "the zone", and that's when the real problems began.

When the security guys saw that I don't really have a seat, but just some kind of recommendation-only boarding pass begging to pass me through security, they assumed I'm on standby and pushed me aside, telling me to wait until all "legit" passengers are boarded.  Needless to say, explaining that I am not on standby and that I should have a seat and that it's not my problem that they overbook flights - didn't really help.

At about 3:15PM, things are really getting nasty - there are about 15 people sitting in the standby area, no customer representative has come to talk to us at any point, and finally - there are two BusinessFirst travelers sitting there, looking very annoyed because allegedly they paid $4000 to sit and wait outside the security area.  How very comforting that we are all treated the same (for now).

Several minutes later, after much yelling and people getting really nervous, comes yet-another-nice-lady (YANL from now on) and tries to remedy the situation.  She can't really do anything, of course - she doesn't know if they have any seats available, and they didn't start that beautiful volunteer scheme yet.  The BusinessFirst passengers are ready to eat her alive as she tells them that she is not in charge of BusinessFirst, and that the BusinessFirst concierge is on his/her/their way (they only use this kind of words for business passengers anyway :-)).

At about 3:50PM, when the aircraft was actually supposed to be in the air, come the good news for some of us: the BusinessFirst concierge gets the BusinessFirst guys on board, and somehow another couple of seats emerge for some other people waiting.  At no point did I really understand why they were giving those seats to certain people and not to others, and whether the algorithm was totally random or there still was some bizarre reverse sort order (because I should really have got the first seat, being at the airport 5 bloody hours before the flight).

The subsequent hour was one of the most aggravating experiences I've had in an airport.  We're just sitting there, every once in a while comes a YANL bearing no news.  They did call for volunteers, but nobody knows if there were any volunteers.  The entire gate security area looks like a big human zoo, appropriately, and they aren't boarding anyone even though it's over half an hour after the flight departure time.

To make a long (and getting longer) story short, at about 4:40PM a YANL emerged and started giving out seats to some people, me not included.  I got so annoyed I approached her and said that I don't understand the distribution logic she was following, but that if she was giving seats to people on the standby list, she should certainly be giving me a ticket.  After appearing to be shocked by the fact that I have a reservation and paid for the ticket but they still couldn't find me a seat (right, very shocking), she went away and got me the last seat on the plane.  Incidentally, when I got on the plane there was a nice gentleman sitting in my seat, but he moved away never to be seen later.

Also incidentally, we took off at 5:15PM or so, almost an hour and a half after what was planned, but the flight duration has magically changed from 10 hours and 25 minutes to 9 hours and 35 minutes.  Why not make it 9 hours in the first place?!

This all goes along very well with what I personally experienced in the past with nearly every airline company out there (e.g. when I was frantically calling El-Al in Heathrow Airport explaining that I'm on my way to the flight, going to be there ahead of time, and asking time and again that they don't give my ticket away - but they still did and I got stuck in Heathrow for 24 hours on my expense), and with what people are saying about Continental in particular.  It's not like I can heed out some warning which will actually help anyone.  Just trying to let some steam out.

.NET Parallel Extensions December CTP

Introduction

November 29: Soma has announced a CTP release of the Parallel Extensions to the .NET framework, available to download as a full product (approx. 1.7MB) or just CHM documentation.  For more info, look at the Parallel Programming blog and the Parallel Computing developer center on MSDN.

Software consultants, architects, decision-makers and community leaders have been blowing the "free lunch is over" horn for several years now.  With the advent of multi-core machines, Moore's law has been rephrased: the individual core's speed is not dramatically increasing, but the number of cores is expected to double every X months.  As Herb Sutter says in very simple terms:

"Concurrency is the next major revolution in how we write software."

Curiously, I've noticed a completely different but related phenomenon a few years ago, when I started teaching: there are three concepts to grasp on your way to becoming a decent developer.  The first concept is recursion, which is a great filter but probably most CS students can make ends with it.  The second concept is pointers (and the underlying memory model), which seems to become more and more extinct in modern programmers, with the high-level abstractions we have today.  Finally, the third concept is parallelism, concurrency and multi-threading - and while there are countless programmers I know who can recite a recursive Fibonacci algorithm, there is perhaps only half a dozen who can wisely opinionate on concurrency-related issues.

The advent of libraries such as the Parallel Extensions for .NET marks a very distinctive point in the evolution of developers: concurrency, parallelism, multi-threaded algorithms must enter our arsenal if we want to stay in business.  It's as simple as that.  Or at least that's what I like to hope for.

So what's in the Parallel Extensions CTP for you?  It has been summarized in several places, and there's an extensive CHM documentation file distributed with the download, but I figured that a few brief and concise examples will drive the point home even better.  Everything demonstrated here comes from the System.Threading.dll assembly, from the CTP release.

Parallel LINQ (PLINQ)

With PLINQ, it's now possible to execute any (well, almost any) LINQ query in parallel.  This is achieved by using the ParallelEnumerable<T> static class with extensions methods mimicking those of LINQ's Enumerable<T>.  For example, the following query to select prime numbers from a range is executed in parallel:

IEnumerable<int> range = ...;   // obtain range from somewhere
var primeNumbers = from n in range.AsParallel()
           where n => IsPrime(n)
           select n;

For more information on PLINQ, consult the CTP documentation or take a look at this PLINQ October 2007 MSDN Magazine article.

Parallel Library

If you're not after parallelizing data access, but rather you have some loops or tasks to perform concurrently, you have the high-level Parallel static class at your service.  For example, to parallelize a loop, you could write:

for (int i = 0; i < 10000; ++i) ComputeSomething(i); // single-threaded version

Parallel.For(0, 10000, i => ComputeSomething(i));    // parallelized version

This will not create 10,000 threads running your work in parallel; the parallel library determines the reasonable level of concurrency itself.

For more information on the Parallel Library, take a look at this Parallel FX October 2007 MSDN Magazine article.

Task Library

Creating tasks is similar to creating thread pool work items, but a task provides you a higher level of control.  For example, you can wait for tasks to complete:

Task t = Task.Create(() => ComputeSomething());
//... do some work
t.Wait();    // wait for the task to complete

Or you can cancel tasks in the middle without resorting to manual synchronization or ugly aborts:

Task t = Task.Create(() =>
        while (!Task.Current.IsCanceled)
            DoSomething();
    );
//... do some work
t.Cancel();    // does not Thread.Abort, but sets IsCanceled property

Futures Library

Futures are tasks with an associated value.  Normally, there would be a future process to calculate the value.  For example:

Future<int> fi = Future.Create(() => ComputeNumber());
//... at a later point:
Console.WriteLine(fi.Value);    // waits if necessary

Alternatively, you can create a future without an associated method (delegate or lambda), in which case you have to set its value manually:

Future<string> fs = Future.Create<string>();    // no delegate
// main thread:
Console.WriteLine(fs.Value);    // waits
// other thread:
fs.Value = "Computed String";    // releases main thread

Summary

Hopefully, this short tour around the Parallel Extensions library was useful for you.  If these concepts seem new and appealing - go ahead and grab the CTP right now to play with the bits.  There is a change, be part of it.  If you think this has nothing to do with the way you are developing software - think again.  You are getting surrounded.