Concurrency testing is hard, and the main difficulty is that behavior is non-deterministic and observing the problem might make it disappear and become untraceable.
Two different approaches to the problem are presented by Madan Musuvathi: Cuzz, a tool for concurrency fuzzing – which can be attached to existing tests, randomly provokes scheduling and makes existing concurrency tests much better; and FeatherLite, a data race detection tool which has a fairly low runtime overhead compared to other tools (usually <50%). [See the Concurrency Analysis Platform page at Microsoft Research.]
Cuzz detected a bug in the Dryad environment which only occurs once in >200,000 executions. (Note that Cuzz doesn’t know right from wrong – it schedules executions and it’s up to you to determine that the program’s behavior is faulty.) Attaching Cuzz is easy – using the Detours tool that injects the Cuzz DLL into the application binaries (there’s no need for instrumentation or recompiling). When the program runs, Cuzz determines how the threads are scheduled and there is fancy randomization that determines how to interleave them.
Cuzz guarantees mathematically the probability that it will find a bug. Running the application enough times with Cuzz brings the probability of finding the bug arbitrarily close to 100%.
Programs have a very large number of possible thread scheduling paths; how can Cuzz guarantee that every interleaving occurs, or how can a stress test guarantee that all paths are valid and pass the test? It wants to maximize the probability of finding the bug on every run. The idea is that concurrency bugs are not random and are not adversarial – they usually require more than one thing to go wrong, and they can go wrong in multiple scheduling paths and not just one.
Hopefully this happens in the vicinity of the violating instruction or instructions. They are looking at it as ordering constraints – the number of instructions that need to be scheduled in particular order to exhibit the bug The depth of a bug is the number of ordering constraints sufficient to find the concurrency bug, and most bugs have depth 1 or 2. Finding depth-1 and depth-2 bugs is probably going enough. (Finding a depth-1 bug is usually a matter of 1/n probability where n is the number of threads; a depth-2 bug is usually 1/nk where k is the number of synchronization instructions involved; and then 1/nk^2 and so forth.)
The way Cuzz works internally is essentially by injecting variable-length sleeps at random in the program, where Win32 synchronization mechanisms are invoked. [Adapting Cuzz to a different platform (e.g. managed code) means another abstraction layer must be developed to suit the slightly different synchronization mechanisms and semantics used by managed code.]
FeatherLite is a data race detection tool that doesn’t cause a great amount of overhead. (Existing tools often introduce an overhead of more than an order of magnitude due to complex instrumentation.)
FeatherLite works by instrumenting the application and then running the instrumented program, feeding the synchronization requests and data accesses to a data-race detection algorithm. Considering that there is typically one memory access per five instructions or less, it’s a huge amount of data – and the optimization idea is to have a smart sampling algorithm that discards most of the data accesses and passes only some data accesses to the data-race detection algorithm.
The sampling hypothesis is that data races occur in cold paths more often than in hot paths, and therefore the sampler discards hot paths more often than cold ones. The results are astonishing – finding more than 70% of the data races by looking at less than 2% of memory accesses. Considering the runtime overhead of less than 50%, interactive applications that depend on user input are not significantly affected.
There’s even a GUI front-end that shows how the data race occurred, at source-level!