PDC 2009 Day 3: Power Tools for Debugging
The last session at the PDC that I’m attending is about incubation tools for debugging, from Microsoft Research. Debugging is hard and the process of finding the root cause is manual and therefore tedious and long.
The formal debugging process – ask an expert, check the bug database, check the version history, reproduce the bug, trace in a debugger. [Some existing tools that help along the way are Visual Studio Test Impact analysis, Visual Studio Test Elements and Visual Studio Intellitrace (new in Visual Studio 2010).]
Can we automatically debug the code and find the root cause?
- Holmes – a statistical debugging tool that uses large test suites to diagnose failures. (available for download right now and integrates with Visual Studio)
- Darwin – a tool for debugging regressions using a stable version to diagnose failures.
- Debug Advisor – a recommendation system for bugs that mines software repositories for information related to a bug.
Holmes
Can we use test suites to help the cause of the failures and not just the failures themselves? Statistical debugging is about collecting instrumented data (all acyclic path fragments within a method) from a large set of successful and failing test cases. Next, Holmes tries to look for code paths that strongly correlate with failure. (Looks for code paths that when exercised, almost always cause tests to fail, and when not exercised, almost always cause tests to pass.)
Holmes collects path coverage information (could also collect Intellitrace information), and once it’s available Holmes comes up with a set of potential root causes.
After injecting a bug, the presenter ran a suite of unit tests and saw some test failures. The actual test failures don’t contain sufficient information for full diagnostics. Next, you load the Holmes package for Visual Studio, select the failing test run and Holmes provides two possible root causes. Double-clicking on one of the results brings you to the actual code with the highlighted code path.
Holmes also supports the external Visual Studio Test Manager so that the developer can open a test run from a TFS server later as long as the Holmes coverage data was collected during the test run.
Because of path coverage (instead of full coverage) the extra overhead introduced by Holmes is about 10% – 30%. As for the statistical analysis, it completes quite quickly. The analysis involves measuring correlation between paths and test failures, but it’s not just a simple correlation – if a path correlates with failure it doesn’t necessarily imply that it causes failure (e.g. exception handling, error recovery etc. are associated with failures but don’t cause it). The smarter analysis idea used in Holmes is that you’re looking for paths that correlate with failure, but only inside methods/loops/try-catch statements that do not correlate with failure. This is a very effective heuristic in practice.
Holmes relies on many test cases – if you have lots of test cases, Holmes can help with a good correlation; but for a small number of tests, there’s hardly anything that can be done with the output of a test run. The presenter’s recommendation is about 100 tests with approximately 10-20 failing test cases.
Darwin
Darwin is useful when you have a stable, working build of your applications and then a change is made and introduces a regression. Manual debugging would involve comparing the old version with the new version… (This is tedious for large changes and doesn’t work with regressions present in the previous version but unmasked in this one.)
Comparing test cases is another approach – comparing a trace of the failing test case with a similar, passing test case. Armed with this information, you can look for the place where the test cases diverge. This is the cause of the bug. The problem is that coming up with a passing test is not easy, and that’s the problem Darwin tries to solve.
Darwin defines similarity between test cases and is capable of generating similar, passing tests given a failing test case. To do that, Darwin uses the previous (stable) version of the application. Tests are similar if for the passing test case they follow the same code path but for the failing case they follow different paths.
Finding similar tests is a constraint solving problem, and techniques similar to Pex can be used to solve it. The tool is currently in prototype but it has already been used to detect a bug in a web server as well as an image processing application.
Debug Advisor
Debug Advisor is a recommendation system for bugs. When a bug report is received, Debug Advisor helps answering the questions: Has someone else looked at this bug before or fixed beforeit ? What do we know about this category of bugs? Who should I ask for help? Where should I start looking?
What you do with the tool is take all the information you know about the bug (in text form) and put it in a big search box. Debug Advisor shows you similar bugs, as well as lots of related information about the bug – people related to it (e.g. working on the same part of the project or encountered a similar bug before), source files that might be related to this bug, as well as a list relevant binaries.