Similarity Analyzer
Simian (Similarity Analyzer) detects duplicate code in almost every human readable text format.
The primary case brought forward by the product webpage is that you can correct a bug in one place in the program, without knowing that the same buggy code was copy-pasted to another location in the program.
But that’s just a tiny fraction of the scenarios where “reuse by copy-paste” detection can improve the quality of your code. Additionally, I can think of even more esoteric but interesting scenarios, such as plagiarism detection.
Finally, you could address the output of Simian as a code quality metric, and even consider it a build-breaking metric. Some amusing figures:
- From the Simian webpage: Running against a large source base such as the entire 390,309 LOC […] in 4,242 files of the JDK 1.5.0_13 source, identified 66,375 duplicate LOC in 1,260 files […]
- When I ran Simian on the sources of the CLR Profiler 2.0, it found 2,928 duplicate lines out of a total of 17,714 lines.
- When I ran Simian on a small part of the sources of the latest version of ReactOS, it found 8,138 duplicate lines out of a total of 55,951 lines.
Why don’t you try running Simian on your own source code?