DCSIMG
Similarity Analyzer - All Your Base Are Belong To Us

All Your Base Are Belong To Us

Mostly .NET internals and other kinds of gory details

Similarity Analyzer

Simian (Similarity Analyzer) detects duplicate code in almost every human readable text format.

The primary case brought forward by the product webpage is that you can correct a bug in one place in the program, without knowing that the same buggy code was copy-pasted to another location in the program.

But that’s just a tiny fraction of the scenarios where “reuse by copy-paste” detection can improve the quality of your code.  Additionally, I can think of even more esoteric but interesting scenarios, such as plagiarism detection.

Finally, you could address the output of Simian as a code quality metric, and even consider it a build-breaking metric.  Some amusing figures:

  • From the Simian webpage: Running against a large source base such as the entire 390,309 LOC […] in 4,242 files of the JDK 1.5.0_13 source, identified 66,375 duplicate LOC in 1,260 files […]
  • When I ran Simian on the sources of the CLR Profiler 2.0, it found 2,928 duplicate lines out of a total of 17,714 lines.
  • When I ran Simian on a small part of the sources of the latest version of ReactOS, it found 8,138 duplicate lines out of a total of 55,951 lines.

Why don’t you try running Simian on your own source code?

Comments

ripper234 said:

I've been using TeamCity's duplicate code analyzer for the same purpose. It's not a stand alone product, but comes with the whole TeamCity test server, but if you've already got TeamCity it's really convenient. It also tracks statistics and past trends.

# May 2, 2009 6:43 PM
Leave a Comment

(required) 

(required) 

(optional)

(required) 


Enter the numbers above: