code analysis – Schneide Blog

Bear up against static code analysis

If you ever had the urge to switch off a rule in your static code analysis tool, this article tries to convince you not to do it. By accepting challenges presented by your tools, you become a better developer and clean up your code on the run.

One of the first things we do when we join a team on a new (or existing) project is to set up a whole barrage of static code analysis tools, like Findbugs, Checkstyle or PMD for java (or any other for virtually every language around). Most of these tools spit out tremendous amounts of numbers and violated rules, totally overwhelming the team. But the amount of violations, (nearly) regardless how high it might be, is not the problem. It’s the trend of the violation curve that shows the problem and its solution. If 2000 findbugs violations didn’t kill your project yet, they most likely won’t do it in the future, too. But if for every week of development there are another 50 violations added to the codebase, it will become a major problem, sooner or later.

Visibility is key

So the first step is always to gain visibility, no matter how painful the numbers are. After the initial shock, most teams accept the challenge and begin to resolve issues in their codebase as soon as they appear and slowly decrease the violation count by spending extra minutes with fixing old code. This is the most valuable phase of static code analysis tools: It enables developers to learn from their mistakes (or goofs) without being embarrassed by a colleague. The analysis tool acts like a very strict and nit-picking code review partner, revealing every flaw in the code. A developer that embraces the changes implied by static analysis tools will greatly accelerate his learning.

But then, after the euphoric initial challenges that improve the code without much hassle, there are some violations that seem hard, if not impossible to solve. The developer already sought out his journey to master the tool, he cannot turn around and just leave these violations in the code. Surely, the tool has flaws itself! The analysis brought up a false positive here! This isn’t faulty code at all, it’s just an overly pedantic algorithm without taste for style that doesn’t see the whole picture! Come to think about it, we have to turn off this rule!

Leave your comfort zone

When this stage is reached, the developers have a deep look into the tool’s configuration and adjust every nut and bolt to their immediate skill level. There’s nothing wrong with this approach if you want to stay on your skill level. But you’ll miss a chance to greatly improve your coding skills by allowing the ruleset to be harder than you can cope with now. Over time, you will come up with solutions you now thought are impossible. It’s like fitness training for your coding skills, you should raise the bar every now and then. Unlike fitness training, nobody gets hurt if the numbers of your code analysis show more violations than you can fix up right now. The violations are in the code, if you let them count or not.

Once, a fellow developer complained really loud about a specific rule in a code analysis tool. He was convinced that the rule was pointless and should be switched off. I asked about a specific example where this rule was violated in his code. When reviewing the code, I thought that applying the rule would improve the code’s internal structure (it was a rule dealing with collapsible conditional statements). In the discussion on how to implement the code block without violating the rule, the real problem showed up – my colleague couldn’t think about a solution to the challenge. So we proceeded to implement the code block in a dozen variations, each without breaking the rule. After the initial few attempts that I had to lead program for him, he suddenly came up with even more solutions. It was as if a switch snapped in his head, from “I’m unable to resolve this stupid rule” to “Hey, if we do it this way, we even can get rid of this local variable”.

Embrace challenges

Don’t trick yourself into thinking that just because your analysis tool doesn’t bring up these esoteric violations anymore after you switched off the rules, they are gone. They are still in your code, just hidden and without your awareness. Bear up against your analysis tool and fix every violation it brings you, one after the other. The tools aren’t there to annoy you, they want to help you stay clear of trouble by pointing out the flaws in a clear and precise manner. Once you meet the challenges the tool presents you with, your skill level will increase automatically. And as a side effect, your code becomes cleaner.

Beyond clean code

Even if every analysis tool approves your code as being clean, it can still be improved. You might have a look at Object Calisthenics or similar code training rulesets. They work the same way as the analysis tools, but without the automatic enforcement (yet). The goal is always cleaner code and higher skilled developers.

Combine cobertura with the awesomeness of crap4j

Want the awesomeness of crap4j without running your tests twice in your build? Just combine it with your cobertura data using crapertura.

You may have heard of crap4j when it was still actively developed. Crap4j is a software metric that points you to “crappy” methods in your projects by combining cyclomatic complexity numbers with test coverage data. The rationale is that overly complex code can only be tamed by rigorous testing or it will quickly reduce to an unmaintainable mess – the feared “rotten code” or “crappy code”, as Alberto Savoia and Bob Evans, the creators of crap4j would put it. The crap4j metric soon became our most important number for every project. It’s highly significant, yet easy to grasp and mandates a healthy coding style.

Some enhancements to crap4j

Crap4j got even better when we developed our own custom enhancements to it, like the CrapMap or the crap4j hudson plugin. We have a tool that formats the crap4j data like cobertura’s report, too.

A minor imperfection

The only thing that always bugged me when using crap4j inside our continuous integration build cycle was that at least half the data was already gathered. Cobertura calculates the code coverage of our tests right before crap4j does the same again. Wouldn’t it be great if the result of the first analysis could be re-used for the crap metric to save effort and time?

Different types of coverage

Soon, I learnt that crap4j uses the “path coverage” to combine it with the complexity of a method. This is perfectly reasonable given that the complexity determines the number of different pathes through the method. Cobertura only determines the “line coverage” and “branch coverage”. As it stands, you can’t use the cobertura data for crap4j because they represent different approaches to measure coverage. That’s still true and probably will be for a long time. But the allurement of the shortcut approach was too high for me to resist. I just tried it out one day to see the real difference.

A different metric

So, here it is, our new metric, heavily inspired by crap4j. I just took the line and branch coverage for every method and multiplied them. If you happen to have a perfect coverage (1.0 on both numbers), it stays perfect. If you only have 75% coverage on both numbers, it will result in a “crapertura coverage” of 56,25%. Then I fed this new coverage data into crap4j and compared the result with the original data. Well, it works on my project.

Presenting crapertura

Encouraged by this result, I wrote a complete ant task that acts similar to the original crap4j ant task. You can nearly use it as a drop-in replacement, given that the cobertura XML report file is already present. Here is an example ant call:


<crapertura
coberturaReportFile="/path/to/cobertura/coverage.xml"
targetDirectory="/where/to/place/the/crap4j/report"
classesDirectory="/your/unarchived/project/class/files"
/>

It will output the usual crap4j report files to the given target directory. Please note that even if it looks like crap4j data, it’s a different metric and should be treated as such. Therefore, online comparison of numbers is disabled.

The whole project is published on github. Feel free to browse the code and compile it for yourself. If you want a binary release, you might grab the latest jar from our download server.

The complete usage guide can be found on the github page or inside the project. If you have questions or issues, please use the comment section here.

Conclusion

If crapertura is able to give you nearly the numbers that crap4j gave you is up to your project, really. Our test project contained over 20k methods, but very little crap. The difference between crap4j and crapertura was negligible. Both metrics basically identified the same methods as being crappy. Your mileage may vary, though. If that’s the case, let us know. If your experience is like ours, you’ve just saved some time in your build cycle without sacrificing quality.

A guide through the swamp – The CrapMap

Locate your crappy methods quickly with this treemap visualization tool for Crap4J report data.

One of the most useful metrics to us in the Softwareschneiderei is “CRAP”. For java, it is calculated by the Crap4J tool and provided as an HTML report. The report gives you a rough idea whats going on in your project, but to really know what’s up, you need to look closer.

A closer look on crap

The Crap4J tool spits out lots of numbers, especially for larger projects. But from these numbers, you can’t easily tell some important questions like:

Are there regions (packages, classes) with lots more crap than others?
What are those regions?

So we thought about the problem and found it to be solvable by data visualization.

Enter CrapMap

If you need to use advanced data visualization techniques, there is a very helpful project called prefuse (which has a successor named flare for web applications). It provides an exhaustive API to visualize nearly everything the way you want to. We wanted our crap statistics drawn in a treemap. A treemap is a bunch of boxes, crammed together by a clever layouting strategy, each one representing data, for example by its size or color.

The CrapMap is a treemap where every box represents a method. The size gives you a hint of the method’s complexity, the color indicates its crappyness. Method boxes reside inside their classes’ boxes which reside in package boxes. That way, the treemap represents your code structure.

A picture worth a thousand numbers

This is a screenshot of the CrapMap in action. You see a medium sized project with few crap methods (less than one percent). Each red rectangle is a crappy method, each green one is an acceptable method regarding its complexity.

Adding interaction

You can quickly identify your biggest problem (in terms of complexity) by selecting it with your mouse. All necessary data about this method is shown in the bottom section of the window. The overall data of the project is shown in the top section.

If you want to answer some more obscure questions about your methods, try the search box in the lower right corner. The CrapMap comes with a search engine using your methods’ names.

Using CrapMap on your project

CrapMap is a java swing application, meant for desktop usage. To visualize your own project, you need the report.xml data file of it from Crap4J. Start the CrapMap application and load the report.xml using the “open file” dialog that shows up. That’s all.

In the near future, CrapMap will be hosted on dev.java.net (crapmap.dev.java.net). Right now, it’s only available as a binary executable from our download server (1MB download size). When you unzip the archive, double-click the crapmap.jar to start the application. CrapMap requires Java6 to be installed.

Show your project

We would be pleased to see your CrapMap. Make a screenshot, upload it and leave a comment containing the link to the image.

Easy code inspection using QDox

Spend five minutes and inspect your code for the aspect you always wanted to know using the QDox project.

So, you’ve inspected your Java code in any possible way, using Findbugs, Checkstyle, PMD, Crap4J and many other tools. You know every number by heart and keep a sharp eye on its trend. But what about some simple questions you might ask yourself about your project, like:

How many instance variables aren’t final?
Are there any setXYZ()-methods without any parameter?
Which classes have more than one constructor?

Each of this question isn’t of much relevance to the project, but its answer might be crucial in one specific situation.

Using QDox for throw-away tooling

QDox is a fine little project making steady progress in being a very intuitive Java code structure inspection API. It’s got a footprint of just one JAR (less than 200k) you need to add to your project and one class you need to remember as a starting point. Everything else can be learnt on the fly, using the code completion feature of your favorite IDE.

Let’s answer the first question of our list by printing out all the names of all instance variables that aren’t final. I’m assuming you call this class in your project’s root directory.

public class NonFinalFinder {
    public static void main(String[] args) {
         File sourceFolder = new File(".");
         JavaDocBuilder parser = new JavaDocBuilder();
         builder.addSourceTree(sourceFolder);
         JavaClass[] javaClasses = parser.getClasses();
         for (JavaClass javaClass : javaClasses) {
             JavaField[] fields = javaClass.getFields();
             for (JavaField javaField : fields) {
                 if (!javaField.isFinal()) {
                     System.out.println("Field "
                       + javaField.getName()
                       + " of class "
                       + javaClass.getFullyQualifiedName()
                       + " is not final.");
                }
            }
        }
    }
}

The QDox parser is called JavaDocBuilder for historical reasons. It takes a directory through addSourceTree() and parses all the java files it finds in there recursively. That’s all you need to program to gain access to your code structure.

In our example, we descend into the code hierarchy using the parser.getClasses() method. From the JavaClass objects, we retrieve their JavaFields and ask each one if it’s final, printing out its name otherwise.

Praising QDox

The code needed to answer our example question is seven lines in essence. Once you navigate through your code structure, the QDox API is self-explanatory. You only need to remember the first two lines of code to get started.

The QDox project had a long quiet period in the past while making the jump to the Java 5 language spec. Today, it’s a very active project published under the Apache 2.0 license. The developers add features nearly every day, making it a perfect choice for your next five-minute throw-away tool.

What’s your tool idea?

Tell me about your code specific aspect you always wanted to know. What would an implementation using QDox look like?

	Impressions of Our C… on Computing gets fuzzy again (AI…
	Computing gets fuzzy… on Impressions of Our Current AI…
	mariuselvert on Calculating the Number of Segm…
	Anonymous on Calculating the Number of Segm…
	Anonymous on Avoiding Code Style Discu…