Depth-first programmers

Depth-first programmers are always busy creating horribly complicated solutions that are somehow off the mark. Here’s why and what to do against it.

Just as there are at least two fundamentally different approaches for searching, namely depth-first and breadth-first search, there are also different types of programmers. The depth-first programmer is a dangerous type, as he is prone to yak shaving and reinvention of the wheel.

The depth-first programmer

Let me try to define the term of a “depth-first programmer” by a little (true) story. A novice java programmer should make some changes to an existing code. To secure his work, he should and wanted to write unit tests in JUnit. He started the work and soon enough, first results could be seen. But when he started to write his tests, the progress notifications stopped. The programmer worked frantically for hours and then days to write what appeared to be some simple data-driven tests.

Finally, the novice java programmer reported success and showed his results. He wrote his tests and “had to extend JUnit a bit to do it right”. Wait, what? Well, in JUnit, the test methods cannot have parameters, but the programmer’s tests needed to be parametrized. So he replaced the part of JUnit that calls the test methods by reflection with an “improved” algorithm that could also inject parameters. His implementation relied on obscure data structures that provided the actual parameter values and only really worked for his needs. The whole mess was nearly intangible, a big bloat and needed most of the development time for the unit tests.

And it was totally unnecessary once you learn about “Parameterized” JUnit4 tests or build light-weight data drivers instead of changing the signature of the test method itself. But this programmer dove deep into JUnit to adjust the framework itself to his needs. When asked about it, he stated that “he needed to pass the parameters somehow”. That’s right, but he choose the most expensive way to do so.

He exhibited the general behaviour of a depth-first programmer: whenever you face a problem, take the first possible solution to a problem you can come up with and work on it without evaluation against other possibilities. And continue on the path without looking back, no matter how long it takes.

Stuck in activism

The problem with this approach should be common sense. The obvious option isn’t always the best or even a good one. You have to evaluate the different possible solutions by their advantages and drawbacks. A less obvious solution might be far better in every aspect but obviousness. Another problem with this approach is the absence of internal warning signs.

Getting stuck is an internal warning sign every programmer understands. You’ve worked your way in a certain direction and suddenly, you cannot advance further. Something blocks your anticipated way of solving the problem and you cannot think of an acceptable way past it. A depth-first programmer never gets stuck this way. No matter how expensive, he will pursue the first thing that brings him closer to the target. A depth-first programmer always churns out code at full speed. Most of it isn’t needed on second thought and can be plain harmful when left in the project. The depth-first programmer will always report progress even when he needs days for a task of minutes. He is stuck in activism.

Progress without guidance

This isn’t a rant about incompetent programmers. Every good programmer knows the situation when you suddenly realize that you’re shaving a yak when all you wanted to do is to add a feature to the code base. This is your self-guidance system regaining consciousness after a period of auto-piloting in depth-first mode. Every programmer behaves depth-first sometimes.

This can be explained with the Dreyfus Model of Skill Acquisition. On the first stage, called “Beginner”, you are simply not capable of proper self-evaluation. You cannot distinguish between good and not so good approaches beforehands or even afterwards. Your expertise in the narrow field of the problem at hand isn’t broad enough to recognize an error even when you are working on the error yourself for prolonged times.

In the Dreyfus Model, a beginner needs external guidance. Somebody with more experience has to point out errors for you and formulate alternatives as clearly and specific as possible. Without external guidance, a beginner will become a depth-first programmer. We’ve all been there.

 Be a guide

The real failure in the story above was done by me. Instead of interacting with the novice java programmer after a few hours when I thought he should be done by now, I let him “advance”. I could have avoided the resulting mess by providing guidance and a few alternate solutions for the immediate problem. I would give an overview of the problem’s context and some hints about the general direction this task should be solved.

Every depth-first programmer works in a suboptimal environment. The programmer tries his best, it’s really the environment that could do better.

So, the next time you see somebody working frantically on a problem that should be rather easy to solve, lend him a hand. Be gentle and empathic about his attempt and work with proposals, not with instructions. Perhaps you’ve spared yourself a mess like an unnecessarily extended JUnit library and the depth-first programmer the frustration when his hard work of several days is silently discarded.

A review of the year 2011 at Softwareschneiderei

This is a review of the year 2011 for the Softwareschneiderei, a software development company from Karlsruhe, Germany.

The current year 2011 is coming to an end. This is the traditional time to pause and reflect on what has happened. This blogpost tries to sum up our year in software development at Softwareschneiderei. It was an interesting, entertaining and successful year for us, that’s for sure.

The official parts

Our developer blog was alive throughout the hardest times of the year, when everyone was under full project load. Every week, one of our developer shares a little posting with the world. The blog is still managed by token only. Looking at the visitor statistics, we fully appreciate your attention. The first blog post of this year looked at the remainder of a failed project. “A tale of scrap metal code” was a detailed vivisection in three parts. Over the course of the year, we wrote about bogus error messages, Groovy, Grails, GORM and some confessions about coding style and multithreading. If you have the time, spend a few minutes to browse our blog post archive for this year.

Our “official” company blog, written in german language, had no activity this year. We can certainly do better than this and take it on the list for 2012.

The company homepage, written in german language, had continuous updates and extensions this year. We coupled the “Open Source Love Day” (OSLD) with our “Homepage Comittee”, when every employee has to improve the homepage in some aspects and present the change to the “comittee”. Unfortunately, this somehow lead to fewer OSLDs this year.

The ongoing Dev Brunch sessions thinned out a bit in the second half of the year. This was a concession to the ever-growing workload. We strive to establish a tighter schedule with accompanying blog posts in the next year.

The internal parts

We were under heavy development load this year. This isn’t a bad thing, but impacts the internal communication and team building process. We tried to cope a bit by restructuring the Open Source Love Day to a “Team Day”, when the whole team meets and works on various internal or hobby projects. Some of these days were spend on Code Camps and other training events. This means less love for the open source community, but crucial together time for us.

We picked up several “new” programming languages this year. You can tell by the blog posts that we worked with Python, Ruby, Flex/ActionScript and even VisualBasic on real projects. The VisualBasic experience was a little epiphany that it’s really the developer and not the language that leads to shitty software.

One method of continuous improvement is our “creativity budget”. Basically, it’s money every developer can spend to improve his workplace. This budget wasn’t used at all this year, as the workplaces seem to be optimal. This cannot be true, so we bought brand new computers or a big RAM upgrade for everyone. And every computer has a big enough SSD now. We took our own advice seriously and invested in our productivity.

Our developer crew grew again this year. We are beginning to think about the remaining space in the new office again. But as usual, we grow slowly and deliberately. There’s nothing worse than a team of strangers.

Conclusion

The year 2011 was great! We’re looking forward to the year 2012, with our motto of christmas 2011: “cheery and spry” (the original motto is in german language “froh und munter”, I hope the translation caught the original spirit).

Have a great turn of the year, everyone. We’d love to see you again next year.

The Story of a Multithreading Sin

The story of a bug that was caused by a common multithreading pitfall, the dreaded liquid lock.

In my last blog entry, I wrote about multithreading pitfalls (in Java), and ironically, this was the week when we got a strange bug report from one of our customers. This blog entry tells the story of the bug and adds another multithreading pitfall to the five I’ve already listed in my blog entry “When it comes to multithreading, better be safe than sorry”.

The premise

We developed a software that runs on several geographically distant independent “stations” that collect a multitude of environmental measurement data. This data is preprocessed and stuffed into data packages, which are periodically transferred to a control center. The software of this control center, also developed by us, receives the data packages, stores them on disk and in a huge database and extracts the overall state of the measurement network from raw data. If you describe the main task of the network on this level, it sounds nearly trivial. But the real functionality requirements are manifold and the project grew large.

We kept the whole system as modular as necessary to maintain an overall grasp of what is going on where in the system and installed a sufficient automatic test coverage for the most important parts. The system is still under active development, but the main parts of the network are in production usage without real changes for years now.

The symptoms

This might explain that we were very surprised when our customer told us that the control center had lost some data packages. Very soon, it turned out that the control center would randomly enter a state of “denial”. In this state, it would still accept data packages from the stations and even acknowledge their arrival (so the stations wouldn’t retry the transmission), but only write parts of the package or nothing at all to the disk and database. When the control center entered this state, it would never recover from it. But when we restarted the software manually, everything would run perfectly fine for several days and then revert back into denial without apparent trigger.

We monitored the control center with every means on our disposal, but its memory consumption, CPU footprint and threading behaviour was without noticeable problem even when the instance was in its degraded state. There was no exception or uncommon entry logged in the logfiles. As the symptom happened randomly, without external cause and with no chance of reversal once it happened, we soon suspected some kind of threading issue.

The bug

The problem with a threading issue is that you can’t just reproduce the bug with an unit or system test. We performed several code reviews until we finally had a trace. When a data package arrives, a global data processing lock is acquired (so that no two data packages can be processed in parallel) and the content of the package is inspected. This might trigger several network status changes. These change events are propagated through the system with classic observer/listener structures, using synchronous calls (normal delegation). The overall status of the network is translated in a human readable status message and again forwarded to a group of status message listeners. This is a synchronous call again. One of the status message listeners was the software driver for a LED ticker display. This module was a recent addition to the control center’s hardware outfit and used to display the status message prominently to the operators. Inside this LED software driver, some bytes are written to a socket stream and then the driver awaits an answer of the hardware device. To avoid the situation that two messages are sent to the device at the same time, a lock is acquired just before the message is sent. This code attracted our attention. Lets have a look at it:

private Message lastMessage = new Message();

public void show(Message message) {
    synchronized (this.lastMessage) {
        writeCommandAndWaitForResponse(Command.SHOW_TEXT, message.asBytes());
        this.lastMessage = message;
    }
}

The main problem here is the object the lock is acquired upon: the reference of lastMessage is mutable! We call this a liquid lock, because the lock isn’t as solid as it should be. It’s one of the more hideous multithreading pitfalls as it looks like everything’s fine at first glance. But this lock doesn’t have a complete “locking” effect because each caller may acquire the lock of a different instance. And a lock with a flawed locking behaviour is guaranteed to fail (in production). The liquid lock is like the bigger brother of the local lock. It isn’t local, but its mutability cause the same problems.

The bug finally turned out to be caused by the liquid lock in the LED display driver that got notified of system message changes when a data package arrived. But only if multiple messages were sent at once to the device, discarding some of the necessary answers in this circumstance or if the connection to the LED hardware would fail in the midst of a transmission, the system would not return from the write attempt. If one thread wouldn’t return to the data package processor, the global data processing lock would not be freed (read the start of this chapter again, this is the most important lock in the system!). And while the data processing lock was still held, all other data packages would be received, but piling up to obtain the lock. But the lock would never be returned from the thread waiting on an answer from a hardware device that had no intention to send another answer. This was when the control center appeared to be healthy but didn’t process any data packages anymore.

The conclusion

If you want to avoid the category of liquid lock multithreading bugs, make sure that all your lock instance references are immutable. Being final is an important property of lock instance references. Avoid to retrieve your locks from notoriously muteable data structures like collections or arrays. The best thing you can do to avoid liquid locks is to “freeze” all your lock instances.

Another insight from this story is that software modules have to be separated threadwise, too. It was a major design flaw to let the data processing thread, while holding the main processing lock, descend down into the deep ends of the LED driver, eventually getting stuck there for infinity. Some simple mechanisms like asynchronous listener notification or producer/consumer queues for pending transmission requests would have helped to confine the effects of the liquid lock bug inside the LED module. Without proper thread separation, it took down the whole software instance.

When it comes to multithreading, better be safe than sorry

Writing multithreaded applications in Java is hard. Here are five problems and how to avoid them without much effort (mostly).

Recently, I attended a code review of the core parts of a web application, written in Java. The application is used by a large customer base and occassionally, there are error reports and exceptions in the log files. Some of these exceptions are the dreaded ConcurrentModificationExceptions, indicating conflicting read/write access on an unsynchronized collection data structure. In the code review, we found several threading flaws, but not after an exhaustive reading of the whole module. Here, I want to present the flaws and give some advice on how to avoid them:

The public lock

In some parts of the code, methods were defined as synchronized through the method declaration keyword:

public synchronized String getLastReservation() { [...]

While there is nothing wrong with this approach in itself, it can be highly dangerous in combination with synchronized blocks. The code above effectively wraps a synchronized block using the object instance (this) as a lock. No information of an object is more publicly visible as the object reference (this), so you have to check all direct or indirect clients of this object if they synchronize on this instance, too. If they do, you have chained two code blocks together, probably without proper mentioning of this fact. The least harmful defect will be performance losses because your code isn’t locked as fine grained as it could be.

The easiest way to avoid these situations it to always hide the locks. Try not to share one object’s locks with other objects. If you choose publicly accessible locks, you can never be sure about that.

The subtle lock change

In one class, there were both instance and class (static) methods, using the synchronized keyword:

public synchronized String getOrderNumberOf(String customerID) { [...]
public  synchronized static int getTotalPendingOrders() { [...]

And while they were both accessing the same collection data structure (a static hashmap), they were using different locks. The lock of the instance method is the instance itself, while the lock of the static method is the class object of the type. This is very dangerous, as it can be easily missed when writing or altering the code.

The best way to prevent this problem it to avoid the synchronized modifier for methods completely. State your locks explicitely, all the time.

Partial locking

In a few classes, collection datatypes like lists were indeed synchronized by internal synchronized-blocks in the methods, using the private collection instance as lock. The synchronized blocks were applied to the altering methods like putX(), removeX() and getX(). But the toString() method, building a comma-separated list of the textual list entries, wasn’t synchronized to the list. The method contained the following code:

public String toString() {
    StringBuilder result = new StringBuilder();
    for (String entry : this.list) {
        result.append(entry);
        result.append(",");
    }
    [...]
    return result.toString();
}

I’ve left out some details and special cases, as they aren’t revelant here. The problem with the foreach loop is that an anonymous Iterator over the list is used and it will relentlessly monitor the list for any changes and throw a ConcurrentModificationException as soon as one of the properly synchronized sections changes it. The toString() method was used to store the list to a session dependent data storage. Every once in a while, the foreach loop threw an exception and failed to properly persist the list data, resulting in data loss.

The most straight-forward solution to this problem might be to add the missing synchronization block in the toString() method. If you don’t want to block the user session while writing to disk, you might traverse the list without an Iterator (and be careful with your assumptions about valid indices) or work on a copy of the list, given that an in-memory copy of the list would be cheap. In an ACID system scenario, you should probably choose to complete your synchronized block guards.

Locking loophole

Another problem was a collection that was synchronized internally, but could be accessed through a getter method. No client could safely modify or traverse the collection, because they had the collection, but not the lock object (that happened to be the collection, too, but who can really be sure about that in the future?). It would be ridiculous to also provide a getter for the lock object (always hide your locks, remember?), the better solution is to refactor the client code to a “tell, don’t ask” style.

To prevent a scenario when a client can access a data structure but not its lock, you shouldn’t be able to gain access to the data structure, but pass “command objects” to the data structure. This is a perfect use case for closures. Effectively, you’ll end up with something like Function or Operation instances that are applied to every element of the collection within a synchronized block and perform your functionality on them. Have a look at op4j for inspirational syntax.

Local locking

This was the worst of all problems and the final reason for this blog entry: In some methods, the lock objects were local variables. In summary, these methods looked like this:

public String getData() {
    Object lock = new Object();
    synchronized (lock) {
        [...]
    }
}

Of course, it wasn’t that obvious. The lock objects were propagated to other methods, stored in datastructures, removed from them, etc. But in the end, each caller of the method got his own lock and could henceforth wreck havoc in code that appeared very well synchronized on first look. The error in its clarity is too stupid to be widespread. The problem was the obfuscation around it. It took us some time to really understand what is going on and where all that lock objects really come from.

My final advice is: If you have to deal with multithreading, don’t outsmart yourself and the next fellow programmer by building complex code structures or implicit relationships. Be as concise and explicit as you can be. Less clutter is more when dealing with threads. The core problem is the all-or-none law of thread synchronization: Either you’ve got it all right or you’ve got it all wrong – you just don’t know yet.

Hide your locks, name your locks explicitely, reduce the scope of necessary locking so that you can survey it easily, never hand out your locked data, and, most important, remove all clutter around your locking structures. This might make the difference between “just works” and endless ominous bug reports.

Separate your code domains

You can improve your code reusability by separating the technical domain code from the business domain code. This article tries to explain how to start.

When you develop software, you most likely have to think in two target domains at the same time. One domain will be the world of your stakeholder. He might talk about business rules and business processes and business everything, so lets call it the business domain. The other domain is the world you own exclusively with your colleagues, it’s the world of computers, programming languages and coding standards. Lets call it the technical domain. It’s the world where your stakeholders will never follow you.

Mixing the domains

Whenever you create source code, you probably try to solve problems in the business domain with the means of your technical domain – e.g. the programming language you’ve chosen on the hardware platform you anticipate the software to run on. Inevitably, you’ll mix parts of the business domain with parts of the technical domain. The main question is – will it blend? Most of the time, the answer is yes. Like milk in coffee, the parts of two domains will blend into an inseparable mixture. Which isn’t necessarily a bad thing – your solution works just fine.

The hard part comes when you want to reuse your code. It’s like reusing the milk in your coffee, but without the coffee. You’ve probably done it, too (reusing domain-blended code, not extracting the milk from your coffee) and it wasn’t the easy “just copy it over here and everything’s fine” reusability you’ve dreamt of.

Separating the domains

One solution for this task begins by realizing which code belongs to which domain. There isn’t a clear set of rules that you can just check and be sure, but we’ve found two rules of thumb helpful for this decision:

  • If you have a strong business domain data type model in your code (that is, you’ve modelled many classes to directly represent concepts and items from your stakeholder’s world), you can look at a line of code and scan for words from the business domain. If there aren’t any, chances are that you’ve found a line belonging to the technical domain. If you prefer to model your data structures with lists and hashmaps containing strings and integers, you’re mostly out of luck here. Hopefully, you’ve chosen explicit names for your variables, so you don’t end with a line stating map.get(key), when in fact, you’re looking up orders.getFor(orderNumber).
  • For every line of code, you can ask yourself “do I want to write it or do I have to write it?”. This question assumes that you really want to solve the problems of the business domain. Every line of code you just have to write because otherwise, the compiler, the QA department, your colleagues or, ultimately, your coder idol of choice would be disappointed is a line from the technical domain. Every line of code that would only disappoint your stakeholder if it would be missing is a line from the business domain. Most likely, everything that your business-driven tests assert is code from the business domain.

Once you categorized your lines of code into their associated domain, you can increase the reusability of your code by separating these lines of code. Effectively, you try to avoid the blending of the parts, much like in a good latte macchiato. If you achieve a clear separation of the different code parts, chances are that you have come a long way to the anticipated “copy and paste” reusability.

Example one: Local separation

Well, all theory is nice and shiny, but what about the real (coding) life? Here are two examples that show the mechanics of the separation process.

In the first example, we’re given a compressed zip file archive as an InputStream. Our task is to write the archive entries to disk, given that certain rules apply:

public void extractEntriesFrom(InputStream in) {
    ZipInputStream zipStream = new ZipInputStream(in);
    try {
         ZipEntry entry = null;
         while ((entry = zipStream.getNextEntry()) != null) {
             if (rulesApplyFor(entry)) {
                 File newFile = new File(entry.getName());
                 writeEntry(zipStream,
                      getOutputStream(basePath(), newFile));
             }
             zipStream.closeEntry();
         }
    } catch (IOException e) {
        e.printStackTrace();
    } finally {
        IOHandler.close(zipStream);
    }
}

This is fairly common code, nothing to be proud of (we can argue that the method signature isn’t as explicit as it should be, the exceptions are poorly handled, etc.), but that’s not the point of this example. Try to focus your attention to the domain of each code line. Is it from the business or the technical domain? Let me refactor the example to a form where the code from both domains is separated, without changing the additional flaws of the code:

public void extractEntriesFrom(InputStream in) {
    ZipInputStream zipStream = new ZipInputStream(in);
    try {
         ZipEntry entry = null;
         while ((entry = zipStream.getNextEntry()) != null) {
             handleEntry(entry, zipStream);
             zipStream.closeEntry();
         }
    } catch (IOException e) {
        e.printStackTrace();
    } finally {
        IOHandler.close(zipStream);
    }
}

protected void handleEntry(ZipEntry entry,
        ZipInputStream zipStream) throws IOException {
    if (rulesApplyFor(entry)) {
        File newFile = new File(entry.getName());
        writeEntry(zipStream,
            getOutputStream(basePath(), newFile));
    }
}

In this version of the same code, the method extractEntriesFrom(…) doesn’t know anything about rules or how to write an entry to the disk. Everything that’s left in the method is part of the technical domain – code you have to write in order to perform something useful within the business domain. The new method handleEntry(…) is nearly free of technical domain stuff. Every line in this method depends on the specific use case, given by your business domain.

Example two: Full separation

Technically, the first example only consisted of a simple refactoring (Extract Method). But by separating the code domains, we’ve done the first step of a journey towards code reusability. It begins with a simple refactoring and ends with separated classes in separated packages from two separated project parts, named something like “application” and “framework”. Even if you only find a class named “Tools” or “Utils” in your project, you’ve done intermediate steps towards the goal: Separating your technical domain code from your business domain code in order to reuse the former (because no two businesses are alike).

The next example shows a full separation in action:

WriteTo.file(target).using(new Writing() {
    @Override
    public void writeTo(PrintWriter writer) {
        writer.println("Hello world!");
        writer.println("Hello second line.");
        // more business domain code here
    }
});

Everything other than the first line (and the necessary java boilerplate) is business domain code. In the first line, only the specified target file isn’t technical. Everything related to opening the file output stream, handling exceptions, closing all resources and all the other fancy stuff you want to do when writing to a file is encapsulated in the WriteTo class. The equivalent to the handleEntry(…) method from the first example is the writeTo(…) method of the Writing interface. Everything within this method is purely business domain related code. The best thing is: you can nearly forget about the technical domain when filling out the method, as it is embedded in a reusable “code clamp” providing the proper context.

Conclusion

If you want to write reusable code, consider separating your two major code domains: the technical domain and the business domain. The first step is to be aware of the domains and distinguish between them. Your separation process then can start with simple extractions and finally lead to a purely technical framework where you “only” have to fill in the business domain code. By the way, it’s a variation of the classic “separation of concerns” principle, if you want to read more.

A VisualBasic.NET cheat sheet for Java developers

If you want to learn VisualBasic.NET coming from a Java perspective, we’ve prepared a little cheat sheet to ease the transition.

Sometimes, we cannot choose what language to implement a project in. Be it because of environmental restrictions (everything else is programmed in language X) or just because there’s an existing code base that needs to be extended and improved. This is when our polyglot programming mindset will be challenged. In a recent project, we picked up the current incarnation of VisualBasic, a language most of us willfully forgot after brief exposure in the late nineties, more than 10 years ago.

Spaceward Ho!

So we ventured into the land of VisualEverything, installing VisualStudio (without ReSharper at first) and finding out about the changes in VisualBasic.NET compared to VisualBasic 6, the language version we used back in the days. Being heavily trained in Java and “javaesque” languages, we were pleasantly surprised to find a modern, object-oriented language with a state-of-the-art platform SDK (the .NET framework) and only little reminiscences of the old age. Microsoft did a great job in modernizing the language, cutting out maybe a bit too much language specific stuff. VisualBasic.NET feels like C# with an uninspired syntax.

Making the transition

To ease our exploration of the language features of VisualBasic.NET, one of our student workers made a comparison table between Java and VisualBasic.NET. This cheat sheet helped us tremendously to wrap our heads around the syntax and the language. The platform SDK is very similar to the Java API, as you can see in the corresponding sections of the table. And because it helped us, it might also help you to gain a quick overview over VisualBasic.NET when you are heading from Java.

I have to thank Frederik Zipp a lot for his work. My only contribution to this cheat sheet is the translation from german to english. I can only try to imagine his effort of putting everything together. And while you might read the whole comparison in about 21 minutes (as stated in the title), it’s worth several hours of searching.

The downloads

And without much further ado, here are the download links for the HTML and PDF versions of the “Java vs. VisualBasic.NET cheat sheet”:

You may use and modify the documents as you see fit. If you redistribute it, please adhere to the Creative Commons Attribution-ShareAlike license. Thank you.

Summary of the Schneide Dev Brunch at 2011-07-17

A summary of our Dev Brunch at Sunday 2011-07-17. You’ll read about conferences, the GRASP principles and some cool projects to know about, mostly.

Last Sunday, the 17th July of 2011, we held another Dev Brunch at our company.

A Dev Brunch is an event that brings three main ingredients together: developers, food and software industry related topics. Given enough time (there is never enough time!), we chat, eat, learn and laugh the whole evening through. Most of the stories and chitchat that is told cannot be summarized and has little value outside its context. But most participants bring a little topic alongside their food bag, something of interest they can talk like 10 minutes about. This blog post summarizes at least the official topics and gives links to additional resources.

Conference review of the Java Forum Stuttgart 2011

The Java Forum Stuttgart is an annual conference held by the Java User Group Stuttgart. It’s the biggest regional Java event and always worth a visit (as long as you understand the german language). This year, the talks stagnated a bit around topics that are mostly well-known.

The best talk was given by Michael Wiedeking from MATHEMA Software GmbH in Erlangen. The talk titled “The next big (Java) thing”, but mostly addressed the history and current state of Java in an entertaining and thought-provoking way. The premise was that you have to know the past and present to anticipate the future. The slides don’t represent the talk well enough, but here’s a link anyway.

Another session introduced the PatternTesting toolkit, a collection of helper classes and useful features that enrich the development of unit testing. Alongside the other spice you can add to unit tests, this project might be worth a look. My favorite was the @Broken annotation that ignores a test case until a given date. It’s like an @Ignore with a best-before date.

There were the usual introductory talks, for example about CouchDB and git/Egit. They were well-executed, but lacked a certain thrill if you heard about the projects before.

As a personal summary, the Java world lacks the “next big thing” a bit.Two buzz products for the next year might be Eclipse Jubula (for UI testing) and Griffon (for desktop application development).

Conference review of the Karlsruhe Entwicklertag (developer day) 2011

The Karlsruhe Entwicklertag is another annual conference, spanning several days and presenting top-notch talks and sessions. It’s the first address for software developers in Karlsruhe that want to stay up to date with current topics and products.

Some topics were presented nearly identically to the Java Forum Stuttgart (but half a year earlier if that matters), while other tracks (like the Pecha Kucha talks) can only be found here.

The buzz product for the next year might be Gerrit (for code review) and Eclipse Jubula again (for UI testing).

As a personal summary, even this conference lacked a certain drive towards real new “big picture” topics. But maybe, that’s just allright given all the hype of the last years.

The GRASP principles

This topic contained hands-on software development knowledge about the nine principles named “GRASP” or General Responsibility Assignment Software Patterns/Principles. There is nothing really new about the GRASP principles, they will only give you common names for otherwise mostly unnamed best practices or fundamental design paradigms and patterns.

We even went through some educational slides that summarize the principles. The most discussion arose about the name “Pure Fabrication” for classes without a relation to the problem domain.

If you are an average experienced software developer, spend a few minutes and scan the GRASP principles so you can combine the name with the specific content.

First-hand experiences of combining work and children

We are well within the best age to raise children. So this topic gets a lot attention, specifically the actual tipps to survive the first two years with kids and how to interact with the different administrative bodies. Germany is a welfare state, but nobody claimed that welfare should be easy or logical. We’ve learned a lot about different reference dates and unusual time partitioning.

Another insight was that working less than 40 percent isn’t really worth the hassle. You are mostly inefficient and aware of it.

That’s all, folks

As always, we shared a lot more information and anecdotes. If you want to participate at one of our Dev Brunches, let us know. We are open for guests and really interested in your topics.

How I met my coding style

One of my students recently asked me where I got my coding style from. This blog post tries to answer that really good question.

One of my students recently asked a question that really stuck in my brain: “Where did you get your coding style from?”. The best part of the question was that I didn’t have an answer, until now. I care about coding style a lot and try to talk about it, too. Here are some coding style related blog posts to give you some examples:

Code squiggles were a crazy idea that really provided readability value even after the initial excitement was gone.

Readable code means that you can read the code out loud and it makes sense to (nearly) everyone.

The student wanted to know how to gather the experience to write readable, elegant or just plain crazy code. The answers he anticipated were books or “trial and error”. I’ve come to the conclusion that, while both sources provided enormous amounts of inspiration and knowledge, there’s another single vital ingredient that rounds up the mix: care.

The simple answer to the question is that I cared enough about coding style to gather some knowledge in this field. The more adequate answer is that a lot of factors helped me to improve my coding style.

Books with code examples

Early in my career, I was always looking for programming books with lots of code examples. This is a typical pattern for the novice and advanced beginner stages in the Dreyfus model of skill acquisition. I felt confident when I understood a piece of code and knew that I could produce something similar, given the proper amount of time.

It took a few years of exposure to actual real-life programming to discover that most code examples in books are rubbish. The code exists only as a “textbook example” for the given problem, it isn’t meant to be actually used outside the (often narrow) context. If you stick to copy&paste programming, your code will have the appeal of a patchwork clothing made out of rags. It might fulfill the requirement, but nothing more. You can’t learn style from programming books alone. This isn’t the fault of the books, by the way. Most programming books aren’t about style, but about programming or solving programmer’s problems.

There are a few precious exceptions from the general trend of bad code snippets. For example, Robert C. Martin’s “Clean Code” is a recent book with mostly well-done code listings. The best way to separate ugly code from nice one in books is to re-read them a few years later and try to improve the printed source code.

I encourage you to read as many books about programming as you can possibly handle. Even the bad ones will have an impact and inspire you in some way or the other. Books are like mentors, whereas good books are good mentors. Reading a programming book again after some time will show you how much knowledge you gained in the meantime and how much there’s still to be discovered.

Other people’s source code

The large amount of open source software available to be read in the original version is a gift to every programmer. You can dissect the source code of rock star programmers, scroll over the vast textual deserts of “enterprise code” or digest little gems of pure genius inside otherwise rather dull projects, all without additional cost except the time and attention you’re willing to invest.

When you reach a level of code reading when other people’s code “opens up” to you and you start to see the “deeper motives” or the “bigger picture”, whatever you call it, that’s a magical moment. Suddenly, you don’t read other people’s code, but other people’s minds as they lay out the fundamentals of their software. You’ll begin to read and understand code in larger and larger blocks and see structures that were right there before, but unbeknownst to you.

I doubt you can gain this ability by working with textbook examples, as they are restricted in scope by the very nature of limited print space and a specific topic. The one book that accomplishes something equivalent is “Growing Object-Oriented Software, Guided by Tests” by Steve Freeman and Nat Pryce. It’s a rare gem of truly holistic code examples mixed with the essence of many years of experience.

Other people

As beneficial as reading other people’s source code is, talking with them about it raises the whole experience to another level. If you can get in synch well enough to get past the initial buzzword bombing to show off your leetness, you can exchange coding experience worth many weeks in just a few minutes. And while you are talking with them, why not grab a notebook and type away your ideas?

Most programmers I’ve met, regardless of their skill level, could teach me something. And I’ve had several enlightening moments of novice programmers pointing out something or asking a question in a way that really inspired me. You’ll have to listen in order to learn, as painful or overwhelming as it may be sometimes.

Most people are very shy and insecure about their abilities. Encourage them to tell you more and show your interest in their work. Several noteworthy elements of my coding style were adopted late at night at a bar, when a fellow programmer finally had enough beer to brag about his code.

Own achievements

This one will hurt. Remember the times when you shouted out loud “what the f**k?” over some idiot’s source code? Let this idiot be yourself some months or a year ago. Read your own code. Refactor your own code. Rewrite your own code. It’s the same proceeding our brain performs when we dream at night: It rehashes old memories and experiences and lives through it again, in time lapse mode. As long as you don’t dream about programming (I’ve started to code in my dreams very early and it keeps getting more abstract over the time), the only way to rehash your old code it to live through it again by working with it again.

You’ll put many past achievements into perspective, remembering how proud you were and being embarrassed now. That’s part of the learning process and really shows your progress. Just be aware that today’s triumph will undergo the same transformation sooner or later.

Practice is a big part of gaining experience. And practicing means playing around, making errors, trying crazy new things and generally questioning everything you’ve learnt so far. Try to allocate as much practicing time as you can get (besides reading those books!) and really hone your skills. This works exceptionally well with other people, at a code camp, a code retreat, a hackathon, whatever you call it.

The right mindset

This is the secret ingredient to all the things listed above. Without the right mindset, everything in addition to your day-to-day work will appear like a chore. This doesn’t mean that you’ll have to sacrifice all your spare time to improve your coding style. It just means that you won’t mind reading a good book about programming at the weekend if you’re really enjoying it. You cannot learn this attitude from books or even blog posts, it’s a passion that you’ll have to develop on your own.

I can try to describe a big part of my passion: Nothing is good enough. There is never a “good enough”. I’m not falling into despair over it, it’s just an endless challenge for me. Tomorrow, I will be a little bit better than today. But even tomorrow, I’m not “good enough” and can still improve. Sometimes, my improvement rate is neglectable for a long time when suddenly an inspiration completely rearranges my (programming) world.

I made my last giant leaps in coding style when I deliberately avoided parts of my usual habits during programming and tried to focus on what I really wanted to do right now and how to express it in the best way possible (for me). The result was astonishing and humbling at the same time: there’s so much knowledge to gain, there’s so little I know. But yet, I keep getting better and that really makes me complete.

Old code: The StringChunker

This is a little story about a single piece of (java) code: Why it got written, how it got used, what happened after the initial usage and where it is today. At the end, you’ll get the full source code and a brainteaser.

This will be a little story about a single piece of code: Why it got written, how it got used, what happened after the initial usage and where it is today. At the end, you’ll get the full source code and a brainteaser.

Prelude

In the year 2004, a long-term customer asked us to develop a little data charting software for the web. The task wasn’t very complicated, but there were two hidden challenges. The first challenge was the data source itself that could have outages for various reasons that each needed to be addressed differently. The second, more subtle challenge was a “message from the operator” that should be displayed, but without the comments. Failing to meet any of these challenges would put the project at risk of usability.

On a side note, when the project was finished, the greatest risk to its usability wasn’t these challenges, but some assumptions made by the developers that turned out wrong, without proper test coverage or documentation. But that’s fodder for another blog post.

Why it got written

When addressing the functionality of the “message from the operator”, we developed it in a test-first manner, as the specification was quite clear: Everything after the first comment sign (“#”) must never be displayed on the web. Soon, we discovered a serious flaw (let’s call it a bug) in the java.util.StringTokenizer class we used to break down the string. Whenever the comment sign was the first character of the string, it just got ignored. This behaviour is still present with today’s JDK and will not be fixed, as StringTokenizer is a legacy class now:

public class LeadingDelimiterBug {
@Test
public void ignoresLeadingDelimiter() throws Exception {
StringTokenizer tokenizer = new StringTokenizer("#thisShouldn'tBeShown", "#");
assertEquals("", tokenizer.nextToken());
assertEquals("thisShouldn'tBeShown", tokenizer.nextToken());
}

String.split() wasn’t available in 2004, so we had to develop our own string partitioning functionality. It was my task and I named the class StringChunker. The class was born on a monday, 21.06.2004, coincidentally also the longest day of the year. I remember coding it until late in the night.

How it got used

The StringChunker class was developed test-first and suffered from feature creep early on. As it was planned as an utility class, I didn’t focus on the requirements at hand, but thought of “possibly needed functionality” and implemented those, too. The class soon had 9 member variables and over 250 lines of code. You could toggle between four different tokenizing modes like “ignore leading/trailing delimiters”, which ironically is exactly what the StringTokenizer does. The code was secured with tests that covered assumed use cases.

Despite the swiss army knife of string tokenizing that I created, the class only served to pick the comment apart from the payload of the operator’s message. If the special case of a leading comment sign would have been declared impossible (or ruled out beforehands), the StringTokenizer would have done the job just as good. Today, there is String.split() that handles the job decently:

public class LeadingDelimiterBug {
@Test
public void ignoresLeadingDelimiterWithSplit() throws Exception {
String[] tokens = "#thisShouldn'tBeShown".split("\\#");
assertEquals("", tokens[0]);
assertEquals("thisShouldn'tBeShown", tokens[1]);
}
 

But the StringChunker in summer 2004 was the shiny new utility class for the job. It got included in the project and known to the developers.

What happened afterwards

The StringChunker was a success in the project and soon was adopted to virtually every other project in our company. Several bugs and quirks were found (despite the unit tests, there were edge cases) and fixed. This lead to a multitude of slightly different implementations over the years. If you want to know what version of the class you’re using, you need to look at the test that covers all bugfixes (or lacks them).

Whenever one of our developers had to chop a string, he instantly imported the StringChunker to the project. Not long after, the class got promoted to be part of our base library of classes that serves as the foundation for every new project. Now the StringChunker was available like every class of java.lang or java.util and got used like a commodity.

Where it is today

When you compare the initial implementation with today’s code, there really isn’t much difference. Some methods got rewritten to conform to our recent taste of style, but the core of the class still is a hopeless mess of 25-lines-methods and a mind-boggling amount of member variables and conditional statements. I’m still a little bit ashamed to be the creator of such a beast, even if it’s not the worst code I’ve ever written (or will write).

The test coverage of the class never reached 100%, it’s at 95% with some lines lacking a test. This will be the topic of the challenge at the end of this blog post. The test code never got enough love to be readable. It’s only a wall of text in its current state. We can do better than that now.

The class is so ubiquitous in our code base that more than a dozen other foundation classes rely on it. If you would delete the class in a project of ours, it would definitely fall apart somewhere crucial. This will be the most important point in the conclusion.

The source

If you want to have a look at the complete source of the StringChunker, you can download the zip archive containing the compileable sources from our download server. Please bear in mind that we give out the code for educational purpose only. You are free to adapt the work to suit your needs, though.

An open question

When you look at the test coverage, you’ll notice that some lines aren’t tested. We have an internal challenge for several years now if somebody is able to come up with a test that covers these lines. It might be possible that these lines aren’t logically reachable and should be deleted. Or our test harness still has holes. The really annoying aspect about this is that we cannot just delete the lines and see what happens. Most of our ancient projects lack extensive test coverages, and even if they are tested, there could be a critical test missing, allowing the project to pass the tests but fail in production. It’s just too dangerous a risk to take.

So the challenge to you is: Can you provide test cases that cover the remaining lines, thus pushing the test coverage to 100%? I’m very eager to see your solution.

Conclusion

The StringChunker class is a very important class in our toolset. It’s versatile and well tried. But it suffered from feature creep from the very first implementation. There are too many different operation modes combined in one class, violating the Single Responsibility Principle and agglomerating complexity. The test coverage isn’t perfect, leaving little but enough room for speculative functionality (behaviour you might employ, presumably unaware of the fact that it isn’t guaranteed by tests). And while the StringChunker code got micro-refactored (and improved) several times over the years, the test code has a bad case of code rot, leaving it in a state of paralysis. Before the production code is changed in any manner, the test code needs to be overhauled to be readable again.

If I should weight the advantages provided by this class to the disadvantages and risks, I would consider the StringChunker a legacy risk. It might even be a technical debt, now that String.split() is available. The major pain point is that this class is used way too often given its poor code quality. With every new usage, the direct or assumed cost of code change rises. And the code has to change to comply to our current quality standards.

Finale

This was my confession about “old code” in a blog post series that was started by Volker with his blog post “Old Code”. As a personal statement: I’m embarrassed. I can vividly remember the feeling of satisfaction when this beast was completed. I’m guilty of promoting the code as a solution to every use case that could easily be implemented with a StringTokenizer or a String.split(), just because it is available, too and it contains my genius. After reviewing the code, I hope the bigger genius lies within avoiding the class in the future.

Bear up against static code analysis

If you ever had the urge to switch off a rule in your static code analysis tool, this article tries to convince you not to do it. By accepting challenges presented by your tools, you become a better developer and clean up your code on the run.

One of the first things we do when we join a team on a new (or existing) project is to set up a whole barrage of static code analysis tools, like Findbugs, Checkstyle or PMD for java (or any other for virtually every language around). Most of these tools spit out tremendous amounts of numbers and violated rules, totally overwhelming the team. But the amount of violations, (nearly) regardless how high it might be, is not the problem. It’s the trend of the violation curve that shows the problem and its solution. If 2000 findbugs violations didn’t kill your project yet, they most likely won’t do it in the future, too. But if for every week of development there are another 50 violations added to the codebase, it will become a major problem, sooner or later.

Visibility is key

So the first step is always to gain visibility, no matter how painful the numbers are. After the initial shock, most teams accept the challenge and begin to resolve issues in their codebase as soon as they appear and slowly decrease the violation count by spending extra minutes with fixing old code. This is the most valuable phase of static code analysis tools: It enables developers to learn from their mistakes (or goofs) without being embarrassed by a colleague. The analysis tool acts like a very strict and nit-picking code review partner, revealing every flaw in the code. A developer that embraces the changes implied by static analysis tools will greatly accelerate his learning.

But then, after the euphoric initial challenges that improve the code without much hassle, there are some violations that seem hard, if not impossible to solve. The developer already sought out his journey to master the tool, he cannot turn around and just leave these violations in the code. Surely, the tool has flaws itself! The analysis brought up a false positive here! This isn’t faulty code at all, it’s just an overly pedantic algorithm without taste for style that doesn’t see the whole picture! Come to think about it, we have to turn off this rule!

Leave your comfort zone

When this stage is reached, the developers have a deep look into the tool’s configuration and adjust every nut and bolt to their immediate skill level. There’s nothing wrong with this approach if you want to stay on your skill level. But you’ll miss a chance to greatly improve your coding skills by allowing the ruleset to be harder than you can cope with now. Over time, you will come up with solutions you now thought are impossible. It’s like fitness training for your coding skills, you should raise the bar every now and then. Unlike fitness training, nobody gets hurt if the numbers of your code analysis show more violations than you can fix up right now. The violations are in the code, if you let them count or not.

Once, a fellow developer complained really loud about a specific rule in a code analysis tool. He was convinced that the rule was pointless and should be switched off. I asked about a specific example where this rule was violated in his code. When reviewing the code, I thought that applying the rule would improve the code’s internal structure (it was a rule dealing with collapsible conditional statements). In the discussion on how to implement the code block without violating the rule, the real problem showed up – my colleague couldn’t think about a solution to the challenge. So we proceeded to implement the code block in a dozen variations, each without breaking the rule. After the initial few attempts that I had to lead program for him, he suddenly came up with even more solutions. It was as if a switch snapped in his head, from “I’m unable to resolve this stupid rule” to “Hey, if we do it this way, we even can get rid of this local variable”.

Embrace challenges

Don’t trick yourself into thinking that just because your analysis tool doesn’t bring up these esoteric violations anymore after you switched off the rules, they are gone. They are still in your code, just hidden and without your awareness. Bear up against your analysis tool and fix every violation it brings you, one after the other. The tools aren’t there to annoy you, they want to help you stay clear of trouble by pointing out the flaws in a clear and precise manner. Once you meet the challenges the tool presents you with, your skill level will increase automatically. And as a side effect, your code becomes cleaner.

Beyond clean code

Even if every analysis tool approves your code as being clean, it can still be improved. You might have a look at Object Calisthenics or similar code training rulesets. They work the same way as the analysis tools, but without the automatic enforcement (yet). The goal is always cleaner code and higher skilled developers.