Old code: The StringChunker

This is a little story about a single piece of (java) code: Why it got written, how it got used, what happened after the initial usage and where it is today. At the end, you’ll get the full source code and a brainteaser.

This will be a little story about a single piece of code: Why it got written, how it got used, what happened after the initial usage and where it is today. At the end, you’ll get the full source code and a brainteaser.

Prelude

In the year 2004, a long-term customer asked us to develop a little data charting software for the web. The task wasn’t very complicated, but there were two hidden challenges. The first challenge was the data source itself that could have outages for various reasons that each needed to be addressed differently. The second, more subtle challenge was a “message from the operator” that should be displayed, but without the comments. Failing to meet any of these challenges would put the project at risk of usability.

On a side note, when the project was finished, the greatest risk to its usability wasn’t these challenges, but some assumptions made by the developers that turned out wrong, without proper test coverage or documentation. But that’s fodder for another blog post.

Why it got written

When addressing the functionality of the “message from the operator”, we developed it in a test-first manner, as the specification was quite clear: Everything after the first comment sign (“#”) must never be displayed on the web. Soon, we discovered a serious flaw (let’s call it a bug) in the java.util.StringTokenizer class we used to break down the string. Whenever the comment sign was the first character of the string, it just got ignored. This behaviour is still present with today’s JDK and will not be fixed, as StringTokenizer is a legacy class now:

public class LeadingDelimiterBug {
@Test
public void ignoresLeadingDelimiter() throws Exception {
StringTokenizer tokenizer = new StringTokenizer("#thisShouldn'tBeShown", "#");
assertEquals("", tokenizer.nextToken());
assertEquals("thisShouldn'tBeShown", tokenizer.nextToken());
}

String.split() wasn’t available in 2004, so we had to develop our own string partitioning functionality. It was my task and I named the class StringChunker. The class was born on a monday, 21.06.2004, coincidentally also the longest day of the year. I remember coding it until late in the night.

How it got used

The StringChunker class was developed test-first and suffered from feature creep early on. As it was planned as an utility class, I didn’t focus on the requirements at hand, but thought of “possibly needed functionality” and implemented those, too. The class soon had 9 member variables and over 250 lines of code. You could toggle between four different tokenizing modes like “ignore leading/trailing delimiters”, which ironically is exactly what the StringTokenizer does. The code was secured with tests that covered assumed use cases.

Despite the swiss army knife of string tokenizing that I created, the class only served to pick the comment apart from the payload of the operator’s message. If the special case of a leading comment sign would have been declared impossible (or ruled out beforehands), the StringTokenizer would have done the job just as good. Today, there is String.split() that handles the job decently:

public class LeadingDelimiterBug {
@Test
public void ignoresLeadingDelimiterWithSplit() throws Exception {
String[] tokens = "#thisShouldn'tBeShown".split("\\#");
assertEquals("", tokens[0]);
assertEquals("thisShouldn'tBeShown", tokens[1]);
}
 

But the StringChunker in summer 2004 was the shiny new utility class for the job. It got included in the project and known to the developers.

What happened afterwards

The StringChunker was a success in the project and soon was adopted to virtually every other project in our company. Several bugs and quirks were found (despite the unit tests, there were edge cases) and fixed. This lead to a multitude of slightly different implementations over the years. If you want to know what version of the class you’re using, you need to look at the test that covers all bugfixes (or lacks them).

Whenever one of our developers had to chop a string, he instantly imported the StringChunker to the project. Not long after, the class got promoted to be part of our base library of classes that serves as the foundation for every new project. Now the StringChunker was available like every class of java.lang or java.util and got used like a commodity.

Where it is today

When you compare the initial implementation with today’s code, there really isn’t much difference. Some methods got rewritten to conform to our recent taste of style, but the core of the class still is a hopeless mess of 25-lines-methods and a mind-boggling amount of member variables and conditional statements. I’m still a little bit ashamed to be the creator of such a beast, even if it’s not the worst code I’ve ever written (or will write).

The test coverage of the class never reached 100%, it’s at 95% with some lines lacking a test. This will be the topic of the challenge at the end of this blog post. The test code never got enough love to be readable. It’s only a wall of text in its current state. We can do better than that now.

The class is so ubiquitous in our code base that more than a dozen other foundation classes rely on it. If you would delete the class in a project of ours, it would definitely fall apart somewhere crucial. This will be the most important point in the conclusion.

The source

If you want to have a look at the complete source of the StringChunker, you can download the zip archive containing the compileable sources from our download server. Please bear in mind that we give out the code for educational purpose only. You are free to adapt the work to suit your needs, though.

An open question

When you look at the test coverage, you’ll notice that some lines aren’t tested. We have an internal challenge for several years now if somebody is able to come up with a test that covers these lines. It might be possible that these lines aren’t logically reachable and should be deleted. Or our test harness still has holes. The really annoying aspect about this is that we cannot just delete the lines and see what happens. Most of our ancient projects lack extensive test coverages, and even if they are tested, there could be a critical test missing, allowing the project to pass the tests but fail in production. It’s just too dangerous a risk to take.

So the challenge to you is: Can you provide test cases that cover the remaining lines, thus pushing the test coverage to 100%? I’m very eager to see your solution.

Conclusion

The StringChunker class is a very important class in our toolset. It’s versatile and well tried. But it suffered from feature creep from the very first implementation. There are too many different operation modes combined in one class, violating the Single Responsibility Principle and agglomerating complexity. The test coverage isn’t perfect, leaving little but enough room for speculative functionality (behaviour you might employ, presumably unaware of the fact that it isn’t guaranteed by tests). And while the StringChunker code got micro-refactored (and improved) several times over the years, the test code has a bad case of code rot, leaving it in a state of paralysis. Before the production code is changed in any manner, the test code needs to be overhauled to be readable again.

If I should weight the advantages provided by this class to the disadvantages and risks, I would consider the StringChunker a legacy risk. It might even be a technical debt, now that String.split() is available. The major pain point is that this class is used way too often given its poor code quality. With every new usage, the direct or assumed cost of code change rises. And the code has to change to comply to our current quality standards.

Finale

This was my confession about “old code” in a blog post series that was started by Volker with his blog post “Old Code”. As a personal statement: I’m embarrassed. I can vividly remember the feeling of satisfaction when this beast was completed. I’m guilty of promoting the code as a solution to every use case that could easily be implemented with a StringTokenizer or a String.split(), just because it is available, too and it contains my genius. After reviewing the code, I hope the bigger genius lies within avoiding the class in the future.

One thought on “Old code: The StringChunker”

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s

This site uses Akismet to reduce spam. Learn how your comment data is processed.