Test your migrations

Do you trust your database migrations?

An evolving project that changes its persistent data structure can require a transformation of already existing content into the new form. To achieve this goal in our grails projects we use a grails database migration plugin. This plugin allows us to apply changesets to the database and keep track of its current state.

The syntax of the DSL for groovy database migrations is easy to read. This can trick you into the assumption that everything that looks good, compiles and runs without errors is OK. Of course it is not. Here is an example:

changeSet(author: 'vasili', id: 'copies messages to archive') {
  grailsChange {
    change {
      sql.eachRow("SELECT MESSAGES.ID, MESSAGES.CONTENT, "
                + "MESSAGES.DATE_SENT FROM MESSAGES WHERE "
                + "MESSAGES.DATE_SENT > to_date('2011-01-01 00:00', "
                + "'YYYY-MM-DD HH24:MI:SS')") { row ->
        sql.execute("INSERT INTO MESSAGES_ARCHIVE(ID, CONTENT, DATE_SENT) "
                  + "VALUES(${row.id}, ${row.content}, ${row.date_sent})")
      }
    }
    change {
      sql.eachRow("SELECT MESSAGES.ID, MESSAGES.CONTENT, "
                + "MESSAGES.DATE_SENT FROM MESSAGES WHERE "
                + "MESSAGES.DATE_SENT > to_date('2012-01-01 00:00', "
                + "'YYYY-MM-DD HH24:MI:SS')") { row ->
        sql.execute("INSERT INTO MESSAGES_ARCHIVE(ID, CONTENT, DATE_SENT) "
                + "VALUES(${row.id}, ${row.content}, ${row.date_sent})")
      }
    }
  }
}

Here you see two change closures that differ only by the year in the SQL where clause. What do you think will happen with your database when this migration is applied? The answer is: only changes from the year 2012 will be found in the destination table. The assumption that when there is one change closure in the grailsChange block there can also be two changes in it is, while compilable and runnable, wrong. Loking at the documentation you will see that it shows only one change block in the example code. When you divide the migration into multiple parts, each of them working on their own change, everything will work as expected.

Currently there is no safety net like unit tests for database migrations. Every assumption you make must be tested manually with some dummy test data.

Ugly problems, ugly solutions?

Do have workarounds to be worse than the problems?

One type of our projects is to integrate some devices into our customers infrastructure. The tasks then mostly consist of writing bridging code for third party libs of the hardware vendor. The most fun part is when the libs do not have some needed capability or feature.

The situation

In my case I was building a device driver with following requirements:

  • asynchronous execution of long running tasks.
  • ability to cancel long running tasks.
  • at any time it is asked for its current status, it has to provide it.

The device is accompanied by a DLL with a following interface(simplified):

  • doWork(), a blocking function that returns after a configurable amount of time that can range from milliseconds to hours.
  • abortWork(), is supposed to cancel the process triggered by doWork() and to make doWork() return earlier.

First impressions

I was able to fullfill two requirements pretty fast. The state ist more or less a simple getter and the doWork function was called in a separate thread. Just cancelling the execution didn’t work. More precisely it didn’t work as expected. In the time between a call to doWork() and the moment it returned, the process always used 100% of one CPU core. After that it always dropped to nearly zero. Now, what happened, when I called abortWork()? There were two things: doWork() returned, but the CPU utilization stayed the same for an indefinite amount of time. Or the call was ignored completely. Especially funny was the first case, where the API seemed to work until the process run out of cores and the system practically grinded to a halt.

The “Solution”

Banging my head against the desk didn’t help, so my first thought was to forget abortWork() and kill the thread myself. Microsoft provides a nice function called TerminateThread for that purpose. Everyone who looks at the documentation, will see that the list of side effects is quite impressive, memory leaks being the least bad ones. I couldn’t guarantee that the application would work afterwards, so I decided against it. What would be the alternative? Process shutdown. When you stop the process all blocked threads should be away. Being too soft and trying to unload the DLL is a bad idea – you have a deadlock when the DllMain waits for the worker thread to finish. My last attempt was to suicide the process!

Now I was able to abort a running task, but my app were no longer available all the time. Every attempt to get the current status between the start of a shutdown and a completed startup failed. So a semi-persistent storage containing the last status of a living application was needed. To achieve this, I created an application with the same interface as the real device driver and proxy that delegated all the requests to it, caching the status responses. That way the polling application still assumed that the last action were still running until the restarting app was fully available again.

In the end the solution consisted of two device drivers, one for caching the state and the other for doing the work. When cancelling the task was required, the latter device driver died and restarted itself again.

Final thoughts

I hope that there is a way to do this in a more elegant way and I just overlooked some facts. It is unbelievable that you can lose all control over your app by a simple call to a third party library and that the only escape is death.

TDD: avoid getting stuck or what’s the next test?

One central point of practicing TDD is to determine what is the next test. Choosing the wrong path can lead you into the infamous impasse

One central point of practicing TDD is to determine what is the next test. Choosing the wrong path can lead you into the infamous impasse: to make the next test pass you need to make not baby but giant steps. Some time ago Uncle Bob introduced a principle called the transformation priority premise. To make a test pass you need to change the implementation. These changes are transformations. There are at least the following transformations (taken from his blog post):

  • ({}–>nil) no code at all->code that employs nil
  • (nil->constant)
  • (constant->constant+) a simple constant to a more complex constant
  • (constant->scalar) replacing a constant with a variable or an argument
  • (statement->statements) adding more unconditional statements.
  • (unconditional->if) splitting the execution path
  • (scalar->array)
  • (array->container)
  • (statement->recursion)
  • (if->while)
  • (expression->function) replacing an expression with a function or algorithm
  • (variable->assignment) replacing the value of a variable.

To determine what the next test should be you look at the possible next tests and the changes in the implementation necessary to make that test pass. The required transformations should be as high in the list as possible. If you always choose the test which causes the highest transformations you avoid getting stuck, the impasse.
This seems to work but I think this is pretty complicated and expensive. Shouldn’t there be an easier way?
Let’s take a look at his case study: the word wrap kata. Word wrap is a function which takes two parameters: a string, and a column number. It returns the string, but with line breaks inserted at just the right places to make sure that no line is longer than the column number. You try to break lines at word boundaries.
The first three tests (nil, empty string and one word which is shorter than the wrap position) are obvious and easy but the next test can lead to an impasse:

@Test
public void twoWordsLongerThanLimitShouldWrap() throws Exception {
  assertThat(wrap("word word", 6), is("word\nword"));
}

With the transformation priority premise you can “calculate” that this is the wrong test and another one is simpler meaning needs transformations higher in the list. But let me introduce another concept: the facets or dimensions of tests.
Each test in a TDD session tests another facet of your problem. And only one more. What a facet is is determined by the problem domain. So you need some domain knowledge but usually to solve that problem you need this nevertheless. Back to the word wrap example: what is a facet? The first test tests the nil input, it changes one facet. The empty input test changes another facet. Then comes one word shorter than the wrap position (one facet changed again) and the fourth test uses two words longer than the wrap position. See it? The fourth tests introduces changes in two facets: one word to two word and shorter to longer than. So what can you do instead? Just change one facet. According to this the next test would be to use one word longer than the wrap position (facet: longer) which is proposed as a solution. Or you can use two words shorter than the wrap position (facet: word count) but this test will just pass without modifications to the implementation code. So facets of the word wrap kata could be: word count, shorter/longer, number of breaks, break position.
I know this is a very informal way of finding the next tests. It leans on your experience and domain knowledge. But I think it is less expensive than the transformations. And even better it can be combined with the transformation priority premise to check and verify your decisions.
What are you experiences with getting stuck in TDD? Do you think the proposed facets of TDD could be of help? Is it too informal? Too vague?

Building Visual C++ Projects with CMake

In previous post my colleague showed how to create RPM packages with CMake. As a really versatile tool it is also able to create and build Visual Studio projects on Windows. This property makes it very valuable when you want to integrate your project into a CI cycle(in our case Jenkins).

Prerequisites:

To be able to compile anything following packages needed to be installed beforehand:

  •  CMake. It is helpful to put it in the PATH environment variable so that absolute paths aren’t needed.
  • Microsoft Windows SDK for Windows 7 and .NET Framework 4 (the web installer or  the ISOs).  The part “.NET Framework 4” is very important, since when the SDK for the .NET Framework 3.5 is installed you will get following parse error for your *.vcxproject files:

    error MSB4066: The attribute “Label” in element is unrecognized

    at the following position:

    <ItemGroup Label=”ProjectConfigurations”>

    Probably equally important is the bitness of the installed SDK. The x64 ISO differs only in one letter from the x86 one. Look for the X if want 64 bit.

  • .NET Framework 4, necessary to make msbuild run

It is possible that you encounter following message during your SDK setup:

A problem occurred while installing selected Windows SDK components. Installation of the “Microsoft Windows SDK for Windows 7” product has reported the following error: Please refer to Samples\Setup\HTML\ConfigDetails.htm document for further information. Please attempt to resolve the problem and then start Windows SDK setup again. If you continue to have problems with this issue, please visit the SDK team support page at http://go.microsoft.com/fwlink/?LinkId=130245. Click the View Log button to review the installation log. To exit, click Finish.

The reason behind this wordy and less informative error message were the Visual C++ Redistributables installed on the system. As suggested by Microsoft KB article removing them all helped.

Makefiles:

For CMake to build anything you need to have a CMakeLists.txt file in your project. For a tutorial on how to use CMake, look at this page. Here is a simple CMakeLists.txt to get you started:

project(MyProject)
 cmake_minimum_required(VERSION 2.6)
 set(source_files
 main.cpp
 )
 include_directories(
 ${CMAKE_CURRENT_SOURCE_DIR}
 )
 add_executable(MyProject ${source_files})

Building:

To build a project there are few steps necessary. You can enter them in your CI directy or put them in a batch file.

call "%ProgramFiles%\Microsoft SDKs\Windows\v7.1\Bin\SetEnv.cmd" /Release /x86

With this call all necessary environment variables are set. Be careful on 64 bit platforms as jenkins slave executes this call in a 32 bit context and so “%ProgramFiles%” is resolved to “ProgramFiles (x86)” where the SDK does not lie.

del CMakeCache.txt

This command is not strictly necessary, but it prevents you from working with outdated generated files when you change your configuration.

cmake -G "Visual Studio 10" .

Generates a Visual Studio 2010 Solution. Every change to the solution and the project files will be gone when you call it, so make sure you track all necessary files in the CMakeLists.txt.

cmake --build . --target ALL_BUILD --config Release

The final step. It will net you the MyProject.exe binary. The target parameter is equal to the name of the project in the solution and the config parameter is one of the solution configurations.

Final words:

The hardest and most time consuming part was the setup of prerequisites. Generic, not informative error messages are the worst you can do to a clueless customer. But… when you are done with it, you are only two small steps apart from an automatically built executable.

Build a RPM package using CMake

Some while ago I presented a way to package projects using different build systems as RPM packages. If you are using CMake for your projects you can use CPack to build RPM packages (in addition to tarballs, NSIS installers, deb packages and so on). This is a really nice option for deployment of your own projects because installation and update can be easily done by the users using familiar package management tools like zypper, yum and yast2.

Your first CPack RPM

It is really easy to add RPM using CPack to your existing project. Just set the mandatory CPack variables and include CPack below the variable definitions, usually as one of the last steps:

project (my_project)
cmake_minimum_required (VERSION 2.8)

set(VERSION "1.0.1")
<----snip your usual build instructions snip--->
set(CPACK_PACKAGE_VERSION ${VERSION})
set(CPACK_GENERATOR "RPM")
set(CPACK_PACKAGE_NAME "my_project")
set(CPACK_PACKAGE_RELEASE 1)
set(CPACK_PACKAGE_CONTACT "John Explainer")
set(CPACK_PACKAGE_VENDOR "My Company")
set(CPACK_PACKAGING_INSTALL_PREFIX ${CMAKE_INSTALL_PREFIX})
set(CPACK_PACKAGE_FILE_NAME "${CPACK_PACKAGE_NAME}-${CPACK_PACKAGE_VERSION}-${CPACK_PACKAGE_RELEASE}.${CMAKE_SYSTEM_PROCESSOR}")
include(CPack)

These few lines should be enough to get you going. After that you can execute a make package command should obtain the RPM package.

Spicing up the package

RPM packages can contain much more metadata and especially package dependencies and a version changelog. Most of the stuff can be specified using CPACK variables. We sometimes prefer to use a SPEC file template to be filled and used by CPack because it then contains most of the RPM specific stuff in a familiar manner instead of polluting the CMakeLists.txt itself:

project (my_project)
<----snip your usual CMake stuff snip--->
<----snip your additional CPack variables snip--->
configure_file("${CMAKE_CURRENT_SOURCE_DIR}/my_project.spec.in" "${CMAKE_CURRENT_BINARY_DIR}/my_project.spec" @ONLY IMMEDIATE)
set(CPACK_RPM_USER_BINARY_SPECFILE "${CMAKE_CURRENT_BINARY_DIR}/my_project.spec")
include(CPack)

The variables in the RPM SPEC file will be replaced by the values provided in the CMakeLists.txt and then be used for the RPM package. It looks very similar to a standard SPEC file but you can omit the usual build instructions boiling down to something like this:

Buildroot: @CMAKE_CURRENT_BINARY_DIR@/_CPack_Packages/Linux/RPM/@CPACK_PACKAGE_FILE_NAME@
Summary:        My very cool Project
Name:           @CPACK_PACKAGE_NAME@
Version:        @CPACK_PACKAGE_VERSION@
Release:        @CPACK_PACKAGE_RELEASE@
License:        GPL
Group:          Development/Tools/Other
Vendor:         @CPACK_PACKAGE_VENDOR@
Prefix:         @CPACK_PACKAGING_INSTALL_PREFIX@
Requires:       opencv >= 2.4

%define _rpmdir @CMAKE_CURRENT_BINARY_DIR@/_CPack_Packages/Linux/RPM
%define _rpmfilename @CPACK_PACKAGE_FILE_NAME@.rpm
%define _unpackaged_files_terminate_build 0
%define _topdir @CMAKE_CURRENT_BINARY_DIR@/_CPack_Packages/Linux/RPM

%description
Cool project solving the problems of many colleagues.

# This is a shortcutted spec file generated by CMake RPM generator
# we skip _install step because CPack does that for us.
# We do only save CPack installed tree in _prepr
# and then restore it in build.
%prep
mv $RPM_BUILD_ROOT @CMAKE_CURRENT_BINARY_DIR@/_CPack_Packages/Linux/RPM/tmpBBroot

%install
if [ -e $RPM_BUILD_ROOT ];
then
  rm -Rf $RPM_BUILD_ROOT
fi
mv "@CMAKE_CURRENT_BINARY_DIR@/_CPack_Packages/Linux/RPM/tmpBBroot" $RPM_BUILD_ROOT

%files
%defattr(-,root,root,-)
@CPACK_PACKAGING_INSTALL_PREFIX@/@LIB_INSTALL_DIR@/*
@CPACK_PACKAGING_INSTALL_PREFIX@/bin/my_project

%changelog
* Tue Jan 29 2013 John Explainer <john@mycompany.com> 1.0.1-3
- use correct maintainer address
* Tue Jan 29 2013 John Explainer <john@mycompany.com> 1.0.1-2
- fix something about the package
* Thu Jan 24 2013 John Explainer <john@mycompany.com> 1.0.1-1
- important bugfixes
* Fri Nov 16 2012 John Explainer <john@mycompany.com> 1.0.0-1
- first release

Conclusion

Integrating RPM (or other package formats) to your CMake-based build is not as hard as it seems and quite flexible. You do not need to rely on the tools provided by your OS vendor and still deliver your software in a way your users are accustomed to. This makes CPack very continuous integration (CI) friendly too!

Java Generics: the Klingonian Cast

Struck by Java generic’s odd type erasure behaviour again? You can circumvent the missing upcast feature by using the Klingonian Cast.

Klingon_by_Balsavor

Ever since Generics were included in Java, they’ve been a great help and source of despair at once. One thing that most newcomers will stumble upon sooner or later is “Type Erasure” and its effects. You may read about it in the Java Tutorial and never quite understand it, until you encounter it in the wild (in your code) and it just laughs at your carefully crafted type system construct. This is the time when you venture into the deep end of the Java language specification and aren’t seen for days or weeks. And when you finally reappear, you are a broken man – or a strong warrior, even stronger than before, charged with the wisdom of the ancients.

The problem

If my introduction was too mystic for your taste – bear with me. The rest of this blog post is rather technical and bleak. It won’t go into the nitty-gritty details of Java generics or type erasure, but describe a real-world problem and one possible solution. The problem can be described by a few lines of code:


List<Integer> integers = new ArrayList<Integer>();
Iterable<Integer> iterable = integers;
Iterable<Number> numbers = integers; // Damn!

The last line won’t compile. Let’s examine it step by step:

  • We create a list of Integers
  • The list can be (up-)casted into an Iterable of Integers. Lists are/behave like Iterables.
  • But the list cannot be casted into an Iterable of Number, even though Integers are/behave like Numbers.

The compiler error message isn’t particularly helpful here:

Type mismatch: cannot convert from List<Integer> to Iterable<Number>

This is when we remember one thing about Java Generics: They aren’t exactly variant. While they have “use-site variance”, we are in need of “declaration-site variance” here, which Java Generics lack entirely. Don’t despair, this was all the theoretical discussion about the topic for today. If you want to know more, just ask in the comment section. Perhaps we can provide another blog post discussing just the theory.

The workaround

In short, our problem is that Java is unable to see the relationship between the types Integer and Number when given as generic parameter. But we can make it see:


List<Integer> integers = new ArrayList<Integer>();
List<Number> numberList = new ArrayList<Number>();
numberList.addAll(integers);
Iterable<Number> numbers = numberList;

This will compile and work. I’ve split the creation and filling of the second List into two steps to make more clear what’s happening: By explicitely creating a new collection and (up-)casting every element of the List alone, we can show the compiler that everything’s ok.

The Klingonian Cast

Well, if the compiler wants to see every element of our initial collection to be sure about upcasting, we should show him. But why create a new List and swap the elements by hand every time, when we can just use the “Klingonian Cast“? Ok, I’ve made the name up. But how else would you call a structure that’s essentially an upcast, but using two generic parameters and a dozen lines of code if not something very manly and bold. But enough mystery again, let’s look at the code:


List<Integer> integers = new ArrayList<Integer>();
Iterable<Number> numbers = MakeIterable.<Number>outOf(integers);

The good thing about the Klingonian cast is that it has a very thin footprint at runtime. Your hotspot compiler might even eliminate it completely. But you probably don’t want to hear about it characteristics, but see the implementation:


public class MakeIterable {
  public static <T> Iterable<T> outOf(final Iterable<? extends T> iterable) {
    return new Iterable<T>() {
      @Override
      public Iterator<T> iterator() {
        return iteratorOutOf(iterable.iterator());
      }
    };
  }

  protected static <T> Iterator<T> iteratorOutOf(final Iterator<? extends T> iterator) {
    return new Iterator<T>() {
      @Override
      public boolean hasNext() {
        return iterator.hasNext();
      }
      @Override
      public T next() {
        return iterator.next();
      }
      @Override
      public void remove() {
        iterator.remove();
      }
    };
  }
}

That’s it. A “simple” upcast for Java Generics, ready to use it for your own convenience. Enjoy!

FTP integrated

When developing a feature containing unknown technology or hardware, I prefer a spike followed by integration tests. Sometimes it helps a lot.

How it all began
One of our customers employs NAS for data storage, accessing it per FTP. Some of the features like copying and moving files around were already implemented by us using Apaches FTPClient. The next feature on the list was “cleanup after x days” – deletion of files, or more important: directories. FTP, being a pretty basic protocol, does not allow for recursive deletion of directories. The only way to do it is to delete the deepest elements first,  going up one level and repeat – or in other words – implementing the recursion yourself. This was too much for our simple feature, so the decision was made to hide the complexity behind a VirtualFile, an interface already existing in our framework.

Being a novice in speaking FTP I was happy to hear that we already have acquired exactly the same type of NAS the customer has. To see how the system behaves (or not) and document it at the same time, I decided to implement the interface integration test first.

Fun
As the amount of tests and file operations started to grow, so did grow the round trip time of my test/make test pass/refactor cycle and my patience dwindled. I switched from NAS FTP-Server to a local FileZilla FTP-Server. It worked like a charm and all necessary features were implemented really fast.

The next step was to run the app using the new feature with real amount of data, real directory structure and our NAS. It failed miserably. And randomly. The app suffered from closed connections while trying to open a data connection. After some search the reason was found: FTPClient we use had active mode enabled by default. That means that to transfer data the server tried to connect to the client and the clients Firewall did not like it. After setting connection mode to passive the problem was solved.

The tests run fine, but they run slow. And they introduced a dependency on an external system. If that system broke or were disabled for any other reason, our CI would report failure without any changes in the code. Both points could be addressed by using an embedded FTP Server. We choose Apaches FTP Server. Changing the tests was easy, since the only thing to do was to setup the server before the test and to shut it down afterwards. Surprisingly some tests failed. Apaches server handled some cases differently:

  • it allowed opening output streams to directories without any exception
  • it forbid to delete current working directory
  • the name listing in the directory (NLST) returned by NAS were absolute paths to the file, Apaches server returned simple names.

After another code change the code worked correctly with all three servers.

Lessons learned
While implementing the interface I learned much about how to create and test bridging functionality:

  • Specification cannot replace tests. Searching for the FTP commands to use I looked at several websites that described the commands. None of them wrote about whether NLST returns absolute paths or only filenames. There are always holes in the spec that will be interpreted differently by vendors or the vendors do not always obey it.
  • Unit tests are great, but they are limited to your code only. When it comes to communication between system components, especially communication with foreign systems, an integration test is a must.
  • Working with a test setup that mimics production environment as close as possible is great. Without the NAS, the app would have simply failed in the best case. In the worst case it would have deleted wrong files. Neither of them make a customer happy.

Aspects done right: Concerns

With aspects you cannot see (without sophisticated IDE support) which class has which aspects and which aspects are woven into the class when looking at its source. Here concerns (also called mixins or traits) come to the rescue.

The idea of encapsulating cross cutting concerns struck with me from the beginning but the implementation namely the aspects lacked clarity in my opinion. With aspects you cannot see (without sophisticated IDE support) which class has which aspects and which aspects are woven into the class when looking at its source. Here concerns (also called mixins or traits) come to the rescue. I know that aspects were invented to hide away details about which code is included and where but I find it confusing and hard to trace without tool support.

Take a look at an example in Ruby:

module Versionable
  extend ActiveSupport::Concern

  included do
    attr_accessor :version
  end
end

class Document
  include Versionable
end

Now Document has a field version and is_a?(Versionable) returns true. For clients it looks like the field version is in Document itself. So for clients of this class it is the same as:

class Document
  attr_accessor :version
end

Furthermore you can easily use the versionable concern in another class. This sounds like a great implementation of the separating of concerns principle but why isn’t everyone using it (besides being a standard for the upcoming Rails 4)? Well, some people are concerned with concerns (excuse the pun). As with every powerful feature you can shoot yourself in the foot. Let’s take a look at each problem.

  • Diamond problem aka multiple inheritance
  • Ruby has no multiple inheritance. Even when you include more than one module the modules are like superclasses for the message resolve order. Every include creates a new “superclass” above the including class. So the last include takes precedence.

  • Dependencies between concerns
  • You can have dependencies between different concerns like this concern needs another concern. ActiveSupport:Concerns handles these dependencies automatically.

  • Unforeseeable results
  • One last big problem with concerns is having side effects from combining two concerns. Take for an example two concerns which add a method with the same name. Including both of them renders one concern unusable. This cannot be solved technically but I also think this problem shows an underlying, more important cause. It could be because of poor naming. Or you did not separate these two concerns enough. As always tests can help to isolate and spot the problem. Also concerns should be tested in isolation and in integration.

A small test saves the day

You think a method is too trivial to write a test for it? Think again if the method is mission-critical!

Just recently, I had to write a connection between an existing application and a new hardware unit. This is a fairly common job for our company, even considering the circumstances that I’d never even seen the hardware, let alone being able to connect to it. The hardware unit itself was rather big and it was installed in a security sensitive area with restricted access. So, I only got a specification of the protocol to use and a description of the hardware’s features.

Our common procedure to include hardware dependent modules into an application is to write two implementations of the module: One implementation is the real deal and interacts with the hardware over ethernet, USB, serial port or whatever proprietary communication device is used. This version of the module can only work as intended if the hardware is present. The other implementation acts as an emulation of the hardware, without any dependencies. If you are familiar with unit tests, think of a big test mock. The emulation version is used during development to test and run the application without requirements about the hardware. There are a lot of subtle pitfalls to consider and avoid, but on a bird-view level of abstraction, these interchangeable implementations of a module enable us to develop software with hardware dependencies without need for the actual hardware.

The first piece of code that’s used of a module is a factory/builder class that chooses between the available implementations, based on some configuration entry (or hardware availability, etc.). A typical implementation of the responsible method might look like this:


public HardwareModule createFor(ModuleConfiguration configuration) {
  if (configuration.isHardwarePresent()) {
    new RealHardwareModule();
  }
  return new EmulatedHardwareModule();
}

If the configuration object says that the hardware is present, the real implementation is used, subsequentially opening a connection to the hardware and talking the client side of the given protocol. Otherwise, the emulation is created and returned, maybe opening a debug GUI window to display certain internal states and values and providing controls to mess with the application during development.

The method itself looks very innocent and meager. There is not much going on, so what could possibly go wrong?

I’m not the most eager test-driven developer in the world, I have to admit. But I see the value of tests (and unit tests in particular) and adhere to the A-TRIP rules defined by Andy Hunt and (pragmatic) Dave Thomas:

  • Automatic
  • Thorough
  • Repeatable
  • Independent
  • Professional

For a complete definition of the rules, read the linked blog entry or, even better, buy the book. It’s small and cheap, but contains a lot of profound basic knowledge about unit testing.

The “Thorough” rule is more of a rule of thumb than a hard scientific formula for good unit tests: Always write a test if you’ve found a bug or if the code you’re writing is mission-critical. This was when my gut feeling told me that while the method above might seem trivial, it is definitely essential for the hardware module. So I wrote a test:

  @Test
  public void providesEmulationIfUnspecified() {
    HardwareModuleFactory factory = new HardwareModuleFactory();
    HardwareModule hardware = factory.createFor(configuration(""));
    assertEquals("not the hardware emulation", EmulatedHardwareModule.class, hardware.getClass());
  }

  @Test
  public void providesEmulationIfHardwareAbsent() {
    HardwareModuleFactory factory = new HardwareModuleFactory();
    HardwareModule hardware = factory.createFor(configuration("hardware.present=false"));
    assertEquals("not the hardware emulation", EmulatedHardwareModule.class, hardware.getClass());
  }

  @Test
  public void providesRealImplementationIfHardwarePresent() {
    HardwareModuleFactory factory = new HardwareModuleFactory();
    HardwareModule hardware = factory.createFor(configuration("hardware.present=true"));
    assertEquals("not the real hardware implementation", RealHardwareModule.class, hardware.getClass());
  }

To my surprise, the test immediately went red for the third test method. After double-checking the test code, I was certain that the test was correct. The test discovered a bug in the production code. And being a mostly independent unit test, it pointed to the problematic lines right away: the method implementation above. The helper method named configuration() spared in the code sample was very unlikely to contain a bug.

After a short moment of reading the code again, I corrected it (note the added return statement in line 3):


public HardwareModule createFor(ModuleConfiguration configuration) {
  if (configuration.isHardwarePresent()) {
    return new RealHardwareModule();
  }
  return new EmulatedHardwareModule();
}

This might not seem like the most disastrous bug ever, but it would have made for a nasty start when I finally would have tried the application with the real hardware. There is nothing more valueable than to be able to keep your cool “in the wild” and work on the real problems like faulty protocol specifications or unexpected/undocumented hardware behaviour. So, my gut feeling (and the Thorough rule) were right and my brain, telling me “skip this petty test” longer than I like to admit, was wrong. A small test for a small method paid off immediately and saved the day, at least for me.

Antipatterns: Convenience Constructors

Lately I stumble a lot upon code I wrote 4 or more years ago. In the light of introducing new features the code gets tested for its quality. One antipattern I’ve found which I had used in the past but which is really hard to extend is convenience constructors.

Lately I stumble a lot upon code I wrote 4 or more years ago. In the light of introducing new features the code gets tested for its quality. One antipattern I’ve found which I had used in the past but which is really hard to extend is convenience constructors. Take a constructor for a command object for example:

    public SetProperty(String filename, String key, String value) {
        this(filename, key, value, null);
    }

    public SetProperty(String filename,
            String key, String value, String comment) {
        this(filename, ReferenceTo.key(key), value, comment);
    }

    public SetProperty(String filename,
            String sectionType, String sectionName,
            String key, String value) {
        this(filename, sectionType, sectionName, key, value, null);
    }

    public SetProperty(String filename,
            String sectionType, String sectionName,
            String key, String value, String comment) {
        this(filename, ReferenceTo.sectionAndKey(sectionType, sectionName, key), value, comment);
    }

    public SetProperty(String filename,
            AdvancedPropertyReference propertyReference,
            String value, String comment) {
        this(filename, propertyReference, value, comment);
    }

    public SetProperty(String filename,
            AdvancedPropertyReference propertyReference,
            String value, String comment) {
        super(filename);
        this.propertyReference = propertyReference;
        this.value = value;
        this.comment = comment;
    }

We need to add a new feature which enables us to append properties not just set and replace them. One way could be to extend the class. But this is overkill. Just adding a new parameter flag should suffice. But this would blow up the number of constructors because you need to include a version with and without the new parameter for each (used) constructor. Here an old friend comes to the rescue: design patterns. Looking at the GoF book shows a good solution to the problem: the builder pattern.

public class SetPropertyBuilder {
    private final String filename;
    private String sectionType;
    private String sectionName;
    private String referenceKey;
    private String value;
    private String comment;
    private boolean append;

    public SetPropertyBuilder(String filename) {
        super();
        this.filename = filename;
    }

    public SetPropertyBuilder set(String key, String newValue) {
        this.referenceKey = key;
        this.value = newValue;
        return this;
    }

    public SetPropertyBuilder append(String key, String additionalValue) {
        set(key, additionalValue);
        this.append = true;
        return this;
    }

    public SetPropertyBuilder inSection(String type, String name) {
        this.sectionType = type;
        this.sectionName = name;
        return this;
    }

    public SetProperty build() {
        AdvancedPropertyReference reference = ReferenceTo.key(this.referenceKey);
        if (this.sectionType != null && this.sectionName != null) {
            reference = ReferenceTo.sectionAndKey(this.sectionType, this.sectionName, this.referenceKey);
        }
        return new SetProperty(this.filename, reference, this.value, this.comment, this.append);
    }
}

Now we can eleminate all but one constructor from the SetProperty command. Adding a new property now yields one new method in the builder.