Building Visual C++ Projects with CMake

In previous post my colleague showed how to create RPM packages with CMake. As a really versatile tool it is also able to create and build Visual Studio projects on Windows. This property makes it very valuable when you want to integrate your project into a CI cycle(in our case Jenkins).

Prerequisites:

To be able to compile anything following packages needed to be installed beforehand:

  •  CMake. It is helpful to put it in the PATH environment variable so that absolute paths aren’t needed.
  • Microsoft Windows SDK for Windows 7 and .NET Framework 4 (the web installer or  the ISOs).  The part “.NET Framework 4” is very important, since when the SDK for the .NET Framework 3.5 is installed you will get following parse error for your *.vcxproject files:

    error MSB4066: The attribute “Label” in element is unrecognized

    at the following position:

    <ItemGroup Label=”ProjectConfigurations”>

    Probably equally important is the bitness of the installed SDK. The x64 ISO differs only in one letter from the x86 one. Look for the X if want 64 bit.

  • .NET Framework 4, necessary to make msbuild run

It is possible that you encounter following message during your SDK setup:

A problem occurred while installing selected Windows SDK components. Installation of the “Microsoft Windows SDK for Windows 7” product has reported the following error: Please refer to Samples\Setup\HTML\ConfigDetails.htm document for further information. Please attempt to resolve the problem and then start Windows SDK setup again. If you continue to have problems with this issue, please visit the SDK team support page at http://go.microsoft.com/fwlink/?LinkId=130245. Click the View Log button to review the installation log. To exit, click Finish.

The reason behind this wordy and less informative error message were the Visual C++ Redistributables installed on the system. As suggested by Microsoft KB article removing them all helped.

Makefiles:

For CMake to build anything you need to have a CMakeLists.txt file in your project. For a tutorial on how to use CMake, look at this page. Here is a simple CMakeLists.txt to get you started:

project(MyProject)
 cmake_minimum_required(VERSION 2.6)
 set(source_files
 main.cpp
 )
 include_directories(
 ${CMAKE_CURRENT_SOURCE_DIR}
 )
 add_executable(MyProject ${source_files})

Building:

To build a project there are few steps necessary. You can enter them in your CI directy or put them in a batch file.

call "%ProgramFiles%\Microsoft SDKs\Windows\v7.1\Bin\SetEnv.cmd" /Release /x86

With this call all necessary environment variables are set. Be careful on 64 bit platforms as jenkins slave executes this call in a 32 bit context and so “%ProgramFiles%” is resolved to “ProgramFiles (x86)” where the SDK does not lie.

del CMakeCache.txt

This command is not strictly necessary, but it prevents you from working with outdated generated files when you change your configuration.

cmake -G "Visual Studio 10" .

Generates a Visual Studio 2010 Solution. Every change to the solution and the project files will be gone when you call it, so make sure you track all necessary files in the CMakeLists.txt.

cmake --build . --target ALL_BUILD --config Release

The final step. It will net you the MyProject.exe binary. The target parameter is equal to the name of the project in the solution and the config parameter is one of the solution configurations.

Final words:

The hardest and most time consuming part was the setup of prerequisites. Generic, not informative error messages are the worst you can do to a clueless customer. But… when you are done with it, you are only two small steps apart from an automatically built executable.

Build a RPM package using CMake

Some while ago I presented a way to package projects using different build systems as RPM packages. If you are using CMake for your projects you can use CPack to build RPM packages (in addition to tarballs, NSIS installers, deb packages and so on). This is a really nice option for deployment of your own projects because installation and update can be easily done by the users using familiar package management tools like zypper, yum and yast2.

Your first CPack RPM

It is really easy to add RPM using CPack to your existing project. Just set the mandatory CPack variables and include CPack below the variable definitions, usually as one of the last steps:

project (my_project)
cmake_minimum_required (VERSION 2.8)

set(VERSION "1.0.1")
<----snip your usual build instructions snip--->
set(CPACK_PACKAGE_VERSION ${VERSION})
set(CPACK_GENERATOR "RPM")
set(CPACK_PACKAGE_NAME "my_project")
set(CPACK_PACKAGE_RELEASE 1)
set(CPACK_PACKAGE_CONTACT "John Explainer")
set(CPACK_PACKAGE_VENDOR "My Company")
set(CPACK_PACKAGING_INSTALL_PREFIX ${CMAKE_INSTALL_PREFIX})
set(CPACK_PACKAGE_FILE_NAME "${CPACK_PACKAGE_NAME}-${CPACK_PACKAGE_VERSION}-${CPACK_PACKAGE_RELEASE}.${CMAKE_SYSTEM_PROCESSOR}")
include(CPack)

These few lines should be enough to get you going. After that you can execute a make package command should obtain the RPM package.

Spicing up the package

RPM packages can contain much more metadata and especially package dependencies and a version changelog. Most of the stuff can be specified using CPACK variables. We sometimes prefer to use a SPEC file template to be filled and used by CPack because it then contains most of the RPM specific stuff in a familiar manner instead of polluting the CMakeLists.txt itself:

project (my_project)
<----snip your usual CMake stuff snip--->
<----snip your additional CPack variables snip--->
configure_file("${CMAKE_CURRENT_SOURCE_DIR}/my_project.spec.in" "${CMAKE_CURRENT_BINARY_DIR}/my_project.spec" @ONLY IMMEDIATE)
set(CPACK_RPM_USER_BINARY_SPECFILE "${CMAKE_CURRENT_BINARY_DIR}/my_project.spec")
include(CPack)

The variables in the RPM SPEC file will be replaced by the values provided in the CMakeLists.txt and then be used for the RPM package. It looks very similar to a standard SPEC file but you can omit the usual build instructions boiling down to something like this:

Buildroot: @CMAKE_CURRENT_BINARY_DIR@/_CPack_Packages/Linux/RPM/@CPACK_PACKAGE_FILE_NAME@
Summary:        My very cool Project
Name:           @CPACK_PACKAGE_NAME@
Version:        @CPACK_PACKAGE_VERSION@
Release:        @CPACK_PACKAGE_RELEASE@
License:        GPL
Group:          Development/Tools/Other
Vendor:         @CPACK_PACKAGE_VENDOR@
Prefix:         @CPACK_PACKAGING_INSTALL_PREFIX@
Requires:       opencv >= 2.4

%define _rpmdir @CMAKE_CURRENT_BINARY_DIR@/_CPack_Packages/Linux/RPM
%define _rpmfilename @CPACK_PACKAGE_FILE_NAME@.rpm
%define _unpackaged_files_terminate_build 0
%define _topdir @CMAKE_CURRENT_BINARY_DIR@/_CPack_Packages/Linux/RPM

%description
Cool project solving the problems of many colleagues.

# This is a shortcutted spec file generated by CMake RPM generator
# we skip _install step because CPack does that for us.
# We do only save CPack installed tree in _prepr
# and then restore it in build.
%prep
mv $RPM_BUILD_ROOT @CMAKE_CURRENT_BINARY_DIR@/_CPack_Packages/Linux/RPM/tmpBBroot

%install
if [ -e $RPM_BUILD_ROOT ];
then
  rm -Rf $RPM_BUILD_ROOT
fi
mv "@CMAKE_CURRENT_BINARY_DIR@/_CPack_Packages/Linux/RPM/tmpBBroot" $RPM_BUILD_ROOT

%files
%defattr(-,root,root,-)
@CPACK_PACKAGING_INSTALL_PREFIX@/@LIB_INSTALL_DIR@/*
@CPACK_PACKAGING_INSTALL_PREFIX@/bin/my_project

%changelog
* Tue Jan 29 2013 John Explainer <john@mycompany.com> 1.0.1-3
- use correct maintainer address
* Tue Jan 29 2013 John Explainer <john@mycompany.com> 1.0.1-2
- fix something about the package
* Thu Jan 24 2013 John Explainer <john@mycompany.com> 1.0.1-1
- important bugfixes
* Fri Nov 16 2012 John Explainer <john@mycompany.com> 1.0.0-1
- first release

Conclusion

Integrating RPM (or other package formats) to your CMake-based build is not as hard as it seems and quite flexible. You do not need to rely on the tools provided by your OS vendor and still deliver your software in a way your users are accustomed to. This makes CPack very continuous integration (CI) friendly too!

Java Generics: the Klingonian Cast

Struck by Java generic’s odd type erasure behaviour again? You can circumvent the missing upcast feature by using the Klingonian Cast.

Klingon_by_Balsavor

Ever since Generics were included in Java, they’ve been a great help and source of despair at once. One thing that most newcomers will stumble upon sooner or later is “Type Erasure” and its effects. You may read about it in the Java Tutorial and never quite understand it, until you encounter it in the wild (in your code) and it just laughs at your carefully crafted type system construct. This is the time when you venture into the deep end of the Java language specification and aren’t seen for days or weeks. And when you finally reappear, you are a broken man – or a strong warrior, even stronger than before, charged with the wisdom of the ancients.

The problem

If my introduction was too mystic for your taste – bear with me. The rest of this blog post is rather technical and bleak. It won’t go into the nitty-gritty details of Java generics or type erasure, but describe a real-world problem and one possible solution. The problem can be described by a few lines of code:


List<Integer> integers = new ArrayList<Integer>();
Iterable<Integer> iterable = integers;
Iterable<Number> numbers = integers; // Damn!

The last line won’t compile. Let’s examine it step by step:

  • We create a list of Integers
  • The list can be (up-)casted into an Iterable of Integers. Lists are/behave like Iterables.
  • But the list cannot be casted into an Iterable of Number, even though Integers are/behave like Numbers.

The compiler error message isn’t particularly helpful here:

Type mismatch: cannot convert from List<Integer> to Iterable<Number>

This is when we remember one thing about Java Generics: They aren’t exactly variant. While they have “use-site variance”, we are in need of “declaration-site variance” here, which Java Generics lack entirely. Don’t despair, this was all the theoretical discussion about the topic for today. If you want to know more, just ask in the comment section. Perhaps we can provide another blog post discussing just the theory.

The workaround

In short, our problem is that Java is unable to see the relationship between the types Integer and Number when given as generic parameter. But we can make it see:


List<Integer> integers = new ArrayList<Integer>();
List<Number> numberList = new ArrayList<Number>();
numberList.addAll(integers);
Iterable<Number> numbers = numberList;

This will compile and work. I’ve split the creation and filling of the second List into two steps to make more clear what’s happening: By explicitely creating a new collection and (up-)casting every element of the List alone, we can show the compiler that everything’s ok.

The Klingonian Cast

Well, if the compiler wants to see every element of our initial collection to be sure about upcasting, we should show him. But why create a new List and swap the elements by hand every time, when we can just use the “Klingonian Cast“? Ok, I’ve made the name up. But how else would you call a structure that’s essentially an upcast, but using two generic parameters and a dozen lines of code if not something very manly and bold. But enough mystery again, let’s look at the code:


List<Integer> integers = new ArrayList<Integer>();
Iterable<Number> numbers = MakeIterable.<Number>outOf(integers);

The good thing about the Klingonian cast is that it has a very thin footprint at runtime. Your hotspot compiler might even eliminate it completely. But you probably don’t want to hear about it characteristics, but see the implementation:


public class MakeIterable {
  public static <T> Iterable<T> outOf(final Iterable<? extends T> iterable) {
    return new Iterable<T>() {
      @Override
      public Iterator<T> iterator() {
        return iteratorOutOf(iterable.iterator());
      }
    };
  }

  protected static <T> Iterator<T> iteratorOutOf(final Iterator<? extends T> iterator) {
    return new Iterator<T>() {
      @Override
      public boolean hasNext() {
        return iterator.hasNext();
      }
      @Override
      public T next() {
        return iterator.next();
      }
      @Override
      public void remove() {
        iterator.remove();
      }
    };
  }
}

That’s it. A “simple” upcast for Java Generics, ready to use it for your own convenience. Enjoy!

FTP integrated

When developing a feature containing unknown technology or hardware, I prefer a spike followed by integration tests. Sometimes it helps a lot.

How it all began
One of our customers employs NAS for data storage, accessing it per FTP. Some of the features like copying and moving files around were already implemented by us using Apaches FTPClient. The next feature on the list was “cleanup after x days” – deletion of files, or more important: directories. FTP, being a pretty basic protocol, does not allow for recursive deletion of directories. The only way to do it is to delete the deepest elements first,  going up one level and repeat – or in other words – implementing the recursion yourself. This was too much for our simple feature, so the decision was made to hide the complexity behind a VirtualFile, an interface already existing in our framework.

Being a novice in speaking FTP I was happy to hear that we already have acquired exactly the same type of NAS the customer has. To see how the system behaves (or not) and document it at the same time, I decided to implement the interface integration test first.

Fun
As the amount of tests and file operations started to grow, so did grow the round trip time of my test/make test pass/refactor cycle and my patience dwindled. I switched from NAS FTP-Server to a local FileZilla FTP-Server. It worked like a charm and all necessary features were implemented really fast.

The next step was to run the app using the new feature with real amount of data, real directory structure and our NAS. It failed miserably. And randomly. The app suffered from closed connections while trying to open a data connection. After some search the reason was found: FTPClient we use had active mode enabled by default. That means that to transfer data the server tried to connect to the client and the clients Firewall did not like it. After setting connection mode to passive the problem was solved.

The tests run fine, but they run slow. And they introduced a dependency on an external system. If that system broke or were disabled for any other reason, our CI would report failure without any changes in the code. Both points could be addressed by using an embedded FTP Server. We choose Apaches FTP Server. Changing the tests was easy, since the only thing to do was to setup the server before the test and to shut it down afterwards. Surprisingly some tests failed. Apaches server handled some cases differently:

  • it allowed opening output streams to directories without any exception
  • it forbid to delete current working directory
  • the name listing in the directory (NLST) returned by NAS were absolute paths to the file, Apaches server returned simple names.

After another code change the code worked correctly with all three servers.

Lessons learned
While implementing the interface I learned much about how to create and test bridging functionality:

  • Specification cannot replace tests. Searching for the FTP commands to use I looked at several websites that described the commands. None of them wrote about whether NLST returns absolute paths or only filenames. There are always holes in the spec that will be interpreted differently by vendors or the vendors do not always obey it.
  • Unit tests are great, but they are limited to your code only. When it comes to communication between system components, especially communication with foreign systems, an integration test is a must.
  • Working with a test setup that mimics production environment as close as possible is great. Without the NAS, the app would have simply failed in the best case. In the worst case it would have deleted wrong files. Neither of them make a customer happy.

Aspects done right: Concerns

With aspects you cannot see (without sophisticated IDE support) which class has which aspects and which aspects are woven into the class when looking at its source. Here concerns (also called mixins or traits) come to the rescue.

The idea of encapsulating cross cutting concerns struck with me from the beginning but the implementation namely the aspects lacked clarity in my opinion. With aspects you cannot see (without sophisticated IDE support) which class has which aspects and which aspects are woven into the class when looking at its source. Here concerns (also called mixins or traits) come to the rescue. I know that aspects were invented to hide away details about which code is included and where but I find it confusing and hard to trace without tool support.

Take a look at an example in Ruby:

module Versionable
  extend ActiveSupport::Concern

  included do
    attr_accessor :version
  end
end

class Document
  include Versionable
end

Now Document has a field version and is_a?(Versionable) returns true. For clients it looks like the field version is in Document itself. So for clients of this class it is the same as:

class Document
  attr_accessor :version
end

Furthermore you can easily use the versionable concern in another class. This sounds like a great implementation of the separating of concerns principle but why isn’t everyone using it (besides being a standard for the upcoming Rails 4)? Well, some people are concerned with concerns (excuse the pun). As with every powerful feature you can shoot yourself in the foot. Let’s take a look at each problem.

  • Diamond problem aka multiple inheritance
  • Ruby has no multiple inheritance. Even when you include more than one module the modules are like superclasses for the message resolve order. Every include creates a new “superclass” above the including class. So the last include takes precedence.

  • Dependencies between concerns
  • You can have dependencies between different concerns like this concern needs another concern. ActiveSupport:Concerns handles these dependencies automatically.

  • Unforeseeable results
  • One last big problem with concerns is having side effects from combining two concerns. Take for an example two concerns which add a method with the same name. Including both of them renders one concern unusable. This cannot be solved technically but I also think this problem shows an underlying, more important cause. It could be because of poor naming. Or you did not separate these two concerns enough. As always tests can help to isolate and spot the problem. Also concerns should be tested in isolation and in integration.

A small test saves the day

You think a method is too trivial to write a test for it? Think again if the method is mission-critical!

Just recently, I had to write a connection between an existing application and a new hardware unit. This is a fairly common job for our company, even considering the circumstances that I’d never even seen the hardware, let alone being able to connect to it. The hardware unit itself was rather big and it was installed in a security sensitive area with restricted access. So, I only got a specification of the protocol to use and a description of the hardware’s features.

Our common procedure to include hardware dependent modules into an application is to write two implementations of the module: One implementation is the real deal and interacts with the hardware over ethernet, USB, serial port or whatever proprietary communication device is used. This version of the module can only work as intended if the hardware is present. The other implementation acts as an emulation of the hardware, without any dependencies. If you are familiar with unit tests, think of a big test mock. The emulation version is used during development to test and run the application without requirements about the hardware. There are a lot of subtle pitfalls to consider and avoid, but on a bird-view level of abstraction, these interchangeable implementations of a module enable us to develop software with hardware dependencies without need for the actual hardware.

The first piece of code that’s used of a module is a factory/builder class that chooses between the available implementations, based on some configuration entry (or hardware availability, etc.). A typical implementation of the responsible method might look like this:


public HardwareModule createFor(ModuleConfiguration configuration) {
  if (configuration.isHardwarePresent()) {
    new RealHardwareModule();
  }
  return new EmulatedHardwareModule();
}

If the configuration object says that the hardware is present, the real implementation is used, subsequentially opening a connection to the hardware and talking the client side of the given protocol. Otherwise, the emulation is created and returned, maybe opening a debug GUI window to display certain internal states and values and providing controls to mess with the application during development.

The method itself looks very innocent and meager. There is not much going on, so what could possibly go wrong?

I’m not the most eager test-driven developer in the world, I have to admit. But I see the value of tests (and unit tests in particular) and adhere to the A-TRIP rules defined by Andy Hunt and (pragmatic) Dave Thomas:

  • Automatic
  • Thorough
  • Repeatable
  • Independent
  • Professional

For a complete definition of the rules, read the linked blog entry or, even better, buy the book. It’s small and cheap, but contains a lot of profound basic knowledge about unit testing.

The “Thorough” rule is more of a rule of thumb than a hard scientific formula for good unit tests: Always write a test if you’ve found a bug or if the code you’re writing is mission-critical. This was when my gut feeling told me that while the method above might seem trivial, it is definitely essential for the hardware module. So I wrote a test:

  @Test
  public void providesEmulationIfUnspecified() {
    HardwareModuleFactory factory = new HardwareModuleFactory();
    HardwareModule hardware = factory.createFor(configuration(""));
    assertEquals("not the hardware emulation", EmulatedHardwareModule.class, hardware.getClass());
  }

  @Test
  public void providesEmulationIfHardwareAbsent() {
    HardwareModuleFactory factory = new HardwareModuleFactory();
    HardwareModule hardware = factory.createFor(configuration("hardware.present=false"));
    assertEquals("not the hardware emulation", EmulatedHardwareModule.class, hardware.getClass());
  }

  @Test
  public void providesRealImplementationIfHardwarePresent() {
    HardwareModuleFactory factory = new HardwareModuleFactory();
    HardwareModule hardware = factory.createFor(configuration("hardware.present=true"));
    assertEquals("not the real hardware implementation", RealHardwareModule.class, hardware.getClass());
  }

To my surprise, the test immediately went red for the third test method. After double-checking the test code, I was certain that the test was correct. The test discovered a bug in the production code. And being a mostly independent unit test, it pointed to the problematic lines right away: the method implementation above. The helper method named configuration() spared in the code sample was very unlikely to contain a bug.

After a short moment of reading the code again, I corrected it (note the added return statement in line 3):


public HardwareModule createFor(ModuleConfiguration configuration) {
  if (configuration.isHardwarePresent()) {
    return new RealHardwareModule();
  }
  return new EmulatedHardwareModule();
}

This might not seem like the most disastrous bug ever, but it would have made for a nasty start when I finally would have tried the application with the real hardware. There is nothing more valueable than to be able to keep your cool “in the wild” and work on the real problems like faulty protocol specifications or unexpected/undocumented hardware behaviour. So, my gut feeling (and the Thorough rule) were right and my brain, telling me “skip this petty test” longer than I like to admit, was wrong. A small test for a small method paid off immediately and saved the day, at least for me.

Antipatterns: Convenience Constructors

Lately I stumble a lot upon code I wrote 4 or more years ago. In the light of introducing new features the code gets tested for its quality. One antipattern I’ve found which I had used in the past but which is really hard to extend is convenience constructors.

Lately I stumble a lot upon code I wrote 4 or more years ago. In the light of introducing new features the code gets tested for its quality. One antipattern I’ve found which I had used in the past but which is really hard to extend is convenience constructors. Take a constructor for a command object for example:

    public SetProperty(String filename, String key, String value) {
        this(filename, key, value, null);
    }

    public SetProperty(String filename,
            String key, String value, String comment) {
        this(filename, ReferenceTo.key(key), value, comment);
    }

    public SetProperty(String filename,
            String sectionType, String sectionName,
            String key, String value) {
        this(filename, sectionType, sectionName, key, value, null);
    }

    public SetProperty(String filename,
            String sectionType, String sectionName,
            String key, String value, String comment) {
        this(filename, ReferenceTo.sectionAndKey(sectionType, sectionName, key), value, comment);
    }

    public SetProperty(String filename,
            AdvancedPropertyReference propertyReference,
            String value, String comment) {
        this(filename, propertyReference, value, comment);
    }

    public SetProperty(String filename,
            AdvancedPropertyReference propertyReference,
            String value, String comment) {
        super(filename);
        this.propertyReference = propertyReference;
        this.value = value;
        this.comment = comment;
    }

We need to add a new feature which enables us to append properties not just set and replace them. One way could be to extend the class. But this is overkill. Just adding a new parameter flag should suffice. But this would blow up the number of constructors because you need to include a version with and without the new parameter for each (used) constructor. Here an old friend comes to the rescue: design patterns. Looking at the GoF book shows a good solution to the problem: the builder pattern.

public class SetPropertyBuilder {
    private final String filename;
    private String sectionType;
    private String sectionName;
    private String referenceKey;
    private String value;
    private String comment;
    private boolean append;

    public SetPropertyBuilder(String filename) {
        super();
        this.filename = filename;
    }

    public SetPropertyBuilder set(String key, String newValue) {
        this.referenceKey = key;
        this.value = newValue;
        return this;
    }

    public SetPropertyBuilder append(String key, String additionalValue) {
        set(key, additionalValue);
        this.append = true;
        return this;
    }

    public SetPropertyBuilder inSection(String type, String name) {
        this.sectionType = type;
        this.sectionName = name;
        return this;
    }

    public SetProperty build() {
        AdvancedPropertyReference reference = ReferenceTo.key(this.referenceKey);
        if (this.sectionType != null && this.sectionName != null) {
            reference = ReferenceTo.sectionAndKey(this.sectionType, this.sectionName, this.referenceKey);
        }
        return new SetProperty(this.filename, reference, this.value, this.comment, this.append);
    }
}

Now we can eleminate all but one constructor from the SetProperty command. Adding a new property now yields one new method in the builder.

Checking preconditions in advance vs. on demand vs. exceptions

Usually, it is good practice to check certain preconditions before applying operations to input data. This is often referred to as defensive programming. Many people are used to lines like:

public void preformOn(String foo) {
  if (!myMap.containsKey(foo)) {
    // handle it correctly
    return;
  }
  // do something with the entry
  myMap.get(foo).performOperation();
}

While there is nothing wrong with such kind of “in advance checking” it may have performance implications – especially when IO is involved.

We had a problem some time ago when working with some thousand wrappers for File objects. The wrappers checked if the given File object actually is a file using the innocent isFile()-method in the constructor which caused hard disk access each time. So building our collection of wrapped files took quite some time (dozens of seconds) and our client complained (rightfully so!) about the performance. Once the collection was built the operations were fast because no checking was needed anymore.

Our first optimization step was deferring the check to the point where the file was actually used. This sped up the creation of the wrappers so it was barely noticeable but processing a bunch of elements took longer because of additional disk accesses. Even though this approach may work for a plethora of situations for our typical use cases the effect of this optimization was not enough.

So we looked at our problem from another perspective: The vast majority of file handles were actually existing and readable files and directories and foreign/unknown files were the exception. Because of this fact we chose to simply leave out any kind of checks and handle the exceptions! Exception handling is often referred to as slow but if exceptions are rare it can make a difference in some orders of magnitude. Our speed up using this approach was enourmous and the client was happy about sub-second responsiveness for his typical operations. In addition we think that the code now expresses more cleary that irregular files really are the exception and not the rule for this particular code.

Conclusion

There are different approaches to handling of parameters and input data. Depending on the cost of the check and the frequency of special input different strategies may prove beneficial both in expressing your intent and the perceived performance of your application.

Solutions to common Java enum problems

More readable solutions to using enums with attributes for categorization or representation.

Say, you have an enum representing a state:

enum State {
  A, B, C, D;
}

And you want to know if a state is a final state. In our example C and D should be final.
An initial attempt might be to use a simple method:

public boolean isFinal() {
	return State.C == this || State.D == this;
}

When there are two states this might seem reasonable but adding more states to this condition makes it unreadable pretty fast.
So why not use the enum hierarchy?

A(false), B(false), C(true), D(true);

private boolean isFinal;

private State(boolean isFinal) {
  this.isFinal = isFinal;
}

public boolean isFinal() {
  return isFinal;
}

This was and is in some cases a good approach but also gets cumbersome if you have more than one attribute in your constructor.
Another attempt I’ve seen:

public boolean isFinal() {
        for (State finalState : State.getFinalStates()) {
            if (this == finalState) {
                return true;
            }
        }
        return false;
    }

    public static List<State> getFinalStates() {
        List<State> finalStates = new ArrayList<State>();
        finalStates.add(State.C);
        finalStates.add(State.D);
        return finalStates;
    }

This code gets one thing right: the separation of the final attribute from the states. But it can be written in a clearer way:

List<State> FINAL_STATES = Arrays.asList(C, D)

public boolean isFinal() {
	return FINAL_STATES.contains(this);
}

Another common problem with enums is constructing them via an external representation, e.g. a text.
The classic dispatch looks like this:

    public static State createFrom(String text) {
        if ("A".equals(text) || "FIRST".equals(text)) {
            return State.A;
        } else if ("B".equals(text)) {
            return State.B;
        } else if ("C".equals(text)) {
            return State.C;
        } else if ("D".equals(text) || "LAST".equals(text)) {
            return State.D;
        } else {
            throw new IllegalArgumentException("Invalid state: " + text);
        }
    }

Readers of refactoring sense a code smell here and promptly want to refactor to a dispatch using the hierarchy.

A("A", "FIRST"),
B("B"),
C("C"),
D("D", "LAST");

private List<String> representations;

private State(String... representations) {
  this.representations = Arrays.asList(representations);
}

public static State createFrom(String text) {
  for (State state : values()) {
    if (state.representations.contains(text)) {
      return state;
    }
  }
  throw new IllegalArgumentException("Invalid state: " + text);
}

Much better.

Class names with verbs enforce the Single Responsibility Principle (SRP)

When using fluent code and fluent interfaces, I noticed an increased flexibility in the code. On closer inspection, this is the effect of a well-known principle that is inherently enforced by the coding style.

I’m experimenting with fluent code for a while now. Fluent code is code that everybody can read out loud and understand immediately. I’ve blogged on this topic already and it’s not big news, but I’ve just recently had a revelation why this particular style of programming works so well in terms of code design.

The basics

I don’t expect you to read all my old blog entries on fluent code or to know anything about fluent interfaces, so I’m giving you a little introduction.

Let’s assume that you want to find all invoice documents inside a given directory tree. A fluent line of code reads like this:


Iterable<Invoice> invoices = FindLetters.ofType(
    AllInvoices.ofYear("2012")).beneath(
        Directory.at("/data/documents"));

While this is very readable, it’s also a bit unusual for a programmer without prior exposure to this style. But if you are used to it, the style works wonders. Let’s see: the implementation of the FindLetters class looks like this (don’t mind all the generic stuff going on, concentrate on the methods!):

public final class FindLetters<L extends Letter> {
  private final LetterType<L> parser;

  private FindLetters(LetterType<L> type) {
    this.parser = type;
  }

  public static <L extends Letter> FindLetters<L> ofType(LetterType<L> type) {
    return new FindLetters<L>(type);
  }

  public Iterable<L> beneath(Directory directory) {
    ...
  }

Note: If you are familiar with fluent interfaces, then you will immediately notice that this isn’t even a full-fledged one. It’s more of a (class-level) factory method and a single instance method.

If you can get used to type in what you want to do as the class name first (and forget about constructors for a while), the code completion functionality of your IDE will guide you through the rest: The only public static method available in the FindLetters class is ofType(), which happens to return an instance of FindLetters, where again the only method available is the beneath() method. One thing leads to another and you’ll end up with exactly the Iterable of Invoices you wanted to find.

To assemble all parts in the example, you’ll need to know that Invoice is a subtype of Letter and AllInvoices is a subtype of LetterType<Invoice>.

The magical part

One thing that always surprised me when programming in this style is how everything seems to find its place in a natural manner. The different parts fit together really well, especially when the fluent line of code is written first. Of course, because you’ll design your classes to make everything fitting. And that’s when I had the revelation. In hindsight, it seems rather obvious to me (a common occurrence with revelations) and you’ve probably already seen it yourself.

The revelation

It struck me that all the pieces that you assemble a fluent line of code with are small and single-purposed (other descriptions would be “focussed”, “opinionated” or “determined”). Well, if you obey the Single Responsibility Principle (SRP), every class should only have one responsibility and therefore only limited purposes. But now I know how these two things are related: You can only cram so much purpose (and responsibility) in a class named FindLetters. When the class name contains the action (verb) and the subject (noun), the purpose is very much set. The only thing that can be adjusted is the context of the action on the subject, a task where fluent interfaces excel at. The main reason to use a fluent interface is to change distinct aspects of the context of an object without losing track of the object itself.

The conclusion

If the action+subject class names enforce the Single Responsibility Principle, then it’s no wonder that the resulting code is very flexible in terms of changing requirements. The flexibility isn’t a result of the fluency or the style itself (as I initially thought), but an effect predicted and caused by the SRP. Realizing that doesn’t invalidate the other positive effects of fluent code for me, but makes it a bit less magical. Which isn’t a bad thing.