Structuring CppUnit Tests

How to structure cppunit tests in non-trivial software systems so that they can be easily executed selectively during code-compile-test cycle and at the same time are easy to execute as a whole by your continuous integration system.

While unit testing in Java is dominated by JUnit, C++ developers can choose between a variety of frameworks. See here for a comprehensive list. Here you can find a nice comparison of the biggest players in the game.

Being probably one of the oldest frameworks CppUnit sure has some usability issues but is still widely used. It is criticised mostly because you have to do a lot of boilerplate typing to add new tests. In the following I will not repeat how tests can be written in CppUnit as this is described already exhaustively (e.g. here or here). Instead I will concentrate on the task of how to structure CppUnit tests in bigger projects. “Bigger” in this case means at least a few architectually independent parts which are compiled independently, i.e. into different libraries.

Having independently compiled parts in your project means that you want to compile their unit tests independently, too. The goal is then to structure the tests so that they can easily be executed selectively during development time by the programmer and at the same time are easy to execute as a whole during CI time (CI meaning Continuous Integration, of course).

As C++ has no reflection or other meta programming elements like the Java Annotations, things like automatic test discovery and how to add new tests become a whole topic of its own. See the CppUnit cookbook for how to do that with CppUnit . In my projects I only use the TestFactoryRegistry approach because it provides the most automatics in this regard.

Let’s begin with a simplest setup, the Link-Time Trap (see example source code): Test runner and result reporter are setup in the “main” function that is compiled into an executable. The actual unit tests are compiled in separate libraries and are all linked to the executable that contains the main function. While this solution works well for small projects it does not scale. This is simply because every time you change something during the code-compile-test cycle the unit test executable has to be relinked, which can take a considerable amount of time the bigger the project gets. You fall into the Link Time Trap!

The solution I use in many projects is as follows: Like in the simple approach, there is one test main function which is compiled into a test executable. All unit tests are compiled into libraries according to their place in the system architecture. To avoid the Link-Time-Trap, they are not linked to the test executable but instead are automatically discovered and loaded during test execution.

1. Automatic Discovery

Applying a little convention-over-configuration all testing libraries end with the suffix “_tests.so”. The testing main function can then simply walk over the directory tree of the project and find all shared libraries that contain unit test classes.

2. Loading

If a “.._test.so” library has been found, it simply gets loaded using dlopen (under Unix/Linux). When the library is loaded the unit tests are automatically registered with the TestFactoryRegistry.

3. Execution

After all unit test libraries has been found and loaded text execution is the same as in the simple approach above.

Here my enhanced testmain.cpp (see example source code).

#include ... 

using namespace boost::filesystem; 
using namespace std; 

void loadPlugins(const std::string& rootPath) 
{
  directory_iterator end_itr; 
  for (directory_iterator itr(rootPath); itr != end_itr; ++itr) { 
    if (is_directory(*itr)) {
      string leaf = (*itr).leaf(); 
      if (leaf[0] != '.') { 
        loadPlugins((*itr).string()); 
      } 
      continue; 
    } 
    const string fileName = (*itr).string();
    if (fileName.find("_tests.so") == string::npos) { 
      continue;
    }
    void * handle = 
      dlopen (fileName.c_str(), RTLD_NOW | RTLD_GLOBAL); 
    cout << "Opening : " << fileName.c_str() << endl; 
    if (!handle) { 
      cout << "Error: " << dlerror() << endl; 
      exit (1); 
    } 
  } 
} 

int main ( int argc, char ** argv ) { 
  string rootPath = "./"; 
  if (argc > 1) { 
    rootPath = static_cast<const char*>(argv[1]); 
  } 
  cout << "Loading all test libs under " << rootPath << endl; 
  string runArg = std::string ( "All Tests" ); 
  // get registry 
  CppUnit::TestFactoryRegistry& registry = 
    CppUnit::TestFactoryRegistry::getRegistry();
  
  loadPlugins(rootPath); 
  // Create the event manager and test controller 
  CppUnit::TestResult controller; 

  // Add a listener that collects test result 
  CppUnit::TestResultCollector result; 
  controller.addListener ( &result ); 
  CppUnit::TextUi::TestRunner *runner = 
    new CppUnit::TextUi::TestRunner; 

  std::ofstream xmlout ( "testresultout.xml" ); 
  CppUnit::XmlOutputter xmlOutputter ( &result, xmlout ); 
  CppUnit::TextOutputter consoleOutputter ( &result, std::cout ); 

  runner->addTest ( registry.makeTest() ); 
  runner->run ( controller, runArg.c_str() ); 

  xmlOutputter.write(); 
  consoleOutputter.write(); 

  return result.wasSuccessful() ? 0 : 1; 
}

As you can see the loadPlugins function uses the Boost.Filesystem library to walk over the directory tree.

It also takes a rootPath argument which you can give as parameter when you call the test main executable. This solves our goal stated above. When you want to execute unit tests selectively during development you can give the path of the corresponding testing library as parameter. Like so:

./testmain path/to/specific/testing/library

In your CI environment on the other hand you can execute all tests at once by giving the root path of the project, or the path where all testing libraries have been installed to.

./testmain project/root

CMake Builder Plugin for Hudson

Update: Check out my post introducing the newest version of the plugin.

Today I’m pleased to announce the first version of the cmakebuilder plugin for Hudson. It can be used to build cmake based projects without having to write a shell script (see my previous blog post). Using the scratch-my-own-itch approach I started out implementing only those features that I needed for my cmake projects which are mostly Linux/g++ based so far.

Let’s do a quick walk through the configuration:

1. CMake Path:
If the cmake executable is not in your $PATH variable you can set its path in the global Hudson configuration page.

2. Build Configuration:

To use the cmake builder in your Free-style project, just add “CMake Build” to your build steps. The configuration is pretty straight forward. You just have to set some basic directories and the build type.

cmakebuilder demo config
cmakebuilder demo config

The demo config above results in the following behavior (shell pseudocode):

if $WORKSPACE/build_dir does not exist
   mkdir $WORKSPACE/build_dir
end if

cd $WORKSPACE/build_dir
cmake $WORKSPACE/src -DCMAKE_BUILD_TYPE=Debug -DCMAKE_INSTALL_PREFIX=$WORKSPACE/install_dir
make
make install

That’s it. Feedback is very much appreciated!!

Originally the plan was to have the plugin downloadable from the hudson plugins site by now but I still have some publishing problems to overcome. So if you are interested, make sure to check out the plugins site again in a few days. I will also post an update here as soon as the plugin can be downloaded.

Update: After fixing some maven settings I was finally able to publish the plugin. Check it out!

Paging with different DBs

Sometimes you cannot or do not want to use an object-relational mapping tool. When not using an OR-mapper like Hibernate or Oracle Toplink you have to deal with database specifics. One common case especially for web applications is limiting the result set to a number of items that fit nicely on a web. You then often want to allow the users to navigate between these “pages” of items aka “paging”.

This type of functionality became part of SQL only as of SQL2008 in the following form:
SELECT * FROM t WHERE ... ORDER BY c OFFSET start_row FETCH count ONLY

Since most popular database management systems (DBMSes) do not yet implement this syntax you have to implement paging in propriatory ways.

My experience with an Oracle DBMS and the frustrating and comparatively long time it took to find the correct™ solution inspired me to write this post. Now I want to present you the syntax for some widely used DBMSes which we encounter frequently in our projects.

  • MySQL, H2 and PostgreSQL (< 8.4 which will also implement the SQL2008 standard) use the same syntax:
    SELECT * FROM t WHERE ... ORDER BY c LIMIT count OFFSET start
  • Oracle is where the fun begins. There is actually no easy and correct way of doing this. So you will end up with a mess like:
    SELECT col1 FROM (SELECT col1, ROWNUM r FROM (SELECT col1 FROM table ORDER BY col1)) WHERE r BETWEEN start AND end
  • DB2 AFAIK uses the syntax proposed in SQL2008 but correct me if I am wrong because we do not yet work with DB2 databases.
  • As we did not need paging with MS SQLServer as of now I did not bother to look for a solution yet. Hints are very welcome.

With all solutions the ORDER BY clause is critical because SQL does not guarantee the order of the returned rows.

Wikipedia delivers some additional and special case information but does not really explain the general, real world case the specific DBMSes.

I hope that I raised some awareness about database specifics and perhaps saved you some time trying to find a solution the problem using your favorite DBMS.

Visualizations with Flare/Prefuse

Recently we were in need of visualizing lots of data in networks. We started with a Javascript based version using RaphaelJS. RaphaelJS uses SVG/VML as the underlying graphics API. Drawing lots of nodes and edges in different shapes and colors worked like a breeze. But we quickly got performance problems when animating the transitions between different states of the network. So we had to look to an alternative technology which is performant enough and pragmatic to use. Reading a lot about Flex and its promising animation capabilities we gave it a try.
Shortly after we stumbled upon Flare/Prefuse and the stunning demos convinced us. It is very easy to use and gives remarkable results in a short time. We integrated it into our web app but suddenly the visualizations weren’t drawn upon startup. Everything went fine when the visualization was ready before the Flex app was fully completed but when the data was visualized after that the edges and nodes weren’t shown. Debugging our app and reading the Flex Sprite API wasn’t much helpful. But one comment in a superclass of all the Flare sprites solved out problem:

Internally, the DirtySprite class maintains a static list of all "dirty" sprites, and redraws each sprite in the list when a Event.RENDER event is issued. Typically, this process is performed automatically. In a few cases, erratic behavior has been observed due to a Flash Player bug that results in RENDER events not being properly issued. As a fallback, the static renderDirty() method can be invoked to manually force each dirty sprite to be redrawn.

These few cases were the default behaviour of our app, so invoking renderDirty solved all our drawing issues. I found no blog entry or hint in the docs and the demos run perfectly since all data is there before the app is shown. So the lesson here is:

  • retest your assumptions (we thought the Flare sprites were all DataSprites and the next superclass is the Flex API Sprite and forgot about the DirtySprite)
  • besides the demos and tutorials/docs read the API docs carefully, often there are implementation/technology specific hints

Software Craftsman Project Priority Survey

Answers to a question of project priorities from the upcoming book “Apprenticeship Patterns”.

apprenticeship-patters-coverThere is an upcoming and very promising book title written by Dave Hoover and Adewale Oshineye called “Apprenticeship Patterns: Guidance For The Aspiring Software Craftsman”.  It will cover all the basic rules you’ll need to become a Software Craftsman. This is a rather new term to describe professional software developers, eventually leading to the Software Craftsmanship Manifesto. The Manifesto itself reads like an addition to the Agile Manifesto:

As aspiring Software Craftsmen we are raising the bar of professional software development by practicing it and helping others learn the craft. Through this work we have come to value:

  • Not only working software, but also well-crafted software
  • Not only responding to change, but also steadily adding value
  • Not only individuals and interactions, but also a community of professionals
  • Not only customer collaboration,but also productive partnerships

That is, in pursuit of the items on the left we have found the items on the right to be indispensable.

© 2009, the undersigned. this statement may be freely copied in any form, but only in its entirety through this notice.

A very good question

When i read the blog of “Apprenticeship Patterns“, i noticed a very good question about project priorities:

Rank the following 3 project attributes in order of importance and explain why.

  • Test Coverage
  • Timely Delivery
  • Code Quality

This question really got me hooked, because there is no single valid answer, only personal statements about values.

An informal survey

I’m in the lucky position of meeting a lot of senior developers and a great number of software engineering students. So I instantly decided to perform a survey on this question and watch out for emerging answer patterns.

I gave each project attribute an unique letter, C for “Test Coverage”, D for “Timely Delivery” and Q for “Code Quality”. There are six possible answers, here are their rates in the survey (when 58 persons gave their answers):stats-all1

  • CDQ: 7 percent
  • CQD: 9 percent
  • DCQ: 5 percent
  • DQC: 7 percent
  • QCD: 41 percent
  • QDC: 31 percent

The vast majority of developers stated Code Quality as their highest goal. This isn’t very surprising to me, as most developers take pride in writing high quality code.

Comparing the answers

But what about the answers of only senior developers? Lets have a look at the numbers without student answers:stats-senior1

  • CDQ: 7 percent
  • CQD: 14 percent
  • DCQ: 7 percent
  • DQC: 14 percent
  • QCD: 21 percent
  • QDC: 36 percent

The big pattern still applies: Code Quality first. It’s amazing to see the other attributes gaining importance, though. To me, that’s a sign that code-centric thinking is one pattern of apprenticeship.

What’s not in the numbers

When i held the survey, the relevant group of people was gathered together, so a discussion of the results arose every time.  But the discussions followed different patterns:

  • The teams (of senior developers) gave very distinct answers while working on the same project. The answers were driven by personal conviction rather than project necessities.
  • The courses (of students) gave more similar answers while having a wide variety of backgrounds. The answers were mostly explained with current project necessities (like security-critical systems as reason for Test Coverage being most important).

When I have to compare the two groups, I tend to say that younger developers are more driven by extrinsic demands while more experienced developers act on their own internal values.

Our duty as Software Craftsman

In conclusion, I see a duty for experienced developers: to share their experience. Leading a discussion about “Team Values” at your current project is the least you can do. Helping others to develop their own set of internal values, even if it isn’t yours, seems crucial to me.

The upcoming “Apprenticeship Patterns” book and the brand new “97 Things Every Software Architect Should Know” are perfect starting points for this.

Make friends with your compiler

Suppose you are a C++ programmer on a project and you have the best intentions to write really good code. The one fellow that you better make friends with is your compiler. He will support you in your efforts whenever he can. Unless you don’t let him. One sure way to reject his help is to switch off all compiler warnings. I know it should be well-known by now that compiling at high warning levels is something to do always and anytime but it seems that many people just don’t do it.
Taking g++ as example, high warning levels do not mean just having “-Wall” switched on. Even if its name suggests otherwise, “-Wall” is just the minimum there. If you just spend like 5 minutes or so to look at the man page of g++ you find many many more helpful and reasonable -W… switches. For example (g++-4.3.2):


-Wctor-dtor-privacy: Warn when a class seems unusable because all the constructors or destructors in that class are private, and it has neither friends nor public static member functions.

Cool stuff! Let’s what else is there:


-Woverloaded-virtual: Warn when a function declaration hides virtual functions from a base class. Example:

class Base
{
public:
virtual void myFunction();
};

class Subclass : public Base
{
public:
void myFunction() const;
};

I would certainly like to be warned about that, but may be that’s just me.


-Weffc++: Warn about violations of the following style guidelines from Scott Meyers’ Effective C++ book

This is certainly one of the most “effective” weapons in your fight for bug-free software. It causes the compiler to spit out warnings about code issues that can lead to subtle and hard-to-find bugs but also about things that are considered good programming practice.

So suppose you read the g++ man page, you enabled all warning switches additional to “-Wall” that seem reasonable to you and you plan to compile your project cleanly without warnings. Unfortunately, chances are quite high that your effort will be instantly thwarted by third-party libraries that your code includes. Because even if your code is clean and shiny, header files of third-pary library may be not. Especially with “-Weffc++” this could result in a massive amount of warning messages in code that you have no control of. Even with the otherwise powerful, easy-to-use and supposedly mature Qt library you run into that problem. Compiling code that includes Qt headers like <QComboBox>with “-Weffc++” switched on is just unbearable.

Leaving aside the fact that my confidence in Qt has declined considerably since I noticed this, the question remains what to do ignore the shortcomings of other peoples code. With GCC you can for example add pragmas around offending includes as desribed here. Or you can create wrapper headers for third-party includes that contain


#pragma GCC system_header

AFAIK Microsoft’s compilers have similar pragmas:


#pragma warning(push, 1)
#include
#include
#pragma warning(pop)

Warning switches are a powerful tool to increase your code quality. You just have to use them!

Drawing your background with Javascript

Having complex backgrounds (like radial gradients) or even dynamic backgrounds in a web page is not an easy task using only images. Javascript helps here and the SVG/VML support in modern browsers makes drawing a reality. Looking around in the Javascript library world there are some which have APIs for crossbrowser drawing or charting. But if you are looking for a small and easy to learn on you have to search. We found raphaeljs, a project which started as a 20%er at Atlassian. It has an API which is close to SVG but also supports non SVG browsers like IE. How to make animations and complex drawings is easy to learn. The examples give a hint of what is possible.
We needed a radial gradient to have a smooth background, looking at the docs and the SVG spec, it was a four liner:

var Paper = Raphael("canvas", width, height);
var b = Paper.circle(width / 2, height / 2, Math.max(width, height));
var gradient = {type: "radial", dots: [{color: "#FFFFFF"}, {color: "#D5D5D5"}], vector: [0, 0, 0, "100%"]};
b.attr({"gradient": gradient});

About String Concatenation in Java or “don’t fear the +”

When it comes to string concatenation in Java many people have almost religious views about performance and style. Sadly, there are some misconceptions and misinformation especially about the performance bits. Many people think that concatenating many strings using + means expensive string copying each time and is thus slow as hell which is mostly wrong.

Justin Lee has a nice writeup of the most prominent concatenation options. But imho he misses out some things and his benchmark is a bit oversimplified although it does tell a true story. I assume that he followed at least the basic rules for performance measurement as his results suggest.

Now I want to try to clarify some points I think he missed and I find important:

  • Concatenation using + in one statement is actually compiled to the use of StringBuilder (at least for Sun Java6 compilers, where I checked it in the debugger, try it yourself!). So it’s no surprise that there is no difference between these two options in Justin’s benchmark.
  • It should be clear that the format variants have some overhead because they actually do more than just concatenate strings. There is at least some string parsing and copying involved so that these methods should be used for the cases where for example parameter reordering (think I18N) is needed or readability suffers using normal concatenation.
  • You have to pay attention when using + concatenation over the course of multiple statements because it then involves string copying. Consider the following code: Critical String Concatenation Here it really does make a difference which option you choose. The StringBuilder will perform far better for higher loop counts. We had a real world issue back some time with that when we used the Simple web framework for serving directory listing of several thousand files. The HTML-code was generated using a concatenatePlus()-style method and took like 40(!) seconds. After changing the code to the StringBuilder variant the page was served in sub-second time.

Whether you use + or StringBuilder is mostly a matter of taste and readability in many cases. When your string concatenation gets more complex you should really consider using StringBuilder as it is the safe bet.

On teaching software engineering

Be sure to include current topics in your lectures when teaching software engineering. Here are some hints.

overheadIn my rare spare time, I hold lectures on software engineering at the University of Cooperative Education in Karlsruhe. The topics range from evergreens like UML to modern subjects like aspect oriented programming (AOP) or Test Driven Development (TDD).

One thing I observe is that students don’t have difficulty separating the old topics from the current ones, even if they hear both of them for the first time. It seems that subject matter ages by itself, just like source code does. So, I’m constantly searching for new topics to include in the lectures, replacing the oldest ones.

Three things to include in your lectures

Some months ago, I read a very good blog post written by Alan Skorkin, titled “3 Things They Should Have Taught In My Computer Science Degree”. Alan covers three points:

  1. Open Source Development
  2. An Agile Process (e.g. XP, Scrum)
  3. Corporate Politics/Building Relationships

The idea of missed opportunities to tell some fundamentals to my students struck me. I compared my presentations to the list, finding the leading two topics covered to a great extent. The last one, corporate politics, is a bit off-topic for a technical lecture. But nevertheless, it’s too important to omit completely, so I already had included some Tom DeMarco lessons in my presentations. Perhaps I can build this part up a bit in the future.

What they should have taught me

Soon afterwards, I though about things my lecturers missed during my study. Here’s the list with only two points in addition to Alan’s list:

  • Age and “maturity” of topic: When I was a student, I quickly identified old topics, like my students do nowadays. What I couldn’t tell was if a topic was mature (a classic) or just deprecated. It would have helped to announce that a topic was necessary, but of little actual relevance in modern software development craftmanship. Or that a topic is academical news, but yet unheard of in the industry and lacking wide-spread acceptance. Both extremes were blended together in the presentations, creating an unique mixture of antiquated and futuristic approaches. This is a common problem of Advanced Beginners in the Dreyfus model.
  • Ergonomics and Effectiveness: I still can’t believe I didn’t hear a word about proper workplace setup from my teachers. I had courses teaching me how to learn, but not a single presentation that taught me how to work. This topic ranges from the right chair over lighting to screen size and quantity and could be skimmed over in less than an hour. But it doesn’t stop with the hardware. Entire books like Neal Ford’s “The Productive Programmer” cover the software side of effective workplace setup. And even further, the minimal set of tools a software developer should use (e.g. IDE, SCM, CI, issue tracker, Wiki) wasn’t even mentioned.

I hope to provide all these topics and information to my students in a recognizable (and rememberable) manner. They deserve to learn about the latest achievements in software engineering. Otherwise you aren’t prepared to work in an industry changing fundamentally every five to ten years. Of course, hearing about the classic stuff is necessary, too.

Give me feedback. What are your missed topics during apprenticeship, study or even work?

Update: In case you can’t visit my lectures but want to know a bit about ergonomics, I’ve written two blog articles on this topic:

Lightweight dependency management

Managing project dependencies without maven or ant ivy, using a custom ant task to ensure classpath orthogonality.

Java’s classpath is a powerful concept – when used appropriate. As your project grows larger in terms of code and people, it gets harder to ensure that your classpath is correct. A great danger arises from JAR files containing different versions of the same resource. You might end up running different code than you think, leading to strange effects. If you build your classpath using wildcards, you can’t even control the order your JAR files are loaded.

Managing dependencies

To avoid the issues mentioned above, you need to manage your project dependencies. It’s a common practice to implement the build process of the project using maven or ant ivy. Both tools provide dependency mangement by declaration. But at a high cost. Especially maven has received some malice lately, criticizing its steep learning curve and complexity.

Scratching the biggest itch

We decided to try a different approach to dependency management, tackling only our biggest concern: The duplication of classpath resources. We take care of the scope of a third-party library, put required JARs in the repository (to us, third party binary artifacts are part of the project source) and update manually. The one thing we cannot assure manually is that every resource is unique. Sometimes, the same class is included in different JARs, as it seems to be common practice among java web frameworks.

Ant to the rescue

Thus, I wrote a custom ant task that, given the classpath, checks for duplicate entries. If it finds one, it lists the culprits and optionally aborts the build process. Included in our continuous integration system, it gets run every time somebody performs a change. You can’t forget to delete an old version of a library or check in the same library twice without breaking the build now.

Our ClasspathCollisionCheckTask

I provide this task here, without any warranty. The source code is included in the JAR alongside the classes, if you want to know what it does exactly.

Assuming you already know how to use custom tasks within an ant build script, here’s only a short usage description.

Import the custom task:

<taskdef
    name="check.collision"
    classname="com.schneide.internal.anttask.ClasspathCollisionCheckTask"
    classpath="${customtasks.library.directory}/schneidetasks.jar"
/>

Next, use it on your classpath:

<check.collision verbose="true" failOnCollision="true">
    <path>
        <fileset dir="${classpath.library.directory}">
            <include name="**/*.jar"/>
        </fileset>
        <fileset dir="${internal.library.directory}">
            <include name="**/*.jar"/>
        </fileset>
    </path>
</check.collision>

The task scans the whole path you give it and reports any collision it detects. You will see the warnings in your build log.

If the failOnCollision parameter is set to true (optional, defaults to false), the build will abort after a collision. If you want to have debug information, set the verbose parameter to true (optional, defaults to false).

Conclusion

If you manage your project dependencies manually, you might find our custom ant task useful. If you use maven or ant ivy, you already have this functionality in your build process.

Feedback

I’m very interested in hearing your opinion on the task or about your way of handling dependencies. Leave us a comment.