Structuring CppUnit Tests

How to structure cppunit tests in non-trivial software systems so that they can be easily executed selectively during code-compile-test cycle and at the same time are easy to execute as a whole by your continuous integration system.

While unit testing in Java is dominated by JUnit, C++ developers can choose between a variety of frameworks. See here for a comprehensive list. Here you can find a nice comparison of the biggest players in the game.

Being probably one of the oldest frameworks CppUnit sure has some usability issues but is still widely used. It is criticised mostly because you have to do a lot of boilerplate typing to add new tests. In the following I will not repeat how tests can be written in CppUnit as this is described already exhaustively (e.g. here or here). Instead I will concentrate on the task of how to structure CppUnit tests in bigger projects. “Bigger” in this case means at least a few architectually independent parts which are compiled independently, i.e. into different libraries.

Having independently compiled parts in your project means that you want to compile their unit tests independently, too. The goal is then to structure the tests so that they can easily be executed selectively during development time by the programmer and at the same time are easy to execute as a whole during CI time (CI meaning Continuous Integration, of course).

As C++ has no reflection or other meta programming elements like the Java Annotations, things like automatic test discovery and how to add new tests become a whole topic of its own. See the CppUnit cookbook for how to do that with CppUnit . In my projects I only use the TestFactoryRegistry approach because it provides the most automatics in this regard.

Let’s begin with a simplest setup, the Link-Time Trap (see example source code): Test runner and result reporter are setup in the “main” function that is compiled into an executable. The actual unit tests are compiled in separate libraries and are all linked to the executable that contains the main function. While this solution works well for small projects it does not scale. This is simply because every time you change something during the code-compile-test cycle the unit test executable has to be relinked, which can take a considerable amount of time the bigger the project gets. You fall into the Link Time Trap!

The solution I use in many projects is as follows: Like in the simple approach, there is one test main function which is compiled into a test executable. All unit tests are compiled into libraries according to their place in the system architecture. To avoid the Link-Time-Trap, they are not linked to the test executable but instead are automatically discovered and loaded during test execution.

1. Automatic Discovery

Applying a little convention-over-configuration all testing libraries end with the suffix “_tests.so”. The testing main function can then simply walk over the directory tree of the project and find all shared libraries that contain unit test classes.

2. Loading

If a “.._test.so” library has been found, it simply gets loaded using dlopen (under Unix/Linux). When the library is loaded the unit tests are automatically registered with the TestFactoryRegistry.

3. Execution

After all unit test libraries has been found and loaded text execution is the same as in the simple approach above.

Here my enhanced testmain.cpp (see example source code).

#include ... 

using namespace boost::filesystem; 
using namespace std; 

void loadPlugins(const std::string& rootPath) 
{
  directory_iterator end_itr; 
  for (directory_iterator itr(rootPath); itr != end_itr; ++itr) { 
    if (is_directory(*itr)) {
      string leaf = (*itr).leaf(); 
      if (leaf[0] != '.') { 
        loadPlugins((*itr).string()); 
      } 
      continue; 
    } 
    const string fileName = (*itr).string();
    if (fileName.find("_tests.so") == string::npos) { 
      continue;
    }
    void * handle = 
      dlopen (fileName.c_str(), RTLD_NOW | RTLD_GLOBAL); 
    cout << "Opening : " << fileName.c_str() << endl; 
    if (!handle) { 
      cout << "Error: " << dlerror() << endl; 
      exit (1); 
    } 
  } 
} 

int main ( int argc, char ** argv ) { 
  string rootPath = "./"; 
  if (argc > 1) { 
    rootPath = static_cast<const char*>(argv[1]); 
  } 
  cout << "Loading all test libs under " << rootPath << endl; 
  string runArg = std::string ( "All Tests" ); 
  // get registry 
  CppUnit::TestFactoryRegistry& registry = 
    CppUnit::TestFactoryRegistry::getRegistry();
  
  loadPlugins(rootPath); 
  // Create the event manager and test controller 
  CppUnit::TestResult controller; 

  // Add a listener that collects test result 
  CppUnit::TestResultCollector result; 
  controller.addListener ( &result ); 
  CppUnit::TextUi::TestRunner *runner = 
    new CppUnit::TextUi::TestRunner; 

  std::ofstream xmlout ( "testresultout.xml" ); 
  CppUnit::XmlOutputter xmlOutputter ( &result, xmlout ); 
  CppUnit::TextOutputter consoleOutputter ( &result, std::cout ); 

  runner->addTest ( registry.makeTest() ); 
  runner->run ( controller, runArg.c_str() ); 

  xmlOutputter.write(); 
  consoleOutputter.write(); 

  return result.wasSuccessful() ? 0 : 1; 
}

As you can see the loadPlugins function uses the Boost.Filesystem library to walk over the directory tree.

It also takes a rootPath argument which you can give as parameter when you call the test main executable. This solves our goal stated above. When you want to execute unit tests selectively during development you can give the path of the corresponding testing library as parameter. Like so:

./testmain path/to/specific/testing/library

In your CI environment on the other hand you can execute all tests at once by giving the root path of the project, or the path where all testing libraries have been installed to.

./testmain project/root

CMake Builder Plugin for Hudson

Update: Check out my post introducing the newest version of the plugin.

Today I’m pleased to announce the first version of the cmakebuilder plugin for Hudson. It can be used to build cmake based projects without having to write a shell script (see my previous blog post). Using the scratch-my-own-itch approach I started out implementing only those features that I needed for my cmake projects which are mostly Linux/g++ based so far.

Let’s do a quick walk through the configuration:

1. CMake Path:
If the cmake executable is not in your $PATH variable you can set its path in the global Hudson configuration page.

2. Build Configuration:

To use the cmake builder in your Free-style project, just add “CMake Build” to your build steps. The configuration is pretty straight forward. You just have to set some basic directories and the build type.

cmakebuilder demo config
cmakebuilder demo config

The demo config above results in the following behavior (shell pseudocode):

if $WORKSPACE/build_dir does not exist
   mkdir $WORKSPACE/build_dir
end if

cd $WORKSPACE/build_dir
cmake $WORKSPACE/src -DCMAKE_BUILD_TYPE=Debug -DCMAKE_INSTALL_PREFIX=$WORKSPACE/install_dir
make
make install

That’s it. Feedback is very much appreciated!!

Originally the plan was to have the plugin downloadable from the hudson plugins site by now but I still have some publishing problems to overcome. So if you are interested, make sure to check out the plugins site again in a few days. I will also post an update here as soon as the plugin can be downloaded.

Update: After fixing some maven settings I was finally able to publish the plugin. Check it out!

Paging with different DBs

Sometimes you cannot or do not want to use an object-relational mapping tool. When not using an OR-mapper like Hibernate or Oracle Toplink you have to deal with database specifics. One common case especially for web applications is limiting the result set to a number of items that fit nicely on a web. You then often want to allow the users to navigate between these “pages” of items aka “paging”.

This type of functionality became part of SQL only as of SQL2008 in the following form:
SELECT * FROM t WHERE ... ORDER BY c OFFSET start_row FETCH count ONLY

Since most popular database management systems (DBMSes) do not yet implement this syntax you have to implement paging in propriatory ways.

My experience with an Oracle DBMS and the frustrating and comparatively long time it took to find the correct™ solution inspired me to write this post. Now I want to present you the syntax for some widely used DBMSes which we encounter frequently in our projects.

  • MySQL, H2 and PostgreSQL (< 8.4 which will also implement the SQL2008 standard) use the same syntax:
    SELECT * FROM t WHERE ... ORDER BY c LIMIT count OFFSET start
  • Oracle is where the fun begins. There is actually no easy and correct way of doing this. So you will end up with a mess like:
    SELECT col1 FROM (SELECT col1, ROWNUM r FROM (SELECT col1 FROM table ORDER BY col1)) WHERE r BETWEEN start AND end
  • DB2 AFAIK uses the syntax proposed in SQL2008 but correct me if I am wrong because we do not yet work with DB2 databases.
  • As we did not need paging with MS SQLServer as of now I did not bother to look for a solution yet. Hints are very welcome.

With all solutions the ORDER BY clause is critical because SQL does not guarantee the order of the returned rows.

Wikipedia delivers some additional and special case information but does not really explain the general, real world case the specific DBMSes.

I hope that I raised some awareness about database specifics and perhaps saved you some time trying to find a solution the problem using your favorite DBMS.

Software Craftsman Project Priority Survey

Answers to a question of project priorities from the upcoming book “Apprenticeship Patterns”.

apprenticeship-patters-coverThere is an upcoming and very promising book title written by Dave Hoover and Adewale Oshineye called “Apprenticeship Patterns: Guidance For The Aspiring Software Craftsman”.  It will cover all the basic rules you’ll need to become a Software Craftsman. This is a rather new term to describe professional software developers, eventually leading to the Software Craftsmanship Manifesto. The Manifesto itself reads like an addition to the Agile Manifesto:

As aspiring Software Craftsmen we are raising the bar of professional software development by practicing it and helping others learn the craft. Through this work we have come to value:

  • Not only working software, but also well-crafted software
  • Not only responding to change, but also steadily adding value
  • Not only individuals and interactions, but also a community of professionals
  • Not only customer collaboration,but also productive partnerships

That is, in pursuit of the items on the left we have found the items on the right to be indispensable.

© 2009, the undersigned. this statement may be freely copied in any form, but only in its entirety through this notice.

A very good question

When i read the blog of “Apprenticeship Patterns“, i noticed a very good question about project priorities:

Rank the following 3 project attributes in order of importance and explain why.

  • Test Coverage
  • Timely Delivery
  • Code Quality

This question really got me hooked, because there is no single valid answer, only personal statements about values.

An informal survey

I’m in the lucky position of meeting a lot of senior developers and a great number of software engineering students. So I instantly decided to perform a survey on this question and watch out for emerging answer patterns.

I gave each project attribute an unique letter, C for “Test Coverage”, D for “Timely Delivery” and Q for “Code Quality”. There are six possible answers, here are their rates in the survey (when 58 persons gave their answers):stats-all1

  • CDQ: 7 percent
  • CQD: 9 percent
  • DCQ: 5 percent
  • DQC: 7 percent
  • QCD: 41 percent
  • QDC: 31 percent

The vast majority of developers stated Code Quality as their highest goal. This isn’t very surprising to me, as most developers take pride in writing high quality code.

Comparing the answers

But what about the answers of only senior developers? Lets have a look at the numbers without student answers:stats-senior1

  • CDQ: 7 percent
  • CQD: 14 percent
  • DCQ: 7 percent
  • DQC: 14 percent
  • QCD: 21 percent
  • QDC: 36 percent

The big pattern still applies: Code Quality first. It’s amazing to see the other attributes gaining importance, though. To me, that’s a sign that code-centric thinking is one pattern of apprenticeship.

What’s not in the numbers

When i held the survey, the relevant group of people was gathered together, so a discussion of the results arose every time.  But the discussions followed different patterns:

  • The teams (of senior developers) gave very distinct answers while working on the same project. The answers were driven by personal conviction rather than project necessities.
  • The courses (of students) gave more similar answers while having a wide variety of backgrounds. The answers were mostly explained with current project necessities (like security-critical systems as reason for Test Coverage being most important).

When I have to compare the two groups, I tend to say that younger developers are more driven by extrinsic demands while more experienced developers act on their own internal values.

Our duty as Software Craftsman

In conclusion, I see a duty for experienced developers: to share their experience. Leading a discussion about “Team Values” at your current project is the least you can do. Helping others to develop their own set of internal values, even if it isn’t yours, seems crucial to me.

The upcoming “Apprenticeship Patterns” book and the brand new “97 Things Every Software Architect Should Know” are perfect starting points for this.

Make friends with your compiler

Suppose you are a C++ programmer on a project and you have the best intentions to write really good code. The one fellow that you better make friends with is your compiler. He will support you in your efforts whenever he can. Unless you don’t let him. One sure way to reject his help is to switch off all compiler warnings. I know it should be well-known by now that compiling at high warning levels is something to do always and anytime but it seems that many people just don’t do it.
Taking g++ as example, high warning levels do not mean just having “-Wall” switched on. Even if its name suggests otherwise, “-Wall” is just the minimum there. If you just spend like 5 minutes or so to look at the man page of g++ you find many many more helpful and reasonable -W… switches. For example (g++-4.3.2):


-Wctor-dtor-privacy: Warn when a class seems unusable because all the constructors or destructors in that class are private, and it has neither friends nor public static member functions.

Cool stuff! Let’s what else is there:


-Woverloaded-virtual: Warn when a function declaration hides virtual functions from a base class. Example:

class Base
{
public:
virtual void myFunction();
};

class Subclass : public Base
{
public:
void myFunction() const;
};

I would certainly like to be warned about that, but may be that’s just me.


-Weffc++: Warn about violations of the following style guidelines from Scott Meyers’ Effective C++ book

This is certainly one of the most “effective” weapons in your fight for bug-free software. It causes the compiler to spit out warnings about code issues that can lead to subtle and hard-to-find bugs but also about things that are considered good programming practice.

So suppose you read the g++ man page, you enabled all warning switches additional to “-Wall” that seem reasonable to you and you plan to compile your project cleanly without warnings. Unfortunately, chances are quite high that your effort will be instantly thwarted by third-party libraries that your code includes. Because even if your code is clean and shiny, header files of third-pary library may be not. Especially with “-Weffc++” this could result in a massive amount of warning messages in code that you have no control of. Even with the otherwise powerful, easy-to-use and supposedly mature Qt library you run into that problem. Compiling code that includes Qt headers like <QComboBox>with “-Weffc++” switched on is just unbearable.

Leaving aside the fact that my confidence in Qt has declined considerably since I noticed this, the question remains what to do ignore the shortcomings of other peoples code. With GCC you can for example add pragmas around offending includes as desribed here. Or you can create wrapper headers for third-party includes that contain


#pragma GCC system_header

AFAIK Microsoft’s compilers have similar pragmas:


#pragma warning(push, 1)
#include
#include
#pragma warning(pop)

Warning switches are a powerful tool to increase your code quality. You just have to use them!

About String Concatenation in Java or “don’t fear the +”

When it comes to string concatenation in Java many people have almost religious views about performance and style. Sadly, there are some misconceptions and misinformation especially about the performance bits. Many people think that concatenating many strings using + means expensive string copying each time and is thus slow as hell which is mostly wrong.

Justin Lee has a nice writeup of the most prominent concatenation options. But imho he misses out some things and his benchmark is a bit oversimplified although it does tell a true story. I assume that he followed at least the basic rules for performance measurement as his results suggest.

Now I want to try to clarify some points I think he missed and I find important:

  • Concatenation using + in one statement is actually compiled to the use of StringBuilder (at least for Sun Java6 compilers, where I checked it in the debugger, try it yourself!). So it’s no surprise that there is no difference between these two options in Justin’s benchmark.
  • It should be clear that the format variants have some overhead because they actually do more than just concatenate strings. There is at least some string parsing and copying involved so that these methods should be used for the cases where for example parameter reordering (think I18N) is needed or readability suffers using normal concatenation.
  • You have to pay attention when using + concatenation over the course of multiple statements because it then involves string copying. Consider the following code: Critical String Concatenation Here it really does make a difference which option you choose. The StringBuilder will perform far better for higher loop counts. We had a real world issue back some time with that when we used the Simple web framework for serving directory listing of several thousand files. The HTML-code was generated using a concatenatePlus()-style method and took like 40(!) seconds. After changing the code to the StringBuilder variant the page was served in sub-second time.

Whether you use + or StringBuilder is mostly a matter of taste and readability in many cases. When your string concatenation gets more complex you should really consider using StringBuilder as it is the safe bet.

Lightweight dependency management

Managing project dependencies without maven or ant ivy, using a custom ant task to ensure classpath orthogonality.

Java’s classpath is a powerful concept – when used appropriate. As your project grows larger in terms of code and people, it gets harder to ensure that your classpath is correct. A great danger arises from JAR files containing different versions of the same resource. You might end up running different code than you think, leading to strange effects. If you build your classpath using wildcards, you can’t even control the order your JAR files are loaded.

Managing dependencies

To avoid the issues mentioned above, you need to manage your project dependencies. It’s a common practice to implement the build process of the project using maven or ant ivy. Both tools provide dependency mangement by declaration. But at a high cost. Especially maven has received some malice lately, criticizing its steep learning curve and complexity.

Scratching the biggest itch

We decided to try a different approach to dependency management, tackling only our biggest concern: The duplication of classpath resources. We take care of the scope of a third-party library, put required JARs in the repository (to us, third party binary artifacts are part of the project source) and update manually. The one thing we cannot assure manually is that every resource is unique. Sometimes, the same class is included in different JARs, as it seems to be common practice among java web frameworks.

Ant to the rescue

Thus, I wrote a custom ant task that, given the classpath, checks for duplicate entries. If it finds one, it lists the culprits and optionally aborts the build process. Included in our continuous integration system, it gets run every time somebody performs a change. You can’t forget to delete an old version of a library or check in the same library twice without breaking the build now.

Our ClasspathCollisionCheckTask

I provide this task here, without any warranty. The source code is included in the JAR alongside the classes, if you want to know what it does exactly.

Assuming you already know how to use custom tasks within an ant build script, here’s only a short usage description.

Import the custom task:

<taskdef
    name="check.collision"
    classname="com.schneide.internal.anttask.ClasspathCollisionCheckTask"
    classpath="${customtasks.library.directory}/schneidetasks.jar"
/>

Next, use it on your classpath:

<check.collision verbose="true" failOnCollision="true">
    <path>
        <fileset dir="${classpath.library.directory}">
            <include name="**/*.jar"/>
        </fileset>
        <fileset dir="${internal.library.directory}">
            <include name="**/*.jar"/>
        </fileset>
    </path>
</check.collision>

The task scans the whole path you give it and reports any collision it detects. You will see the warnings in your build log.

If the failOnCollision parameter is set to true (optional, defaults to false), the build will abort after a collision. If you want to have debug information, set the verbose parameter to true (optional, defaults to false).

Conclusion

If you manage your project dependencies manually, you might find our custom ant task useful. If you use maven or ant ivy, you already have this functionality in your build process.

Feedback

I’m very interested in hearing your opinion on the task or about your way of handling dependencies. Leave us a comment.

Industry Standard C++

The other day I was browsing through the C++ API code of a third-party library. I was not much surprised to see stuff like

#define MAX(a, b) ( (a) >= (b) ? (a) : (b))
#define MIN(a, b) ( (a) <= (b) ? (a) : (b))

because despite the fact that std::min, std::max together with the rest of the C++ standard library is around for quite a while now, you still come across old fashioned code like above frequently. But things got worse:

#define FALSE 0
#define TRUE 1

and later:

...
bool someVariable = TRUE;

As if they learned only half the story about the bool type in C++. But there was more to come:

class ListItem
{
   ListItem* next;
   ListItem* previous;
   ...
};

class List : private ListItem
{
...
};

Yes, that’s right, the API guys created their own linked-list implementation. And a pretty weird one, too, mixing templates with void* pointers to hold the contents. Now, why on earth would you do that when you could just use std::list or std::vector? Makes you wonder about the quality of the rest of the code. Especially with C++ where there are so many little pitfalls and details which can burn you. Hey, if you have no clue about the very basics of a language, leave it alone!

Unfortunately, the above example is not exceptional in industry software. It seems that the C++ world these days is actually split into two worlds. In one, people like Andrei Alexandrescu write great books about Modern C++ design, Scott Meyers gives talks about Effective C++ and the boost guys introduce the next library using even more creative operator overloading that in the spirit library (which is pretty cool stuff, btw).

In the other world, you could easily call it industry reality, people barely know the STL, don’t use templates at all, or fall for misleading and dangerous c++ features like the throw() clause in method signatures. Or they ban certain c++ features because they are supposedly not easy to understand for the new guy on the project or are less readable in general. Take for example the Google C++ Style Guide. They don’t even allow exceptions, or the use of std::auto_ptr. Their take on the boost library is that “some of the libraries encourage … an excessively “functional” style of programming”. What exactly is bad about piece of functional programming used as the right tool in the right place? And what communicates ownership issues better than e.g. returning a heap allocated object using a std::auto_ptr?

The no-exceptions rule is also only partly understandable. Sure enough, exceptions increase code complexity in C++ more than in other languages (read Items 18 and 19 of Herb Sutter’s Exceptional C++ as an eye-opener. Or look here). But IMHO their advantages still outweigh their downsides.

With the upcoming new C++0X standard my guess is that the situation will not get any better, to put it mildly. Most likely, things like type inference with the new auto keyword will sell big because they save typing effort. Same thing with the long overdue feature of constructor delegation. But why would people who find functional programming less readable start to use lambda functions? As little known as the explicit keyword is now, how many people will know about or actually use the new “= delete” keyword, let alone “= default“? Maybe I’m a little too pessimistic here but I will certainly put a mark in my calender on the day I encounter the first concept definition in some piece of industry C++ software.

Update: Concepts have been removed from C++0X so that mark in my calender will not come any time soon…

A DSL for deploying grails apps

Everytime I deploy my grails app I do the same steps over and over again:

  • get the latest build from our Hudson CI
  • extract the war file from the CI archive
  • scp the war to a gateway server
  • scp the war to the target server
  • run stop.sh to shutdown the jetty
  • run update.sh to update the web app in the jetty webapps dir
  • run start.sh to start the jetty

Reading the Productive Programmer I thought: “This should be automated”. Looking at the Rails world I found a tool named Capistrano which looked like a script library for deploying Rails apps. Using builders in groovy and JSch for SSH/scp I wrote a small script to do the tedious work using a self defined DSL for deploying grails apps:

Grapes grapes = new Grapes()
def script = grapes.script {
    set gateway: "gateway-server"
    set username: "schneide"
    set password: "************"
    set project: "my_ci_project"
    set ciType: "hudson"
    set target: "deploy_target.com"
    set ci_server: "hudson-schneide"
    set files: ["webapp.war"]

    task("deploy") {
        grab from: "ci"
        scp to: "target"
        ssh "stop.sh"
        ssh "update.sh"
        ssh "start.sh"
    }
}

script.tasks.deploy.execute()

This is far from being finished but a starting point and I think about open sourcing it. What do you think: may it help you? What are your experiences with deploying grails apps?

Analyzing Java Heap problems Part 2: Using Eclipse MAT

In part one we saw how to obtain the data to analyze, the heap dumps. Now we are looking into a nice plugin for the Eclipse IDE for analyzing the dumps.  Compared to the basic tools described in the previous article Memory Analyzer Tool (MAT) offers better usability, performance and some high level analysis and report tools.Eclipse MAT Overview After you open a hprof heap dump with MAT it will generate index files for faster access to all the data you are interested in and show an overview with nice charts.  From here you have access to other views and features:

  • The histogram is somewhat similar to what jHat offers.mat-pathtogcroot It allows you to browse, sort and filter the object instances in memory and shows you instance count and the shallow heap (memory used only by this object instance) and retained heap (memory used by this object instance including referenced objects). From the context menu you can choose “Merge Shortest Paths to GC roots” to see the reference chain of an object all the way up to the classloader. Here we can see that the JDateChooser registers itself at the MenuSelectionManager as a listener which can cause serious memory leaks as described in another post about Java memory handling.
  • The dominator tree allows you to quickly identify the biggest objects and what they reference. Again, using the context menu on an item in the list offers many options to dive deeper into the analysis.
  • The object inspector gives you detailed information about the selected objects like shallow and retained size, its fields and the class loader by whom it was loaded.
  • The leak suspects report tries to give you some high level hints about possible causes of memory problems of your application.
  • MAT Component ReportThe component report provides some very interesting statistics about Strings and collection usage which might be worth looking at if you are not hunting down leaks but trying to reduce overall memory usage. You can even get performance hints when many overfull HashMaps are detected or there are many empty collections which could be better lazily created.

I personally am using the histogram and the dominator tree the most because I am a technical guy and like to hunt down the problems in the code. Nevertheless the reports may show use other valuable aspects which you did not think of before. The MAT team are expanding the tools nicely on that side so that the benefit of these reports is ever increasing.

It is very likely that when you analyze large heap dumps you may need to increase the Java heap size for Eclipse by using the -vmargs -Xmx<memory size> parameter. That way you are able to analyze big heaps > 500M relatively fast and comfortable. For some live demo take a look at a webinar by some of SAPs Eclipse MAT committers.