How to accidentally kill your CI build time

At one of our customers I do C++ consulting in a mid-sized project which uses cmake as build system. A clean build on our Jenkins CI server takes about 40 minutes (including unit tests) which is way too long to be considered “fast feedback” in an agile kind of way.

Because of that, we do clean builds only 2 times a day – some time during the night and during lunch break. The rest of the day the CI server only does a “svn update” and a normal “make”, which takes about 3-10 minutes depending on what files have been changed.

With C++ there are lots of ways to unnecessarily lengthen your build time. The most important factor is, of course, #include dependencies. One has to be very (very) disciplined in adding #include directives in header files. Otherwise, the whole world suddenly gets rebuild when some small header file somewhere in a little corner of the code has been changed.

And I have to say, for the most part, this project is in pretty good shape with regard to #include dependencies.

So what the hell has suddenly increased our build time from 3-10 minutes to 20-25 minutes? was what I was thinking some time last week while waiting for the CI server to spit out new latest and greatest rpm packages. For some reason, our normal, rest-of-the-day build started to compile what felt like everything in our main package even on the slightest code change in a remote .cpp file.

What happened?

In order to have the build time available (e.g. to show in an “about” box), we use a preprocessor symbol like REVISION_DATE which gets filled in a CMakeLists.txt file. The whole thing looks like this:

...
EXEC_PROGRAM(date ARGS '+%F_%T' OUTPUT_VARIABLE REVISION_DATE)
...
ADD_DEFINITIONS(-DREVISION_DATE=\"${REVISION_DATE}\")
...

Since the beginning of the time these lines of CMake code lived in a small sub-sub-..-directory with little to no incomming dependencies. Then, at some point, it became necessary to have the REVISION_DATE symbol at some other place, too, which led to a move of the above code into the CMakeLists.txt file of the main package.

The value of command date +%F_%T changes every second which leads to a changed REVISION_DATE on every build – which is what we initially intended. What changes, too, of course, is the value of the ADD_DEFINITIONS directive. And as CMake is very strict with the slightest change in this value, every make target below that line gets rebuild – which in our case was everything in the main package.

So there! Build time killing creatures are lurking everywhere in our C/C++ projects. Always be aware of them!

Open Source Love Day October 2010

On Friday two weeks ago, we held our Open Source Love Day for October 2010. This day was special in several ways. We strayed very far from the usual schedule for this day, there were several internal tasks that couldn’t be delayed and we introduced a “fun practice” event. But we eventually produced something valuable this day.

The Open Source Love Day

We introduced a monthly Open Source Love Day (OSLD) to show our appreciation to the Open Source software ecosystem and to donate back. We heavily rely on Open Source software for our projects. We would be honored if you find our contributions useful. Check out our first OSLD blog posting for details on the event itself.

The distractions

  • A regular project needed an urgent cost estimation by the whole team. This was the last opportunity because of an upcoming parental leave to have the team together for a long time.
  • Another regular project needed an urgent problem solved. This turned out to be so obscure that one of our developers had to be on-site. You can read about it in this blog entry now.
  • We received several shipments of office furniture and computer parts. They had to be checked and placed in.
  • We had a fun practice event. We discovered the online “game” typeracer and practiced our raw typing skill against each other for some time. Pro tip for beginners: don’t look at the highscores!

On this OSLD, we accomplished the following tasks:

  • A new version 1.8 of the cmakebuilder hudson plugin implements several feature requests. You can now choose to NOT clean your workspace before building and set different paths for the cmake installation for every job or node (hudson slaves). The latter option can be applied using an environment variable.

On this OSLD, we also tried the following tasks:

  • We keep an eye on Scala and its associated web framework Lift as a promising technology. One issue with Lift that bugs us is the use of “sun bastard format” properties for internationalization. We tried to teach Lift to accept UTF-8 encoded property files. After a lot of “downloading the internet” (you can always tell which project uses maven by their initial setup delay), we quickly implemented our own ResourceBundle.Control. But the Lift framework itself could not be built: “Error occurred during initialization of VM: Could not reserve enough space for object heap”. We ran out of time and will investigate in this issue on the next OSLD.
  • Grails is another web framework we use in projects. There are some bugs that really annoy us, and the OSLD is the perfect time to fix them. One of these bugs is GRAILS-6475, which we tried to reproduce with the latest code base. After writing a test case that would go green unexpectedly, we tried to provoke the error by setting up a sample project. The bug didn’t show up there, too. We left a comment in the issuetracker and ceased development.

What were our lessons learnt today?

  • You can’t tear off massive amounts of time from the OSLD and expect it to still be working. An OSLD doesn’t scale down apparently.
  • Most issues that can’t be done fail with the project’s build. The build process of a foreign project is the most crucial phase in your decision on commitment. If it fails, your participation in the project is at risk. We’ve seen many brittle, undocumented and incredible complex build processes now. And we can state one thing: It doesn’t stop with throwing maven at a project, you still have to “think the build”.

Retrospective of the OSLD

This OSLD was special in the amount of non-OSLD work done. The remaining efforts weren’t as successful as we wished. This has been an ongoing issue with our OSLD for the last months now and we are looking forward to adapt our workstyle to yield better results in the future. The distraction by typeracer was fun, though.

Open Source Love Day September 2010

On wednesday last week, we held our Open Source Love Day for September 2010. Our day started with the usual Homepage Comittee meeting and very soon, we were up and working. This time, our success rate wasn’t as high as we wanted, mostly because we worked on internal tools that didn’t work out quite as well as expected. But, we managed to produce something valuable this day.

The Open Source Love Day

We introduced a monthly Open Source Love Day (OSLD) to show our appreciation to the Open Source software ecosystem and to donate back. We heavily rely on Open Source software for our projects. We would be honored if you find our contributions useful. Check out our first OSLD blog posting for details on the event itself.

On this OSLD, we accomplished the following tasks:

  • A new version 1.6.1 of the cmakebuilder hudson plugin was published. This version consisted of bugfixes only and right now, it still seems flawed. We are working on the issue, expect a new version 1.6.2 soon.
  • Our internal time-tracking tool got love at several issues. One issue required the use of triangle-shaping CSS, as described in this blog post from Jon Rohan. Our issue weren’t finished because Javascript code can rot into a big pile of crap really quick.
  • We managed to make a long hatched dream come true at this OSLD. As you might be aware, we are big fanboys of crap4j, a metric tool that associates test coverage with code complexity. Thus, we wrote the crap4j hudson plugin, release the CrapMap and use some internal improvements, too. The main disadvantage of crap4j is the strong dependency on a specific test coverage tool. Our goal was to use the test coverage data we already collect using Cobertura. We achieved this goal and got the whole thing working. It will be released in the next few weeks, with a detailed blog post here. Stay tuned for this new tool (it already has a name: Crapertura).

What were our lessons learnt today?

  • No matter how clever you are, Javascript outsmarts you every time it appears in superior numbers. Refactoring is the key here, but difficult and tedious to apply.
  • When you dissect a foreign API or code base, you just need to find the right grip. I cannot decribe it more precise right now, but this grip is all you need to open up the code. When playing around with the crap4j code base, as soon as we held the grip, everything else followed naturally. Perhaps “the grip” can be translated with “catching the author’s intent”. These are always magical moments.

Retrospective of the OSLD

This OSLD wasn’t as successful as we wished, partly because of missing manpower (honeymoon holiday!) and because of our inability to tame a Javascript code base. We have to work on our expertise here and we are glad that we’ve found out at an OSLD, not a time- and mission-critical customer project.

Open Source Love Day March 2010

Yesterday, we held our first Open Source Love Day (OSLD) for this year. The last OSLD was at December 2009. Then, we reassigned a day in January and February each to perform our relocation to the new (and much bigger) office. But now we are back to regular duty and had the time to donate some work back to the Open Source ecosystem.

The Open Source Love Day

We introduced a monthly Open Source Love Day to show our appreciation to the Open Source software ecosystem and to donate back. We heavily rely on Open Source software for our projects. We would be honored if you find our contributions useful. Check out our first OSLD blog posting for details on the event itself.

Participate at our OSLD by using the features we’ve built today:

  • Grails still has some bugs. Instead of only complaining about them, we try to fix them. There is a bug with checkboxes and nested boolean properties that bugged us in a customer project. It’s filed unter GRAILS-3299 and has a proposed patch now.
  • In previous OSLDs, we produced the cmake hudson plugin. In the corresponding blog entry, comments with bug reports began to pile up. They addressed issues with hudson master/slave setups. So we implemented a hudson master/slave test environment, using VirtualBox virtual machines to perform as slaves. This setup quickly revealed the problems that were typical enough to devote a complete blog entry about this topic soon. Fixing the problems resulted in the new cmake hudson plugin version 1.2 to be released yesterday.
  • We are using the RXTX project to perform serial (RS232) communication in several projects. We are really glad the project exists, because the “official” communications API from Sun/Oracle is nothing but a mess. With RXTX, we only had a problem with emulated COM ports. Emulated COM ports exist when you use a USB->Serial or Ethernet->Serial converter, which is what our customer chose to do. If you unplug the converter during operation, the corresponding COM port disappears. This causes RXTX to crash, bringing the JVM down, too. We wrote a test application and used it with every converter we own (and we own quite a lot of them!). Then we began tracing the RXTX source code (at C code level), altering it to “only” throw an IOException when the virtual COM port disappears. The corresponding patch will be proposed to the RXTX project soon.
  • Another API we use a lot is the tiny winp project, written by Kohsuke Kawaguchi, the creator of hudson. We kill Windows processes with it, within a project that runs on Windows 2000, Windows XP and Windows 7. The latest Windows version seemed incompatible with winp, even the 32bit edition. We didn’t find the cause for this, but developed a workaround that will be proposed to the winp project soon.

What were our lessons learnt today?

  • If you face OutOfMemoryErrors on a 64bit Java6 JVM, try to switch back to a 32bit Java5 JVM. It helped us with our Grails bugfixing (during the test phase).
  • Hudson Master/Slave support for plugins isn’t particularly hard. It’s just that you need to be aware of the topic and replace some types like java.io.File. We gathered the same experience twice with our Crap4J plugin and the cmake plugin. It’s time to tell the world about it. Stay tuned!
  • The good old error return code is an error prone coding paradigm, because all too often, users of a function/method just forget to check the returned result. This was the case with a call to WaitForSingleObject in RXTX.
  • If you don’t understand an implementation well enough to fix the cause, you might at least be able to produce a workaround. It’ll work for you and provide guidance for the original author about where the bug might hide. This is why we count our winp efforts as success, too.
  • Your project either is mavenized or it isn’t. Everything in between is half-assed.

This OSLD was a bit short, as we had some guests in the evening, but nevertheless, it was fun. Well, to be precise, it was this special software engineer’s type of fun: The whole company was remarkably quiet most of the day, with everyone working totally focussed. We scratched our own itches, enhanced our customer projects and contributed to the open source community. A very good day!

Stay tuned if you want to know more about the specifics of the hudson plugin development or the to-be-proposed patches. We will publish them here.