Breakpad and Your CI – A Strong Team

If your C++ software has to run 24/7 on some server rack at your customer’s data center, it has to meet not only all the user requirements, but also requirements that come from you as developer. When your customer calls you about some “problems”, “strange behaviours”, or even crashes, you must be able to detect what went wrong. Fast!

One means to this end is of course logging. But if your application crashes, nothing beats a decent stacktrace 🙂

Google’s breakpad library comes in very handy here because it provides very easy crash reporting. Even if your process has 2 gigs of virtual memory, breakpad shrinks that ‘core dump’ down to a couple of megs.

Breakpad pulls that trick off by using so-called symbol files that you have to generate for each compiled binary (executable or shared library). These symbol files together with the breakpad dump file that is created at crash time are then used to recreate the stacktrace.

Because every compilation creates different binaries, dump file and symbol files need to be ‘based on’ exactly the same binaries.

This is where you can let your CI system do some work for you. At one of our customers we use Jenkins not only for the usual automatic builds and tests after each check-in but also for release builds that go into production.

At the end of each build, breakpad’s symbol dumper runs over all compiled executables and libraries and generates the symbol files. These are then archived together with the compiled binaries.

Now we are prepared. Whenever some customer sends us a dump file, we can just easily pull out the symbol files corresponding to the software version that runs at this customer and let breakpad do its magic…

 

How to accidentally kill your CI build time

At one of our customers I do C++ consulting in a mid-sized project which uses cmake as build system. A clean build on our Jenkins CI server takes about 40 minutes (including unit tests) which is way too long to be considered “fast feedback” in an agile kind of way.

Because of that, we do clean builds only 2 times a day – some time during the night and during lunch break. The rest of the day the CI server only does a “svn update” and a normal “make”, which takes about 3-10 minutes depending on what files have been changed.

With C++ there are lots of ways to unnecessarily lengthen your build time. The most important factor is, of course, #include dependencies. One has to be very (very) disciplined in adding #include directives in header files. Otherwise, the whole world suddenly gets rebuild when some small header file somewhere in a little corner of the code has been changed.

And I have to say, for the most part, this project is in pretty good shape with regard to #include dependencies.

So what the hell has suddenly increased our build time from 3-10 minutes to 20-25 minutes? was what I was thinking some time last week while waiting for the CI server to spit out new latest and greatest rpm packages. For some reason, our normal, rest-of-the-day build started to compile what felt like everything in our main package even on the slightest code change in a remote .cpp file.

What happened?

In order to have the build time available (e.g. to show in an “about” box), we use a preprocessor symbol like REVISION_DATE which gets filled in a CMakeLists.txt file. The whole thing looks like this:

...
EXEC_PROGRAM(date ARGS '+%F_%T' OUTPUT_VARIABLE REVISION_DATE)
...
ADD_DEFINITIONS(-DREVISION_DATE=\"${REVISION_DATE}\")
...

Since the beginning of the time these lines of CMake code lived in a small sub-sub-..-directory with little to no incomming dependencies. Then, at some point, it became necessary to have the REVISION_DATE symbol at some other place, too, which led to a move of the above code into the CMakeLists.txt file of the main package.

The value of command date +%F_%T changes every second which leads to a changed REVISION_DATE on every build – which is what we initially intended. What changes, too, of course, is the value of the ADD_DEFINITIONS directive. And as CMake is very strict with the slightest change in this value, every make target below that line gets rebuild – which in our case was everything in the main package.

So there! Build time killing creatures are lurking everywhere in our C/C++ projects. Always be aware of them!

CMakeBuilder Version 1.9

Today, I want to announce version 1.9 of the CMakeBuilder plugin for Jenkins (formerly known as Hudson). Concluding from the user feedback, there are no major missing features – at least for the moment.

So for this version, I implemented only one visible enhancement: It is now possible to use environment variables in every configuration setting. Even settings like “Preload Script” “Make Command” or “Install Command” can now be configured with the support of environment variables.

The major invisible change I did was the migration to the Jenkins development infrastructure using this very helpful guide. Moving the whole thing to git will be next.

Check it out!

Open Source Love Day July 2010

This friday , we held our Open Source Love Day for July 2010.  We began with several internal meetings and discussion (like the Homepage Comittee meeting) and dived right in our work afterwards. Everybody had a little backlog of issues that we wanted to get done on this day. Nearly everybody succeeded (well, the author had a minor delay – read about it below). The day went by in a very fast pace, but it felt right.

The Open Source Love Day

We introduced a monthly Open Source Love Day (OSLD) to show our appreciation to the Open Source software ecosystem and to donate back. We heavily rely on Open Source software for our projects. We would be honored if you find our contributions useful. Check out our first OSLD blog posting for details on the event itself.

On this OSLD, we accomplished the following tasks:

  • There are really cool new features in the latest JUnit versions and Rules are one of them. What hurt our aesthetic sense was that the field that hold the Rule instance has to be public. Checkstyle was on our side, so we tweaked JUnit to allow all kinds of visibility. You can read about the change needed here: http://github.com/KentBeck/junit/issues#issue/31. The fix is almost trivial and will hopefully be incorporated in the next versions of JUnit, so we do not publish our altered version.
  • We constantly receive requests and remarks about our cmake plugin for Hudson. This lead to a new version of the plugin fixing two issues with matrix builds and custom build types. Head over to the plugin homepage and grab the new version 1.6. The issues were in detail:
    • The plugin can be used with matrix builds now
    • Custom build types can be defined now
  • RXTX is our choice for serial port communication with Java. We fixed some issues during the last few OSLDs, with one issue left for today: When you flush your stream while using a special type of usb-to-rs232 converter, you got an exception. The corresponding issue is #102 in the RXTX issue tracker. We proposed a patch that fixes the problem.
  • Another hudson plugin is our crap4j reporter. It lacked some love for months now and finally broke when used with the latest hudson versions. Fixing the problem was a lot harder than we thought, basically because the plugin needed adjustments to recent API changes and we couldn’t figure out exactly what adjustments are necessary. You might have a look at the developer mailing list thread for this question. Finally, we got it resolved (on sunday, with a sudden stroke of insight) and a new version 0.8 is published.
  • We use an internal time tracking tool for our projects. This tool isn’t specifically open source yet, but continues to grow in terms of features and usability. The work invested in this tool helps us to continue with the OSLD, so it’s beneficial work nonetheless.
  • During the last OSLD, we had plans for a new hudson plugin and even produced a prototype. This time, we looked around the hudson plugin zoo (it’s getting a bit difficult to keep track of all of them) for inspiration and found a wonderful piece of art: The Groovy Postbuild Plugin. Using this plugin with a small groovy script served our needs exactly. No need for a full-blown plugin when you can scratch your itch with a simple script. Thanks to Serban Iordache for his great work!

What were our lessons learnt today?

  • If you need to setup a fresh workspace for an open source project, consider to prepare it over the night before, or the download delay will kill your precious work time. There is nothing more frustrating than staring at a “downloading…” progress bar while being eager to start programming.
  • Always look around what others have done before. We wanted to build a full hudson plugin from scratch when all we needed was a little groovy script placed inside another plugin. Sweet!
  • Do not hesitate to privately fix open source issues that won’t get done in time for you. Just make sure to have a management process in place to track those changes and be able to re-apply them to future versions. More important though, be able to tell exactly when NOT to re-apply them because the original project has fixed the issue.

Retrospective of the OSLD

The OSLD went smooth and was productive. We tend to work on backlogs instead of searching for random issues now, but that’s just a sign that our approach has matured and we depend on the OSLD to get work done.

Last wednesday, we held our Open Source Love Day for June 2010. This one was productive despite the heat that had us sweating the whole day long (as a sidenote: it got even warmer the days afterwards). Some features were finished and will help at least us in our projects. We still look forward for the right way to release them. Another release was even more problematic, you will read about it below.The Open Source Love Day

We introduced a monthly Open Source Love Day (OSLD) to show our appreciation to the Open Source software ecosystem and to donate back. We heavily rely on Open Source software for our projects. We would be honored if you find our contributions useful. Check out our first OSLD blog posting for details on the event itself.

On this OSLD, we accomplished the following tasks:

Follow-up to our Dev Brunch July 2010

Last Saturday, we held our Dev Brunch for July 2010. The setting of this brunch was unusual, as we didn’t brunch, but cooked spaghetti (to be exact: had spaghetti cooked while we ranted about different workplaces). We also didn’t start in the late morning, but in the early afternoon. Later on, a LAN computer game party was held in our office, limiting our time-frame a bit. Due to rainy weather, we stayed inside and discussed the topics listed below.

The Dev Brunch

If you want to know more about the meaning of the term “Dev Brunch” or how we implement it, have a look at the follow-up posting of the brunch in October 2009. We continue to allow presence over topics. Our topics for the brunch were:

  • Your own Java ResourceBundle implementation – Since Java 6, there is the new possibility to add your own ResourceBundle formats under the generic API using ResourceBundle.Control. We discussed several possible use cases and had an example case mocked up in source code. The API enables you to do what was impossible beforehands but isn’t as polished as it could be. Worth a closer look if you want to combine ResourceBundle with your i18n database, for example.
  • Thoughts on “Team Rooms” – Lately, there was a very good blog entry about team rooms and how they are introduced by Martin Fowler. The article is titled “The rise of the cattle office” and has some valid points. But nearly every attendee of the brunch likes working in a team room. We had a great discussion that can’t be summarized in a single sentence, but one advice: Mr. Fowler, please put up some nicer teaser image in your bliki!
  • Retrospective of the Java Forum Stuttgart 2010 – The Java Forum Stuttgart 2010 is a local conference dedicated to Java. It grew into a 1k+ developer’s meeting for southwest germany. You cannot avoid to meet former colleagues and chat non-stop in the pauses. The presentations are mostly very professional and worthwhile. We learnt a bit about long-term serialization issues (put a version in your XML namespace!), better JUnit (Rules are cool!), some Dependency Injection myths (though this presentation could have been snappier) and got introduced to Apache Hadoop (Map/Reduce at its best). Embedded Java still is the hell we remembered it to be. But the best presentation of the day clearly was Dr. Simon Wiest talking about Hudson and advanced techniques to speed up your build.

Retrospection of the brunch

The group of attendees was small again, with several firsttime guests. This helped the disgression factor a lot, we talked a lot about all kinds of topics that didn’t make it on the list above. The time and setup was a bit unusual, but the brunch itself was fun and insightful as always.

Open Source Love Day June 2010

Last wednesday, we held our Open Source Love Day for June 2010. This one was productive despite the heat that had us sweating the whole day long (as a sidenote: it got even warmer the days afterwards). Some features were finished and will help at least us in our projects. We still look forward for the right way to release them. Another release was even more problematic, you will read about it below.

The Open Source Love Day

We introduced a monthly Open Source Love Day (OSLD) to show our appreciation to the Open Source software ecosystem and to donate back. We heavily rely on Open Source software for our projects. We would be honored if you find our contributions useful. Check out our first OSLD blog posting for details on the event itself.

On this OSLD, we accomplished the following tasks:

  • Launch4j is a java application launcher for Windows, handling all the stuff a startup script would do, too. At the last OSLD, we added the ability to restart the application in case of a crash or other unplanned exit. To utilize this feature for automatic update routines, we needed to add the additional feature of starting another command instead of the original one. If the program fills a special file with the command needed, Launch4j will execute it after the program’s exit. This patch builds onto the previous patch and we are still investigating how to publish this functionality without breaking backward compatibility. We are looking forward to release it on the next OSLD.
  • We use RXTX to perform the serial (RS232) communication on all our java projects. We worked on an issue with serial converters over the course of several OSLDs now and released the patch to the issue tracker of RXTX after a longterm stability test. See the reworked patch for issue #144. There is another issue with the flush() method that seems to affect not only virtual RS232 ports that we currently investigate. But we aren’t yet able to come up with a complete issue description or a fix, so this will be suspended until the next OSLD.
  • We have written the Campfire Hudson Plugin as part of previous OSLDs. When issues emerged, we got patches from the community here. Thank you guys! We included the changes to the code and prepared a new release, when maven failed. This is not an issue, except when it fails repeatedly and messes up the workspace and the repository. After a long time of helpless fiddling with the parameters, we decided to start over and increase the version number to 2.1 (instead of 1.2). All of a sudden, everything worked out fine. Maven is a mysterious beast.
  • The initial work for a New Hudson Plugin was made. One tradition of the OSLD has always been to scratch our own itches. While there are many useful hudson plugins, we have the immediate need for another one that doesn’t exist yet. Without going into details here (we save this for the next OSLD), we produced a proof of concept and a first iteration of the code. Stay tuned for details on the next OSLD.

What were our lessons learnt today?

  • If you don’t succeed with maven’s automatic processes, do not try to sort out things manually. You’ll just end up with a gigantic mess that won’t work either way. The best way to deal with maven failures is to revert everything and try again with different parameters.
  • The best approach to develop hudson plugins is to adapt the old “monkey see, monkey do” process. There are so many plugins already, chances are good your immediate question was already answered somewhere. Just check the found solution for accidental complexity. Sometimes, the first solution isn’t the easiest.
  • When dealing with the legacy Win32 API, combined knowledge scraping is king. We had discussions throughout the day that only consisted of little parts of recollections about knowledge that seemed long forgotten. But finally, we put the pieces together and solved the problem. It should be called teamthink, i guess.

Retrospective on the OSLD

The weather at this OSLD was way too hot to operate at normal speed. But we got some nice results and a cliffhanger for the next OSLD. We left soaked with sweat but happy that evening.

Improved Version of CMake Builder for Hudson

Today I just want to give a small round-up of the improvements made on the cmake builder plugin since my last blog post. Back then, version 1.2 was released to support master/slave configurations. As of yesterday, we are at version 1.5 which contains the following improvements/bug-fixes:

  • Bug: The drop-down box for selecting the build type didn’t remember its value. This was fixed with a patch by Atte Timonen.
  • Improvement: Also included in Atte’s patch was the propagation of environment variables to the cmake command which now allows to do parameterized builds. A big thank’s to Atte!
  • Improvement: The install command gets only executed when install directory and install command is given. Before, the build was either broken or $WORKSPACE was used automatically as install directory. Thanks to Dat Chu for his feedback.
  • Improvement: The one-line ‘Other CMake Arguments’ field can get full pretty quickly, so it was changed to a multi-line text-area.

Thank’s again for the feedback, and have fun with the new version!

CMake Builder Plugin in Master/Slave Setups

The first versions of the cmake builder plugin were developed more or less only driven by our own needs. As people began to use it an issue came up that we hadn’t considered yet: distributed builds, a.k.a master/slave mode. So on our first OSLD in 2010 I looked into the plugin and began to rectify the situation.

My test setup consisted of a hudson master on WindowsXP box which was connected via SSH to a slave node in a Ubuntu virtual machine. The first errors were easy to find. The plugin tried to find all configured paths on the windows host and not on the ubuntu slave.

Experience from our previous Crap4J plugin development and a quick read here brought me on the right track. It’s not a good idea to use just java.io.File if you want your plugin to be master/slave capable – use hudson.FilePath instead.

So after replacing all java.io.File occurrences with hudson.FilePath the situation was much better. The plugin handled all paths correctly but still produced errors when calling cmake. I quickly discovered that java.lang.Process and java.lang.ProcessBuilder were used to call “cmake -version”. Again, not a good idea – hudson.Launcher is your friend here.

After replacing Process with Launcher I had only one strange error left. The following launcher call using a nice fluent interface wouldn’t execute on the remote machine but insisted to execute locally.

launcher.launch().cmds(cmakeCall).envs(environmentVars)
   .stdout(listener).pwd(workDir).join();

When I changed it to the seemingly equal statement

launcher.launch(cmakeCall, environmentVars,
    listener.getLogger(), workDir).join();

it worked like a charm.

After all those changes I proudly present the newest version of CMake Builder Plugin which is now ready to be used in distributed environments.

Only one little unpleasantness still exists, though: when configuring the make and install commands the plugin tries to find the executables on the PATH of host machine. For now, you can just ignore the error message. I try to look into it, soon. Apart from that, have fun with the new version.

Open Source Love Day March 2010

Yesterday, we held our first Open Source Love Day (OSLD) for this year. The last OSLD was at December 2009. Then, we reassigned a day in January and February each to perform our relocation to the new (and much bigger) office. But now we are back to regular duty and had the time to donate some work back to the Open Source ecosystem.

The Open Source Love Day

We introduced a monthly Open Source Love Day to show our appreciation to the Open Source software ecosystem and to donate back. We heavily rely on Open Source software for our projects. We would be honored if you find our contributions useful. Check out our first OSLD blog posting for details on the event itself.

Participate at our OSLD by using the features we’ve built today:

  • Grails still has some bugs. Instead of only complaining about them, we try to fix them. There is a bug with checkboxes and nested boolean properties that bugged us in a customer project. It’s filed unter GRAILS-3299 and has a proposed patch now.
  • In previous OSLDs, we produced the cmake hudson plugin. In the corresponding blog entry, comments with bug reports began to pile up. They addressed issues with hudson master/slave setups. So we implemented a hudson master/slave test environment, using VirtualBox virtual machines to perform as slaves. This setup quickly revealed the problems that were typical enough to devote a complete blog entry about this topic soon. Fixing the problems resulted in the new cmake hudson plugin version 1.2 to be released yesterday.
  • We are using the RXTX project to perform serial (RS232) communication in several projects. We are really glad the project exists, because the “official” communications API from Sun/Oracle is nothing but a mess. With RXTX, we only had a problem with emulated COM ports. Emulated COM ports exist when you use a USB->Serial or Ethernet->Serial converter, which is what our customer chose to do. If you unplug the converter during operation, the corresponding COM port disappears. This causes RXTX to crash, bringing the JVM down, too. We wrote a test application and used it with every converter we own (and we own quite a lot of them!). Then we began tracing the RXTX source code (at C code level), altering it to “only” throw an IOException when the virtual COM port disappears. The corresponding patch will be proposed to the RXTX project soon.
  • Another API we use a lot is the tiny winp project, written by Kohsuke Kawaguchi, the creator of hudson. We kill Windows processes with it, within a project that runs on Windows 2000, Windows XP and Windows 7. The latest Windows version seemed incompatible with winp, even the 32bit edition. We didn’t find the cause for this, but developed a workaround that will be proposed to the winp project soon.

What were our lessons learnt today?

  • If you face OutOfMemoryErrors on a 64bit Java6 JVM, try to switch back to a 32bit Java5 JVM. It helped us with our Grails bugfixing (during the test phase).
  • Hudson Master/Slave support for plugins isn’t particularly hard. It’s just that you need to be aware of the topic and replace some types like java.io.File. We gathered the same experience twice with our Crap4J plugin and the cmake plugin. It’s time to tell the world about it. Stay tuned!
  • The good old error return code is an error prone coding paradigm, because all too often, users of a function/method just forget to check the returned result. This was the case with a call to WaitForSingleObject in RXTX.
  • If you don’t understand an implementation well enough to fix the cause, you might at least be able to produce a workaround. It’ll work for you and provide guidance for the original author about where the bug might hide. This is why we count our winp efforts as success, too.
  • Your project either is mavenized or it isn’t. Everything in between is half-assed.

This OSLD was a bit short, as we had some guests in the evening, but nevertheless, it was fun. Well, to be precise, it was this special software engineer’s type of fun: The whole company was remarkably quiet most of the day, with everyone working totally focussed. We scratched our own itches, enhanced our customer projects and contributed to the open source community. A very good day!

Stay tuned if you want to know more about the specifics of the hudson plugin development or the to-be-proposed patches. We will publish them here.

Speed up your buildbox, Part IV: Beyond the box

© Friedberg - Fotolia.comIn the first three parts of our effort to speed up our buildbox, we replaced the harddisk with a RAM disk, upgraded the CPU to the top-notch model and installed plenty of fast RAM. This brought the build time down from 03:30 minutes to around 02:00 minutes. The CPU frequency was the biggest time saving factor in our case study. Two minutes is as fast as the build can get for our project without fiddling with the actual build process. It’s sufficient for our case, but it may not for yours.

Even top speed is too slow

Lets assume we maxed out the hardware and still have a build duration far beyond the magical ten minutes mark. What can we do now? There are two viable options at hand if you can exclude the possibility that your build process is really inefficient and needs optimization. In the latter case, it would be better to revise the process instead of the build infrastructure.

Two ways to speed up your build infrastructure

You can go down one or both of two general paths to speed up your build process. To understand the examples, lets assume the build takes 20 minutes to run on your top-notch build box.

  • Add more build boxes. This is the classical “parallelize it!” approach. It won’t speed up the individual build process, but enable more processes to run at the same time. This approach wont change anything if your team does seldom check-ins, which in itself is an anti-pattern to continuous integration. But if your team commits changes every ten minutes, having at least two build boxes will prevent the second committer from waiting 30 minutes on the CI results. Instead, the results will always be there after 20 minutes. You haven’t exactly sped up your build process, but the maximal waiting time of your committers. For details on the implementation, see below at “Growing a build park“.
  • Chop up your build process. This is known as “staging” or “pipelining” your build. This won’t speed up the individual build process, either, but deliver certain partial results of your build earlier. Lets assume you can split your build process into four distinct stages: compile, unit test, integration test, package. Whenever a stage yields a result, the comitter gets feedback immediately. In our example, this might be every 5 minutes. This has several disadvantages, as for example discussed in the article “The pipeline of doom” by Julian Simpson, but can lower the waiting time for specific aspects of your build drastically. You haven’t exactly sped up your build process, but the response time for partial results and therefore the average waiting time of your committer. For details on the implementation, see below at “Installing a build pipeline“.

Growing a build park

If you want to reduce the initial waiting delay of a build before it gets processed or increase the throughput of builds, the build farm pattern is your way to go. By adding slave build machines to your build master, you can distribute the workload on more shoulders. The best way to set up your infrastructure is to introduce a dedicated master box that only delegates actual builds to its slaves. The master box handles the archivation of build artifacts and deals with the web server requests, while the slaves only perform build tasks. The master box can be of average power, with increased storage size, while the slaves should be ultra-fast, without the need of big disks. Solid state disks or even RAM disks of the slaves can be tuned to actual workspace sizes, as it is all that needs to be stored there.

Distributed builds with Hudson

The Hudson continuous integration server has a strength in setting up these master/slave scenarios. It’s ridiculously easy to set up a build slave. You basically only need to click on a link to start the slave process. If you happen to have a standard build, everything needed gets downloaded automatically. If you want your slaves to operate automatically, you can install a windows daemon, provide a SSH account or write your own script. Usually, slaves are set up in a matter of minutes without hassle. A great idea is to turn powerful collegue boxes into build slaves (aka CI zombies) by booting an USB stick. The best way to start with master/slave builds is to turn your current PC into a hudson slave right now by using the Java Web Start method.

Installing a build pipeline

If you are interested in early but incomplete feedback from your build box, staging your build will help you out. If partitioned right, you’ll receive a series of answers on specific questions from your build process. The questions may be like:

  1. Will it compile?
  2. Will it pass the unit tests?
  3. Will it function (pass the integration tests)?
  4. Will it blend?

Ok, the last question is rather unlikely to be answered by your build box. The overall build process will not be any faster, but basic safety test results are reported earlier. If you combine this approach with distributed builds, there is the possibility to designate specifically tuned machines to different stages. The Hudson continuous integration server has the ability to tag a slave with different labels. You can then configure your build to only run on slaves with the desired label assigned.

Staged builds with Hudson

Staging with the Hudson continuous integration server isn’t as easy as the master/slave feature, but there are some plugins that allow for more complex setups. You might experience some functionality that’s still under development, but basic staging is possible even today. In combination with specialized slave build boxes, this approach can lower your build duration. It is a a complex endeavour, though.

Conclusion

Once your single build box is maxed out but still not fast enough, you enter a different realm of continuous integration infrastructure setups. Speeding up a build process beyond the single box isn’t as easy as installing more RAM. But with a fair amount of planning, you have a fair chance to improve the situation. Note that you won’t primarily lower build duration, but increase throughput and utilize partitioning and specialization. These are different measures and might not affect the wall clock time of your build. The combination of staging and distribution is the most powerful setup, but will result in the most complex infrastructure to maintain. Before entering this realm, be sure to apply any possible optimization to your build process. Because you’ll not leave that realm again soon.

What’s your story on build optimization beyond the box? Drop us a comment.