Breakpad and Your CI – A Strong Team

Google’s breakpad together with your CI system can prepare you for the worst.

If your C++ software has to run 24/7 on some server rack at your customer’s data center, it has to meet not only all the user requirements, but also requirements that come from you as developer. When your customer calls you about some “problems”, “strange behaviours”, or even crashes, you must be able to detect what went wrong. Fast!

One means to this end is of course logging. But if your application crashes, nothing beats a decent stacktrace 🙂

Google’s breakpad library comes in very handy here because it provides very easy crash reporting. Even if your process has 2 gigs of virtual memory, breakpad shrinks that ‘core dump’ down to a couple of megs.

Breakpad pulls that trick off by using so-called symbol files that you have to generate for each compiled binary (executable or shared library). These symbol files together with the breakpad dump file that is created at crash time are then used to recreate the stacktrace.

Because every compilation creates different binaries, dump file and symbol files need to be ‘based on’ exactly the same binaries.

This is where you can let your CI system do some work for you. At one of our customers we use Jenkins not only for the usual automatic builds and tests after each check-in but also for release builds that go into production.

At the end of each build, breakpad’s symbol dumper runs over all compiled executables and libraries and generates the symbol files. These are then archived together with the compiled binaries.

Now we are prepared. Whenever some customer sends us a dump file, we can just easily pull out the symbol files corresponding to the software version that runs at this customer and let breakpad do its magic…

 

Debug Output

Crafting debug output from std::istream data can be dangerous!

Writing a blog post sometimes can be useful to get some face-palm kind of programming error out of one’s system.

Putting such an error into written words then serves a couple of purposes:

  • it helps oneself remembering
  • it helps others who read it not to do the same thing
  • it serves as error log for future reference

So here it comes:

In one project we use JSON to serialize objects in order to send them over HTTP (we use the very nice JSON Spirit library, btw).

For each object we have serialize/deserialize methods which do the heavy lifting. After having developed a new deserialize method I wanted to test it together with the HTTP request handling. Using curl for this I issued a command like this:

curl -X PUT http://localhost:30222/some/url -d @datafile

This command issues a PUT request to the given URL and uses data in ./datafile, which contains the JSON, as request data.

The request came through but the deserializer wouldn’t do its work. WTF? Let’s see what goes on – let’s put some debug output in:

MyObject MyObjectSerializer::deserialize(std::istream& jsonIn)
{
   // debug output starts here
   std::string stringToDeserialize;
   Poco::StreamCopier::copyToString(jsonIn, stringToDeserialize);
   std::cout << "The String: " << stringToDeserialize << std::endl;
   // debug output ends here

   json_spirit::Value value;
   json_spirit::read(jsonIn, value);
   ...
}

I’ll give you some time to spot the bug…. 3..2..1..got it? Please check Poco::StreamCopier documentation if you are not familiar with POCO libraries.
What’s particularly misleading is the “Copier” part of the name StreamCopier, because it does not exactly copy the bytes from the stream into the string – it moves them. This means that after the debug output code, the istream is empty.

Unfortunately, I did not immediately recognize the change in the error outputs of the JSON parser. This might have given me a hint to the real problem. Instead, during the next half hour I searched for errors in the JSON I was sending.

When I finally realized it …

Readability of Boolean Expressions

Readability of boolean expressions lies in the eyes of the beholder.

Following up on various previous posts on code readability and style I want to provide two more examples today – this time under the common theme of “handling of boolean values”.

Consider this (1a):

bool someMethod()
{
  if (expression) {
    return true;
  } else {
    return false;
  }
}

Yes, there are people who consider this more readable than (1b)

bool someMethod()
{
  return (expression);
}

Another example is this (2a):

  if (someExpression() == true)
    ...

versus my preferred version (2b):

  if (someExpression())
    ...

So what could be the reason for these different viewpoints? One explanation I thought of is as follows: Let’s say you have a background in C and you are therefore used to do something like:

#define FALSE (0)
#define TRUE (!FALSE)

In other words, you may not see boolean as a type of its own, like int and double, with a well-defined value range. Instead you see it more like an enumerated type which makes it feel very naturally do a expression == true comparison.

At the same time it feels not very natural to see the result of a boolean expression as being of type bool with all the consequences – e.g. to be able to return it immediately as in the first example.

Another explanation is that 1a and 2a are as verbose as it can be. You don’t have to make any mental efforts to understand what the code does.

While these may be possible explanations, my guess is that most of you, like me,  still see 1a and 2a as unnecessary visual clutter and consider 1b and 2b as far more readable.

The Great Divide

There is a great divide in the C++ developer community between “normal” developers that use only basic language features and very savvy ones that know every little corner of the language. The upcoming C++ standard deepens this divide even more.

Recently, I had two very contrary conversations about C++ which show very good the great divide in C++ developer community.

The first was with the technical lead of a team that writes and maintains drivers and control software for a scientific institution. These systems run 24/7 and have to be very stable and reliable.

I had discovered that they use a self-written toolbox library containing classes like SharedPtr<T>, and Thread and suspected immediately a classical NIH-syndrome. I asked him about it and why they don’t use well established libraries like boost. He told me that they indeed are only using the standard library and their own toolbox.

The reason he gave was that despite boost being most elegant C++ library out there, it required very good knowledge about the most advanced C++ mechanisms, and that his team was not on this level … I should probably mention here that his team does a very good job in running their systems. So, apparently, they get along very well with using only basic  C++ features and no “fancy” boost stuff.

The other conversation was with a friend of mine with whom I chat regularly about all sorts of programming related stuff. This time the topic was the upcoming  C++ standard and all its  exciting new stuff. He has lot’s of experience with C++ and knows the language very well. But even someone like him had a hard time to really understand what rvalue references are all about. I had not looked at them in detail, yet,  so he tried to explain them to me. During our discussion I was thinking about if teams like the one introduced before will ever use rvalue references, or other C++0X stuff in their production code, other than maybe the auto keyword for type inference, or constructor delegation.

Honestly, I don’t think stuff like  rvalue refs will become a feature that is often used by “standard industry” teams, because it adds a lot of complexity to an already complex language. Even easy-to-get stuff like the new keywords override, constexpr and final, or additional initialization means like std::initializer_list<T> will take a lot of time to get used regularly by most C++ teams.

Instead, most of C++0X will greatly increase the divide between “normal” C++ developers who get along well with using only basic language features, and experts that know every little corner of the language. And this is simply because there is so much more to know with C++0X.

But don’t let us paint this picture overly black. I, for one, am looking forward to the new standard and I will certainly spread the word about the new possibilities and features in every C++ team I work with.

Bogus Error Messages with Qt .ui Files

Name your Qt Forms correctly and you will save lots of debugging time.

Bogus errors together with their messages can have a large number of reasons – full hard drives being one of the classics. When it comes to programming and especially C++, the possibilities for cryptic, meaningless and misleading error message are infinite.

A nice one bit us at one of our customers the other day. The message was something like

QLayout can only have instances of QWidget as parent

and it appeared as standard error output during program start-up. Needless to say that the whole thing crashed with a segmentation fault after that. The only change that was made was a header file that was added to the Qt files list in the CMakeLists.txt file.  The Qt class in this header file was just in its beginnings and had not yet any QLayouts, or QWidgets in it. Even the  C++ standard measure of cleaning and recompiling everything didn’t help.

So how is it possible that an additional Qt header file that has not references to QLayout and QWidget can cause such an error message?

As all of you experienced C/C++ developers know, for the compiler, a code file is not only the stuff that it contains directly but also what is #included! The offending header file included a generated ui description file which you get when you design your windows – or Forms in Qt terminology – with the Qt designer and use the Compile-Time-Form-Processing-approach to incorporate the form into the code base.

But how can that effect anything?

The Qt designer saves the forms into .ui files. From that, the so-called User Interface Compiler (uic) generates a header file containing a C++ class together with inlined code that creates the form. Form components like line edits, or push buttons are generated as instance attributes. The name of the class is generated from the name of the form. You can even use namespaces.  By naming it e.g. myproject::BestFormEverDesigned the generated class is named BestFormEverDesigned   is put into namespace myproject.

So far, so nice, handy and easy to use.

When you create a new form in qt designer, the default name is Form. Maybe you can guess already where this leads to…

Two forms for which the respective developers forgot to set a proper name, existed in the same sub project and had been compiled and linked into the same shared library. The compiler has no chance to detect this, because it sees only one

class Form
{

at a time. The linker happily links all of this together since it thinks that all Forms are created equal. And then at run-time … Boom!

I will have to look into a little Jenkins helper which breaks the build when a Form form is checked in…

How to accidentally kill your CI build time

At one of our customers I do C++ consulting in a mid-sized project which uses cmake as build system. A clean build on our Jenkins CI server takes about 40 minutes (including unit tests) which is way too long to be considered “fast feedback” in an agile kind of way.

Because of that, we do clean builds only 2 times a day – some time during the night and during lunch break. The rest of the day the CI server only does a “svn update” and a normal “make”, which takes about 3-10 minutes depending on what files have been changed.

With C++ there are lots of ways to unnecessarily lengthen your build time. The most important factor is, of course, #include dependencies. One has to be very (very) disciplined in adding #include directives in header files. Otherwise, the whole world suddenly gets rebuild when some small header file somewhere in a little corner of the code has been changed.

And I have to say, for the most part, this project is in pretty good shape with regard to #include dependencies.

So what the hell has suddenly increased our build time from 3-10 minutes to 20-25 minutes? was what I was thinking some time last week while waiting for the CI server to spit out new latest and greatest rpm packages. For some reason, our normal, rest-of-the-day build started to compile what felt like everything in our main package even on the slightest code change in a remote .cpp file.

What happened?

In order to have the build time available (e.g. to show in an “about” box), we use a preprocessor symbol like REVISION_DATE which gets filled in a CMakeLists.txt file. The whole thing looks like this:

...
EXEC_PROGRAM(date ARGS '+%F_%T' OUTPUT_VARIABLE REVISION_DATE)
...
ADD_DEFINITIONS(-DREVISION_DATE=\"${REVISION_DATE}\")
...

Since the beginning of the time these lines of CMake code lived in a small sub-sub-..-directory with little to no incomming dependencies. Then, at some point, it became necessary to have the REVISION_DATE symbol at some other place, too, which led to a move of the above code into the CMakeLists.txt file of the main package.

The value of command date +%F_%T changes every second which leads to a changed REVISION_DATE on every build – which is what we initially intended. What changes, too, of course, is the value of the ADD_DEFINITIONS directive. And as CMake is very strict with the slightest change in this value, every make target below that line gets rebuild – which in our case was everything in the main package.

So there! Build time killing creatures are lurking everywhere in our C/C++ projects. Always be aware of them!

Looping in C++

What is “the best” way to loop over collections in C++?

One recurring discussion point in one of our customers C++ project team is the following:

What is “the best” way to loop over collections?

In a typical scenario there is a standard container like std::list, or some equivalent collection, and the task is to do something with every element in the collection. The straight forward way would be like this:

std::list<std::string> mylist;
for (std::list<std::string>::iterator iter = mylist.begin(); iter != mylist.end(); iter++)
{
   ...
}

This code is correct and readable. But my guess is that most of you instantly see at least two possible improvements:

  1. the call to mylist.end() occurs in every loop an can be expensive e.g. in case of long std::lists
  2. iter++ creates one unnecessary intermediate object on the stack

So this

for (std::list<std::string>::iterator iter = mylist.begin(), end = mylist.end(); iter != end; ++iter)
{
   ...
}

would be much better but can already be seen as a little less readable.

Using BOOST_FOREACH can save you much of this still tedious code but has one nasty pitfall when it comes to std::maps.

In some places of the code base std::for_each is used together with a function, or function object.  The downside of this is that the function/function object code is not located where the loop occurs. However, this can be made “readable enough” when the function, or function object does only one thing and has a telling name.

Looping is sometimes done to create other collections of objects for each element. What to do there? Define the new collection use a for-loop of BOOST_FOREACH like above, or use std::transform with the same downside as std::for_each?

The other day one team member suggested to use boost::lambda expressions in loops. The initial usage examples where very promising but let me tell you – readability can drop dramatically very fast if you don’t be careful. It is very easy to get carried away with boost’s lambdas. I happened that we found ourselves having spent the last hour to carve out a super crisp lambda expression that takes anybody else another hour to read.

So the initial question remains undecided and will most likely stay like that. As for everything else in programming, there doesn’t seem to be a silver bullet for this task.

How do you go about looping in C++? Do you have some kind of coding style in place? Do you use std::for_each, BOOST_FOREACH, or some other means?

Looking forward to some feedback.

CMakeBuilder Version 1.9

Introducing CMakeBuilder plugin version 1.9.

Today, I want to announce version 1.9 of the CMakeBuilder plugin for Jenkins (formerly known as Hudson). Concluding from the user feedback, there are no major missing features – at least for the moment.

So for this version, I implemented only one visible enhancement: It is now possible to use environment variables in every configuration setting. Even settings like “Preload Script” “Make Command” or “Install Command” can now be configured with the support of environment variables.

The major invisible change I did was the migration to the Jenkins development infrastructure using this very helpful guide. Moving the whole thing to git will be next.

Check it out!

Podcasts

Podcasts are a very good means to shorten your commute, to keep you entertained during otherwise boring house-keeping activities, or, if you’re into sports, during your training sessions. Here is a list of some of my favourite shows.

This Developer’s Life

Rob Conery and Scott Hanselman interview developers and other IT professionals who share their stories. Very interesting, very well edited and flavoured with some nice pieces of music.

TechZing

Basically, TechZing are two guys, Jason Roberts and Justin Vincent, who discuss different topics concerning their lives as freelance web developers and startup bootstrappers. They enjoy themselves very much just talking to each other which is very entertaining already. The occasional interview and panel shows are then the icing on the cake.

It’s impossible to give a clear range of  topics since they consist of technical stuff like ‘how to store images in web applications’, SEO, NoSQL, JavaScript and iPhone development, but also non IT stuff like Pioneer One, geological challenges, and the Luck-Surface-Area. Edutainment at its best! Highly recommended!

Software Engineering Radio

This is purely an interview show which addresses all sorts of topics of interest for professional software developers: languages, platforms, technologies, methodologies, etc. Very informative, high profile guests and very competent hosts. Unfortunately, the output rate has gone down a lot in the last year.

Software ArchitekTOUR Podcast

This german (with little bits of swabian) speaking podcast is mostly concerned with topics around software architecture (as the name already suggests). DSLs, NoSQL databases and REST have been some of the latest topics.

FLOSS Weekly

Randal Schwartz (mostly) and other hosts are talking about Free Libre Open Source Software projects, ranging from whole OSes like CentOS to smaller niche projects like Ledger. Great show if you want to know what’s going on in the Open Source world.

Security Now

Steve Gibson and Leo Laporte talk about everything related to IT security. This will keep you informed about the latest browser vulnerabilities, Adobe Flash updates and Windows patches. But you will also learn e.g. how SSL works, the details of Stuxnet and everything about BitCoins. Don’t miss the all-time favourite episode 248: The Portable Dog Killer.

What are your favourite shows?

Old Code

Why bother buying Stephen King’s horror books, just take a look at your old code.

There is a saying that if you don’t be embarrassed by code that you wrote six month ago, you haven’t learned anything. Recently, I stumbled upon a C/C++ project that dates back to the very early days of my programming career – this was many * six months ago – and I can tell you, I was very embarrassed.

I had just “learned” C++ and object-orientation at that time and, of course, wanted to program that way. The result was terrible. The only small piece of object-orientation was the use of the keyword class. There were public fields all over the place,  no interfaces or abstractions of any kind, switches over type-ids, and so on.

Another highlight was the vast amount of literals scattered all over the code. For example, as it was a curses-based application, I had to read and display user input using curses methods like

int mvwgetch(WINDOW *win, int y, int x);

and

 int mvwaddch(WINDOW *win, int y, int x, const chtype ch);

And what did I do? I hard-coded y and x positions on every call of those methods. So it would often be the case that I changed, say, the y position in one part and … well, you guessed it already.

Naming of variables was also big. Boolean values would often be called “flag”, a name length of more than 4 was considered way too long.

But there was also progress. In later parts of the software I started to use “advanced” things like auto_ptrs, std::list, and std::map. Hooray!

The only positive thing about this project was that since I made every possible mistake one can imagine, I learned quite lot about programming. And I remember that at the end of the project, I was already very embarrassed about the whole thing…

So if you like reading horror stories, try digging up your old code 😉 And share if you like.