Breakpad and Your CI – A Strong Team

Google’s breakpad together with your CI system can prepare you for the worst.

If your C++ software has to run 24/7 on some server rack at your customer’s data center, it has to meet not only all the user requirements, but also requirements that come from you as developer. When your customer calls you about some “problems”, “strange behaviours”, or even crashes, you must be able to detect what went wrong. Fast!

One means to this end is of course logging. But if your application crashes, nothing beats a decent stacktrace 🙂

Google’s breakpad library comes in very handy here because it provides very easy crash reporting. Even if your process has 2 gigs of virtual memory, breakpad shrinks that ‘core dump’ down to a couple of megs.

Breakpad pulls that trick off by using so-called symbol files that you have to generate for each compiled binary (executable or shared library). These symbol files together with the breakpad dump file that is created at crash time are then used to recreate the stacktrace.

Because every compilation creates different binaries, dump file and symbol files need to be ‘based on’ exactly the same binaries.

This is where you can let your CI system do some work for you. At one of our customers we use Jenkins not only for the usual automatic builds and tests after each check-in but also for release builds that go into production.

At the end of each build, breakpad’s symbol dumper runs over all compiled executables and libraries and generates the symbol files. These are then archived together with the compiled binaries.

Now we are prepared. Whenever some customer sends us a dump file, we can just easily pull out the symbol files corresponding to the software version that runs at this customer and let breakpad do its magic…

 

Deployment with the Play! framework

Play! is a great framework for java-base development of modern web applications. Unfortunately, the documentation about deployment options is not really that extensive in certain details. I want to describe a way to automatically build a self-contained zip archive without the source code. The documentation does state that using the standalone web server is preferred so we will use that option.

Our goal is:

  • an artifact with the executable application
  • no sources in the artifact
  • startup script for different platform and environments
  • CI integration with execution of the tests

Fortunately, the play framework makes most of this quite easy if you know some small tricks.

The first very important step towards our goal is embedding the whole Play! framework somewhere in your project directory. I like to put it into lib/play-x.y.z (x.y.z being the framework version). That way you can do perform all neccessary calls to play scripts using relative paths and provide a self-contained artifact which developers or clients may download and execute on their machine. You can also be sure everyone is using the correct (read “same”) framework version.

The next important thing is to write some small start-scripts so you can demo the software easily on any machine with Java installed. Your clients may try it out theirselves if the project policy is open enough. Here are small examples for linux

#!/bin/sh
python lib/play-1.2.3/play run --%demo -Dprecompiled=true

and windows

REM start our app in the "demo" environment
lib\play-1.2.3\play run --%%demo -Dprecompiled=true

The last ingredient to a great deployment and demoing experience is the build script which builds, tests and packages the software together. We do not want to include the sources in the artifact, so there is a bit of work to do. We perform following steps in the script:

  1. delete old artifacts to ensure a clean build
  2. call play to precompile our application
  3. call play to execute all our automatic tests
  4. copy all needed files into our distribution directory ready to be packed together
  5. pack the artifacts into a zip archive

Our sample build script is for the linux shell but you can easily translate it to the scripting environment of your choice, be it apache ant, gradle, windows batch depending on your needs and preference:

#!/bin/sh

rm -r dist
rm -r test-result
rm -r precompiled
python lib/play-1.2.3/play precompile
python lib/play-1.2.3/play auto-test
TARGET=dist/my_project
mkdir -p $TARGET/app
cp -r app/views $TARGET/app
cp -r conf lib modules precompiled public $TARGET
cp programs/my_project* $TARGET
cd dist && zip -r my_project.zip my_project

Now we can hook the project into a continuous integration server like Jenkins and let it archive the build artifact containing an executable installation of our web application. You could grant your client direct access to the artifact, use it for demos and further deployment steps like triggered upload to a staging server or the like.

HTTP Get: The problem with Percent Encoded Parameters

Encoding problems are common place in software development but sometimes you get them in unexpected places.

Encoding problems are common place in software development but sometimes you get them in unexpected places.
About the setup: we have a web application written in Grails (though the choice of framework here doesn’t really matter) running on Tomcat. A flash application sends a HTTP Get request to this web application.
As you might know parameters in Get request are encoded in the URL with the so called percent encoding for example: %20 for space. But how are they encoded? UTF8?
Looking at our tomcat configuration all Get parameters are decoded with UTF8. Great. But looking at the output of what the flash app sends us we see scrambled Umlauts. Hmmm clearly the flash app does not use UTF8. But wait! There’s another option in Tomcat for decoding Get parameters: look into the header and use the encoding specified there. A restart later nothing changed. So flash does not send its encoding in the HTTP header. Well, let’s take a look at the HTTP standard:

If a reserved character is found in a URI component and no delimiting role
is known for that character, then it must be interpreted as representing the
data octet corresponding to that character's encoding in US-ASCII.

Ah.. US-ASCII and what about non ASCII ones? Wikipedia states:

For a non-ASCII character, it is typically converted to its byte sequence
in UTF-8, and then each byte value is represented as above.

Typically? Not in our case, so we tried ISO-8859-1 and finally the umlauts are correct! But currency signs like the euro are again garbage. So which encoding is similar to Latin-1 but not quite the same?
Yes, guess what: cp1252, the Windows native encoding.
And we tested all this on a Mac?!

The Story of a Multithreading Sin

The story of a bug that was caused by a common multithreading pitfall, the dreaded liquid lock.

In my last blog entry, I wrote about multithreading pitfalls (in Java), and ironically, this was the week when we got a strange bug report from one of our customers. This blog entry tells the story of the bug and adds another multithreading pitfall to the five I’ve already listed in my blog entry “When it comes to multithreading, better be safe than sorry”.

The premise

We developed a software that runs on several geographically distant independent “stations” that collect a multitude of environmental measurement data. This data is preprocessed and stuffed into data packages, which are periodically transferred to a control center. The software of this control center, also developed by us, receives the data packages, stores them on disk and in a huge database and extracts the overall state of the measurement network from raw data. If you describe the main task of the network on this level, it sounds nearly trivial. But the real functionality requirements are manifold and the project grew large.

We kept the whole system as modular as necessary to maintain an overall grasp of what is going on where in the system and installed a sufficient automatic test coverage for the most important parts. The system is still under active development, but the main parts of the network are in production usage without real changes for years now.

The symptoms

This might explain that we were very surprised when our customer told us that the control center had lost some data packages. Very soon, it turned out that the control center would randomly enter a state of “denial”. In this state, it would still accept data packages from the stations and even acknowledge their arrival (so the stations wouldn’t retry the transmission), but only write parts of the package or nothing at all to the disk and database. When the control center entered this state, it would never recover from it. But when we restarted the software manually, everything would run perfectly fine for several days and then revert back into denial without apparent trigger.

We monitored the control center with every means on our disposal, but its memory consumption, CPU footprint and threading behaviour was without noticeable problem even when the instance was in its degraded state. There was no exception or uncommon entry logged in the logfiles. As the symptom happened randomly, without external cause and with no chance of reversal once it happened, we soon suspected some kind of threading issue.

The bug

The problem with a threading issue is that you can’t just reproduce the bug with an unit or system test. We performed several code reviews until we finally had a trace. When a data package arrives, a global data processing lock is acquired (so that no two data packages can be processed in parallel) and the content of the package is inspected. This might trigger several network status changes. These change events are propagated through the system with classic observer/listener structures, using synchronous calls (normal delegation). The overall status of the network is translated in a human readable status message and again forwarded to a group of status message listeners. This is a synchronous call again. One of the status message listeners was the software driver for a LED ticker display. This module was a recent addition to the control center’s hardware outfit and used to display the status message prominently to the operators. Inside this LED software driver, some bytes are written to a socket stream and then the driver awaits an answer of the hardware device. To avoid the situation that two messages are sent to the device at the same time, a lock is acquired just before the message is sent. This code attracted our attention. Lets have a look at it:

private Message lastMessage = new Message();

public void show(Message message) {
    synchronized (this.lastMessage) {
        writeCommandAndWaitForResponse(Command.SHOW_TEXT, message.asBytes());
        this.lastMessage = message;
    }
}

The main problem here is the object the lock is acquired upon: the reference of lastMessage is mutable! We call this a liquid lock, because the lock isn’t as solid as it should be. It’s one of the more hideous multithreading pitfalls as it looks like everything’s fine at first glance. But this lock doesn’t have a complete “locking” effect because each caller may acquire the lock of a different instance. And a lock with a flawed locking behaviour is guaranteed to fail (in production). The liquid lock is like the bigger brother of the local lock. It isn’t local, but its mutability cause the same problems.

The bug finally turned out to be caused by the liquid lock in the LED display driver that got notified of system message changes when a data package arrived. But only if multiple messages were sent at once to the device, discarding some of the necessary answers in this circumstance or if the connection to the LED hardware would fail in the midst of a transmission, the system would not return from the write attempt. If one thread wouldn’t return to the data package processor, the global data processing lock would not be freed (read the start of this chapter again, this is the most important lock in the system!). And while the data processing lock was still held, all other data packages would be received, but piling up to obtain the lock. But the lock would never be returned from the thread waiting on an answer from a hardware device that had no intention to send another answer. This was when the control center appeared to be healthy but didn’t process any data packages anymore.

The conclusion

If you want to avoid the category of liquid lock multithreading bugs, make sure that all your lock instance references are immutable. Being final is an important property of lock instance references. Avoid to retrieve your locks from notoriously muteable data structures like collections or arrays. The best thing you can do to avoid liquid locks is to “freeze” all your lock instances.

Another insight from this story is that software modules have to be separated threadwise, too. It was a major design flaw to let the data processing thread, while holding the main processing lock, descend down into the deep ends of the LED driver, eventually getting stuck there for infinity. Some simple mechanisms like asynchronous listener notification or producer/consumer queues for pending transmission requests would have helped to confine the effects of the liquid lock bug inside the LED module. Without proper thread separation, it took down the whole software instance.

Debug Output

Crafting debug output from std::istream data can be dangerous!

Writing a blog post sometimes can be useful to get some face-palm kind of programming error out of one’s system.

Putting such an error into written words then serves a couple of purposes:

  • it helps oneself remembering
  • it helps others who read it not to do the same thing
  • it serves as error log for future reference

So here it comes:

In one project we use JSON to serialize objects in order to send them over HTTP (we use the very nice JSON Spirit library, btw).

For each object we have serialize/deserialize methods which do the heavy lifting. After having developed a new deserialize method I wanted to test it together with the HTTP request handling. Using curl for this I issued a command like this:

curl -X PUT http://localhost:30222/some/url -d @datafile

This command issues a PUT request to the given URL and uses data in ./datafile, which contains the JSON, as request data.

The request came through but the deserializer wouldn’t do its work. WTF? Let’s see what goes on – let’s put some debug output in:

MyObject MyObjectSerializer::deserialize(std::istream& jsonIn)
{
   // debug output starts here
   std::string stringToDeserialize;
   Poco::StreamCopier::copyToString(jsonIn, stringToDeserialize);
   std::cout << "The String: " << stringToDeserialize << std::endl;
   // debug output ends here

   json_spirit::Value value;
   json_spirit::read(jsonIn, value);
   ...
}

I’ll give you some time to spot the bug…. 3..2..1..got it? Please check Poco::StreamCopier documentation if you are not familiar with POCO libraries.
What’s particularly misleading is the “Copier” part of the name StreamCopier, because it does not exactly copy the bytes from the stream into the string – it moves them. This means that after the debug output code, the istream is empty.

Unfortunately, I did not immediately recognize the change in the error outputs of the JSON parser. This might have given me a hint to the real problem. Instead, during the next half hour I searched for errors in the JSON I was sending.

When I finally realized it …

Readable Code Needs Time and Care

A few weeks ago I was about to write an acceptance test involving socket communication. Since I was only interested in a particular sequence of exchanged data, I needed to wait for the start command and ignore all information sent prior to that command. In this blog post I’d like to present the process of enhancing the readability of the tiny piece of code responsible for this task.

The first version, written without thinking much about readability looked something like the following:

private void waitForStartCommand(DataInputStream inputStream) {
  String content = inputStream.readUTF();
  while (!START_COMMAND.equals(content)) {
    content = inputStream.readUTF();
  }
}

The aspect that disturbed me most about this solution was calling inputStream.readUTF() twice (Remember: DRY). So I refactored and came up with:

private void waitForStartCommand(DataInputStream inputStream) {
  String content = null;
  do {
    content = inputStream.readUTF();
  } while (!START_COMMAND.equals(content)) {
}

In this version the need to declare and initialize a variable grants far too much meaning to an unimportant detail. So, a little refactoring resulted in the final version:

private void waitForStartCommand(DataInputStream inputStream) {
  while (startCommandIsNotReadOn(inputStream)) {
    continue;
  }
}

private boolean startCommandIsNotReadOn(DataInputStream inputStream) {
  return !START_COMMAND.equals(inputStream.readUTF());
}

This example shows pretty well how even rather simple code may need to be refactored several times in order to be highly readably and understandable. Especially code that handles more or less unimportant side aspects, should be as easily to understand as possible in order to avoid conveying the impression of being of major importance.

Hibernate Objects in HTTP Sessions

While discussing common problems and pitfalls in web development
we stumbled on the pitfall of storing hibernate objects in a HTTP session.

While discussing common problems and pitfalls in web development we stumbled on the pitfall of storing hibernate objects in a HTTP session. I do not recommend doing this mainly because of the common pitfall with detached objects.
Usually the lifecycle of a hibernate session is attached to a request/response. So after the end of the response the hibernate object is detached. Accessing non loaded associations of detached objects causes a LazyInitializationException. Even if you try to update or save the detached object you will get an exception if the object isn’t in the hibernate session already. You could re-attach/merge the detached object but this writes your changes on the detached object to the database.
A better way would be to store only the identifier of the hibernate object in the HTTP session,
so you can load the object from the database when you need it.

When it comes to multithreading, better be safe than sorry

Writing multithreaded applications in Java is hard. Here are five problems and how to avoid them without much effort (mostly).

Recently, I attended a code review of the core parts of a web application, written in Java. The application is used by a large customer base and occassionally, there are error reports and exceptions in the log files. Some of these exceptions are the dreaded ConcurrentModificationExceptions, indicating conflicting read/write access on an unsynchronized collection data structure. In the code review, we found several threading flaws, but not after an exhaustive reading of the whole module. Here, I want to present the flaws and give some advice on how to avoid them:

The public lock

In some parts of the code, methods were defined as synchronized through the method declaration keyword:

public synchronized String getLastReservation() { [...]

While there is nothing wrong with this approach in itself, it can be highly dangerous in combination with synchronized blocks. The code above effectively wraps a synchronized block using the object instance (this) as a lock. No information of an object is more publicly visible as the object reference (this), so you have to check all direct or indirect clients of this object if they synchronize on this instance, too. If they do, you have chained two code blocks together, probably without proper mentioning of this fact. The least harmful defect will be performance losses because your code isn’t locked as fine grained as it could be.

The easiest way to avoid these situations it to always hide the locks. Try not to share one object’s locks with other objects. If you choose publicly accessible locks, you can never be sure about that.

The subtle lock change

In one class, there were both instance and class (static) methods, using the synchronized keyword:

public synchronized String getOrderNumberOf(String customerID) { [...]
public  synchronized static int getTotalPendingOrders() { [...]

And while they were both accessing the same collection data structure (a static hashmap), they were using different locks. The lock of the instance method is the instance itself, while the lock of the static method is the class object of the type. This is very dangerous, as it can be easily missed when writing or altering the code.

The best way to prevent this problem it to avoid the synchronized modifier for methods completely. State your locks explicitely, all the time.

Partial locking

In a few classes, collection datatypes like lists were indeed synchronized by internal synchronized-blocks in the methods, using the private collection instance as lock. The synchronized blocks were applied to the altering methods like putX(), removeX() and getX(). But the toString() method, building a comma-separated list of the textual list entries, wasn’t synchronized to the list. The method contained the following code:

public String toString() {
    StringBuilder result = new StringBuilder();
    for (String entry : this.list) {
        result.append(entry);
        result.append(",");
    }
    [...]
    return result.toString();
}

I’ve left out some details and special cases, as they aren’t revelant here. The problem with the foreach loop is that an anonymous Iterator over the list is used and it will relentlessly monitor the list for any changes and throw a ConcurrentModificationException as soon as one of the properly synchronized sections changes it. The toString() method was used to store the list to a session dependent data storage. Every once in a while, the foreach loop threw an exception and failed to properly persist the list data, resulting in data loss.

The most straight-forward solution to this problem might be to add the missing synchronization block in the toString() method. If you don’t want to block the user session while writing to disk, you might traverse the list without an Iterator (and be careful with your assumptions about valid indices) or work on a copy of the list, given that an in-memory copy of the list would be cheap. In an ACID system scenario, you should probably choose to complete your synchronized block guards.

Locking loophole

Another problem was a collection that was synchronized internally, but could be accessed through a getter method. No client could safely modify or traverse the collection, because they had the collection, but not the lock object (that happened to be the collection, too, but who can really be sure about that in the future?). It would be ridiculous to also provide a getter for the lock object (always hide your locks, remember?), the better solution is to refactor the client code to a “tell, don’t ask” style.

To prevent a scenario when a client can access a data structure but not its lock, you shouldn’t be able to gain access to the data structure, but pass “command objects” to the data structure. This is a perfect use case for closures. Effectively, you’ll end up with something like Function or Operation instances that are applied to every element of the collection within a synchronized block and perform your functionality on them. Have a look at op4j for inspirational syntax.

Local locking

This was the worst of all problems and the final reason for this blog entry: In some methods, the lock objects were local variables. In summary, these methods looked like this:

public String getData() {
    Object lock = new Object();
    synchronized (lock) {
        [...]
    }
}

Of course, it wasn’t that obvious. The lock objects were propagated to other methods, stored in datastructures, removed from them, etc. But in the end, each caller of the method got his own lock and could henceforth wreck havoc in code that appeared very well synchronized on first look. The error in its clarity is too stupid to be widespread. The problem was the obfuscation around it. It took us some time to really understand what is going on and where all that lock objects really come from.

My final advice is: If you have to deal with multithreading, don’t outsmart yourself and the next fellow programmer by building complex code structures or implicit relationships. Be as concise and explicit as you can be. Less clutter is more when dealing with threads. The core problem is the all-or-none law of thread synchronization: Either you’ve got it all right or you’ve got it all wrong – you just don’t know yet.

Hide your locks, name your locks explicitely, reduce the scope of necessary locking so that you can survey it easily, never hand out your locked data, and, most important, remove all clutter around your locking structures. This might make the difference between “just works” and endless ominous bug reports.

Embedding Python into C++

In one of our projects the requirement to run small user-defined Python scripts inside a C++ application arose. Thanks to Python’s C-API, nicknamed CPython, embedding (really) simple scripts is pretty straightforward:

Py_Initialize();
const char* pythonScript = "print 'Hello, world!'\n";
int result = PyRun_SimpleString(pythonScript);
Py_Finalize();

Yet, this approach does neither allow running extensive scripts, nor does it provide a way to exchange data between the application and the script. The result of this operation merely indicates whether the script was executed properly by returning 0, or -1 otherwise, e.g. if an exception was raised. To overcome these limitations, CPython offers another, more versatile way to execute scripts:

PyObject* PyRun_String(const char* pythonScript, int startToken, PyObject* globalDictionary, PyObject* localDictionary)

Besides the actual script, this function requires a start token, which should be set to Py_file_input for larger scripts, and two dictionaries containing the exchanged data:

PyObject* main = PyImport_AddModule("__main__");
PyObject* globalDictionary = PyModule_GetDict(main);
PyObject* localDictionary = PyDict_New();
PyObject* result = PyRun_String(pythonScript, Py_file_input, globalDictionary, localDictionary);

Communication between the application and the script is done by inserting entries to one of the dictionaries prior to running the script:

PyObject* value = PyString_FromString("some value");
PyDict_SetItemString(localDict, "someKey", value);

Doing so makes the variable “someKey” and its value available inside the Python script. Accessing the produced data after running the Python script is just as easy:

char* result = String_AsString(PyDict_GetItemString(localDict, "someKey"));

If a variable is created inside the Python script, this variable also becomes accessible from the application through PyDict_GetItemString (or PyDict_GetItem), even if it was not entered into the dictionary beforehand.

The following example shows the complete process of defining variables as dictionary entries, running a small script and retrieving the produced result in the C++ application:

Py_Initialize();
//create the dictionaries as shown above
const char* pythonScript = "result = multiplicand * multiplier\n";
PyDict_SetItemString(localDictionary, "multiplicand", PyInt_FromLong(2));
PyDict_SetItemString(localDictionary, "multiplier", PyInt_FromLong(5));
PyRun_String(pythonScript, Py_file_input, globalDictionary, localDictionary);
long result = PyInt_AsLong(PyDict_GetItemString(localDictionary, "result"));
cout << result << endl;
Py_Finalize();

Readability of Boolean Expressions

Readability of boolean expressions lies in the eyes of the beholder.

Following up on various previous posts on code readability and style I want to provide two more examples today – this time under the common theme of “handling of boolean values”.

Consider this (1a):

bool someMethod()
{
  if (expression) {
    return true;
  } else {
    return false;
  }
}

Yes, there are people who consider this more readable than (1b)

bool someMethod()
{
  return (expression);
}

Another example is this (2a):

  if (someExpression() == true)
    ...

versus my preferred version (2b):

  if (someExpression())
    ...

So what could be the reason for these different viewpoints? One explanation I thought of is as follows: Let’s say you have a background in C and you are therefore used to do something like:

#define FALSE (0)
#define TRUE (!FALSE)

In other words, you may not see boolean as a type of its own, like int and double, with a well-defined value range. Instead you see it more like an enumerated type which makes it feel very naturally do a expression == true comparison.

At the same time it feels not very natural to see the result of a boolean expression as being of type bool with all the consequences – e.g. to be able to return it immediately as in the first example.

Another explanation is that 1a and 2a are as verbose as it can be. You don’t have to make any mental efforts to understand what the code does.

While these may be possible explanations, my guess is that most of you, like me,  still see 1a and 2a as unnecessary visual clutter and consider 1b and 2b as far more readable.