Use Boost’s Multi Index Container!

Boost’s multi index container is a very cool and useful piece of code. Make it a part of your toolbox. You can start slowly by replacing uses of std::set and std::multiset with simple boost::multi_index_containers.

Sometimes, after you have used a special library or other special programming tool for a job, you forget about it because you don’t have that specific use case anymore. Boost’s multi_index container could fall in this category because you don’t have to hold data in memory with the need to access it by different keys all the time.

Therefore, this post is intended to be a reminder for c++ programmers that there exists this pretty cool thing called boost::multi_index_container and that you can use it in more situations than you would think at first.

(If you’re already using it on a regular basis you may stop here, jump directly to the comments and tell us about your typical use cases.)

I remember when I discovered boost::multi_index_container I found it quite intimidating at first sight. All those templates that are used in sometimes weird ways can trigger that feeling if you are not a template metaprogramming specialist (i.e. haven’t yet read Andrei Alexandrescu’s book “Modern C++ Design” ).

But if you look at it after you fought your way through the documentation and after your unit test is green that tests your first example, it doesn’t look that complicated anymore.

My latest use case for boost::multi_index_container was data objects that should be sorted by two different date-times. (For dates and times we use boost::date_time, of course). At first, the requirement was to store the objects sorted by one date time. I used a std::set for that with a custom comparator. Everything was fine.

With changing requirements it became necessary to retrieve objects by another date time, too. I started to use another std::set with a different comparator but then I remembered that there was some cool container somewhere in boost for which you can define multiple indices ….

After I had set it up with the two date time indices, the code also looked much cleaner because in order to update one object with a new time stamp I could just call container->replace(…) instead of fiddling around with the std::set.

Furthermore, I noticed that setting up a boost::multi_index_container with a specific key makes it much clearer what you intend with this data structure than using a std::set with a custom comparator. It is not that much more typing effort, and you can practice template metaprogramming a little bit 🙂

Let’s compare the two implementations:

#include <boost/shared_ptr.hpp>
#include <boost/date_time/posix_time/posix_time.hpp>
using boost::posix_time::ptime;

// objects of this class should be stored
class MyDataClass
{
  public:
    const ptime& getUpdateTime() const;
    const ptime& getDataChangedTime() const;

  private:
    ptime _updateTimestamp;
    ptime _dataChangedTimestamp;
};
typedef boost::shared_ptr<MyDataClass> MyDataClassPtr;

Now the definition of a multi index container:

#include <boost/multi_index_container.hpp>
#include <boost/multi_index/ordered_index.hpp>
#include <boost/multi_index/mem_fun.hpp>
using namespace boost::multi_index;

typedef multi_index_container
<
  MyDataClassPtr,
  indexed_by
  <
    ordered_non_unique
    <
      const_mem_fun<MyDataClass, 
        const ptime&, 
        &MyDataClass::getUpdateTime>
    >
  >
> MyDataClassContainer;

compared to std::set:

#include <set>

// we need a comparator first
struct MyDataClassComparatorByUpdateTime
{
  bool operator() (const MyDataClassPtr& lhs, 
                   const MyDataClassPtr& rhs) const
  {
    return lhs->getUpdateTime() < rhs->getUpdateTime();
  }
};
typedef std::multiset<MyDataClassPtr, 
                      MyDataClassComparatorByUpdateTime> 
   MyDataClassSetByUpdateTime;

What I like is that the typedef for the multi index container reads almost like a sentence. Besides, it is purely declarative (as long as you get away without custom key extractors), whereas with std::multiset you have to implement the comparator.

In addition to being a reminder, I hope this post also serves as motivation to get to know boost::multi_index_container and to make it a part of your toolbox. If you still have fears of contact, start small by replacing usages of std::set/multiset.

Don’t mix C++ smart pointers with references

This post will teach by example that mixing smart pointers with references in c++ is not a particularly good idea.

As I did in the past, I will use this post as means to remember and to push the following principle deeper in my head – and hopefully in yours as a reader and C++ programmer:

Do not mix smart pointers with references in your C++ programms.

Of course I knew that before I created this little helper library, that was supposed to make it easier to send data asynchronous over an existing connection. Here is the situation (simplified):

class A
{
  ...
  void doStuff();

  private:
     // a private shared_ptr to B
    boost::shared_ptr<B> _bPointer;
};

class C
{
  public:
    C(B& b) : _b(b)
    {}

    ~C()
    {
      _bRef.resetSomeValueToDefault();
    }

  private:
     // a private reference to B which is set in the ctor
    B& _bRef;
};

void A::doStuff()
{
  createBpointerIfNotExisting();
  C myC(*_bPointer);
  myC.someMethodThatDoesSomethingWithB();
  if (someCondition) {
    // Delete this B instance.
    // A new instance will be created next time
    _bPointer.reset();
  }
}

So class A has a shared pointer of B which is given as a reference to an instance of class C in method A::doStuff. Class C stores the B instance as reference and interacts with it during its lifetime, which ends at the end of A::doStuff.

The last interaction occurrs at the very end of its life – in the destructor.

I highlighted the most important facts, but I’ll give you a few more moments …

The following happens (in A::doStuff):

  • line 29: if no instance of B exists (i.e. _bPointer is null), a new B instance is created and held in _bPointer
  • line 30: instance myC of C is created on the stack. A reference of B is given as ctor parameter
  • line 32-35: if “someCondition” is true, _bPointer is reseted which means that the B instance gets deleted
  • line 37: A::doStuff() ends and myC goes out of scope
  • line 19: the destructor of C is called and _bRef is accessed
  • since the B instance does not exist any more … memory corruption!!!

The most annoying thing with this kind of errors is that the program crashes somewhere, but almost never where the error actually occurred. This means, that you get stack traces pointing you right into some rock-solid 3rd party library which had never failed since you know and use it, or to some completely unrelated part in your code that had worked without any problems before and hasn’t been changed in years.

I even had these classes unit tested before I integrated them. But for some strange reason – maybe because everything gets reset after each test method – the bug never occurred in the tests.

So always be very cautious when you mix smart pointers with references, and when you do, make sure you have your object lifetimes completely under control!

The Great Divide

There is a great divide in the C++ developer community between “normal” developers that use only basic language features and very savvy ones that know every little corner of the language. The upcoming C++ standard deepens this divide even more.

Recently, I had two very contrary conversations about C++ which show very good the great divide in C++ developer community.

The first was with the technical lead of a team that writes and maintains drivers and control software for a scientific institution. These systems run 24/7 and have to be very stable and reliable.

I had discovered that they use a self-written toolbox library containing classes like SharedPtr<T>, and Thread and suspected immediately a classical NIH-syndrome. I asked him about it and why they don’t use well established libraries like boost. He told me that they indeed are only using the standard library and their own toolbox.

The reason he gave was that despite boost being most elegant C++ library out there, it required very good knowledge about the most advanced C++ mechanisms, and that his team was not on this level … I should probably mention here that his team does a very good job in running their systems. So, apparently, they get along very well with using only basic  C++ features and no “fancy” boost stuff.

The other conversation was with a friend of mine with whom I chat regularly about all sorts of programming related stuff. This time the topic was the upcoming  C++ standard and all its  exciting new stuff. He has lot’s of experience with C++ and knows the language very well. But even someone like him had a hard time to really understand what rvalue references are all about. I had not looked at them in detail, yet,  so he tried to explain them to me. During our discussion I was thinking about if teams like the one introduced before will ever use rvalue references, or other C++0X stuff in their production code, other than maybe the auto keyword for type inference, or constructor delegation.

Honestly, I don’t think stuff like  rvalue refs will become a feature that is often used by “standard industry” teams, because it adds a lot of complexity to an already complex language. Even easy-to-get stuff like the new keywords override, constexpr and final, or additional initialization means like std::initializer_list<T> will take a lot of time to get used regularly by most C++ teams.

Instead, most of C++0X will greatly increase the divide between “normal” C++ developers who get along well with using only basic language features, and experts that know every little corner of the language. And this is simply because there is so much more to know with C++0X.

But don’t let us paint this picture overly black. I, for one, am looking forward to the new standard and I will certainly spread the word about the new possibilities and features in every C++ team I work with.

Bug hunting fun with std::sort

Small errors in custom comparison functions used with std::sort can lead to hard-to-find bugs.

The other day I came across a nice little C++-shoot-yourself-in-the-foot at one of our customers. Let’s see how fast you can spot the problem. The following code crashes with segmentation fault sometime, somewhere in the sort call (line 31).

#include <iostream>
#include <vector>
#include <boost/shared_ptr.hpp>
#include <boost/bind.hpp>

using namespace std;
using namespace boost;

enum SORT_ORDER
{
  SORT_ORDER_ASCENDING,
  SORT_ORDER_DESCENDING
};

bool compareValues(const std::string& valueLeft,
                   const std::string& valueRight,
                   SORT_ORDER order)
{
  const bool compareResult = (valueLeft < valueRight);
  if (order == SORT_ORDER_DESCENDING) {
    return !compareResult;
  }
  return compareResult;
}

int main(int argc, char *argv[])
{
  std::vector<std::string> strValues(300);
  std::fill(strValues.begin(), strValues.end(),
            "Hallo");
  std::sort(strValues.begin(), strValues.end(),
            bind(compareValues, _1, _2, SORT_ORDER_DESCENDING));
  return EXIT_SUCCESS;
}

Any ideas? The tricky thing about this bug is that the stacktrace output in the debugger gives absolutely no hint at all about its cause. And this is a simplified version of the real code which has to sort boost::shared_ptrs instead of strings. Believe me, you don’t want to see that stacktrace. Because of the use of boost::bind together with boost::shared_ptrs it looks, well, let’s say intimidating.

Still no idea?

I’ll give you a hint. If the SORT_ORDER is set to SORT_ORDER_ASCENDING everything is fine. …

Ok, the problem is that std::sort algorithm must be given a comparison function (object) that defines a strict weak ordering on the elements that are to be sorted. In other words the comparison function object must implement the ‘<‘ (less than) relationship on the elements.

Unfortunately, lines 20 to 22 break this ordering when SORT_ORDER_DESCENDING is given. The initial idea of this code was that, well, if compareResult gets returned on ascending sort order, lets just return the negation of it when the “negation” of acscending order is requested. This, of course, destroys the strict weak ordering requirement because whenever valueLeft == valueRight, the function returns true, meaning instead that valueLeft < valueRight. And this somehow wreaks havoc inside std::sort.

A better version of the function could be:

...
bool compareValues(const std::string& valueLeft,
                   const std::string& valueRight,
                   SORT_ORDER order)
{
  // solution: return false independent of sort order
  // whenever valueLeft == valueRight
  if (valueLeft == valueRight) {
    return false;
  }
  const bool compareResult = (valueLeft < valueRight);
  if (order == SORT_ORDER_DESCENDING) {
    return !compareResult;
  }
  return compareResult;
}
...

The really annoying thing about this whole issue is that std::sort just randomly crashes with a stack trace that shows nothing but some weird memory corruption going on. After the initial shock, this sends you down the complete wrong bug hunting road where you start looking for spots where memory could be overwritten or the like.

So beware of custom comparison functions or function objects. They might look innocent and easy, but they can give you lot’s of headaches.

How much boost does a C++ newbie need?

The other day, I talked to a C++ developer, who is relatively new in the language, about the C++ training they just had at his company. The training topics were already somewhat advanced and contained e.g. STL containers and their peculiarities, STL algorithms and some boost stuff like binders and smart pointers. That got me thinking about how much of STL and boost does a C++ developer just has to know in order to survive their C++ projects.

There is also another angle to this. There are certain corners of the C++ language, e.g. template metaprogramming, which are just hard to get, even for more experienced developers. And because of that, in my opinion, they have no place in a standard industry C++ project. But where do you draw the line? With template meta-programming it is obvious that it probably will never be in every day usage by Joe Developer. But what about e.g. boost’s multi-index container or their functional programming stuff? One could say that it depends on the skills of team whether more advanced stuff can be used or not. But suppose your team consist largely of C++ beginners and does not have much experience in the language, would you want to pass on using Boost.Spirit when you had to do some serious parsing? Or would you want to use error codes instead of decent exceptions, because they add a lot more potentially “invisible” code paths? Probably not, but those are certainly no easy decisions.

One of the problems with STL and boost for a C++ beginner can be illustrated with the following easy problem: How do you convert an int into a std::string and back? Having already internalized the stream classes the beginner might come up with something like this:

 int i = 5;
 std::ostringstream out;
 out << i;
 std::string i_string = out.str();  

 int j=0;
 std::istringstream in(i_string);
 in >> j;
 assert(i == j);

But if he just had learned a little boost he would know that, in fact, it is as easy as this:

 int i=5;
 std::string i_string = boost::lexical_cast<std::string>(i);

 int j = boost::lexical_cast<int>(i_string);

So you just have to know some basic boost stuff in order to write fairly decent C++ code. Besides boost::lexical_cast, which is part of the Boost Conversion Library, here is my personal list of mandatory boost knowledge:

Boost.Assign: Why still bother with std::map::push_back and the likes, if there is a much easier and concise syntax to initialize containers?

Boost.Bind (If you use functional programming): No one should be forced to wade through the mud of STL binders any longer. Boost::bind is just so much easier.

Boost.Foreach: Every for-loop becomes a code-smell after your first use of BOOST_FOREACH.

Boost.Member Function: see Boost.Bind

Boost.Smart Pointers: No comment is needed on that one.

As you can see, these are only the most basic libraries. Other extremely useful things for day-to-day programming are e.g. Boost.FileSystem, Boost.DateTime, Boost.Exceptions, Boost.Format, Boost.Unordered and Boost.Utilities.

Of course, you don’t have to memorize every part of the boost libraries, but boost.org should in any case be the first address to look for a solution to your daily  C++ challenges.

Observer/Listener structures in C++ with boost’s smart pointers

Whenever you are developing sufficiently large complex programs in languages like C++ or Java you have to deal with memory issues. This holds true especially when your program is supposed to run 24/7 or close to that. Because these kinds of issues can be hard to get right Java has this nice little helper, the garbage collector. But as Java solves all memory problems, or maybe not? points out, you can still easily shoot yourself in foot or even blow your whole leg away.  One of the problems stated there is that memory leaks can easily occur due to incorrect listener relations. Whenever a listener is not removed properly, which is either a large object itself or has references to such objects,  it’s only a matter of time until your program dies with “OutOfMemoryError” as its last words.  One of the proposed solutions is to use Java weak pointers for listener management.  Let’s see how this translates to C++.

Observer/listener management in C++ is often done using pointers to listener objects. Pointers are pretty weak by default. They can be :

  • null
  • pointing to a valid object
  • pointing to an invalid memory address

In listener relationships especially the latter can be a problem. For example, simple listener management could look like this:

   class SimpleListenerManagement
   {
   public:
      void addListener(MyListener* listener);
      void removeListener(MyListener* listener);
      void notifyListeners();
   private:
      std::list<MyListener*> listeners_;
   };

   void SimpleListenerManagement::notifyListeners()
   {
      // call notify on all listeners
      for (std::list<MyListener*>::iterator iter = listeners_.begin();
          iter != listeners_.end();
          ++iter)
      {
         (*iter)->notify(); // may be a bad idea!
      }
   }

In notifyListeners(), the pointer is used trusting that it still points to a valid object. But if it doesn’t, for instance because the object was deleted but the client forgot to removed it from the listener management, well, too bad.

Obviously, the situation would be much better if we didn’t use raw pointers but some kind of wrapper objects instead.  A first improvement would be to use boost::shared_ptr in the listener management:

   typedef boost::shared_ptr<MyListener> MyListenerPtr;

   class SimpleListenerManagement
   {
   public:
      void addListener(MyListenerPtr listener);
      void removeListener(MyListenerPtr listener);
      void notifyListeners();
   private:
      std::list<MyListenerPtr> listeners_;
   };

Provided that the given MyListenerPtr instance was created correctly by the client we can be sure now that all listeners exist when we call notify() on them.  Seems much better now. But wait! Using boost::shared_ptr, we now hold  strong references in our listeners list and are therefore kind of in the same situation as described in the post mentioned above. If the client forgets to remove its MyListenerPtr instance it never gets deleted and may be in a invalid state next time notify() is called.

A solution that works well in most cases is to use boost::weak_ptr to hold the listeners. If you see boost::shared_ptr on a level with normal Java references, boost::weak_ptrs are roughly the same as Java’ s weak references. Our listener management class would then look like this:

   typedef boost::shared_ptr<MyListener> MyListenerPtr;
   typedef boost::weak_ptr<MyListener> MyListenerWeakPtr;

   class SimpleListenerManagement
   {
   public:
      void addListener(MyListenerPtr listener);
      void removeListener(MyListenerPtr listener);
      void notifyListeners();
   private:
      std::list<MyListenerWeakPtr> listeners_; // using weak_ptr
   };

Note that addListener and removeListener still use MyListenerPtr as parameter. This ensures that the client provides valid listener objects.  The interesting stuff happens in notifyListeners():

   void SimpleListenerManagement::notifyListeners()
   {
      std::list<MyListenerWeakPtr>::iterator iter = listeners_.begin();
      while(iter != listeners_.end())
      {
         if ((*iter).expired())
         {
            iter = listeners_.erase(iter);
         }
         else
         {
            MyListenerPtr listener = (*iter).lock(); // create a shared_ptr from the weak_ptr
            listener->notify();
            ++iter;
         }
      }
   }

Each weak_ptr can now be checked if its object still exists before using it. If the weak_ptr is expired, it can simply be removed from the listeners list. With this implementation the removeListener method becomes optional and can as well be omitted. The client only has to make sure that the shared_ptr holding the listener gets deleted somehow.