Shrink your dependency list with POCO

POCO is a nice set of C++ libraries which provides elegant solutions for day-to-day tasks.

When you write C++ applications of any sort you are very likely to need support libraries in addition to what comes with C++ (which is not much, btw). Of course, this holds true for any other language. But with Java and its rich JDK for example this need is not so imminent.

Starting at the very beginning, let’s see how fast the need for support arises.

int main(int argc, char** argv)
{
// parsing command line arguments
...

How to parse those command line arguments in a simple and easy way? How about a little help output when the program is called with -h or –help? Ok, we got boost::program_options for this.

Going further in your program you may want to have some sort of logging capability. Unfortunately, as of boost version 1.45 there is nothing to be found there. So you add a nice logging library.

And so on.

But wait! You don’t want to depend on too many 3rd party libraries because, among other things, they add deployment complexity.

Not even Qt, as one of the major players in the C++ framework world, provides solutions to both previous examples. As of version 4.7, no logging and not much support with command line arguments. And you end-up having to use QString, one of many non-std::string classes in C++ frameworks, which can get annoying at times (of course there are reasons why those exist).

I could go on with the list of smaller or larger concerns for which you either roll your own implementation or include yet another library in your project.

Instead I would like to point you to POCO, a nice set of C++ libraries which provide easy solutions for many basic and/or advanced day-to-day tasks. From their website:

Modern, powerful open source C++ class libraries and frameworks for building network- and internet-based applications that run on desktop, server and embedded systems

Besides very basic stuff like logging, date/time handling, threads, memory management, UTF-8, etc. they also provide lots of higher level classes for things like SMTP, POP3, SQL database access and HTTP. They even have a so called C++ Server Page Compiler which is basically something like JSP or Active Server Pages.

And they have no own string class! Yay! Instead they provide lots of functions classes and streams to do string manuipulation on good old std::string.

One thing I like most about POCO, though, is its clean, well-documented and apparently very high quality code. Although it is not overly functional or template-heavy, like you see it in in boost very often, it still provides elegant solutions.

Check it out and shrink your dependency list.

Diving into Hibernate’s Query Cache behaviour

Hibernate is a very sophisticated OR-Mapper and as such has some overhead for certain usage patterns or raw queries. Through proper usage of caches (hibernates featured a L1, L2 cache and a query cache) you can get both performance and convenience if everything fits together. When trying to get more of our persistence layer we performed some tests with the query cache to be able to decide if it is worth using for us. We were puzzled by the behaviour in our test case: Despite everything configured properly we never had any cache hits into our query cache using the following query-sequence:

  1. Transaction start
  2. Execute query
  3. Update a table touched by query
  4. Execute query
  5. Execute query
  6. Transaction end

We would expect that step 5 would be a cache hit but in our case it was not. So we dived into the source of the used hibernate release (the 3.3.1 bundled with grails 1.3.5) and browsed the hibernate issue tracker. We found the issue and correlated it to the issues HHH-3339 and HHH-5210. Since the fix was simpler than upgrading grails to a new hibernate release we fixed the issue and replaced the jar in our environment. So far, so good, but in our test step 5 still refused to produce a cache hit. Using the debugger strangely enough provided us a cache hit when analyzing the state of the cache and everything. After some more brooding and some println()'s and sleep()‘s we found the reason for the observed behaviour in the UpdateTimestampsCache (yes, yet another cache!):

	public synchronized void preinvalidate(Serializable[] spaces) throws CacheException {
		//TODO: to handle concurrent writes correctly, this should return a Lock to the client
		Long ts = new Long( region.nextTimestamp() + region.getTimeout() );
		for ( int i=0; i
			if ( log.isDebugEnabled() ) {
				log.debug( "Pre-invalidating space [" + spaces[i] + "]" );
			}
			//put() has nowait semantics, is this really appropriate?
			//note that it needs to be async replication, never local or sync
			region.put( spaces[i], ts );
		}
		//TODO: return new Lock(ts);
	}

The innocently looking statement region.nextTimestamp() + region.getTimeout() essentially means that the query cache for a certain “region” (e.g. a table in simple cases) is “invalid” (read: disabled) for some “timeout” period or until the end of the transaction. This period defaults to 60 seconds (yes, one minute!) and renders the query cache useless within a transaction. For many use cases this may not be a problem but our write heavy application really suffers because it works on very few different tables and thus query caching has no effect. We are still discussing ways to leverage hibernates caches to improve the performance of our app.

Bug hunting fun with std::sort

Small errors in custom comparison functions used with std::sort can lead to hard-to-find bugs.

The other day I came across a nice little C++-shoot-yourself-in-the-foot at one of our customers. Let’s see how fast you can spot the problem. The following code crashes with segmentation fault sometime, somewhere in the sort call (line 31).

#include <iostream>
#include <vector>
#include <boost/shared_ptr.hpp>
#include <boost/bind.hpp>

using namespace std;
using namespace boost;

enum SORT_ORDER
{
  SORT_ORDER_ASCENDING,
  SORT_ORDER_DESCENDING
};

bool compareValues(const std::string& valueLeft,
                   const std::string& valueRight,
                   SORT_ORDER order)
{
  const bool compareResult = (valueLeft < valueRight);
  if (order == SORT_ORDER_DESCENDING) {
    return !compareResult;
  }
  return compareResult;
}

int main(int argc, char *argv[])
{
  std::vector<std::string> strValues(300);
  std::fill(strValues.begin(), strValues.end(),
            "Hallo");
  std::sort(strValues.begin(), strValues.end(),
            bind(compareValues, _1, _2, SORT_ORDER_DESCENDING));
  return EXIT_SUCCESS;
}

Any ideas? The tricky thing about this bug is that the stacktrace output in the debugger gives absolutely no hint at all about its cause. And this is a simplified version of the real code which has to sort boost::shared_ptrs instead of strings. Believe me, you don’t want to see that stacktrace. Because of the use of boost::bind together with boost::shared_ptrs it looks, well, let’s say intimidating.

Still no idea?

I’ll give you a hint. If the SORT_ORDER is set to SORT_ORDER_ASCENDING everything is fine. …

Ok, the problem is that std::sort algorithm must be given a comparison function (object) that defines a strict weak ordering on the elements that are to be sorted. In other words the comparison function object must implement the ‘<‘ (less than) relationship on the elements.

Unfortunately, lines 20 to 22 break this ordering when SORT_ORDER_DESCENDING is given. The initial idea of this code was that, well, if compareResult gets returned on ascending sort order, lets just return the negation of it when the “negation” of acscending order is requested. This, of course, destroys the strict weak ordering requirement because whenever valueLeft == valueRight, the function returns true, meaning instead that valueLeft < valueRight. And this somehow wreaks havoc inside std::sort.

A better version of the function could be:

...
bool compareValues(const std::string& valueLeft,
                   const std::string& valueRight,
                   SORT_ORDER order)
{
  // solution: return false independent of sort order
  // whenever valueLeft == valueRight
  if (valueLeft == valueRight) {
    return false;
  }
  const bool compareResult = (valueLeft < valueRight);
  if (order == SORT_ORDER_DESCENDING) {
    return !compareResult;
  }
  return compareResult;
}
...

The really annoying thing about this whole issue is that std::sort just randomly crashes with a stack trace that shows nothing but some weird memory corruption going on. After the initial shock, this sends you down the complete wrong bug hunting road where you start looking for spots where memory could be overwritten or the like.

So beware of custom comparison functions or function objects. They might look innocent and easy, but they can give you lot’s of headaches.

Statement against public fields in Java

Every once in a while I talk to people about coding style and sooner or later there is discussion about public fields and getters/setters in Java. I would like to elaborate my opinion on this issue in addition to other quite well balanced articles to a broader audience.

First I want to differentiate properties of a class from other fields/member variables. Properties are fields, whose values are useful and important to clients of the class. We consciously decide to break encapsulation here and provide this data to our clients. The size of a collection may serve as a nice example. Fields on the other hand store state or dependencies our class needs to be fully operational. Datastructures like arrays, data access objects (DAOs) or some kind of notification service may serve as examples here.

The internal implementation of both, properties and fields, should never be exposed because this truely breaks encapsulation and takes away the freedom of the class implementor to change their implementation. At a later time you may decide to compute a value or read it from a database instead of storing it directly . On the other hand properties themselves may well be public and belong to the API of our class.

Now on to Java. There is no native property support in the Java language as it does not support the uniform access principle using language constructs. In other languages like Python, Ruby, Groovy or Scala you can change from direct field access to accessor methods without changing the clients, so it is no problem to expose fields (or more precisely properties) and thus make them public or protected. To gain the same degree of freedom in Java you have to emulate properties by using the getter/setter convention of Java Beans. You have to trade conciseness of public fields against this freedom and you really should do it. An IDE can generate the accessors and fold the methods away from your sight. The cost of getters/setters is really negligible.

Now we can derive the conclusions for Java programers. With each member variable you introduce you have to decide if it is a property or just some internal field. For properties you may provide getters and/or setters with appropriate visibility when needed. That means you should not provide accessor methods for all of your fields. In general you should never expose fields directly and all instance variables should be private. Not doing so will remove the freedom to change class internals without affecting the clients. Once a class with exposed internals is published as part of an API it is almost impossible to change internal design decisions.

Developing Grails Apps – Some Dark Sides

Most of the time, developing Grails apps is a nice experience. But there are also dark sides. One of which is when bugs do appear or do not appear depending on how you started your app.

Usually, I try to avoid it but this time a Disclaimer is in order: This is not a Grails rant. Most of the time developing Grails projects is fast and smooth. Using Grails brought many advantages for us. But there are also dark sides…

My main criticism is that Grails abstractions are more than leaky! In every list of examples for the definition of the term Leaky Abstraction Grails should be top. As soon as you leave the tutorial/scaffolding/helloworld level you have to know a lot about the underlying stack. And with Hibernate and Spring neither of the words small, easy and lightweight do apply.

GORM, too, is only easy to use at first sight. The very informative blog series about GORM gotchas should absolutely become part of the user guide or the refence docs.

And there are those times where it gets really unpleasant. This is e.g. when a bug does appear in your grails application running in a servlet container (packaged in a .war)  but does not appear when the application is started from within the IDE. Our last one of those was a naming conflict in a .gsp file. The controller handed a model like this to the .gsp:

...
return [fieldValue: 'THE_VALUE', ...]

The model entry ‘fieldValue’ was used in the .gsp to set the value of a combo box. Unfortunately, ‘fieldValue’ is also the name of a built-in Grails tag

Admittedly, ‘fieldValue’ was not the wisest choice of names and I would certainly expect to get scolded loudly by Grails for that – ideally with a nice descriptive exception. But what happend instead led to a loud scolding of Grails from us. And to some big question marks: What is the difference between executing Grails from the IDE and within a servlet container with respect to naming resolution? Why is there a difference, at all?

We had a hard time figuring out this one, not least because the error message was not very telling. And since this was not the first of those works-in-the-IDE-but-not-in-a-real-environment bugs there is always this slightly uneasy feeling…

As I said in the beginning, most of the time developing Grails applications is nice and shiny. I would not support their slogan, though. My personal search for the best web development tool is definitively not over.

How about your search?

Combine cobertura with the awesomeness of crap4j

Want the awesomeness of crap4j without running your tests twice in your build? Just combine it with your cobertura data using crapertura.

You may have heard of crap4j when it was still actively developed. Crap4j is a software metric that points you to “crappy” methods in your projects by combining cyclomatic complexity numbers with test coverage data. The rationale is that overly complex code can only be tamed by rigorous testing or it will quickly reduce to an unmaintainable mess – the feared “rotten code” or “crappy code”, as Alberto Savoia and Bob Evans, the creators of crap4j would put it. The crap4j metric soon became our most important number for every project. It’s highly significant, yet easy to grasp and mandates a healthy coding style.

Some enhancements to crap4j

Crap4j got even better when we developed our own custom enhancements to it, like the CrapMap or the crap4j hudson plugin. We have a tool that formats the crap4j data like cobertura’s report, too.

A minor imperfection

The only thing that always bugged me when using crap4j inside our continuous integration build cycle was that at least half the data was already gathered. Cobertura calculates the code coverage of our tests right before crap4j does the same again. Wouldn’t it be great if the result of the first analysis could be re-used for the crap metric to save effort and time?

Different types of coverage

Soon, I learnt that crap4j uses the “path coverage” to combine it with the complexity of a method. This is perfectly reasonable given that the complexity determines the number of different pathes through the method. Cobertura only determines the “line coverage” and “branch coverage”. As it stands, you can’t use the cobertura data for crap4j because they represent different approaches to measure coverage. That’s still true and probably will be for a long time. But the allurement of the shortcut approach was too high for me to resist. I just tried it out one day to see the real difference.

A different metric

So, here it is, our new metric, heavily inspired by crap4j. I just took the line and branch coverage for every method and multiplied them. If you happen to have a perfect coverage (1.0 on both numbers), it stays perfect. If you only have 75% coverage on both numbers, it will result in a “crapertura coverage” of 56,25%. Then I fed this new coverage data into crap4j and compared the result with the original data. Well, it works on my project.

Presenting crapertura

Encouraged by this result, I wrote a complete ant task that acts similar to the original crap4j ant task. You can nearly use it as a drop-in replacement, given that the cobertura XML report file is already present. Here is an example ant call:


<crapertura
coberturaReportFile="/path/to/cobertura/coverage.xml"
targetDirectory="/where/to/place/the/crap4j/report"
classesDirectory="/your/unarchived/project/class/files"
/>

It will output the usual crap4j report files to the given target directory. Please note that even if it looks like crap4j data, it’s a different metric and should be treated as such. Therefore, online comparison of numbers is disabled.

The whole project is published on github. Feel free to browse the code and compile it for yourself. If you want a binary release, you might grab the latest jar from our download server.

The complete usage guide can be found on the github page or inside the project. If you have questions or issues, please use the comment section here.

Conclusion

If crapertura is able to give you nearly the numbers that crap4j gave you is up to your project, really. Our test project contained over 20k methods, but very little crap. The difference between crap4j and crapertura was negligible. Both metrics basically identified the same methods as being crappy. Your mileage may vary, though. If that’s the case, let us know. If your experience is like ours, you’ve just saved some time in your build cycle without sacrificing quality.

GORM Gotchas: Validation and hasMany

Using validation on the end of hasMany associations yields unexpected results.

The excellent GORM Gotchas Series inspired me to write about a gotcha I found recently.
You have a domain class Container which contains elements:

class Container {
  static hasMany = [elements:Element]

  List<Element> elements
}

and the element has a constraint:

class Element {
  static belongsTo = [container:Container]

  static constraints = {
    email(email: true)
  }

  String email
}

When you try to save a container with more than one element that fails validation, only the first error appears:

Container c = new Container()
c.addToElements(new Element(email: "a"))
c.addToElements(new Element(email: "b"))
c.save()
assertEquals(2, c.errors.allErrors.size()) // fails, only one error is recorded!

The solution described in the docs coined with In some situations (unusual situations)) is to use a custom validator in the container class:

class Container {

  static constraints = {
      elements(validator: { List val, Container obj ->
          val.each {Element element ->
            if(! element.validate()) {
              element.errors.allErrors.each { error->
                obj.errors.rejectValue(
                      'elements',
                      error.getCode(),
                      error.getArguments(),
                      error.getDefaultMessage()
                )
              }
            }
          }
          return true
      })
  }

  static hasMany = [elements:Element]

  List<Element> elements
}

Responsive Qt GUIs – Threading with Qt

Qt4 used to have only primitive threading support. Starting with version 4.4 new classes and functions makes your threading life a lot easier. So in case you haven’t come around to look at those features, do it now!, it’s worth it.

If you have used Qt4 for some time now, specifically since pre 4.4 versions, you may or may not aware of the latest developments in the threading part of the library. This post shall be a reminder in case you didn’t follow the versions in detail or just didn’t get around to look closer and/or update.

In pre 4.4 versions, the only way to do threading was to use class QThread. Subclass QThread, implement the run method, and there you had your thread. This looks fine at first, but, taking the title of the post as example, it can get annoying very fast. Sometimes you have just few lines of code you want to keep away from the GUI thread because, e.g. they could potentially block on some communication socket. Subclassing QThread for every small little work package is not something you want to do, so I guess many users just wrote their own thread pool or the like.

Starting with version4.4. Qt gained two major threading features, for which, IMHO, the Qt people do not a very good job of advertising. The first is QThreadPool together with QRunnable. All Java programmers, which use java.lang.Runnable since the beginning, may have their laugh now, I’ll wait…

The second new threading feature is the QtConcurrent namespace (from the Qt documentation):

The QtConcurrent namespace provides high-level APIs that make it possible to write multi-threaded programs without using low-level threading primitives such as mutexes, read-write locks, wait conditions, or semaphores

Sounds great! What else?

QtConcurrent includes functional programming style APIs for parallel list prosessing, including a MapReduce and FilterReduce implementation for shared-memory (non-distributed) systems, and classes for managing asynchronous computations in GUI applications.

This is really great stuff. Functions like QtConcurrent::run together with QFuture<T> and QFutureWatcher<T> can simplify your threading code significantly. So, if you haven’t got around to look at those new classes by now, I can only advise you to do it immediately. Allocate a refactoring slot in your next Sprint to replace all those old QThread sub-classes by shiny new QRunnables or QtConcurrent functions. It’s worth it!

Let’s get back to the responsive GUIs example. In his Qt Quarterly article, Witold Wysota describes in detail every technical possibility to keep your GUI responsive. It’s a very good article which provides a lot of insights. He starts with manual event processing and mentions the QtConcurrent features only at the very end of the article. I suggest the following order of threading-solutions-to-consider:

  1. QtConcurrent functions
  2. QThreadPool + QRunnable
  3. rest

Stay responsive!

Scala: Easier to read (and write), harder to understand?

There is a vivid discussion about Scala’s complexity going on for some weeks now on the web even with a response from Martin Odersky. I want to throw my 2¢ together with some hopefully new aspects into the discussion.
Scala’s liberal syntax rules and compiler magic like type inference and implicit conversions allow nicely written APIs and DSLs almost looking like prose texts. Take a look at APIs like scalatest and imagine the Java/Junit equivalent:

@Test def demonstrateScalaTest() {
  val sb = new StringBuilder("Scala")
  sb.append(" is cool!")
  sb.toString should be ("Scala is cool!")
  evaluating { "concise".charAt(-1) } should produce [StringIndexOutOfBoundsException]
}

There are really nice features that reduce day-to-day programming tasks to keywords or one-liners. Here are some examples:

// singletons have their own keyword (object), static does not exist!
object MySingleton {
  def printMessage {
    println("I am the only one")
 }
}

// lazy initialization/evaluation
lazy val complexResult = computeForHours()

// bean-style data container with two scala properties and one java-bean property with getter+setter
class Data(val readOnly: String, var readWrite: Int, @BeanProperty var javaProperty: String)

// tuples as return values or quick data transfer objects (DTO) for methods yielding multiple data objects
def correctCoords(x: Double, y: Double) = (x + 12, y * 0.5)
val (correctedX, correctedY) = correctCoords(0.37, 34.2)
println("corrected: " + correctedX + ", " + correctedY)

On the other hand there are so many features built-in that really make it hard to understand the code if you are not scala programmer with some experience. I like the differentiation between application and library code Martin Odersky himself makes in Programming Scala. The frameworks I have tried so far (Lift, scalatest and scala-swing) in Scala make your life very easy as long as you just use them. It is really a breeze and much more fun than using most APIs in Java for example. But when something goes wrong or you really want/have to understand what is going on you can have a hard time. This is true at least for a Scala beginner, sometimes perhaps for an pro, too.

Final Thoughts
In my opinion Scala is a very nice language that successfully combines clean object oriented programming with functional features. You can migrate from a pure OO-style to a nice hybrid “Scala-style” like many programmers did when they first used Java mostly with procedural style using classes only as namespaces for their static methods. I am quite sure that a Scala code style and best practices still have to develop. Programmers will need their time diving into the language and using it for their benefit. I hope Scala prospers and gains attention in the industry because I personally think it is a nice step forward compared to Java (which turns more and more into a mess where you need profound knowledge to fight your problems).

Regarding the complexity, which certainly exists in Scala, I only want to raise some questions which may be answered sometime in the future:

  • Maybe the tooling is just not there (yet)?
  • Maybe you sometimes just don’t have to understand everything what’s happening underneath?
  • Maybe Scala makes debugging much more seldom but harder, when something does not work out?
  • Maybe the features and power of Scala are worth learning?
  • Maybe certain features will just be banned by the teams like sometimes in Java teams (think of switch-case, the ?-operator, Autoboxing e.g.)?