Special upgrade notes for Grails 1.3.x to 2.2.x

Usually there are quite extensive upgrade notes that should take you from one Grails release to another. Every now and then there are subtle changes in behaviour that may break your application without being mentioned in the notes. We are maintaining some Grails applications started years ago in the Grails 1.0.x era and a bucket full of experience upgrading between major releases.

Here are our special upgrade notes for 1.3.x to 2.2.x:

  • domain constructors with default parameters lead to DuplicateMethodErrors. The easy fix is to change code like
    public MyDomain(def number = 0) {
        ...
    }
    

    to

    public MyDomain() {
        this(0)
    }
    
    public MyDomain(def number) {
        ...
    }
    
  • private static classes are disallowed in controllers. So in general avoid visibility modifiers for multiple classes in one file.
  • If you use Apache Shiro with the Grails Shiro Plugin for authentication, you will have to do some work for existing accounts to stay working because the default CredentialMatcher changed from SHA1 to SHA256. To get the old behaviour add the following to conf/spring/resources.groovy:
    import org.apache.shiro.authc.credential.Sha1CredentialsMatcher
    
    beans = {
        ...
        credentialMatcher(Sha1CredentialsMatcher) {
            storedCredentialsHexEncoded = true
        }
        ...
    }
    
    
  • A domain class property or even a domain class with the name “environment” clashes(d) with a spring bean (GRAILS-7851) and leads to unexpected effects. Renaming the property or class is a viable workaround.
  • Namespacing in tag libs is broken so that you cannot name a local variable “properties”:
        def myTag = { attrs, body ->
            String properties = 'some string'
    

    leads to a bogus error
    [groovyc] TagLib.groovy: -1: The return type of java.lang.String getProperties() in TagLib$_closure24_closure87 is incompatible with java.util.Map getProperties() in groovy.lang.Closure.Simply renaming the variable to something like props fixes the problem.

  • Migrations need package statements if you organize them in subdirectories.

In addition to the changes mentioned in the official release notes solving the issues above made our application work again with the latest and greatest Grails release.

Scaling your web app: Cache me if you can

Invalidation and transaction aware caching using memcached with Grails as an example

One of the biggest problems of caches is how and when do I invalidate my cache content? When you read outdated data from the cache you are toast.
For example we have a list of children elements inside a parent. Normally you would cache the children under the parent’s id:

cache[parent.id] = children

But how do you know if your cache content is still valid? When one child or the list of children changes you write the new content into the cache

cache[parent.id] = newChildren

But when do you update the cache? If you place the update code where the list of children is modified the cache is updated before transaction has ended. You break the isolation. Another point would be after the transaction has been committed but then you have to track all changes. There is a better way: use a timestamp from the database which is also visible to other transactions when it is committed. It should also be in the parent object because you need this object for the cache key nonetheless. You could use lastUpdated or another timestamp for this which is updated when the children collection changes. The cache key is now:

cache[parent.id + '_' + parent.lastUpdated]

Now other transactions read the parent object and get the old timestamp and so the old cache content before the transaction is committed. The transaction itself gets the new content. In Grails if you change the collection lastUpdated is automatically updated and in Rails with belongs_to and touch even a change in a child updates the lastUpdate of the parent – no manual invalidation needed.

Excourse: using memcached with Grails

If you want to use memcached from the JVM there is a good library which wraps common calls: spymemcached. If you want to use spymemcached from Grails you drop the jar into your lib folder and wrap it in a Service:

class MemcachedService implements InitializingBean {
  static final Object NULL = "NULL"
  def MemcachedClient memcachedClient

  def void afterPropertiesSet() {
    memcachedClient = new MemcachedClient(
      new ConnectionFactoryBuilder().setTranscoder(new CustomSerializingTranscoder()).build(),
      AddrUtil.getAddresses("localhost:11211")
    )
  }

  def connected() {
    return !memcachedClient.availableServers.isEmpty()
  }

  def get(String key) {
    return memcachedClient.get(key)
  }

  def set(String key, Object value) {
    memcachedClient.set(key, 600, value)
  }

  def clear() {
    memcachedClient.flush()
  }
}

Spymemcached serializes your cache content so you need to make all your cached classes implement Serializable. Since Grails uses its own class loaders we had problems with deserializing and used a custom serializing transcoder to get the right class loader (taken from this issue):

public class CustomSerializingTranscoder extends SerializingTranscoder {

  @Override
  protected Object deserialize(byte[] bytes) {
    final ClassLoader currentClassLoader = Thread.currentThread().getContextClassLoader();
    ObjectInputStream in = null;
    try {
      ByteArrayInputStream bs = new ByteArrayInputStream(bytes);
      in = new ObjectInputStream(bs) {
        @Override
        protected Class<ObjectStreamClass> resolveClass(ObjectStreamClass objectStreamClass) throws IOException, ClassNotFoundException {
          try {
            return (Class<ObjectStreamClass>) currentClassLoader.loadClass(objectStreamClass.getName());
          } catch (Exception e) {
            return (Class<ObjectStreamClass>) super.resolveClass(objectStreamClass);
          }
        }
      };
      return in.readObject();
    } catch (Exception e) {
      e.printStackTrace();
      throw new RuntimeException(e);
    } finally {
      closeStream(in);
    }
  }

  private static void closeStream(Closeable c) {
    if (c != null) {
      try {
        c.close();
      } catch (IOException e) {
        e.printStackTrace();
      }
    }
  }
}

With the connected method you can check if any memcached instances are available. Which is better than calling a method and waiting for the timeout.

def connected() {
  return !memcachedClient.availableServers.isEmpty()
}

Now you can inject your Service where you need to and cache along.

Cache the outermost layer

If you use Hibernate you get database based caching almost for free, so why bother using another cache? In one application we used Hibernate to fetch a large chunk of data from the database and even with caches it took 100 ms. Measuring the code showed that the processing of the data (conversion for the client) took by far the biggest chunk. Caching the processed data lead to 2 ms for the whole request. So one take away is here that caching the result of (user indepedent) calculations and conversions can speed up your request even further. When you got static resources you could also use HTTP directives.

Small cause – big effect (Story 1)

This is a story about a small programming error that caused a company-wide chaos.

Today’s modern software development is very different from the low-level programming you had to endure in the early days of computing. Computer science invented concepts like “high-level programming languages” that abstract a lot of the menial tasks away that are necessary to perform the desired functionality. Examples of these abstractions are register allocation, memory management and control structures. But even if we step into programming at a “high level”, we build whole cities of skyscrapers (metaphorically speaking) on top of those foundation layers. And still, if we make errors in the lowest building blocks of our buildings, they will come crashing down like a tower in a game of jenga.

Paris_Tuileries_Garden_Facepalm_statueThis story illustrates one such error in a small and low building block that has widespread and expensive effects. And while the problem might seem perfectly avoidable in isolation, in reality, there are no silver bullets that will solve those problems completely and forever.

The story is altered in some aspects to protect the innocent, but the core message wasn’t changed. I also want to point out that our company isn’t connected to this story in any way other than being able to retell it.

The setting

Back in the days of unreliable and slow dial-up internet connections, there was a big company-wide software system that should connect all clients, desktop computers and notebooks likewise, with a central server that should publish important news and documentation updates. The company developing and using the system had numerous field workers that connected from the hotel room somewhere near the next customer, let the updates run and disconnected again. They relied on extensive internal documentation that had to be up-to-date when working offline at the customer’s site.

Because the documentation was too big to be transferred completely after each change, but not partitioned enough to use a standard tool like rsync, the company developed a custom solution written in Java. A central server kept track of every client and all updates that needed to be delivered. This was done using a Set, more specifically a HashSet for each client. Imagine a HashMap (or Dictionary) with only a key, but no payload value. The Set’s entries were the update command objects itself. With this construct, whenever a client connected, the server could just iterate the corresponding Set and execute every object in it. After the execution had succeeded, the object was removed from the Set.

Going live

The system went live and worked instantly. All clients were served their updates and the computation power requirements for the central server were low because only few decisions needed to be made. The internet connection was the bottleneck, as expected. But it soon turned out to be too much of a bottleneck. More and more clients didn’t get all their updates. Some were provided with the most recent updates, but lacked older ones. Others only got the older ones.

The administrators asked for a bigger line and got it installed. The problems thinned out for a while, but soon returned as strong as ever. It wasn’t a problem of raw bandwidth apparently. The developers had a look at their data structure and discovered that a HashSet wouldn’t guarantee the order of traversal, so that old and new updates could easily get mixed up. But that shouldn’t be a problem because once the updates were delivered, they would be removed from the Set. And all updates had to be delivered anyways, regardless of age.

Going down

Then the central server instance stopped working with an OutOfMemoryError. The heap space of the Java virtual machine was used up by update command objects, sitting in their HashSets waiting to be executed and removed. It was clear that there were far too many update command objects to come up with a reasonable explanation. The part of the system that generated the update commands was reviewed and double-checked. No errors related to the problem at hand were found.

The next step was a review of the algorithm for iterating, executing and removing the update commands. And there, right in the update command class, the cause was found: The update command objects calculated their hashcode value based on their data fields, including the command’s execution date. Every time the update command was executed, this date field was updated to the most recent value. This caused the hashcode value of the object to change. And this had the effect that the update command object couldn’t be removed from the Set because the HashSet implementation relies on the hashcode value to find its objects. You could ask the Set if the object was still contained and it would answer “no”, but still include it into each loop over the content.

The cause

The Sets with update commands for the clients always grew in size, because once a update command object was executed, it couldn’t be removed but appeared absent. Whenever a client connected, it got served all update commands since the beginning, over and over again in semi-random order. This explained why sometimes, the most recent updates were delivered, but older ones were still missing. It also explained why the bandwidth was never enough and all clients lacked updates sooner or later.

The cost of this excessive update orgy was substantial: numerous clients had leeched all (useless) data they could get until the connection was cut, day for day, over expensive long-distance calls. Yet, they lacked crucial updates that caused additional harm and chaos. And all this damage could be tracked down to a simple programming error:

Never include mutable data into the calculation of a hashkey.

Performance considerations with network requests, database queries and other IO

Todays processors, memory and other sub systems are wicked fast. Nevertheless, many applications feel sluggish. In my experience this is true for client and server applications and not limited to specific scenarios. So the question is, why?

Many developer rush straight into optimizing their code to save CPU cycles. Most of the time thats not the real problem. The most important rule of performance optimisation stays true: Measure first!

Often times you will find your application waiting the greater part of its running time waiting for input/output (IO). Common sources for IO are database queries, network/http request and file system operations. Many developers are aware of these facts but we see this problem very often whether in inhouse or on-site customer projects.

Profile the unresponsive/slow parts of your application and check especially for hidden excess IO, here some Java examples:

  • The innocently looking method File.isFile() typically does a seek on the harddrive on each call. Using it an a loop over several dozens of files will slow you down massively.
  • The java.net.URL class does network requests for hashCode() and equals()! Never use it in collections, especially HashMaps. It is better to use the java.net.URI for managing the resource location and only convert to URL when needed.
  • Using an object-relational-mapping (ORM) tool like hibernate most people default to lazy loading. If your usage pattern requires to load the referenced objects all or most of the time you will get many additional database requests, at least one for each accessed association. In such cases it is most likely better to use eager fetching because the network and query overhead is reduced drastically and the data has to be loaded anyway.

So if you have performance and/or responsiveness problems, keep an eye on your IO pattern and optimize the algorithms to reduce IO. Usually it will help you much more than micro optimisation of your application code.

C/C++ pitfalls for Java developers

Java and C/C++ have concepts that are similar enough to get an inexperienced Java developer confused. Here I want to show you some mistakes I found or done myself.

Java and C/C++ have concepts that are similar enough to get an inexperienced Java developer confused. Here I want to show you some mistakes I found or done myself.

Type conversion rules

A well known and often used pattern is simultaneous assignment of an expression to a variable and its comparison with another value.

if((a = b) != c) {
  // do something
}

In both Java and C would this code would have the same behaviour. The problem arises when a parenthesis is misplaced, resulting in an assignment of a boolean expression to a:

if((a = b != c)) {
  // do something
}

Since a boolean expression can be converted to an integer and the assignment expression is contained in a parenthesis, the compiler may even not ensue a warning. For Java this code isn’t legal anymore while perfectly fine in C. The error strikes most hard when the result of the comparison, namely 0 or 1, is a valid value. A good example is a call to socket(), that may return 0 as a file descriptor for stdin. The probably simplest solution to this problem is separating the assignment from comparison – even at the cost of a temporary variable.

Memory management

The behaviour of standard containers is sometimes combined with incomplete/misunderstood behaviour of pointers. An example:

class A {}
class B
{
  public:
  void foo()
  {
    std::vector<A*> theContainer;
    for(int i = 0; i < 100; i++) {
      theContainer.push_back(new A());
    }
  }
}

Every call to foo() would result in a memory leak due to not deleted A’s. When the vector is destructed, a destructor of each contained item is called. For pointers and other scalar types this is a no-op, resulting in missing call to the destructor of pointed to class. A solution to this problem could be the use of smart pointers wrapping the actual pointers or an explicit destruction of pointed to objects before the vector goes out of scope.

Deterministic destruction

Coming from language with automatic memory management there is some uncertainty when it comes to the order of destruction when multiple objects leave the scope. Consider this example:

void foo()
{
  std::lock_guard<std::mutex> lock(mutex);
  std::ifstream input ....

  //some operations

  //??
}

In this case the stream is destructed before the lock, guaranteeing that the stream is destructed before the execution reaches the destructor of the lock. This pattern is exploited by the RAII.

Exception handling

This is my personal favourite. Here is a little quiz: what is printed to the screen?

try {
  throw new SomeException();
} catch (SomeException& e) {
  std::cout << "first" << std::endl;
} catch (...) {
  std::cout << "second" << std::endl;
}

As some may already have guessed from the question: the answer is “second”. To make the code work, the reference in the catch block has to be replaced by the pointer. Another, and probably better alternative is to create the exception on the stack. The reason behind this mistake is that in java any thrown object is constructed with new. Explicit hints or experience are required to avoid such flawed exception handling.

Grails / GORM performance tuning tips

Every situation and every code is different but here are some pitfalls that can cost performance and tips you can try to improve performance in Grails / GORM

First things first: never optimize without measuring. Even more so with Grails there are many layers involved when running code: the code itself, the compiler optimized version, the Grails library stack, hotspot, the Java VM, the operating system, the C libraries, the CPU… With this many layers and even more possibilities you shouldn’t guess where you can improve the performance.

Measuring the performance

So how do you measure code? If you have a profiler like JProfiler you can use it to measure different aspects of your code like CPU utilization, hotspots, JDBC query performance, Hibernate, etc. But even without a decent profiler some custom code snippets can go a long way. Sometimes we use dedicated methods for measuring the runtime:

class Measurement {
  public static void runs(String opertationName, Closure toMeasure) {
    long start = System.nanoTime()
    toMeasure.call()
    long end = System.nanoTime()
    println("Operation ${operationName} took ${(end - start) / 1E6} ms")
  }
}

or you can even show the growth in the Hibernate persistence context:

class Measurement {
  public static void grown(String opertationName, Closure toMeasure) {
    PersistenceContext pc = sessionFactory.currentSession.persistenceContext
    Map before = numberOfInstancesPerClass(pc)
    toMeasure.call()
    Map after = numberOfInstancesPerClass(pc)
    println "The operation ${operationName} has grown the persistence context: ${differenceOf(after, before)}"
  }
}

Improving the performance

So when you found your bad performing code, what can you do about it? Every situation and every code is different but here are some pitfalls that can cost performance and tips you can try to improve performance:

GORM hotspots

Performance problems with GORM can be in different areas. A good rule of thumb is to reduce the number of queries hitting the database. This can be achieved by combining results with outer join, eager fetching associations or improving caching. Another hotspot can be long running operations which you can improve via creating indices on the database but first analyze the query with database specific tools like ANALYZE.
Also a typical problem can be a large persistence context. Why is this a problem? The default flush mode in Hibernate and hence GORM is auto which means before any query the persistence context is flushed. Flushing means Hibernate checks every property of every instance if it has changed. The larger the persistence context the more work to do. One option would be to clear the session periodically after a flush but this could decrease the performance because once loaded and therefore cached instances need to be reloaded from the database.
Another option is to identify the parts of your code which only need read access on the instances. Here you can use a stateless session or in Grails you can use the Spring annotation @Transactional(readOnly = true). It can be beneficial for the performance to separate read only and write access to the database. You could also experiment with the flush mode but beware that this can lead to wrong query results.

The thin line: where to stop?

If you measure and improve you can get big and small improvements. The problem is to decide which of these small ones change the code in a good or minimal way. It is a trade off between performance and code design as some performance improvements can worsen the code quality. Another cup of tea left untouched in this discussion is scalability. Whereas performance concentrates of the actual data and the current situation, scalability looks on the performance of the system when the data increases. Some performance improvements can worsen scalability. As with performance: measure, measure, measure.

Testing Java with Grails 2.2

We have some projects that consist of both java and groovy classes. If you don’t pay attention, you can have a nice WTF moment.

Let us look at the following fictive example. You want to test a static method of a java class “BlogAction”. This method should tell you whether a user can trigger a delete action depending on a configuration property.

The classes under test:

public class BlogAction {
    public static boolean isDeletePossible() {
        return ConfigurationOption.allowsDeletion();
    }
}
class ConfigurationOption {
    static boolean allowsDeletion() {
        // code of the Option class omitted, here it always returns false
        return Option.isEnabled('userCanDelete');
    }
}

In our test we mock the method of ConfigurationOption to return some special value and test for it:

@TestMixin([GrailsUnitTestMixin])
class BlogActionTest {
    @Test
    void postCanBeDeletedWhenOptionIsSet() {
        def option = mockFor(ConfigurationOption)
        option.demand.static.allowsDeletion(0..(Integer.MAX_VALUE-1)) { -> true }

        assertTrue(BlogAction.isDeletePossible())
    }
}

As result, the test runner greets us with a nice message:

junit.framework.AssertionFailedError
    at junit.framework.Assert.fail(Assert.java:48)
    at junit.framework.Assert.assertTrue(Assert.java:20)
    at junit.framework.Assert.assertTrue(Assert.java:27)
    at junit.framework.Assert$assertTrue.callStatic(Unknown Source)
    ...
    at BlogActionTest.postCanBeDeletedWhenOptionIsSet(BlogActionTest.groovy:21)
    ...

Why? There is not much code to mock, what is missing? An additional assert statement adds clarity:

assertTrue(ConfigurationOption.allowsDeletion())

The static method still returns false! The metaclass “magic” provided by mockFor() is not used by my java class. BlogAction simply ignores it and calls the allowsDeletion method directly. But there is a solution: we can mock the call to the “Option” class.

@Test
void postCanBeDeletedWhenOptionIsSet() {
    def option = mockFor(Option)
    option.demand.static.isEnabled(0..(Integer.MAX_VALUE-1)) { String propertyName -> true }

    assertTrue(BlogAction.isDeletePossible())
}

Lessons learned: The more happens behind the scenes, the more important becomes the context of execution.

Impressions from Java Forum Stuttgart 2013

Java Forum Stuttgart(JFS) is a yearly java focused conference primarily visited by developers. The conference lasts for a day, offering 45 minute long talks plus some time in between for discussions. This was my second visit and I am happy to tell you about my impressions.

vert.x: Polyglot – modular – Asynchron

Speaker: Eberhard Wolff from adesso AG

This was my first stop. The topic seemed interesting, because at Softwareschneiderei we are using a mix of different languages and frameworks for our projects. To learn about a new Framework was a nice thing. vert.x runs on a Java VM and can be written in a mixable variety of languages like Java, JavaScript or Groovy. The main points of the presentation were the examples in Java and JavaScript showing the asynchronous features and communication between different components. Judging by the function set and the questions asked, this seems to be a framework that provides java developers a smooth transition from the synchronous world to the event based asynchronous world. Compared to NodeJS, vert.x is currently a small project containing only a handful of modules.

Java 8 innovations

Speaker: Michael Wiedeking from MATHEMA Software GmbH

This one is somewhat special. After thousands of blog entries, presentations and whatever there is only a marginal chance to get fresh news about java features. The speaker did know this and spiced the presentation up with some jokes, while showing ever increasing complex code samples. Exactly what I hoped for: reading code and having fun.

Statical code analysis as a quality measurement?

Speaker: Dr. Karl-Heinz Wichert from iteratec GmbH

We are using grails in some of our projects. As any other highly dynamic language, grails suffers from its strength: weak type system. Without acceptance test support it is hard to verify whether a given code piece is correct or not. My hope was to hear something about new trends in statical analysis that allow me to detect simple errors faster, without firing up the system. Biggest mismatch between imagination and reality that day. The speaker presented the reasons why to use statical code analysys and its current shortcomings like the inability to verify that comments match the code commented or the inability to detect complex implementations of a simple algorithm. An interesting statement was that statical analysis fails if not every aspect is checked, the reason beeing the developer trying to optimize the code against the measured criteria while neglecting other aspects. From my point of view this is not a shortcoming of a statical analysis, but of the way the people use it. It is measurable that a maintainable product has proportionally more readable variable names than an unmaintainable one, but is not necessarily true that your product gets maintainable when you rename all your variables. All in one: the speaker managed to motivate me to look for holes in his argumentation and thus to actively think about the topic.

Enterprise portals with grails. Does it work?

Speakers: Tobias Kraft and Manuel Breitfeld from exentio GmbH

Like the previous presentation, this one attracted me because of the grails context. Additionally, because of the title, I was hoping for a nice description on pitfalls they encountered while building their portals. One part of the presentation was the description of the portal they built and the requirements it has to fullfill. Another part was a description of the grails platform. They use grails to deliver snippets for their portal that is organized as a collection of such snippets. Very valuable was the part about the problems one can encounter when using grails, where they honestly admitted that the migration from Grails 1.3.7 to 2.x did cost them some time. To detect regressions during platform upgrades they recommended to put extra effort in tests.

Car2Car systems – Java and Peer2Peer move into the car

Speaker: Adam Kovacs from msg-systems

After the impressive lunch break my brain waves almost reached zero. The program brochure missed any interesting titles for the next round so I went for the least common topic. The presentation turned out a lucky find. The speaker managed to keep the right level of detail, without diving too deep or scratching the surface. He described how Chord, an implementation of a distributed hash table, can be used to share locally relevant traffic data like traffic jams or accidents. To increase the stability and the security of the network he introduced the use of existing transmitting stations and certificate authorities.

Kotlin

Speaker: Dr Ralph Guderlei from eXXcellent Solutions

There were two reasons I wanted to visit this presentation. The first was: My colleague already showed me some features of this language. The second: the language is developed by the company that also develops the IntelliJ IDEA. Good IDE support is practically built in, isn’t it? The presentation covered syntax, lambdas, type inference extension methods and how kotlin handles null references. It looks like kotlin is going to become something like an improved java. I hope for the best.

Enterprise Integration Patterns

Speaker: Alexander Heusingfeld from First Point IT Solutions

Another Presentation that gets selected because of the least boring sounding title and another success. In the first minutes I expected an endless enumeration of common well known patterns. This was only true for the first minutes. The topic quickly shifted to asynchronous messaging and increasingly complex patterns to handle it. As two frameworks with similar range of functions he presented Apache Camel and Spring  Integration

Bottom line

The event was as always fun. Unfortunately it was not possible to visit more presentations due to their “parallel execution”. Have you been an JFS too and want to share your impressions about same or other presentations too? Post a comment!

Communication Through Code

In a previous post my colleague described our experiment on our ability to transfer the intention of the code by tests. The tests describe how the code behaves when called from the outside. Additional approach is to communicate through code.

To understand the code, at least the following two questions have to be answered:

  • How does the code work?
  • What is the reason behind the way the code is implemented?

Challenge

As long as the code is readable, it is possible to deduce its meaning. Improving readability is a common technique to help the reader. This includes using descriptive names, reducing complexity or hiding implementation details until they are absolutely necessary to understand the problem.

On the other hand deducing the reason why exactly this implementation was chosen by somebody is an impossible task without the knowledge (or lack thereof) of all implementors combined. One of the missing parts are the assumptions. Our code is full of them. Consider the following example:

void print(char* text)
{
  printf("program says %s", text);
}

In this function the writer assumes that:

  • the text is a valid pointer
  • the text is zero terminated
  • this program can write to stdout, i.e. is a console app
  • the reader speaks english

Or something nastier:

void* allocateBuffer(size_t size)
{
  void* buffer = malloc(size);
  if (!buffer) {
    printf("expect a segmentation fault!");
  }
  return buffer;
}

Here the writer assumes that malloc always returns either NULL or a pointer to dereferenceable memory. It is not always the case:

If size is zero, the return value depends on the particular library implementation (it may or may not be a null pointer), but the returned pointer shall not be dereferenced.

Assumptions not explicitly defined in the code lead sooner or later to hard to discover bugs.

Solution approaches

Comments are the quick and dirty way of writing down assumptions. They are easiest to read, but are never enforced and tend to diverge from the code with every edit made to it. However it is better to read “should never come here” and hear the alarm bells ringing than seeing nothing but whitespace.

Some of the assumptions can be documented and verified through tests, with varying level of detail. Unit tests will be most efficient on assumptions with little or no context, like verifying that only non-NULL-pointers are passed to a function. For more global assumptions integration or acceptance tests can be used. Together they ensure that no changes to the codebase break the assumptions made earlier. The drawback of unit tests is that they are locally decoupled from the code tested, forcing the reader to gather the information by searching for direct or indirect references to it.

When new code is written, assertions help to document how the API is meant to be used. Since they are executed not only during the test phase, they can capture wrong assumptions the authors made about the runtime environment. Writing down every possible assumption can quickly clutter the code with repeated statements like “assume pointer x is not NULL”, reducing readability and usefulness of this technique.

Conclusion

All of the shown approaches are not new. Each one has an aspect it excels at, so to get the most information out of the code they all have to be used. Their domains overlap partially, so it is possible to choose the approach depending on the situation, i.e. replacing assertions with unit tests for time critical code. One niche currently not filled by any of them is the description of global assumptions like the cultural background of the users.

Composite comparators in Java

Some time ago a fellow developer wrote a really comprehensive blog post (unfortunately only available in german) about comparator implementations in Java. More specifically it is about composite comparators used to compare entities according to different attributes. The best solution for Java 7 he comes up with is a comparator

class FoobarComparator implements Comparator {
  @Override
  public int compare(Foobar lhs, Foobar rhs) {
    return compoundCompare(
      lhs.getLastName().compareTo(rhs.getLastName()),
      lhs.getFirstName().compareTo(rhs.getFirstName()),
      lhs.getPlaceOfBirth().compareTo(rhs.getPlaceOfBirth()),
      lhs.getDateOfBirth().compareTo(rhs.getDateOfBirth()));
  }
}

with a reusable compoundCompare()-method

// utility method used with static import
int compoundCompare(int... results) {
  for (int result : results) {
    if (result != 0) {
      return result;
    }
  }
  return 0;
}

While this solution is quite clean and a vast improvement over the critized implementations it has the flaw that it eagerly evaluates all attributes even though short-circuiting may be possible for many entities. This may lead to performance problems in some cases. So he goes on to explain how Java 8 will fix this problem with Lambdas or another solution he calls “KeyMethodComparator”.

Now I want to show you an implementation very similar to his approach above but without the performance penalty and possible in Java 7 using the composite pattern:

import java.util.Arrays;
import java.util.Comparator;
import java.util.List;

class FoobarComparator implements Comparator<Foobar> {

  private List<Comparator<Foobar>> defaultFoobarComparison =
    Arrays.<Comparator<Foobar>>asList(
      new Comparator<Foobar>() {
        @Override
        public int compare(Foobar lhs, Foobar rhs) {
          return lhs.getLastName().compareTo(rhs.getLastName());
        }
      },
      new Comparator<Foobar>() {
        @Override
        public int compare(Foobar lhs, Foobar rhs) {
          return lhs.getFirstName().compareTo(rhs.getFirstName());
        }
      },
      new Comparator<Foobar>() {
        @Override
        public int compare(Foobar lhs, Foobar rhs) {
          return lhs.getPlaceOfBirth().compareTo(rhs.getPlaceOfBirth());
        }
      },
      new Comparator<Foobar>() {
        @Override
        public int compare(Foobar lhs, Foobar rhs) {
          return lhs.getDateOfBirth().compareTo(rhs.getDateOfBirth());
        }
      });

  @Override
  public int compare(Foobar lhs, Foobar rhs) {
    for (Comparator<Foobar> comp : defaultFoobarComparison) {
      int result = comp.compare(lhs, rhs);
      if (result != 0) {
        return result;
      }
    }
    return 0;
  }
}

It features the lazy evaluation demanded by my fellow for performance and allows flexible construction of different composite comparators if you, e.g. add a constructor accepting a list of comparators.
Imho, it is a quite elegant solution using standard object-oriented programming in Java today and not only in the future.