Consistency over magic, please

The Groovy programming language is a JVM based scripting language. It is used by the Grails web framework and the Gradle build automation system.

Groovy has a language feature called Named argument constructors. This means that given a class with properties, for example

class Example {
  String text
}

you can initialize the properties directly when calling the constructor:

def example = new Example(text: ' This is an example. ')
assert example.text == ' This is an example. '

This is basically a shortcut for initializing the properties via explicit assignment:

def example = new Example()
example.text = ' This is an example. '
assert example.text == ' This is an example. '

So far so good.

Enter Grails

We use the aforementioned Grails framework for some of our web application projects. It is advertised on its website as featuring “convention-over-configuration” and “sensible defaults”. Grails uses the Groovy programming language, and a simple domain class looks just like a plain old Groovy class, except that it lives under the grails-app/domain directory (this is one of the convention-over-configuration aspects):

class Example {
  String text
}

As expected, you can initialize the property via regular assignment:

def example = new Example()
example.text = ' This is an example. '
assert example.text == ' This is an example. '

So one might expect that you can initialize it via a named argument constructor call as well:

def example = new Example(text: ' This is an example. ')
assert example.text == ' This is an example. '

And indeed, you can. But what’s this? Our assertion fails:

assert example.text == ' This is an example. '
               |    |
               |    false
               This is an example.

It is not directly obvious from the assertion failure output, but the property value is indeed no longer equal to the expected text: the leading and trailing spaces got trimmed!

I was surprised, but after some research in Grails documentation it turned out that it’s not a bug, but a feature. In the section on Data Binding, you can find the following sentence:

The mass property binding mechanism will by default automatically trim all Strings at binding time. To disable this behavior set the grails.databinding.trimStrings property to false in grails-app/conf/application.groovy.

Groovy’s named argument constructor feature is used as a data binding mechanism by Grails to bind web request parameters to a domain object. For this the default behavior was modified, so that strings are automatically trimmed. I can only guess that this is considered to be an instance of the “sensible defaults” mentioned on the Grails homepage.

To me personally this kind of surprising behavior is not a sensible default, and I think it goes against the Principle of least astonishement. I prefer consistency over “magic”.

Gradle projects as Debian packages

Gradle is a great tool for setting up and building your Java projects. If you want to deliver them for Ubuntu or other debian-based distributions you should consider building .deb packages. Because of the quite steep learning curve of debian packaging I want to show you a step-by-step guide to get you up to speed.

Prerequisites

You have a project that can be built by gradle using gradle wrapper. In addition you have a debian-based system where you can install and use the packaging utilities used to create the package metadata and the final packages.

To prepare the debian system you have to install some packages:

sudo apt install dh-make debhelper javahelper

Generating packaging infrastructure

First we have to generate all the files necessary to build full fledged debian packages. Fortunately, there is a tool for that called dh_make. To correctly prefill the maintainer name and e-mail address we have to set 2 environment variables. Of course, you could change them later…

export DEBFULLNAME="John Doe"
export DEBEMAIL="john.doe@company.net"
cd $project_root
dh_make --native -p $project_name-$version

Choose “indep binary” (“i”) as type of package because Java is architecture indendepent. This will generate the debian directory containing all the files for creating .deb packages. You can safely ignore all of the files ending with .ex as they are examples features like manpage-generation, additional scripts pre- and post-installation and many other aspects.

We will concentrate on only two files that will allow us to build a nice basic package of our software:

  1. control
  2. rules

Adding metadata for our Java project

In the control file fill all the properties if relevant for your project. They will help your users understand what the package contains and whom to contact in case of problems. You should add the JRE to depends, e.g.:

Depends: openjdk-8-jre, ${misc:Depends}

If you have other dependencies that can be resolved by packages of the distribution add them there, too.

Define the rules for building our Java project

The most important file is the rules makefile which defines how our project is built and what the resulting package contents consist of. For this to work with gradle we use the javahelper dh_make extension and override some targets to tune the results. Key in all this is that the directory debian/$project_name/ contains a directory structure with all our files we want to install on the target machine. In our example we will put everything into the directory /opt/my_project.

#!/usr/bin/make -f
# -*- makefile -*-

# Uncomment this to turn on verbose mode.
#export DH_VERBOSE=1

%:
	dh $@ --with javahelper # use the javahelper extension

override_dh_auto_build:
	export GRADLE_USER_HOME="`pwd`/gradle"; \
	export GRADLE_OPTS="-Dorg.gradle.daemon=false -Xmx512m"; \
	./gradlew assemble; \
	./gradlew test

override_dh_auto_install:
	dh_auto_install
# here we can install additional files like an upstart configuration
	export UPSTART_TARGET_DIR=debian/my_project/etc/init/; \
	mkdir -p $${UPSTART_TARGET_DIR}; \
	install -m 644 debian/my_project.conf $${UPSTART_TARGET_DIR};

# additional install target of javahelper
override_jh_installlibs:
	LIB_DIR="debian/my_project/opt/my_project/lib"; \
	mkdir -p $${LIB_DIR}; \
	install lib/*.jar $${LIB_DIR}; \
	install build/libs/*.jar $${LIB_DIR};
	BIN_DIR="debian/my_project/opt/my_project/bin"; \
	mkdir -p $${BIN_DIR}; \
	install build/scripts/my_project_start_script.sh $${BIN_DIR}; \

Most of the above should be self-explanatory. Here some things that cost me some time and I found noteworthy:

  • Newer Gradle version use a lot memory and try to start a daemon which does not help you on your build slaves (if using a continous integration system)
  • The rules file is in GNU make syntax and executes each command separately. So you have to make sure everything is on “one line” if you want to access environment variables for example. This is achieved by \ as continuation character.
  • You have to escape the $ to use shell variables.

Summary

Debian packaging can be daunting at first but using and understanding the tools you can build new packages of your projects in a few minutes. I hope this guide helps you to find a starting point for your gradle-based projects.

Functional tests for Grails with Geb and geckodriver

Previously we had many functional tests using the selenium-rc plugin for Grails. Many were initially recorded using Selenium IDE, then refactored to be more maintainable. These refactorings introduced “driver” objects used to interact with common elements on the pages and runners which improved the API for walking through a multipage process.

Selenium-rc got deprecated quite a while ago and support for Firefox broke every once in a while. Finally we were forced to migrate to the current state-of-the-art in Grails functional testing: Geb.

Generally I can say it is really a major improvement over the old selenium API. The page concept is similar to our own drivers with some nice features:

  • At-Checkers provide a standardized way of checking if we are at the expected page
  • Default and custom per page timeouts using atCheckWaiting
  • Specification of relevant content elements using a JQuery-like syntax and support for CSS-selectors
  • The so-called modules ease the interaction with form elements and the like
  • Much better error messages

While Geb is a real improvement over selenium it comes with some quirks. Here are some advice that may help you in successfully using geb in the context of your (grails) webapplication.

Cross-plattform testing in Grails

Geb (or more specifically the underlying webdriver component) requires a geckodriver-binary to work correctly with Firefox. This binary is naturally platform-dependent. We have a setup with mostly Windows machines for the developers and Linux build slaves and target systems. So we need binaries for all required platforms and have to configure them accordingly. We have simply put them into a folder in our project and added following configuration to the test-environment in Config.groovy:

environments {
  test {
    def basedir = new File(new File('.', 'infrastructure'), 'testing')
    def geckodriver = 'geckodriver'
    if (System.properties['os.name'].toLowerCase().contains('windows')) {
      geckodriver += '.exe'
    }
    System.setProperty('webdriver.gecko.driver', new File(basedir, geckodriver).canonicalPath)
  }
}

Problems with File-Uploads

If you are plagued with file uploads not working it may be a Problem with certain Firefox versions. Even though the fix has landed in Firefox 56 I want to add the workaround if you still experience problems. Add The following to your GebConfig.grooy:

driver = {
  FirefoxProfile profile = new FirefoxProfile()
  // Workaround for issue https://github.com/mozilla/geckodriver/issues/858
  profile.setPreference('dom.file.createInChild', true)
  new FirefoxDriver(profile)
}

Minor drawbacks

While the Geb-DSL is quite readable and allows concise tests the IDE-support is not there. You do not get much code assist when writing the tests and calling functions of the page objects like in our own, code based solution.

Conclusion

After taking the first few hurdles writing new functional tests with Geb really feels good and is a breeze compared to our old selenium tests. Converting them will be a lot work and only happen on a case-by-case basis but our coverage with Geb is ever increasing.

Explicit types – and when to use them

Many modern programming languages offer a way declare variables without an explicit type if the type can be inferred, either dynamically or statically. Many also allow for variables to be explicitly defined with a type. For example, Scala and C# let you omit the explicit variable type via the var keyword, but both also allow defining variables with explicit types. I’m coming from the C++ world, where “auto” is available for this purpose since the relatively recent C++11. However, people are still debating whether you should actually use it.

Pros

Herb Sutter popularised the almost-always-auto style. He advocates that using more type inference is good because it is roughly equivalent to programming against interfaces instead of implementations. He says that “Overcommitting to explicit types makes code less generic and more interdependent, and therefore more brittle and limited.” However, he also mentions that you might sometimes want to use explicit types.

Now what exactly is overcommiting here? When is the right time to use explicit types?

Cons

Opponents to implicit typing, many of them experienced veterans, often state that they want the actual type visible in the source code. They don’t want to rely on type inference being right. They want the code to explicitly state what’s going on.

At first, I figured that was just conservatism in the face of a new “scary” feature that they did not fully understand. After all, IDEs can usually infer the type on-the-fly and you can hover on a variable to let it show you the type.

For C++, the function signature is a natural boundary where you often insert explicit types, unless you want to commit to the compile time and physical dependency cost that comes with templates. Other languages, such as Groovy, do not have this trade-off and let you skip explicit types almost everywhere. After working with Groovy/Grails for a while, where the dominant style seems to be to omit types whereever possible, it dawned on me that the opponents of implicit typing have a point. Not only does the IDE often fail to show me the inferred type (even though it still works way more often than I would have anticipated), but I also found it harder to follow and modify code that did not mention explicit types. Seemingly contrary to Herb Sutter’s argument, that code felt more brittle than I had liked.

Middle-ground

As usual, the truth seems to be somewhere in the middle. I propose the following rule for when to use explicit types:

  • Explicit typing for domain-types
  • Implicit typing everywhere else

Code using types from the problem domain should be as specific as possible. There’s no need for it to be generic – it’s actually counter-productive, as otherwise the code model would be inconsistent with model of the problem domain. This is also the most important aspect to grok when reading code, so it should be explicit. The type is as important as the action on it.

On the other hand, for pure-fabrication types that do not respresent a concept in the domain, the action is important, while the type is merely a means to achieve this action. Typically, most of the elements from a language’s standard library fall into this category. All your containers, iterators, callables. Their types are merely implementation details: an associative container could be an array, or a hash-map or a tree structure. Exchanging it rarely changes the meaning of the code in the problem domain – it just changes its performance characteristics.

Containers will occasionally contain domain-types in their type. What do you do about those? I think they belong in the “everywhere else” catergory, but you should be take extra care to name the contained type when working with it – for example when declaring the variable of the for-each loop on it, or when inserting something into it. This way, the “collection of domain-type” aspect will become clear, but the specific container implementation will stay implicit – like it should.

What do you think? Is this a useful proposition for your code?

How to speed up your ORM queries of n x m associations

What causes a speedup like this? (all numbers are in ms)

Disclaimer: the absolute benchmark numbers are for illustration purposes, the relationship and the speedup between the different approaches are important (just for the curious: I measured 500 entries per table in a PostgreSQL database with both Rails 4.1.0 and Grails 2.3.8 running on Java 7 on a recent MBP running OSX 10.8)

Say you have the model classes Book and (Book)Writer which are connected via a n x m table named Authorship:

A typical query would be to list all books with its authors like:

Fowler, Martin: Refactoring

A straight forward way is to query all authorships:

In Rails:

# 1500 ms
Authorship.all.map {|authorship| "#{authorship.writer.lastname}, #{authorship.writer.firstname}: #{authorship.book.title}"}

In Grails:

// 585 ms
Authorship.list().collect {"${it.writer.lastname}, ${it.writer.firstname}: ${it.book.title}"}

This is unsurprisingly not very fast. The problem with this approach is that it causes the famous n+1 select problem. The first option we have is to use eager fetching. In Rails we can use ‘includes’ or ‘joins’. ‘Includes’ loads the associated objects via additional queries, one for authorship, one for writer and one for book.

# 2300 ms
Authorship.includes(:book, :writer).all

‘Joins’ uses SQL inner joins to load the associated objects.

# 1000 ms
Authorship.joins(:book, :writer).all
# returns only the first element
Authorship.joins(:book, :writer).includes(:book, :writer).all

Additional queries with ‘includes’ in our case slows down the whole request but with joins we can more than halve our time. The combination of both directives causes Rails to return just one record and is therefore ruled out.

In Grails using ‘belongsTo’ on the associations speeds up the request considerably.

class Authorship {
    static belongsTo = [book:Book, writer:BookWriter]

    Book book
    BookWriter writer
}

// 430 ms
Authorship.list()

Also we can implement eager loading with specifying ‘lazy: false’ in our mapping which boosts a mild performance increase.

class Authorship {
    static mapping = {
        book lazy: false
        writer lazy: false
    }
}

// 416 ms
Authorship.list()

Can we do better? The normal approach is to use ‘has_many’ associations and query from one side of the n x m association. Since we use more properties from the writer we start from here.

class Writer < ActiveRecord::Base
  has_many :authors
  has_many :books, through: :authors
end

Testing the different combinations of ‘includes’ and ‘joins’ yields interesting results.

# 1525 ms
Writer.all.joins(:books)

# 2300 ms
Writer.all.includes(:books)

# 196 ms
Writer.all.joins(:books).includes(:books)

With both options our request is now faster than ever (196 ms), a speedup of 7.
What about Grails? Adding ‘hasMany’ and the authorship table as a join table is easy.

class BookWriter {
    static mapping = {
        books joinTable:[name: 'authorships', key: 'writer_id']
    }

    static hasMany = [books:Book]
}
// 313 ms, adding lazy: false results in 295 ms
BookWriter.list().collect {"${it.lastname}, ${it.firstname}: ${it.books*.title}"}

The result is rather disappointing. Only a mild speedup (2x) and even slower than Rails.

Is this the most we can get out of our queries?
Looking at the benchmark results and the detailed numbers Rails shows us hints that the query per se is not the problem anymore but the deserialization. What if we try to limit our created object graph and use a model class backed by a database view? We can create a view containing all the attributes we need even with associations to the books and writers.

create view author_views as (SELECT "authorships"."writer_id" AS writer_id, "authorships"."book_id" AS book_id, "books"."title" AS book_title, "writers"."firstname" AS writer_firstname, "writers"."lastname" AS writer_lastname FROM "authorships" INNER JOIN "books" ON "books"."id" = "authorships"."book_id" INNER JOIN "writers" ON "writers"."id" = "authorships"."writer_id")

Let’s take a look at our request time:

# 15 ms
 AuthorView.select(:writer_lastname, :writer_firstname, :book_title).all.map { |author| "#{author.writer_lastname}, #{author.writer_firstname}: #{author.book_title}" }
// 13 ms
AuthorView.list().collect {"${it.writerLastname}, ${it.writerFirstname}: ${it.bookTitle}"}

13 ms and 15 ms. This surprised me a lot. Seeing this in comparison shows how much this impacts performance of our request.

The lesson here is that sometimes the performance can be improved outside of our code and that mapping database results to objects is a costly operation.

Grails Update from 2.2 to 2.3

An update in the minor version does not seem like a big step but this is one brought a lot of changes, so here a step by step guide which highlights some pitfalls.

First update the version of Grails in your application properties:

app.grails.version=2.3.8

The tomcat and hibernate plugins now have versions of their respective frameworks and not the version number of Grails:

plugins.tomcat=7.0.52.1
plugins.hibernate=3.6.10.13

Grails 2.3 has a new databinding mechanism. To use the old one, especially if you use custom property editors you have to add this option to your Config.groovy:

grails.databinding.useSpringBinder = true

But even with the old databinding something changed. The field id is not bound in command objects you need to bind id explicitly:

def action = { MyCommand command ->
  command.id = params['id']?.toLong()
}

Besides the databinding mechanism also the dependency resolving changed. But you can use the old ivy mechanism by including this in BuildConfig.groovy:

grails.project.dependency.resolver="ivy"

Nonetheless all dependencies must be declared in application.properties or BuildConfig.groovy. If you have a lib directory with local jars in your application you need to add this to your repositories as a local directory:

grails.project.dependency.resolution = {
    repositories {
        flatDir name:'myRepo', dirs:'lib'
    }
}

When you have all dependencies declared your application should start.

Tests

Grails 2.3 features a new test mode: forking. This causes some problems and is better to be deactivated in BuildConfig.groovy:

grails.project.fork = [
        test: false,
]

With the new version only JUnit4 style tests are supported. This means that you don’t extend GroovyTestCase or GrailsUnitTestCase. All rules must be public and non static. All tests methods need to be annotated with @Test. Set up methods are annotated with @Before and must be public. The tearDown methods must also be annotated with @After and be public. A bug in Grails prevents you from naming the set up and tear down methods freely: the names must be setUp and tearDown. All test methods must be public void, the old def declaration is not supported anymore. Now without extending GroovyTestCase you lose the assertion methods and need to add a static import:

import static groovy.util.GroovyTestCase.*

Unit Tests

All tests should be annotated with @TestMixin([GrailsUnitTestMixin]). If you need to mock domain classes you change mockDomain to @Mock:

class MyTest {
	public void testThis() {
      mockDomain(MyDomainClass, [mdc])
  }
}
@Mock([Proposal])
class MyTest {
	public void testThis() {
      mdc.save()
  }
}

Configuration is now already mocked and your properties can be added easily:

config.my.property.value.is='123'

Integration tests

As mentioned before setUp method naming has a bug: you have to name them setUp otherwise the changes to your database aren’t rollbacked.

Acceptance Tests with Selenium

You need to patch the Remote Control Plugin because of a ClassNotFoundException. Add an additional constructor to RemoteControl.groovy to support setting the classloader:

RemoteControl(ClassLoader loader) {
  super(new HttpTransport(getFunctionalTestReceiverAddress(), loader), loader)
} 

In your tests you call this new constructor with the classloader of your class:

new RemoteControl(getClass().classLoader)

Using a Groovy Mixin in your application does not work in your tests and need to be replaced with grails.util.Mixin. But this only works in one way: the target class can access the mixin but the mixin not the target class. For this to work you need to let your mixin implement MixinTargetAware and declare a field named target:

class MyMixin implements MixinTargetAware {
	def target
}

Subtle changes and pitfalls

If you have a classname with a Controller suffix and a corresponding test but which isn’t a Grails controller Grails nevertheless tries to mock the class in your unit tests. If you rename the test to something without controller everything works fine.

In our pre 2.3 project we had some select tags in our views and used fieldValue for the selection:

<g:select value="${fieldValue(bean: object, field: 'value')}">

But now the select tag uses equals which fails if the values aren’t Strings. Just use the unescaped value:

<g:select value="${object?.value}">

I hope this guide and hints help others to avoid the headaches when upgrading your Grails application.

Scaling your web app: Cache me if you can

One of the biggest problems of caches is how and when do I invalidate my cache content? When you read outdated data from the cache you are toast.
For example we have a list of children elements inside a parent. Normally you would cache the children under the parent’s id:

cache[parent.id] = children

But how do you know if your cache content is still valid? When one child or the list of children changes you write the new content into the cache

cache[parent.id] = newChildren

But when do you update the cache? If you place the update code where the list of children is modified the cache is updated before transaction has ended. You break the isolation. Another point would be after the transaction has been committed but then you have to track all changes. There is a better way: use a timestamp from the database which is also visible to other transactions when it is committed. It should also be in the parent object because you need this object for the cache key nonetheless. You could use lastUpdated or another timestamp for this which is updated when the children collection changes. The cache key is now:

cache[parent.id + '_' + parent.lastUpdated]

Now other transactions read the parent object and get the old timestamp and so the old cache content before the transaction is committed. The transaction itself gets the new content. In Grails if you change the collection lastUpdated is automatically updated and in Rails with belongs_to and touch even a change in a child updates the lastUpdate of the parent – no manual invalidation needed.

Excourse: using memcached with Grails

If you want to use memcached from the JVM there is a good library which wraps common calls: spymemcached. If you want to use spymemcached from Grails you drop the jar into your lib folder and wrap it in a Service:

class MemcachedService implements InitializingBean {
  static final Object NULL = "NULL"
  def MemcachedClient memcachedClient

  def void afterPropertiesSet() {
    memcachedClient = new MemcachedClient(
      new ConnectionFactoryBuilder().setTranscoder(new CustomSerializingTranscoder()).build(),
      AddrUtil.getAddresses("localhost:11211")
    )
  }

  def connected() {
    return !memcachedClient.availableServers.isEmpty()
  }

  def get(String key) {
    return memcachedClient.get(key)
  }

  def set(String key, Object value) {
    memcachedClient.set(key, 600, value)
  }

  def clear() {
    memcachedClient.flush()
  }
}

Spymemcached serializes your cache content so you need to make all your cached classes implement Serializable. Since Grails uses its own class loaders we had problems with deserializing and used a custom serializing transcoder to get the right class loader (taken from this issue):

public class CustomSerializingTranscoder extends SerializingTranscoder {

  @Override
  protected Object deserialize(byte[] bytes) {
    final ClassLoader currentClassLoader = Thread.currentThread().getContextClassLoader();
    ObjectInputStream in = null;
    try {
      ByteArrayInputStream bs = new ByteArrayInputStream(bytes);
      in = new ObjectInputStream(bs) {
        @Override
        protected Class<ObjectStreamClass> resolveClass(ObjectStreamClass objectStreamClass) throws IOException, ClassNotFoundException {
          try {
            return (Class<ObjectStreamClass>) currentClassLoader.loadClass(objectStreamClass.getName());
          } catch (Exception e) {
            return (Class<ObjectStreamClass>) super.resolveClass(objectStreamClass);
          }
        }
      };
      return in.readObject();
    } catch (Exception e) {
      e.printStackTrace();
      throw new RuntimeException(e);
    } finally {
      closeStream(in);
    }
  }

  private static void closeStream(Closeable c) {
    if (c != null) {
      try {
        c.close();
      } catch (IOException e) {
        e.printStackTrace();
      }
    }
  }
}

With the connected method you can check if any memcached instances are available. Which is better than calling a method and waiting for the timeout.

def connected() {
  return !memcachedClient.availableServers.isEmpty()
}

Now you can inject your Service where you need to and cache along.

Cache the outermost layer

If you use Hibernate you get database based caching almost for free, so why bother using another cache? In one application we used Hibernate to fetch a large chunk of data from the database and even with caches it took 100 ms. Measuring the code showed that the processing of the data (conversion for the client) took by far the biggest chunk. Caching the processed data lead to 2 ms for the whole request. So one take away is here that caching the result of (user indepedent) calculations and conversions can speed up your request even further. When you got static resources you could also use HTTP directives.