Speeding up your HQL

Using an object-relational-mapper (ORM) to persist your entities, manage their state and query subsets for lists or reports is a wide-spread practice and may speed up your development.

If not used correctly, it may introduce unexpected performance problems because of unefficient default queries and the overhead this mapping introduces as most of the time table rows are converted to domain objects. Often this results in many queries and the n+1 query problem.

Nevertheless, the benefits of using an ORM may outweigh the problems and most problems can be mitigated by features and a correct usage of the tool.

Today I want to present a performance problem we had using GORM/Hibernate and how we easily fixed it without major code restructuring or workarounds.

The Problem

We used a HQL-query to load quite a lot of entities which took about 3 seconds. This was acceptable for our customer. If the user however tried to narrow down the results using a filter loading a smaller amount of the same entities took over 1 minute. Obviously, this was totally unacceptable and counter-intuitive.

The Analysis

Further analysis revealed, that a particular part of the WHERE-clause was responsible for the observed slowdown:

FROM Report r
WHERE r.project.proposal.id = p.id

So we did filter the root entity Report on an entity called Proposal but needed to load an associated Project entity for all reports to consider. So even if we are just using entity-ids to filter the innocently looking path r.project.proposal.id leads to loading and mapping of hundreds of Project entities.

The Solution

In our example we can fortunately do a lot better without big changes to our domain model, the application code or the query.

The relevant part of the schema looks like below:

In the above schema we can see, that both, a Report and a Proposal are associated with a certain project. Remember, that in Hibernate your entities contain only the id of their one-to-one mapped sub-entities by default. This means that if we change the filter clause to

WHERE r.project.id = p.project.id

we skip loading and mapping of all the Project entities and only load the needed reports and proposals. Since they both contain the project id we can use that in our filter. This resulted in more than a 10x speedup with such a simple and non-invasive change.

General Takeaway

ORMs can be a great tool but it is very easy to shoot yourself into the foot. With enough care you can achieve both simple code and good performance but you may run into non-obvious problems every now and then.

Object slicing with Grails and GORM

Some may know the problem called object slicing when passing or assigning polymorphic objects by value in C++. The issue is not limited to C++ as we experienced recently in one of our web application based on Grails. If you are curious just stay awhile and listen…

Our setting

Some of our domain entities use inheritance and their containing entities determine what to do using some properties. You may call that bad design but for now let us take it as it is and show some code to clarify the situation:

@Entity
class Container {
  private A a

  def doSomething() {
    if (hasActuallyB()) {
      return a.bMethod()
    }
    return a.something()
  }
}

@Entity
class A {

  def something() {
    return 'Something A does'
  }
}

@Entity
class B extends A {

  def bMethod() {
    return 'Something only B can do'
  }
}

class ContainerController {

  def save = {
    new Container(b: new B()).save()
  }

  def show = {
    def container = Container.get(params.id)
    [result: container.doSomething()]
  }
}

Such code worked for us without problems in until we upgraded to Grails 3. Suddenly we got exceptions like:

2019-02-18 17:03:43.370 ERROR --- [nio-8080-exec-1] o.g.web.errors.GrailsExceptionResolver   : MissingMethodException occurred when processing request: [GET] /container/show
No signature of method: A.bMethod() is applicable for argument types: () values: []. Stacktrace follows:

Caused by: groovy.lang.MissingMethodException: No signature of method: A.bMethod() is applicable for argument types: () values: []
at Container.doSomething(Container.groovy:123)

Debugging showed our assumptions and checks were still true and the Container member was saved correctly as a B. Still the groovy method call using duck typing did not work…

What is happening here?

Since the domain entities are persistent objects mapped by GORM and (in our case) Hibernate they do not always behave like your average POGO (plain old groovy object). They may in reality be Javassist proxy instances when fetched from the database. These proxies are set up to respond to the declared type and not the actual type of the member! Clearly, an A does not respond to the bMethod().

A workaround

Ok, the class hierarchy is not that great but we cannot rewrite everything. So what now?

Fortunately there is a workaround: You can explicitly unwrap the proxy object using GrailsHibernateUtil.unwrapIfProxy() and you have a real instance of B and your groovy duck typing and polymorphic calls work as expected again.

Grails and the query cache

The principle of least astonishment can be violated in the unusual places like using the query cache on a Grails domain class.

Look at the following code:

class Node {
  Node parent
  String name
  Tree tree
}

Tree tree = new Tree()
Node root = new Node(name: 'Root', tree: tree)
root.save()
new Node(name: 'Child', parent: root, tree: tree).save()

What happens when I query all nodes by tree?

List allNodesOfTree = Node.findAllByTree(tree, [cache: true])

Of course you get 2 nodes, but what is the result of:

allNodesOfTree.contains(Node.get(rootId))

It should be true but it isn’t all the time. If you didn’t implement equals and hashCode you get an instance equals that is the same as ==.
Hibernate guarantees that you get the same instance out of a session for the same domain object. (Node.get(rootId) == Node.get(rootId))

But the query cache plays a crucial role here, it saves the ids of the result and calls Node.load(id). There is an important difference between Node.get and Node.load. Node.get always returns an instance of Node which is a real node not a proxy. For this it queries the session context and hits the database when necessary. Node.load on the other hand never hits the database. It returns a proxy and only when the session contains the domain object it returns a real domain object.

So allNodesOfTree returns

  • two proxies when no element is in the session
  • a proxy and a real object when you call Node.get(childId) beforehand
  • two real objects when you call get on both elements first

Deactivating the query cache globally or for this query only, returns two real objects.

Grails: Beware of the second level cache

Know your caches!

Recently we were hunting a strange bug. Take the following domain model:

class Computer {
  Coder coder
}

class Coder {
  static hasMany = [projects:Project]
}

Querying the computer and iterating over the respective coder and projects sometimes resulted in strange number of projects: 1. Looking into the underlying database we quickly found out that the number of 1 was not correct. It got even more strange: getting the coder in question via Coder.get in the loop yielded the correct results. What was the problem?
After some code reading and debugging another query which was called after the first one but before accessing the coder in the loop gave some insight:

  Coder.withCriteria {
    projects {
      idEq(projectId)
    }
  }

This second query also queried the Coder but constrained the projects to a specific one. These coders were populated into the second level cache and when we called computer.coder the second level cache returned the before queried coder. But this coder had only one project!
Since we only needed the number of coders with this project we changed the second
query to using count, so no instances of Coder are returned and thus saved in the second level cache. Bug fixed.

GORM-Performance with Collections

The other day I was looking to improve the performance of specific parts of our Grails application. I quickly found the typical bottleneck in database centric Grails apps: Too many queries were executed because GORM hides away database queries by its built-in persistence methods for domain objects and the extremely nice dynamic finders. In search for improvements and places to use GORM/Hibernate caching I stumbled upon a very good and helpful presentation on GORM-performance in general and especially collection usage. Burt Beckwith presents some common problems and good patterns to overcome them in his SpringOne 2GX talk. I highly recommend having a thorough look at his presentation.

Nevertheless, I want to summarize his bottom line here: GORM does provide a nice abstraction from relational databases but this abstraction is leaky at times. So you have to know exactly how the stuff in your domain classes is mapped. Be especially careful it collections tend to become “large” because performance will suffer extremely. We already observed a significant performance degradation for some dozen elements; your mileage may vary. For many simple modifications on a collection all its elements have to be loaded from the database!
Instead of using hasMany/belongsTo just add a back reference to the domain object your object belongs to. With the collection you lose cascading delete and some GORM functionality but you can still use dynamic finders and put the functionality to manage associations yourself into respective classes. This may be a large gain in specific cases!

The Grails performance switch: flush.mode=commit

Some default configuration options of Grails are not optimal for all projects.

— Disclaimer —
This optimization requires more manual work and is error prone but isn’t this with most (big) performance improvements?
For it to really work you have to structure your code accordingly and flush explicitly.

Recently in our performance measurements of a medium sized Grails project we noticed a strange behavior: every time we executed the same query the time it took increased. It started with 40ms and every time it took 1 ms more. The query was simple like Child.findAllByParent(parent)
The first thought: indexes! We looked at the database (a postgresql db) and we had indexes on the parent column.
Next: maybe the session cache got too large. But session.flush() and session.clear() did not solve that problem.
Another post suggested using a HQL query. Changing to

Child.executeQuery("select new Child(c.name, c.parent) from Child c where parent=:parent", [parent: parent])

had no effect.
Finally after countless more attempts we tried:

session.setFlushMode(FlushMode.COMMIT)

And not even the query executed in constant time it was also 10x faster?!
Hmmm…why?
The default flush mode in Grails is set to AUTO
Which means that before every query made the session is flushed. Every query regardless of the classes effected. The problem is known for hibernate but after 4! years it is still unresolved.
So my question here is: why did Grails chose AUTO as default?

Diving into Hibernate’s Query Cache behaviour

Hibernate is a very sophisticated OR-Mapper and as such has some overhead for certain usage patterns or raw queries. Through proper usage of caches (hibernates featured a L1, L2 cache and a query cache) you can get both performance and convenience if everything fits together. When trying to get more of our persistence layer we performed some tests with the query cache to be able to decide if it is worth using for us. We were puzzled by the behaviour in our test case: Despite everything configured properly we never had any cache hits into our query cache using the following query-sequence:

  1. Transaction start
  2. Execute query
  3. Update a table touched by query
  4. Execute query
  5. Execute query
  6. Transaction end

We would expect that step 5 would be a cache hit but in our case it was not. So we dived into the source of the used hibernate release (the 3.3.1 bundled with grails 1.3.5) and browsed the hibernate issue tracker. We found the issue and correlated it to the issues HHH-3339 and HHH-5210. Since the fix was simpler than upgrading grails to a new hibernate release we fixed the issue and replaced the jar in our environment. So far, so good, but in our test step 5 still refused to produce a cache hit. Using the debugger strangely enough provided us a cache hit when analyzing the state of the cache and everything. After some more brooding and some println()'s and sleep()‘s we found the reason for the observed behaviour in the UpdateTimestampsCache (yes, yet another cache!):

	public synchronized void preinvalidate(Serializable[] spaces) throws CacheException {
		//TODO: to handle concurrent writes correctly, this should return a Lock to the client
		Long ts = new Long( region.nextTimestamp() + region.getTimeout() );
		for ( int i=0; i
			if ( log.isDebugEnabled() ) {
				log.debug( "Pre-invalidating space [" + spaces[i] + "]" );
			}
			//put() has nowait semantics, is this really appropriate?
			//note that it needs to be async replication, never local or sync
			region.put( spaces[i], ts );
		}
		//TODO: return new Lock(ts);
	}

The innocently looking statement region.nextTimestamp() + region.getTimeout() essentially means that the query cache for a certain “region” (e.g. a table in simple cases) is “invalid” (read: disabled) for some “timeout” period or until the end of the transaction. This period defaults to 60 seconds (yes, one minute!) and renders the query cache useless within a transaction. For many use cases this may not be a problem but our write heavy application really suffers because it works on very few different tables and thus query caching has no effect. We are still discussing ways to leverage hibernates caches to improve the performance of our app.

Small gaps in the grails docs

Just for reference, if you come across one of the following problems:

Validation only datasource

Looking at the options of dbCreate in Datasource.groovy I only found 3 values: create-drop, create or update. But there is a fourth one: validate!
This one helps a lot when you use schema generation with Autobase or doing your schema updates external.

Redirect

Controller.redirect has two options for passing an id to the action id and params, but if you specify both which one will be used?

controller.redirect(id:1, params:[id:2])

Trying this out I found the id supersedes the params.id.

Update:
Thanks to Burt and Alvaro for their hints. I submitted a JIRA issue