One of the biggest problems of caches is how and when do I invalidate my cache content? When you read outdated data from the cache you are toast.
For example we have a list of children elements inside a parent. Normally you would cache the children under the parent’s id:
cache[parent.id] = children
But how do you know if your cache content is still valid? When one child or the list of children changes you write the new content into the cache
cache[parent.id] = newChildren
But when do you update the cache? If you place the update code where the list of children is modified the cache is updated before transaction has ended. You break the isolation. Another point would be after the transaction has been committed but then you have to track all changes. There is a better way: use a timestamp from the database which is also visible to other transactions when it is committed. It should also be in the parent object because you need this object for the cache key nonetheless. You could use lastUpdated or another timestamp for this which is updated when the children collection changes. The cache key is now:
cache[parent.id + '_' + parent.lastUpdated]
Now other transactions read the parent object and get the old timestamp and so the old cache content before the transaction is committed. The transaction itself gets the new content. In Grails if you change the collection lastUpdated is automatically updated and in Rails with belongs_to and touch even a change in a child updates the lastUpdate of the parent – no manual invalidation needed.
Excourse: using memcached with Grails
If you want to use memcached from the JVM there is a good library which wraps common calls: spymemcached. If you want to use spymemcached from Grails you drop the jar into your lib folder and wrap it in a Service:
class MemcachedService implements InitializingBean { static final Object NULL = "NULL" def MemcachedClient memcachedClient def void afterPropertiesSet() { memcachedClient = new MemcachedClient( new ConnectionFactoryBuilder().setTranscoder(new CustomSerializingTranscoder()).build(), AddrUtil.getAddresses("localhost:11211") ) } def connected() { return !memcachedClient.availableServers.isEmpty() } def get(String key) { return memcachedClient.get(key) } def set(String key, Object value) { memcachedClient.set(key, 600, value) } def clear() { memcachedClient.flush() } }
Spymemcached serializes your cache content so you need to make all your cached classes implement Serializable. Since Grails uses its own class loaders we had problems with deserializing and used a custom serializing transcoder to get the right class loader (taken from this issue):
public class CustomSerializingTranscoder extends SerializingTranscoder { @Override protected Object deserialize(byte[] bytes) { final ClassLoader currentClassLoader = Thread.currentThread().getContextClassLoader(); ObjectInputStream in = null; try { ByteArrayInputStream bs = new ByteArrayInputStream(bytes); in = new ObjectInputStream(bs) { @Override protected Class<ObjectStreamClass> resolveClass(ObjectStreamClass objectStreamClass) throws IOException, ClassNotFoundException { try { return (Class<ObjectStreamClass>) currentClassLoader.loadClass(objectStreamClass.getName()); } catch (Exception e) { return (Class<ObjectStreamClass>) super.resolveClass(objectStreamClass); } } }; return in.readObject(); } catch (Exception e) { e.printStackTrace(); throw new RuntimeException(e); } finally { closeStream(in); } } private static void closeStream(Closeable c) { if (c != null) { try { c.close(); } catch (IOException e) { e.printStackTrace(); } } } }
With the connected method you can check if any memcached instances are available. Which is better than calling a method and waiting for the timeout.
def connected() { return !memcachedClient.availableServers.isEmpty() }
Now you can inject your Service where you need to and cache along.
Cache the outermost layer
If you use Hibernate you get database based caching almost for free, so why bother using another cache? In one application we used Hibernate to fetch a large chunk of data from the database and even with caches it took 100 ms. Measuring the code showed that the processing of the data (conversion for the client) took by far the biggest chunk. Caching the processed data lead to 2 ms for the whole request. So one take away is here that caching the result of (user indepedent) calculations and conversions can speed up your request even further. When you got static resources you could also use HTTP directives.
One thought on “Scaling your web app: Cache me if you can”