Simple code, subtle bugs: write unit tests!

We developed an applet some years ago that was supposed to run 24/7. Basically, it fetches some values periodically and plots them in a chart. It always shows the last 24 hours. Everything seemed to work fine but every few weeks it crashed. The problems were clouded by hardware instabilities and the use of some obscure JVM. After quite a bit of analysis it became clear that it was a bug in our applet: a memory leak. Here is the troublemaker code:

for (int i = 0; i < dataSeries.getItemCount(); i++) {
    XYDataItem dataItem = dataSeries.getDataItem(i);
    if (dataItem.getX().longValue() < startDate.getTime().getTime()) {
        dataSeries.remove(i);
    } else {
        break;
    }
}
dataSeries.add(newDataPoint);

This simple and innocently looking piece of code essentially removes old items from a list by index. Many of you may already have spotted the main problem leading to the leak. Removing items by index means that following items shift one place towards the head of the list. As the index is incremented one element is skipped by the next iteration. This added up over time and lead to an OutOfMemoryError after some weeks.

Now, even if the code is not great it does relatively look straightforward on first sight, yet it is not and is contains errors. Making things worse, the code was buried somewhere in some tangled logic. This leads me to the point of my post:

Write unit tests even for relatively simple code.

Most likely writing a unit test for this cleanup work would have led to a class that manages the dataSeries and nothing more. Quite easy to understand and test in isolation. The problem would never have slipped into production and caused months of  investigating different stability problems.

The programmer may not have been aware of iterators but he should have made sure that this block works as intended. The best way to do this is automated unit tests. They make sure that your building blocks are as solid as you expect them to be. Use acceptance testing to make sure you have put your building blocks together the right way. Together unit and acceptance tests will save countless hours hunting regressions.

Reversing an array in Java

Reversing an array is a popular interview question especially in languages like C. Some days ago I faced the problem in a Java legacy project I was maintaining. As a Java guy I did not want to fiddle with a for-loop and indexes. So I looked for another solution. Besides showing the one-liner I want to give some insights which some might not be aware of. Here is my solution:

public static <T> T[] reverse(T[] array) {
    Collections.reverse(Arrays.asList(array));
    return array;
}
  1. This approach works, because the ArrayList implementation Arrays.asList() returns a specially tailored ArrayList which writes the changes right through to the backing array. It is not a java.util.ArrayList!
  2. The above means that the array is changed in-place and no array copying is involved. The approach thus has good performance it that matters.
  3. My solution directly changes the given array. To make it free of side-effects you could create a new array and copy the original one into it before converting to a list and reversing.

Side note regarding Arrays.asList()

Arrays.asList() is nice for bridging to old code and APIs. It has to be used with caution though: If you read the API documentation and keep 1.) in mind you will notice, that the returned list is fixed size. So calls to add() and remove() will fail with an UnsupportedOperationException! Passing such lists around in your system using the java.util.List interface may lead to unexpected behaviour since most people expect lists to be of variable sized in Java. In our use case we do not return the resulting list to the caller but only use it temporarily and locally. So this is no problem here.

Conclusion

Reversing an array can be a nice interview question for Java candidates too as you can discuss about in-place reversing, memory and time complexity and different approaches. It may show their knowledge of the collection API and the utility classes. If the above approach is discovered there is also something to discuss as depicted in this article.

Grails upgrade – lessons learned

Grails is a great framework for rapidly developing web applications on top the proven java/servlet platform. Especially smaller, short-lived projects can be a real breeze with all the scaffolding, GORM and convention-over-configuration built into grails.

We happen to use it for a quite complex web application project for almost 3 years now. Half a year ago we upgrade from grails 1.0.x to 1.3.4. That makes 3 major versions in one upgrade step and produced obviously a lot of work and many small bugs. I do not want to put the blame on the grails guys here, because most of the stuff was mentioned in the release notes and it was a big step we decided to take when the decision came to continue the project for several years to come.

Our upgrade policy changed due to that experience and we try to stay a lot more current to be able to adapt our software to framework changes more incrementally. Some weeks ago we upgraded from 1.3.4 to 1.3.7 and this experience was not pleasant. Even though we skimmed through the changelogs and release notes and thought the update should be uncritical for us grails behaviour changed in two aspects which broke things for us:

  1. An API-change where GroovyPagesException was changed to GrailsTagException
  2. Behavioural change where no application context and injections are available in functional tests anymore

Item 1 was easy to fix but you need really good testing to spot it before it slips into production. Such subtle API changes should not happen in micro-version updates as that can easily break parts of the system whithout you knowing because of groovy/grails’ dynamic nature. No compiler saves you here.

Item 2 produced some amount of work for us because we build a quite extensive acceptance test suite using services and domain objects to setup the initial environment for each test. Luckily, there is the grails remote control plugin which you can use for things like that.

Lessons learned

  • You should have extensive automated test suites when developing a grails application over a longer period because things can break in unexpected ways without code changes on your side.
  • Try to plan upgrades some time ahead of releases and dedicate time to scanning the release notes and actually performing the upgrade. It may take you significantly longer than the smooth upgrade procedure itself suggests.

The grails team seems to be increasingly aware of backwards compatibility but they still have some way to go. We hope and expect to see fewer unexpected breakages to occur in the future.

Acceptance testing a grails app with selenium-rc

Having extensive acceptance tests is the basis of delivering high quality releases with very few regressions for long time projects. This is even more true when your environment uses dynamically typed languages and changing requirements. One of our Grails projects is running for several years now and continues to evolve and grow.  We are in dire need of more acceptance tests and especially their automated execution. Manual testing is not feasible and our coverage through unit and integration tests is not enough. We have a nice set of Selenium IDE acceptance tests already though. They were executed very infrequently which let some bugs slip through into production.

I want to describe our approach to automated and extensive acceptance testing below:

  1. We create the acceptance tests using capture & replay with selenium IDE. This is a fast way to exercise a new feature through a repeatable test.
  2. We think that maintaining the tests in code offers much more flexiblity and is easier to run in continuous integration (CI) than maintaining the selenium IDE html code. So we export the captured test to play nicely with the grails selenium-rc plugin. Kurt Harriger explains the setup and usage of the selenium-rc grails plugin. You need to make some changes to the exported code for everything to work nicely:
    1. Delete or change package declaration.
    2. Choose a grails functional test compatible file name like MyAcceptanceTests.groovy. We use the Junit4 export but Groovy export works also because the difference is only marginal and both must be adjusted in some places.
    3. Change the class name to match the file name without extension if they are not equal already.
    4. Change the exported test to extends from GroovyTestCase instead of *SeleneseTestCase.
    5. Add the @Mixin(SeleniumAware) annotation to the test class.
    6. Remove the setup and teardown methods.
    7. Replace verifyTrue() and friends with junit assertions.
  3. Each test has to setup it’s initial state. This leads to independent acceptance tests at the expense of some longer running time but is well worth the cost imho.
  4. The resulting selenium-rc tests can be run easily using grails test-app -functional and thus integration in the build process is pretty straighforward. We currently use ant to wrap the grails calls, but other ways may be more feasible depending on your infrastructure.

The end result is fast creation of acceptance tests and much flexibility setting up the test fixture and maintaining the tests. Using the grails plugin you gain easy execution of the tests on the developer machines and CI servers as well. With extensive automated acceptance tests the danger of regressions is greatly reduced. But be sure to not neglect unit and integration tests!

Is Groovy++ already doomed?

<disclaimer>I really like Groovy and other cool languages like Scala, Fantom, Gosu or Clojure targetting the JVM.</disclaimer>

I know the title is a bit provocative but I want to express a concern regarding Groovy++. In my perception most people think of Groovy++ as an extension for Groovy which trades dynamic dispatching for static typing and dispatching yielding performance. So you can just look for hot spots in your code and resolve them with some annotations. Sounds nice, doesn’t it?.

That seems to be the promise of Groovy++ but it isn’t. Alex Tkachman, the founder of the Groovy++ project states this clearly in this comment to an issue with Groovy++: “100% compatibility with regular Groovy is nice when possible and we do our best to keep it but it is not a must.”.

Imho the mentioned issue together with this statement reduces the target audience to a few people who think of Groovy++ as a better Java, not a faster and type-safe Groovy where needed. I do not think there are too many people thinking that way. I think wide adoption of such a Groovy++ will not happen given the alternatives mentioned in the disclaimer above and Groovy itself. I hope they will strive for 100% compatibility with Groovy…

Diving into Hibernate’s Query Cache behaviour

Hibernate is a very sophisticated OR-Mapper and as such has some overhead for certain usage patterns or raw queries. Through proper usage of caches (hibernates featured a L1, L2 cache and a query cache) you can get both performance and convenience if everything fits together. When trying to get more of our persistence layer we performed some tests with the query cache to be able to decide if it is worth using for us. We were puzzled by the behaviour in our test case: Despite everything configured properly we never had any cache hits into our query cache using the following query-sequence:

  1. Transaction start
  2. Execute query
  3. Update a table touched by query
  4. Execute query
  5. Execute query
  6. Transaction end

We would expect that step 5 would be a cache hit but in our case it was not. So we dived into the source of the used hibernate release (the 3.3.1 bundled with grails 1.3.5) and browsed the hibernate issue tracker. We found the issue and correlated it to the issues HHH-3339 and HHH-5210. Since the fix was simpler than upgrading grails to a new hibernate release we fixed the issue and replaced the jar in our environment. So far, so good, but in our test step 5 still refused to produce a cache hit. Using the debugger strangely enough provided us a cache hit when analyzing the state of the cache and everything. After some more brooding and some println()'s and sleep()‘s we found the reason for the observed behaviour in the UpdateTimestampsCache (yes, yet another cache!):

	public synchronized void preinvalidate(Serializable[] spaces) throws CacheException {
		//TODO: to handle concurrent writes correctly, this should return a Lock to the client
		Long ts = new Long( region.nextTimestamp() + region.getTimeout() );
		for ( int i=0; i
			if ( log.isDebugEnabled() ) {
				log.debug( "Pre-invalidating space [" + spaces[i] + "]" );
			}
			//put() has nowait semantics, is this really appropriate?
			//note that it needs to be async replication, never local or sync
			region.put( spaces[i], ts );
		}
		//TODO: return new Lock(ts);
	}

The innocently looking statement region.nextTimestamp() + region.getTimeout() essentially means that the query cache for a certain “region” (e.g. a table in simple cases) is “invalid” (read: disabled) for some “timeout” period or until the end of the transaction. This period defaults to 60 seconds (yes, one minute!) and renders the query cache useless within a transaction. For many use cases this may not be a problem but our write heavy application really suffers because it works on very few different tables and thus query caching has no effect. We are still discussing ways to leverage hibernates caches to improve the performance of our app.

Statement against public fields in Java

Every once in a while I talk to people about coding style and sooner or later there is discussion about public fields and getters/setters in Java. I would like to elaborate my opinion on this issue in addition to other quite well balanced articles to a broader audience.

First I want to differentiate properties of a class from other fields/member variables. Properties are fields, whose values are useful and important to clients of the class. We consciously decide to break encapsulation here and provide this data to our clients. The size of a collection may serve as a nice example. Fields on the other hand store state or dependencies our class needs to be fully operational. Datastructures like arrays, data access objects (DAOs) or some kind of notification service may serve as examples here.

The internal implementation of both, properties and fields, should never be exposed because this truely breaks encapsulation and takes away the freedom of the class implementor to change their implementation. At a later time you may decide to compute a value or read it from a database instead of storing it directly . On the other hand properties themselves may well be public and belong to the API of our class.

Now on to Java. There is no native property support in the Java language as it does not support the uniform access principle using language constructs. In other languages like Python, Ruby, Groovy or Scala you can change from direct field access to accessor methods without changing the clients, so it is no problem to expose fields (or more precisely properties) and thus make them public or protected. To gain the same degree of freedom in Java you have to emulate properties by using the getter/setter convention of Java Beans. You have to trade conciseness of public fields against this freedom and you really should do it. An IDE can generate the accessors and fold the methods away from your sight. The cost of getters/setters is really negligible.

Now we can derive the conclusions for Java programers. With each member variable you introduce you have to decide if it is a property or just some internal field. For properties you may provide getters and/or setters with appropriate visibility when needed. That means you should not provide accessor methods for all of your fields. In general you should never expose fields directly and all instance variables should be private. Not doing so will remove the freedom to change class internals without affecting the clients. Once a class with exposed internals is published as part of an API it is almost impossible to change internal design decisions.

Scala: Easier to read (and write), harder to understand?

There is a vivid discussion about Scala’s complexity going on for some weeks now on the web even with a response from Martin Odersky. I want to throw my 2¢ together with some hopefully new aspects into the discussion.
Scala’s liberal syntax rules and compiler magic like type inference and implicit conversions allow nicely written APIs and DSLs almost looking like prose texts. Take a look at APIs like scalatest and imagine the Java/Junit equivalent:

@Test def demonstrateScalaTest() {
  val sb = new StringBuilder("Scala")
  sb.append(" is cool!")
  sb.toString should be ("Scala is cool!")
  evaluating { "concise".charAt(-1) } should produce [StringIndexOutOfBoundsException]
}

There are really nice features that reduce day-to-day programming tasks to keywords or one-liners. Here are some examples:

// singletons have their own keyword (object), static does not exist!
object MySingleton {
  def printMessage {
    println("I am the only one")
 }
}

// lazy initialization/evaluation
lazy val complexResult = computeForHours()

// bean-style data container with two scala properties and one java-bean property with getter+setter
class Data(val readOnly: String, var readWrite: Int, @BeanProperty var javaProperty: String)

// tuples as return values or quick data transfer objects (DTO) for methods yielding multiple data objects
def correctCoords(x: Double, y: Double) = (x + 12, y * 0.5)
val (correctedX, correctedY) = correctCoords(0.37, 34.2)
println("corrected: " + correctedX + ", " + correctedY)

On the other hand there are so many features built-in that really make it hard to understand the code if you are not scala programmer with some experience. I like the differentiation between application and library code Martin Odersky himself makes in Programming Scala. The frameworks I have tried so far (Lift, scalatest and scala-swing) in Scala make your life very easy as long as you just use them. It is really a breeze and much more fun than using most APIs in Java for example. But when something goes wrong or you really want/have to understand what is going on you can have a hard time. This is true at least for a Scala beginner, sometimes perhaps for an pro, too.

Final Thoughts
In my opinion Scala is a very nice language that successfully combines clean object oriented programming with functional features. You can migrate from a pure OO-style to a nice hybrid “Scala-style” like many programmers did when they first used Java mostly with procedural style using classes only as namespaces for their static methods. I am quite sure that a Scala code style and best practices still have to develop. Programmers will need their time diving into the language and using it for their benefit. I hope Scala prospers and gains attention in the industry because I personally think it is a nice step forward compared to Java (which turns more and more into a mess where you need profound knowledge to fight your problems).

Regarding the complexity, which certainly exists in Scala, I only want to raise some questions which may be answered sometime in the future:

  • Maybe the tooling is just not there (yet)?
  • Maybe you sometimes just don’t have to understand everything what’s happening underneath?
  • Maybe Scala makes debugging much more seldom but harder, when something does not work out?
  • Maybe the features and power of Scala are worth learning?
  • Maybe certain features will just be banned by the teams like sometimes in Java teams (think of switch-case, the ?-operator, Autoboxing e.g.)?

Grails: Migrating enum mapping from 1.0 to 1.2 or newer

We have an ongoing long-term web application running on Grails. It started with Grails 1.0.3 and Grails moved on to 1.3.3 in the meantime. Due to time constraints and lack of resources we were not able to update to each new major version. Now, some years later, the time has finally come for us to benefit from all the new features, bugfixes and improvements of the platform. There were quite some changes in behaviour and one of the biggest is the change of how enums are mapped to the database.

In Grails 1.0.x and 1.1.x Java enums were mapped as int values in the database. Starting with Grails 1.2 they are mapped as varchars containing the enum name. Now you have the problem to migrate your existing data over to the new mapping style of the framework. One solution is to use autobase or liquibase migrations to port the enum values for the new mapping style.
Suppose we have the enum Coolness:

pulic enum Coolness {
    COOL, UNCOOL;
}

The SQL for migrating it on the PostgreSQL DBMS looks as simple like this:

alter table foo alter column bar type varchar
    using case when bar = 0 then 'COOL'
            when bar = 1 then 'UNCOOL'
            end

and as an autobase migration it becomes something like this:

changeSet(id: 'change foo bar from int to varchar', author: 'me') {
    preConditions(onFail: "CONTINUE") {
        dbms(type: "postgresql")
    }
    sql("""alter table foo alter column bar type varchar
            using case when bar = 0 then 'COOL'
                    when bar = 1 then 'UNCOOL'
                    end""")
}

We use the precondition to skip our non-persistent in-memory databases we use in development and only apply the change set with persistent test or production databases.

For Oracle and maybe other database systems which do not support altering the column type with the using-clause it may look like

alter table foo add bar_new varchar(255);
update foo set bar_new =
    (case when type = 0 then 'COOL'
            when type = 1 then 'UNCOOL'
      end);
alter table foo drop column bar;
alter table foo rename column bar_new to bar;

I hope this helps when you have to perfom similar migrations some time.

Stay tuned for other changes in API and behaviour between the different Grails framework versions.

An Oracle story: Null, empty or what?

One big argument for relational databases is SQL which as a standard minimizes the effort needed to switch your app between different DBMSes. This comes particularily handy when using in-memory databases (like HSQL or H2) for development and a “big” one (like PostgreSQL, MySQL, DB2, MS SqlServer or Oracle) in production. The pity is that there are subtle differences with regard to the interpretation of the SQL-standard when it comes to databases from different vendors.

Oracle is particularily picky and offers quite some interesting behaviours: Most databases (all that I know well) treat null and empty as different values when it comes to strings. So it is perfectly valid to store an empty string in a not-null column and retrieving the string from the column yields an empty string. Not so with Oracle 10g! Inserting null and retrieving the value yields unsurprisingly null, even using Oracle. Inserting an empty string and retrieving the value leaves you with null, too! Oracle does not differentiate between empty strings and null values like a Java developer would expect. In our environment this has led to surprised developers and locally unreproducible bug which clearly exist in production a couple of times.

[rant]Oracle has great features for big installations and enterprises that can afford the support, maintenance and hardware of a serious Oracle DBMS installation. But IMHO it is a shame that such a big player in the market does not really care about the shortcomings of their flagship product and standards in general (Oracle 10g only supports SQL92 entry level!). Oracle, please fix such issues and help us developers to get rid of special casing for your database product![/rant]

The lesson to be learnt here is that you need a clone of the production database for your integration tests or acceptance tests to be really effective. Quite some bugs have slipped into production because of subtle differences in behaviour of our inhouse databases and the ones in production at the customer site.