Scala: Easier to read (and write), harder to understand?

There is a vivid discussion about Scala’s complexity going on for some weeks now on the web even with a response from Martin Odersky. I want to throw my 2¢ together with some hopefully new aspects into the discussion.
Scala’s liberal syntax rules and compiler magic like type inference and implicit conversions allow nicely written APIs and DSLs almost looking like prose texts. Take a look at APIs like scalatest and imagine the Java/Junit equivalent:

@Test def demonstrateScalaTest() {
  val sb = new StringBuilder("Scala")
  sb.append(" is cool!")
  sb.toString should be ("Scala is cool!")
  evaluating { "concise".charAt(-1) } should produce [StringIndexOutOfBoundsException]
}

There are really nice features that reduce day-to-day programming tasks to keywords or one-liners. Here are some examples:

// singletons have their own keyword (object), static does not exist!
object MySingleton {
  def printMessage {
    println("I am the only one")
 }
}

// lazy initialization/evaluation
lazy val complexResult = computeForHours()

// bean-style data container with two scala properties and one java-bean property with getter+setter
class Data(val readOnly: String, var readWrite: Int, @BeanProperty var javaProperty: String)

// tuples as return values or quick data transfer objects (DTO) for methods yielding multiple data objects
def correctCoords(x: Double, y: Double) = (x + 12, y * 0.5)
val (correctedX, correctedY) = correctCoords(0.37, 34.2)
println("corrected: " + correctedX + ", " + correctedY)

On the other hand there are so many features built-in that really make it hard to understand the code if you are not scala programmer with some experience. I like the differentiation between application and library code Martin Odersky himself makes in Programming Scala. The frameworks I have tried so far (Lift, scalatest and scala-swing) in Scala make your life very easy as long as you just use them. It is really a breeze and much more fun than using most APIs in Java for example. But when something goes wrong or you really want/have to understand what is going on you can have a hard time. This is true at least for a Scala beginner, sometimes perhaps for an pro, too.

Final Thoughts
In my opinion Scala is a very nice language that successfully combines clean object oriented programming with functional features. You can migrate from a pure OO-style to a nice hybrid “Scala-style” like many programmers did when they first used Java mostly with procedural style using classes only as namespaces for their static methods. I am quite sure that a Scala code style and best practices still have to develop. Programmers will need their time diving into the language and using it for their benefit. I hope Scala prospers and gains attention in the industry because I personally think it is a nice step forward compared to Java (which turns more and more into a mess where you need profound knowledge to fight your problems).

Regarding the complexity, which certainly exists in Scala, I only want to raise some questions which may be answered sometime in the future:

  • Maybe the tooling is just not there (yet)?
  • Maybe you sometimes just don’t have to understand everything what’s happening underneath?
  • Maybe Scala makes debugging much more seldom but harder, when something does not work out?
  • Maybe the features and power of Scala are worth learning?
  • Maybe certain features will just be banned by the teams like sometimes in Java teams (think of switch-case, the ?-operator, Autoboxing e.g.)?

Grails: Migrating enum mapping from 1.0 to 1.2 or newer

We have an ongoing long-term web application running on Grails. It started with Grails 1.0.3 and Grails moved on to 1.3.3 in the meantime. Due to time constraints and lack of resources we were not able to update to each new major version. Now, some years later, the time has finally come for us to benefit from all the new features, bugfixes and improvements of the platform. There were quite some changes in behaviour and one of the biggest is the change of how enums are mapped to the database.

In Grails 1.0.x and 1.1.x Java enums were mapped as int values in the database. Starting with Grails 1.2 they are mapped as varchars containing the enum name. Now you have the problem to migrate your existing data over to the new mapping style of the framework. One solution is to use autobase or liquibase migrations to port the enum values for the new mapping style.
Suppose we have the enum Coolness:

pulic enum Coolness {
    COOL, UNCOOL;
}

The SQL for migrating it on the PostgreSQL DBMS looks as simple like this:

alter table foo alter column bar type varchar
    using case when bar = 0 then 'COOL'
            when bar = 1 then 'UNCOOL'
            end

and as an autobase migration it becomes something like this:

changeSet(id: 'change foo bar from int to varchar', author: 'me') {
    preConditions(onFail: "CONTINUE") {
        dbms(type: "postgresql")
    }
    sql("""alter table foo alter column bar type varchar
            using case when bar = 0 then 'COOL'
                    when bar = 1 then 'UNCOOL'
                    end""")
}

We use the precondition to skip our non-persistent in-memory databases we use in development and only apply the change set with persistent test or production databases.

For Oracle and maybe other database systems which do not support altering the column type with the using-clause it may look like

alter table foo add bar_new varchar(255);
update foo set bar_new =
    (case when type = 0 then 'COOL'
            when type = 1 then 'UNCOOL'
      end);
alter table foo drop column bar;
alter table foo rename column bar_new to bar;

I hope this helps when you have to perfom similar migrations some time.

Stay tuned for other changes in API and behaviour between the different Grails framework versions.

Open Source Love Day July 2010

Our Open Source Love Day for July 2010 brought love for Hudson (especially the CMake and Crap4j plugins), RXTX and JUnit.

This friday , we held our Open Source Love Day for July 2010.  We began with several internal meetings and discussion (like the Homepage Comittee meeting) and dived right in our work afterwards. Everybody had a little backlog of issues that we wanted to get done on this day. Nearly everybody succeeded (well, the author had a minor delay – read about it below). The day went by in a very fast pace, but it felt right.

The Open Source Love Day

We introduced a monthly Open Source Love Day (OSLD) to show our appreciation to the Open Source software ecosystem and to donate back. We heavily rely on Open Source software for our projects. We would be honored if you find our contributions useful. Check out our first OSLD blog posting for details on the event itself.

On this OSLD, we accomplished the following tasks:

  • There are really cool new features in the latest JUnit versions and Rules are one of them. What hurt our aesthetic sense was that the field that hold the Rule instance has to be public. Checkstyle was on our side, so we tweaked JUnit to allow all kinds of visibility. You can read about the change needed here: http://github.com/KentBeck/junit/issues#issue/31. The fix is almost trivial and will hopefully be incorporated in the next versions of JUnit, so we do not publish our altered version.
  • We constantly receive requests and remarks about our cmake plugin for Hudson. This lead to a new version of the plugin fixing two issues with matrix builds and custom build types. Head over to the plugin homepage and grab the new version 1.6. The issues were in detail:
    • The plugin can be used with matrix builds now
    • Custom build types can be defined now
  • RXTX is our choice for serial port communication with Java. We fixed some issues during the last few OSLDs, with one issue left for today: When you flush your stream while using a special type of usb-to-rs232 converter, you got an exception. The corresponding issue is #102 in the RXTX issue tracker. We proposed a patch that fixes the problem.
  • Another hudson plugin is our crap4j reporter. It lacked some love for months now and finally broke when used with the latest hudson versions. Fixing the problem was a lot harder than we thought, basically because the plugin needed adjustments to recent API changes and we couldn’t figure out exactly what adjustments are necessary. You might have a look at the developer mailing list thread for this question. Finally, we got it resolved (on sunday, with a sudden stroke of insight) and a new version 0.8 is published.
  • We use an internal time tracking tool for our projects. This tool isn’t specifically open source yet, but continues to grow in terms of features and usability. The work invested in this tool helps us to continue with the OSLD, so it’s beneficial work nonetheless.
  • During the last OSLD, we had plans for a new hudson plugin and even produced a prototype. This time, we looked around the hudson plugin zoo (it’s getting a bit difficult to keep track of all of them) for inspiration and found a wonderful piece of art: The Groovy Postbuild Plugin. Using this plugin with a small groovy script served our needs exactly. No need for a full-blown plugin when you can scratch your itch with a simple script. Thanks to Serban Iordache for his great work!

What were our lessons learnt today?

  • If you need to setup a fresh workspace for an open source project, consider to prepare it over the night before, or the download delay will kill your precious work time. There is nothing more frustrating than staring at a “downloading…” progress bar while being eager to start programming.
  • Always look around what others have done before. We wanted to build a full hudson plugin from scratch when all we needed was a little groovy script placed inside another plugin. Sweet!
  • Do not hesitate to privately fix open source issues that won’t get done in time for you. Just make sure to have a management process in place to track those changes and be able to re-apply them to future versions. More important though, be able to tell exactly when NOT to re-apply them because the original project has fixed the issue.

Retrospective of the OSLD

The OSLD went smooth and was productive. We tend to work on backlogs instead of searching for random issues now, but that’s just a sign that our approach has matured and we depend on the OSLD to get work done.

Last wednesday, we held our Open Source Love Day for June 2010. This one was productive despite the heat that had us sweating the whole day long (as a sidenote: it got even warmer the days afterwards). Some features were finished and will help at least us in our projects. We still look forward for the right way to release them. Another release was even more problematic, you will read about it below.The Open Source Love Day

We introduced a monthly Open Source Love Day (OSLD) to show our appreciation to the Open Source software ecosystem and to donate back. We heavily rely on Open Source software for our projects. We would be honored if you find our contributions useful. Check out our first OSLD blog posting for details on the event itself.

On this OSLD, we accomplished the following tasks:

An Oracle story: Null, empty or what?

One big argument for relational databases is SQL which as a standard minimizes the effort needed to switch your app between different DBMSes. This comes particularily handy when using in-memory databases (like HSQL or H2) for development and a “big” one (like PostgreSQL, MySQL, DB2, MS SqlServer or Oracle) in production. The pity is that there are subtle differences with regard to the interpretation of the SQL-standard when it comes to databases from different vendors.

Oracle is particularily picky and offers quite some interesting behaviours: Most databases (all that I know well) treat null and empty as different values when it comes to strings. So it is perfectly valid to store an empty string in a not-null column and retrieving the string from the column yields an empty string. Not so with Oracle 10g! Inserting null and retrieving the value yields unsurprisingly null, even using Oracle. Inserting an empty string and retrieving the value leaves you with null, too! Oracle does not differentiate between empty strings and null values like a Java developer would expect. In our environment this has led to surprised developers and locally unreproducible bug which clearly exist in production a couple of times.

[rant]Oracle has great features for big installations and enterprises that can afford the support, maintenance and hardware of a serious Oracle DBMS installation. But IMHO it is a shame that such a big player in the market does not really care about the shortcomings of their flagship product and standards in general (Oracle 10g only supports SQL92 entry level!). Oracle, please fix such issues and help us developers to get rid of special casing for your database product![/rant]

The lesson to be learnt here is that you need a clone of the production database for your integration tests or acceptance tests to be really effective. Quite some bugs have slipped into production because of subtle differences in behaviour of our inhouse databases and the ones in production at the customer site.

The C++ Shoot-yourself-in-the-Foot of the Week

I think we can all agree that C++, compared to other languages, provides quite a number of possibilities to screw up. Everybody working with the language at some point probably had problems with e.g. its exception system, operator overloading or automatic type conversions – to name just a few of the darker corners.

There are also many mitigation strategies which all come down to ban certain aspects of the language and/or define strict code conventions. If you follow Google’s style guide, for example, you cannot use exceptions and you are restricted to a very small set of boost libs.

But developers – being humans – often find creative and surprising ways to thwart every good intentions. In an external project the following conventions are in place:

  • Use const wherever you possibly can
  • Use boost::shared_ptr wherever it makes sense.
  • Define typedefs to shared_ptrs  in order to make code more readable.
  • typedefs to shared_ptrs are to be defined like this:
typedef boost::shared_ptr<MySuperDuperClass> MySuperDuperClassPtr;
  • typedefs to shared const pointers are to be defined like this:
typedef boost::shared_ptr<const MySuperDuperClass> MySuperDuperClassCPtr;

As you can see, postfixes Ptr and CPtr are the markers for normal shared_ptrs and constant shared_ptrs.

Last week, a compile error about some const to non-const conversion made me nearly pull my hair out. The types of variables that were involved all had the CPtr postfix but still the code didn’t compile. After a little digging I found that one of the typedefs involved was like this:

typedef boost::shared_ptr<  MySuperDuperClass> MySuperDuperClassCPtr;

Somebody just deleted the const modifier in front of MySuperDuperClass but left the name with the CPtr untouched. And because non-const to const conversions are allowed this was not detected for a couple of weeks. Nice going!

Any suggestions for a decent style checker for c++? Thanks in advance 😉

Looking left and right will improve you as a developer

After initial encounters with computers and programming I pretty much settled with Java as my preferred Language and especially platform. Occasional adventures to C/C++ or other languages and tools do happen but not on a day to day basis. We are mostly a Java shop so this seems natural. However, I strongly suggest looking left and right and trying new stuff, be it a programming language, an operating system, an IDE or a programming framework. Similarily to travelling around in the real world™ it will widen your views on your daily work and give you new ideas on how to improve it. You will try to adapt new good stuff from elsewhere and on the other hand appreciate nice aspects of your current environment more.

Let me give some examples to support my point and motivate adventures outside your home turf. I have been playing around with Scala and therefore functional programming in the past months. One major benefit of these experiments was my new appreciation for immutable types and side effect free code. You can carry them over to many programming environments including Java often making your code easier to test and less error prone. Object-oriented programming (OOP) relies heavily on objects with state and side effects but there are many places where immutability reduces tracking effort of objects and their state. A nice example in Java is the Joda Time library with provides immutable DateTime-classes in contrast to java.util.Date et al. Null-Handling in Scala using the Option-Type seems so interesting that some people try to carry it over to Java as an alternative to Null-Objects or null checks. The rich collection classes and implicit conversions in Scala may encourage you write own utility classes for Java collections, nice wrappers or use alternative collection frameworks to make your life easier. In general, I find wrapping a nice, standard OO-technique often underused outside of frameworks.

You do not always have to stray that far. Some frameworks like Fest and EasyMock show nice usages of fluent interfaces. Why not use this technique in own code? I found them especially useful for implementing builders for complex or highly configurable objects. Fluent interfaces can make your code look a lot like natural language resulting in expressive and highly readably code.

Using Mac OS X with Spotlight™ and TimeMachine™ may make you look for similiar features on your Linux Box (e.g. gDeskbar and BackInTime) or Windows (depending on version available through third party software or built-in). Using the multiple deskops of Linux (or another OS with an X11-Window system) may motivate you to try them in your other working environments (they suck on OS X, though).

Trying different IDEs may increase your effectiveness depending on the language and frameworks used. Grails support may be better in IntelliJ whereas you may like Eclipse more for plain J2SE projects. Sometimes you will find some cool feature in one IDE you were missing before. I encourage you to go back to your environment and look for the same or similar features. Often you will find them in many advanced IDEs.

Without seeing what is possible you may never miss it. But beware thinking everything new is automatically better. The grass is always greener on the other side. Try to reflect on the things you learn and encounter. Take the good parts to improve existing stuff whenever sensible. But also do not fear to move on if it is time.

For me and my colleagues this wandering around in this rich software development world has proven very valuable to continually improve our style and increase our productivity. These adventures in foreign waters coupled with reading books, dev brunches and attending talks on and offline keeps our skills fresh and improving. Sometimes it may even lead to own ideas, APIs and tools.

Step-by-step tutorials and manuals are priceless

Writing documentation in and outside of code annoys most developers. I am no different but after several years in the software development business I have come to value one type of documentation a lot: step-by-step tutorials and manuals. Surprisingly, they are often missing. I want to depict some common use cases where such documents have proven valueable many times:

when learning new tools and frameworks

Most people I know learn best by example. A tutorial with step-by-step explanation of how to setup the tool and then some good examples brings you up to speed quickly. The Lift framework has such a thing and gets you a webapp running in minutes. But when you dive deeper you may find much documentation and how-tos are missing. Books may help but most of the time it is faster and easier to look for examples on the web and in project wikis.

Also beware, that bad examples may cause a lot of frustration (typos are poison for newbies) and teach bad style or obsolete techniques. Keep the tutorials up-to-date (I know, thats one problem of documentation….) and as accurate as possible and many people will have a better time working with the tools and frameworks.

The same is true for developers in your team: Many will look for examples in the existing code base and learn from them be they good or bad. So keep the code clean and full of good examples. Let your seniors spread them in the teams. Do not forget to provide good examples and best practices in your wiki or other documentation system.

when performing manual tasks

We often have to manage and deploy production systems for our customers. Even though many tasks are automated by scripts and other tools you sometimes need to perform manual tasks. Especially in an stress situation at customer site or when otherwise dealing with production systems it is extremely helpful to have a simple and precise guide for the task at hand. There are enough unknowns and problems that may need brain power so the basic tasks and procedures should be well documented. That way you have one less thing to worry about and are ready to face potential upcoming problems. A good guide gives you much needed confidence.

when trying to work on an open source project

We regularily (see our OSLDs) dive into some open source project to improve it. Nothing is more frustrating than spending more than half a day with setting up the development environment. Overboarding dependencies, works-on-my-machine-style build scripts and missing documentation and tutorials prevent a productive experience. That hurts the themselves projects by driving away potential contributors. We had several such experiences but also many positive ones like Hudson plugin development or EGit where you can get up to work in minutes and perform your first monkey see – monkey do experiments.

Conclusion

Much documentation generated or written nowadays is not worth 2 cents. API-Docs which list the classes, methods and fields and no additional info and descriptions provide no use and just waste the time of people trying to find information there. Highlevel blabla or theoretical dissertations are all nice but do not help much getting the job done (at least not in the beginning). But small guides and tutorials written to the point really do make a difference regardless if they are written for inhouse work and development or for openly available projects. Choose the kind of documentation you write wisely.

Wrestling with Qt’s Model/View API – Filtering in Tree Models

Qt4’s model/view API can be kind of a challenge sometimes. Well, prepare for a even harder fight when sorting and filtering come into play.

As I described in one of my last posts, Qt4’s model/view API can be kind of a challenge sometimes. Well, prepare for a even harder fight when sorting and filtering come into play.

Let’s say you finally managed to get the data into your model and to provide correct implementations of the required methods in order for the attached views to display it properly. One of your next assignments after that is very likely something like implementing some kind of sorting and filtering of the model data. Qt provides a simple-at-first-sight proxy architecture for this with API class QSortFilterProxyModel as main ingredient.

Small preliminary side note: Last time I checked it was good OO practice to have only one responsibility for a given class. And wasn’t that even more important for good API design? Well, let’s not distract us with such minor details.

With my model implementation, none of the standard filtering mechanisms, like setting a filter regexp, were applicable, so I had to override method

QVariant filterAcceptsRow ( int source_row, const QModelIndex& sourceParent ) const

in order to make it work. Well, the rows disappeared as they should, but unfortunately so did all the columns except the first one. So what to do now? One small part of the documentation of QSortFilterProxyModel made me a little uneasy:

“… This simple proxy mechanism may need to be overridden for source models with more complex behavior; for example, if the source model provides a custom hasChildren() implementation you should also provide one in the proxy model.”

What on earth should I do with that? “… may need to be overridden“? “… for example.. hasChildren()…” Why can’t they just say clearly what methods must be overridden in which cases???

After a lot more trial and error I found that for whatever reason,

int columnCount ( const QModelIndex& parent ) const

had to be overridden in order for the columns to reappear. The implementation looks like what I had thought the proxy model would do already:

int MyFilter::columnCount ( const QModelIndex& parent ) const
{
   return sourceModel()->columnCount(parent);
}

So beware of QSortFilterProxyModel! It’s not as easy to use as it looks, and with that kind of fuzzy documentation it is even harder.

Database Versioning with Liquibase

In my experience software developers and database people do not fit too well together. The database guys like to think of their database as a solid piece and dislike changing the schema. In an ideal world the schema is fixed for all time.

Software developers on the other hand tend to think about everything as a subject to change. This is even more true for agile teams embracing refactoring. Liquibase is a tool making database refactorings feasible and revertable. For the cost of only one additional jar-file you get a very flexible tool for migrating from one schema version to another.

Using Liquibase

  • You formulate the changes in XML, plain SQL or even as custom java migration classes. If you are careful and sometimes provide additional information your changes can be made rollbackable so that changing between schema revisions becomes a breeze.
  • To apply the changes you simply run the liquibase.jar as a standalone java application. You can specify tags to update or rollback to or the count of changesets to apply. This allows putting the database in an arbitrary state within changeset granularity.

Additional benefits

  • An important benefit of Liquibase is that you can easily put all your changesets under version control so that they are managed exactly the same as the rest of the application.
  • Liquibase stores the changelog directly in the database in a table called databasechangelog. This enables the developer and the application to check the schema revision of the database and thus find inkonsistent states much easier.

Conclusion
All of the above is especially useful when multiple installations or development/test databases with different verions of the software and therefore database have to be used at the same time. Tracking the changes to the database in the repository and having a small cross platform tool to apply them is priceless in many situations.

Readability of Guard Clauses in Methods

A little story about two opinions on readability of methods containing if-clauses.

Browsing through the code base of one of our customers I frequently stumbled over methods that were roughtly structured like this:

void theMethod
{
  if (some_expression)
  {
    // rest of the method body
    // ...
  }
  // no more code here!
}

And most of the time I was tempted to refactor the method using a guard clause, like so:

void theMethod
{
  if (!some_expression)
  {
    return;
  }
  // rest of the method body
  // ...
}

because this is far more readable for me. When I noticed that the methods were written all by the same guy I told him about by refactoring ideas in absolute certainty that he would agree with me. It came as quite a surprise when, in fact, he didn’t agree with me, at all. Even something like this:

void theMethod
{
  if (some_expression)
  {
    // some code
    // ...
    if (another_expression)
    {
      // some more code
      // ...
    }
    // no more code here ..
  }
  // ... and here
}

was in his eyes far more readable than the refactored version with guard clauses. His rational was that guard clauses make it harder for to see the program flow through the method. And a nested if(…) structure like above was very suitable to express slightly more complicated flows.

All my talks about crappy methods and the downsides of highly indented code were not able to change his mind.

I admit that I can somewhat understand his point about the visibility of the program flow through the method.  And sure, the (nested) ifs increase indentation and the number of possible code paths but since there are no elses and no code after the if-blocks, does that really increase the overall complexity?

Well, I still would prefer smaller methods with guard clauses but as you can see, to a great extend readability lies in the eyes of the beholder.

What do you find readable?