Database Versioning with Liquibase

In my experience software developers and database people do not fit too well together. The database guys like to think of their database as a solid piece and dislike changing the schema. In an ideal world the schema is fixed for all time.

Software developers on the other hand tend to think about everything as a subject to change. This is even more true for agile teams embracing refactoring. Liquibase is a tool making database refactorings feasible and revertable. For the cost of only one additional jar-file you get a very flexible tool for migrating from one schema version to another.

Using Liquibase

  • You formulate the changes in XML, plain SQL or even as custom java migration classes. If you are careful and sometimes provide additional information your changes can be made rollbackable so that changing between schema revisions becomes a breeze.
  • To apply the changes you simply run the liquibase.jar as a standalone java application. You can specify tags to update or rollback to or the count of changesets to apply. This allows putting the database in an arbitrary state within changeset granularity.

Additional benefits

  • An important benefit of Liquibase is that you can easily put all your changesets under version control so that they are managed exactly the same as the rest of the application.
  • Liquibase stores the changelog directly in the database in a table called databasechangelog. This enables the developer and the application to check the schema revision of the database and thus find inkonsistent states much easier.

Conclusion
All of the above is especially useful when multiple installations or development/test databases with different verions of the software and therefore database have to be used at the same time. Tracking the changes to the database in the repository and having a small cross platform tool to apply them is priceless in many situations.

Don’t let the tools use you

In today’s software development we use many tools to help us doing our job. Many of them are indispensable. But don’t make yourself a slave of the tool.

In today’s software development we use many tools to help us doing our job. Many of them are indispensable. But don’t make yourself a slave of the tool.
When we create schedules based on our estimates we have a great way to communicate how much work we think needs to be done. But don’t let the schedule fool you: you made an estimate based on your knowledge before you began to work. Maybe you need to change the infrastructure, maybe the new change interferes with a past one, in short you gain knowledge on your way. This knowledge can lead you to adapt or even rework your schedule. Your usual approach for accounting unforeseen obstacles might be to include a risk “pad”. Why not use your new gained knowledge to communicate with your client? Maybe a small change to the planned feature is much easier to implement and the client doesn’t even need the extra flexibility. Or the feature is not that important or worth. Don’t try to hold the schedule at all costs. Estimates are a communication tool not a promise we need to fulfill. (related: read also a 37signals post on It’s not a promise it’s a guess)
Or take a look at your favorite editor or IDE: if you happen to use a new language, source code management system or another new tool, the chances are high that your IDE doesn’t support all features or any features at all. You may be tempted to adapt your way to what your IDE supports. If that suits you it’s definitely ok but don’t let the IDE determine how you make use of the features.
When you use software metrics to measure and control the quality of your code you get a lot of numbers and graphs and hints. Use them when you think they benefit you. Don’t just adhere to them, they are a guideline for you, not a law.

Gesture Touchscreens might render Paper Prototyping useless

With the advent of highly dynamic, gesture-controlled user interfaces for touchscreens, Paper Prototyping seems to lose its applicability.

Paper Prototyping is an highly effective tool to examine the usability of software, even before it is written. The basic idea of Paper Prototyping is that you perform real (software) tasks with real users, but replace everything technical with low-fi substitutes.

Adventures in Low-fi

A typical Paper Prototyping session looks like this:

  • The user gets his task description and has to perform it with the software
  • The computer screen is replaced by an arrangement of paper sheets on a desk
  • The graphical user interface is replaced by hand-drawn copies on paper
  • The computer itself is replaced by a human, mimicking the software responses to input
  • The user operates by finger-pointing or writing with a pen

Advantages of Paper Prototyping

The whole situation described above seems awkward on first look, but is really rewarding for a project in its early stages. The customer has to provide real end user tasks and enough details of the solution to make up a prototype. The team has to produce reasonable drafts of the software GUI and come up with enough understanding of the processes and tasks involved to survive the session without major outages.

The result of a Paper Prototyping session can be used in various ways:

  • Detailed specification of the GUI
  • Use Case or User Story (acted out already)
  • Mock screenshots for the user manual
  • Data for initial acceptance tests

Classical user interface vs. gestures

This approach worked very well as long as the user only had a mouse (pointing, clicking) and a keyboard (writing) at hands. Even then, advanced features like grabbing (drag&drop) or automatic scrolling challenged the creativity of the prototypers. But most GUIs were rather dull and static. The perfect playground for Paper Prototyping.

With the advent of touchscreens, we soon realized that pointing and clicking only needs one finger out of ten. Gestures were introduced to keep all our fingertips busy and to enrich the interaction between user and GUI. We instantly understand the “zoom in” or “scroll down” gestures because it resembles natural behaviour (at least for some of us).

In the wake of gestures, the GUI of our software gets more and more dynamic. The GUI has to be minimalistic so we can control it even with stubby fingers (the new handicap of our generation, compare cell phone key pads). Detailed information has to be provided on demand and only temporarily. Everything can be manipulated. The classical approach to tab through a form (by carefully designed tab orders) isn’t that suitable anymore.

Gestures vs. Paper Prototyping

When using a Paper Prototype, the throughput of scribbled paper is enormous even with the classical GUIs. The more dynamic some dialog is, the more different parts need to be prepared in various states and locations (depending on the fragmentation of the paper screen). With gestures on a touchscreen, the user needs to be able to express them on the screen. Most touchscreen interfaces heavily depend on the (simulated) physical interaction between the fingers and some drawn “objects” on the interface. This is the moment when Paper Prototyping falls short of resembling the real interaction. You just can’t fiddle that fast with all the paper shreds.

No solution yet

I observed this effect when performing a Paper Prototype workshop with my students. The interfaces with classical mouse/keyboard handling performed well in the sessions. Interfaces for touchscreens (iPhone apps were the big newcomer here) just didn’t work out well, especially when downsized to palm size. We weren’t able to come up with a viable solution to make Paper Prototyping perform again for touchscreens and gestures.

Any ideas out there?

Readability of Guard Clauses in Methods

A little story about two opinions on readability of methods containing if-clauses.

Browsing through the code base of one of our customers I frequently stumbled over methods that were roughtly structured like this:

void theMethod
{
  if (some_expression)
  {
    // rest of the method body
    // ...
  }
  // no more code here!
}

And most of the time I was tempted to refactor the method using a guard clause, like so:

void theMethod
{
  if (!some_expression)
  {
    return;
  }
  // rest of the method body
  // ...
}

because this is far more readable for me. When I noticed that the methods were written all by the same guy I told him about by refactoring ideas in absolute certainty that he would agree with me. It came as quite a surprise when, in fact, he didn’t agree with me, at all. Even something like this:

void theMethod
{
  if (some_expression)
  {
    // some code
    // ...
    if (another_expression)
    {
      // some more code
      // ...
    }
    // no more code here ..
  }
  // ... and here
}

was in his eyes far more readable than the refactored version with guard clauses. His rational was that guard clauses make it harder for to see the program flow through the method. And a nested if(…) structure like above was very suitable to express slightly more complicated flows.

All my talks about crappy methods and the downsides of highly indented code were not able to change his mind.

I admit that I can somewhat understand his point about the visibility of the program flow through the method.  And sure, the (nested) ifs increase indentation and the number of possible code paths but since there are no elses and no code after the if-blocks, does that really increase the overall complexity?

Well, I still would prefer smaller methods with guard clauses but as you can see, to a great extend readability lies in the eyes of the beholder.

What do you find readable?

Blog harvest, February 2010

Some noteworthy blog articles, harvested for February 2010. If you ever asked yourself about the personality of your web framework, you’ll find the link to the answer here.

After the move to the new office is nearly complete, work begins to normalize again. Here is the February blog harvest with a little more entries, as I wasn’t unable to read other blogs, but to write on our own blog. There are many fun articles this time that I found share-worthy, perhaps because they made me laugh even in harder times.

This was the more serious part of this harvesting. Let’s read some articles that share their message in a lighter way:

  • What kind of woman would your web framework be? – If you ever have to sell a new hot (web) framework to management, why not take this plausible approach? At least they could relate to what you are talking about.
  • It’s Not the Recession, You Just Suck – Ouch! That hurt. This is a wake-up call for everybody who likes to blame it on higher means. And it reminds me to hurry up with this blog entry and get back to work.
  • I test therefore I log bugs – Ever tried to explain “programming” to your grandparents? You’ll end in esoterics (“teaching machines to have dreams”) or in obviousnesses. This is a story about consensus on the latter.

This blog harvest closes with a video:

  • Uncle Bob on Software Craftsmanship – Much of what Bob Martin says has truth in it, but for me the last two minutes are the most explicit and rewarding. By the way, Uncle Bob looks good in the T-Shirt (I always feared it would be teared, regarding the sounds when he stretches), but needs to switch his cell phone off.

Start with the core

What’s the most important feature you cannot live without? Start with it, you can stop anytime because after that you have a minimal yet usuable system.

If you begin your new product, start from the core functionality. Not the core of the system or architecture.
Ask yourself: What’s the most important feature you cannot live without? Just name one, only one. Build this first and only. Eventually refine or redo if it doesn’t suit your needs. Then continue with the next. You always have a minimal but useable product and can stop anytime.
We once had to develop a web based system where users can file an application which can be reviewed, changed, rated and finally accepted or declined. We started with a web page where you could download a PDF and an email address to send it to when complete. Minimal? Yes. Useless? No. All the functionality which was needed was there. All other stuff could be done via email or phone. It was readily available and useable.
Start with the core.

Follow-up to our Dev Brunch February 2010

A follow-up to our February 2010 Dev Brunch, summarizing the talks and providing bonus material.

Today, we held our second Dev Brunch for 2010. It was the first one in the new office, with some packing cases still around. The brunch had some interesting topics, most of them small and focussed. We discussed if the topics should be announced beforehands to avoid collision, but defined these collisions as enrichments rather than duplications.

The Dev Brunch

If you want to know more about the meaning of the term “Dev Brunch” or how we implement it, have a look at the follow-up posting of the brunch in October 2009. This time, we didn’t urge all participants to bring their own topic. Presence is more important than topic.

  • Scrum adventure book review – There are lots of book on the Scrum project management process. But the one called “Geschichten vom Scrum” (sorry, it’s a german book!) will teach you all the basics and some advanced practical topics of Scrum while telling you the fairy tale of a kingdom haunted by dragons. By following a group of common fairy tale characters in their quest to build a dragon trap the Scrum way, you’ll learn a great share of real world Scrum and still be entertained. You might compare this book to Tom DeMarco’s “The Deadline”, a novel about general project management.
  • What is the Google Web Toolkit? – Based on the learning from the presentation of the Karlsruhe Java User Group (JUG-KA), we skipped through the slides to get to know the Google Web Toolkit (GWT) framework. Advanced topics were discussed in the next talk.
  • First hand experience with GWT – We talked about the sweet spots and pain points of Google Web Toolkit, based on the experiences in a real project. This was very helpful to sort out the marketing promises from the definite advantages. While the browser doesn’t affect the developer anymore, the separation of client (browser) and server will still leak through.
  • First impressions of the Lift framework – The way to go with web application development in Scala is Lift. It’s a framework borrowing the best from “Seaside, Rails, Django and Wicket” and combining it with Scala and the whole Java ecosystem. While this talk was just a teaser, it already looked promising.

As usual, the topics ranged from first-hand experiences to literature research or summaries of recently attended presentations. You can check out the comments for additional resources, but they may be in german language.

Retrospection of the brunch

It’s right to grant access to “non-topics”. This will lower the barrier for occasional guests while they are valuable for their experiences and insights. This brunch was enriched by yet another topic collision, which is the perfect situation for a more in-depth discussion.

Poor man’s TimeMachine

Some weeks ago I wrote about a easy and cheap backup solution for windows users. But what about Mac and Linux users? The Mac guys have a similar solution right at hand: TimeMachine. It is quite easy to backup the most important stuff regularily onto an external drive while working. The configuration and hardware investment is minimal.

Now what if I happen to use Linux as an operating system? I looked for solutions similar to the Seagate Replica or TimeMachine expecting less comfort. My first try was rsnapshot because a friend of mine recommended it. While it works nicely and has quite some features it requires manual editing of text configuration files. Nothing, that a casual user would like and even I was not quite satisfied. A little more research on the web brought me to Back in time.

Back in time was exactly what I wanted: simple install from the Ubuntu package repository, a GNOME gui (KDE version is available too) to configure and maintain everything and unobstrusive background operation. You can configure it even to run with root priviledges to backup files the logged in user cannot access. So you can keep system configuration files etc. backed up, too.

One hint for ubuntu users: You may need to install the “menu”-package to be able to use the root version.

Conclusion

With these backup solutions available for all major operating systems one can achive basic data security at virtually no cost. There is no compelling reason to risk many hours of work to a drive failure or a user delete without undelete possibilies (think rm -rf *). Of course one can improve that backup strategy further, but for me this is a baseline nobody should miss.

Verbosity is not Java’s fault

One of Java’s most heard flaws (verbosity) isn’t really tied to the language it is rooted in a more deeply source: it comes from the way we use it.

Quiz: Whats one of the most heard flaws of Java compared to other languages?

Bad Performance? That’s a long overhauled myth. Slow startup? OK, this can be improved… It’s verbosity, right? Right but wrong. Yes, it is one of the most mentioned flaws but is it really inherit to the language Java? Do you really think Closures, annotations or any other new introduced language feature will significantly reduce the clutter? Don’t get me wrong here: closures are a mighty construct and I like them a lot. But the source of the problem lies elsewhere: the APIs. What?! You will tell me Java has some great libraries. These are the ones that let Java stand out! I don’t talk about the functionality of the libraries here I mean the design of the API. Let me elaborate on this.

Example 1: HTML parsing/manipulation

Say you want to parse a HTML page and remove all infoboxes and add your link to a blog box:

        DOMFragmentParser parser = new DOMFragmentParser();
        parser.setFeature("http://xml.org/sax/features/namespaces", false); 
        parser.setFeature("http://cyberneko.org/html/features/balance-tags", false);
        parser.setFeature("http://cyberneko.org/html/features/balance-tags/document-fragment", true);
        parser.setFeature("http://cyberneko.org/html/features/scanner/ignore-specified-charset", true);
        parser.setFeature("http://cyberneko.org/html/features/balance-tags/ignore-outside-content", true);
        HTMLDocument document = new HTMLDocumentImpl();
        DocumentFragment fragment = document.createDocumentFragment();
        parser.parse(new InputSource(new StringReader(html)), fragment);
        XPathFactory factory = XPathFactory.newInstance();
        XPath xpath = factory.newXPath();
        Node infobox = xpath.evaluate("//*/div[@class='infobox']", fragment, XPathConstants.NODE);
        infobox.getParentNode().removeChild(infobox);
        Node blog = xpath.evaluate("//*[@id='blog']", fragment, XPathConstants.NODE);
        NodeList children = blog.getChildNodes();
        for (int i = 0; i < children.getLength(); i++) {
            node.remove(children.item(i));
        }
        blog.appendChild(/*create Elementtree*/);

What you really want to say is:

HTMLDocument document = new HTMLDocument(url);
document.at("//*/div[@class='infobox']").remove();
document.at("//*[@id='blog']").setInnerHtml("<a href='blog_url'>Blog</a>");

Much more concise, easy to read and it communicates its purpose clearly. The functionality is the same but what you need to do is vastly different.

  The library behind the API should do the heavy lifting not the API's user.

Example 2: HTTP requests

Take this example of sending a post request to an URL:

HttpClient client = new HttpClient();
PostMethod post = new PostMethod(url);
for (Entry param : params.entrySet()) {
    post.setParameter(param.key, param.value);
}
try {
    return client.executeMethod(post);
} finally {
    post.releaseConnection();
}

and compare it with:

HttpClient client = new HttpClient();
client.post(url, params);

Yes, there are cases where you want to specify additional attributes or options but mostly you just want to send some params to an URL. This is the default functionality you want to use, so why not:

  Make the easy and most used cases easy,
    the difficult ones not impossible to achieve.

Example 3: Swing’s JTable

So what happens when you designed for one purpose but people usually use it for another one?
The following code displays a JTable filled with attachments showing their name and additional actions:
(Disclaimer: this one makes heavy use of our internal frameworks)

        JTable attachmentTable = new JTable();
        TableColumnBinder<FileAttachment> tableBinding = new TableColumnBinder<FileAttachment>();
        tableBinding.addColumnBinding(new StringColumnBinding<FileAttachment>("Attachments") {
            @Override
            public String getValueFor(FileAttachment element, int row) {
                return element.getName();
            }
        });
        tableBinding.addColumnBinding(new ActionColumnBinding<FileAttachment>("Actions") {
            @Override
            public IAction<?, ?>[] getValueFor(FileAttachment element, int row) {
                return getActionsFor(element);
            }
        });
        tableBinding.bindTo(attachmentTable, this.attachments);

Now think about you had to implement this using bare Swing. You need to create a TableModel which is unfortunately based on row and column indexes instead of elements, you need to write your own renderers and editors, not talking about the different listeners which need to map the passed indexes to the corresponding element.
JTable was designed as a spreadsheet like grid but most of the time people use it as a list of items. This change in use case needs a change in the API. Now indexes are not a good reference method for a cell, you want a list of elements and a column property. When the usage pattern changes you can write a new library or component or you can:

  Evolve your API.

Designed to be used

So why is one API design better than another? The better ones are designed to be used. They have a clearly defined purpose: to get the task done in a simple way. Just that. They don’t want to satisfy a standard or a specification. They don’t need to open up a huge new world of configuration options or preference tweaks.

Call to action

So I argue that to make Java (or your language of choice) a better language and environment we have to design better APIs. Better designed APIs help an environment more than just another new language feature. Don’t jump on the next super duper language band wagon because it has X or Y or any other cool language feature. Improve your (API) design skills! It will help you in every language/environment you use and will use. Learning new languages is good to give you new viewpoints but don’t just flee to them.

Speed up your buildbox, Part IV: Beyond the box

This is the fourth and last part of a series on how to boost your build box without much effort. This episode talks about possible measures to increase the build performance when a single box isn’t enough.

© Friedberg - Fotolia.comIn the first three parts of our effort to speed up our buildbox, we replaced the harddisk with a RAM disk, upgraded the CPU to the top-notch model and installed plenty of fast RAM. This brought the build time down from 03:30 minutes to around 02:00 minutes. The CPU frequency was the biggest time saving factor in our case study. Two minutes is as fast as the build can get for our project without fiddling with the actual build process. It’s sufficient for our case, but it may not for yours.

Even top speed is too slow

Lets assume we maxed out the hardware and still have a build duration far beyond the magical ten minutes mark. What can we do now? There are two viable options at hand if you can exclude the possibility that your build process is really inefficient and needs optimization. In the latter case, it would be better to revise the process instead of the build infrastructure.

Two ways to speed up your build infrastructure

You can go down one or both of two general paths to speed up your build process. To understand the examples, lets assume the build takes 20 minutes to run on your top-notch build box.

  • Add more build boxes. This is the classical “parallelize it!” approach. It won’t speed up the individual build process, but enable more processes to run at the same time. This approach wont change anything if your team does seldom check-ins, which in itself is an anti-pattern to continuous integration. But if your team commits changes every ten minutes, having at least two build boxes will prevent the second committer from waiting 30 minutes on the CI results. Instead, the results will always be there after 20 minutes. You haven’t exactly sped up your build process, but the maximal waiting time of your committers. For details on the implementation, see below at “Growing a build park“.
  • Chop up your build process. This is known as “staging” or “pipelining” your build. This won’t speed up the individual build process, either, but deliver certain partial results of your build earlier. Lets assume you can split your build process into four distinct stages: compile, unit test, integration test, package. Whenever a stage yields a result, the comitter gets feedback immediately. In our example, this might be every 5 minutes. This has several disadvantages, as for example discussed in the article “The pipeline of doom” by Julian Simpson, but can lower the waiting time for specific aspects of your build drastically. You haven’t exactly sped up your build process, but the response time for partial results and therefore the average waiting time of your committer. For details on the implementation, see below at “Installing a build pipeline“.

Growing a build park

If you want to reduce the initial waiting delay of a build before it gets processed or increase the throughput of builds, the build farm pattern is your way to go. By adding slave build machines to your build master, you can distribute the workload on more shoulders. The best way to set up your infrastructure is to introduce a dedicated master box that only delegates actual builds to its slaves. The master box handles the archivation of build artifacts and deals with the web server requests, while the slaves only perform build tasks. The master box can be of average power, with increased storage size, while the slaves should be ultra-fast, without the need of big disks. Solid state disks or even RAM disks of the slaves can be tuned to actual workspace sizes, as it is all that needs to be stored there.

Distributed builds with Hudson

The Hudson continuous integration server has a strength in setting up these master/slave scenarios. It’s ridiculously easy to set up a build slave. You basically only need to click on a link to start the slave process. If you happen to have a standard build, everything needed gets downloaded automatically. If you want your slaves to operate automatically, you can install a windows daemon, provide a SSH account or write your own script. Usually, slaves are set up in a matter of minutes without hassle. A great idea is to turn powerful collegue boxes into build slaves (aka CI zombies) by booting an USB stick. The best way to start with master/slave builds is to turn your current PC into a hudson slave right now by using the Java Web Start method.

Installing a build pipeline

If you are interested in early but incomplete feedback from your build box, staging your build will help you out. If partitioned right, you’ll receive a series of answers on specific questions from your build process. The questions may be like:

  1. Will it compile?
  2. Will it pass the unit tests?
  3. Will it function (pass the integration tests)?
  4. Will it blend?

Ok, the last question is rather unlikely to be answered by your build box. The overall build process will not be any faster, but basic safety test results are reported earlier. If you combine this approach with distributed builds, there is the possibility to designate specifically tuned machines to different stages. The Hudson continuous integration server has the ability to tag a slave with different labels. You can then configure your build to only run on slaves with the desired label assigned.

Staged builds with Hudson

Staging with the Hudson continuous integration server isn’t as easy as the master/slave feature, but there are some plugins that allow for more complex setups. You might experience some functionality that’s still under development, but basic staging is possible even today. In combination with specialized slave build boxes, this approach can lower your build duration. It is a a complex endeavour, though.

Conclusion

Once your single build box is maxed out but still not fast enough, you enter a different realm of continuous integration infrastructure setups. Speeding up a build process beyond the single box isn’t as easy as installing more RAM. But with a fair amount of planning, you have a fair chance to improve the situation. Note that you won’t primarily lower build duration, but increase throughput and utilize partitioning and specialization. These are different measures and might not affect the wall clock time of your build. The combination of staging and distribution is the most powerful setup, but will result in the most complex infrastructure to maintain. Before entering this realm, be sure to apply any possible optimization to your build process. Because you’ll not leave that realm again soon.

What’s your story on build optimization beyond the box? Drop us a comment.