Blog harvest, February 2010

Some noteworthy blog articles, harvested for February 2010. If you ever asked yourself about the personality of your web framework, you’ll find the link to the answer here.

After the move to the new office is nearly complete, work begins to normalize again. Here is the February blog harvest with a little more entries, as I wasn’t unable to read other blogs, but to write on our own blog. There are many fun articles this time that I found share-worthy, perhaps because they made me laugh even in harder times.

This was the more serious part of this harvesting. Let’s read some articles that share their message in a lighter way:

  • What kind of woman would your web framework be? – If you ever have to sell a new hot (web) framework to management, why not take this plausible approach? At least they could relate to what you are talking about.
  • It’s Not the Recession, You Just Suck – Ouch! That hurt. This is a wake-up call for everybody who likes to blame it on higher means. And it reminds me to hurry up with this blog entry and get back to work.
  • I test therefore I log bugs – Ever tried to explain “programming” to your grandparents? You’ll end in esoterics (“teaching machines to have dreams”) or in obviousnesses. This is a story about consensus on the latter.

This blog harvest closes with a video:

  • Uncle Bob on Software Craftsmanship – Much of what Bob Martin says has truth in it, but for me the last two minutes are the most explicit and rewarding. By the way, Uncle Bob looks good in the T-Shirt (I always feared it would be teared, regarding the sounds when he stretches), but needs to switch his cell phone off.

Start with the core

What’s the most important feature you cannot live without? Start with it, you can stop anytime because after that you have a minimal yet usuable system.

If you begin your new product, start from the core functionality. Not the core of the system or architecture.
Ask yourself: What’s the most important feature you cannot live without? Just name one, only one. Build this first and only. Eventually refine or redo if it doesn’t suit your needs. Then continue with the next. You always have a minimal but useable product and can stop anytime.
We once had to develop a web based system where users can file an application which can be reviewed, changed, rated and finally accepted or declined. We started with a web page where you could download a PDF and an email address to send it to when complete. Minimal? Yes. Useless? No. All the functionality which was needed was there. All other stuff could be done via email or phone. It was readily available and useable.
Start with the core.

Follow-up to our Dev Brunch February 2010

A follow-up to our February 2010 Dev Brunch, summarizing the talks and providing bonus material.

Today, we held our second Dev Brunch for 2010. It was the first one in the new office, with some packing cases still around. The brunch had some interesting topics, most of them small and focussed. We discussed if the topics should be announced beforehands to avoid collision, but defined these collisions as enrichments rather than duplications.

The Dev Brunch

If you want to know more about the meaning of the term “Dev Brunch” or how we implement it, have a look at the follow-up posting of the brunch in October 2009. This time, we didn’t urge all participants to bring their own topic. Presence is more important than topic.

  • Scrum adventure book review – There are lots of book on the Scrum project management process. But the one called “Geschichten vom Scrum” (sorry, it’s a german book!) will teach you all the basics and some advanced practical topics of Scrum while telling you the fairy tale of a kingdom haunted by dragons. By following a group of common fairy tale characters in their quest to build a dragon trap the Scrum way, you’ll learn a great share of real world Scrum and still be entertained. You might compare this book to Tom DeMarco’s “The Deadline”, a novel about general project management.
  • What is the Google Web Toolkit? – Based on the learning from the presentation of the Karlsruhe Java User Group (JUG-KA), we skipped through the slides to get to know the Google Web Toolkit (GWT) framework. Advanced topics were discussed in the next talk.
  • First hand experience with GWT – We talked about the sweet spots and pain points of Google Web Toolkit, based on the experiences in a real project. This was very helpful to sort out the marketing promises from the definite advantages. While the browser doesn’t affect the developer anymore, the separation of client (browser) and server will still leak through.
  • First impressions of the Lift framework – The way to go with web application development in Scala is Lift. It’s a framework borrowing the best from “Seaside, Rails, Django and Wicket” and combining it with Scala and the whole Java ecosystem. While this talk was just a teaser, it already looked promising.

As usual, the topics ranged from first-hand experiences to literature research or summaries of recently attended presentations. You can check out the comments for additional resources, but they may be in german language.

Retrospection of the brunch

It’s right to grant access to “non-topics”. This will lower the barrier for occasional guests while they are valuable for their experiences and insights. This brunch was enriched by yet another topic collision, which is the perfect situation for a more in-depth discussion.

Poor man’s TimeMachine

Some weeks ago I wrote about a easy and cheap backup solution for windows users. But what about Mac and Linux users? The Mac guys have a similar solution right at hand: TimeMachine. It is quite easy to backup the most important stuff regularily onto an external drive while working. The configuration and hardware investment is minimal.

Now what if I happen to use Linux as an operating system? I looked for solutions similar to the Seagate Replica or TimeMachine expecting less comfort. My first try was rsnapshot because a friend of mine recommended it. While it works nicely and has quite some features it requires manual editing of text configuration files. Nothing, that a casual user would like and even I was not quite satisfied. A little more research on the web brought me to Back in time.

Back in time was exactly what I wanted: simple install from the Ubuntu package repository, a GNOME gui (KDE version is available too) to configure and maintain everything and unobstrusive background operation. You can configure it even to run with root priviledges to backup files the logged in user cannot access. So you can keep system configuration files etc. backed up, too.

One hint for ubuntu users: You may need to install the “menu”-package to be able to use the root version.

Conclusion

With these backup solutions available for all major operating systems one can achive basic data security at virtually no cost. There is no compelling reason to risk many hours of work to a drive failure or a user delete without undelete possibilies (think rm -rf *). Of course one can improve that backup strategy further, but for me this is a baseline nobody should miss.

Verbosity is not Java’s fault

One of Java’s most heard flaws (verbosity) isn’t really tied to the language it is rooted in a more deeply source: it comes from the way we use it.

Quiz: Whats one of the most heard flaws of Java compared to other languages?

Bad Performance? That’s a long overhauled myth. Slow startup? OK, this can be improved… It’s verbosity, right? Right but wrong. Yes, it is one of the most mentioned flaws but is it really inherit to the language Java? Do you really think Closures, annotations or any other new introduced language feature will significantly reduce the clutter? Don’t get me wrong here: closures are a mighty construct and I like them a lot. But the source of the problem lies elsewhere: the APIs. What?! You will tell me Java has some great libraries. These are the ones that let Java stand out! I don’t talk about the functionality of the libraries here I mean the design of the API. Let me elaborate on this.

Example 1: HTML parsing/manipulation

Say you want to parse a HTML page and remove all infoboxes and add your link to a blog box:

        DOMFragmentParser parser = new DOMFragmentParser();
        parser.setFeature("http://xml.org/sax/features/namespaces", false); 
        parser.setFeature("http://cyberneko.org/html/features/balance-tags", false);
        parser.setFeature("http://cyberneko.org/html/features/balance-tags/document-fragment", true);
        parser.setFeature("http://cyberneko.org/html/features/scanner/ignore-specified-charset", true);
        parser.setFeature("http://cyberneko.org/html/features/balance-tags/ignore-outside-content", true);
        HTMLDocument document = new HTMLDocumentImpl();
        DocumentFragment fragment = document.createDocumentFragment();
        parser.parse(new InputSource(new StringReader(html)), fragment);
        XPathFactory factory = XPathFactory.newInstance();
        XPath xpath = factory.newXPath();
        Node infobox = xpath.evaluate("//*/div[@class='infobox']", fragment, XPathConstants.NODE);
        infobox.getParentNode().removeChild(infobox);
        Node blog = xpath.evaluate("//*[@id='blog']", fragment, XPathConstants.NODE);
        NodeList children = blog.getChildNodes();
        for (int i = 0; i < children.getLength(); i++) {
            node.remove(children.item(i));
        }
        blog.appendChild(/*create Elementtree*/);

What you really want to say is:

HTMLDocument document = new HTMLDocument(url);
document.at("//*/div[@class='infobox']").remove();
document.at("//*[@id='blog']").setInnerHtml("<a href='blog_url'>Blog</a>");

Much more concise, easy to read and it communicates its purpose clearly. The functionality is the same but what you need to do is vastly different.

  The library behind the API should do the heavy lifting not the API's user.

Example 2: HTTP requests

Take this example of sending a post request to an URL:

HttpClient client = new HttpClient();
PostMethod post = new PostMethod(url);
for (Entry param : params.entrySet()) {
    post.setParameter(param.key, param.value);
}
try {
    return client.executeMethod(post);
} finally {
    post.releaseConnection();
}

and compare it with:

HttpClient client = new HttpClient();
client.post(url, params);

Yes, there are cases where you want to specify additional attributes or options but mostly you just want to send some params to an URL. This is the default functionality you want to use, so why not:

  Make the easy and most used cases easy,
    the difficult ones not impossible to achieve.

Example 3: Swing’s JTable

So what happens when you designed for one purpose but people usually use it for another one?
The following code displays a JTable filled with attachments showing their name and additional actions:
(Disclaimer: this one makes heavy use of our internal frameworks)

        JTable attachmentTable = new JTable();
        TableColumnBinder<FileAttachment> tableBinding = new TableColumnBinder<FileAttachment>();
        tableBinding.addColumnBinding(new StringColumnBinding<FileAttachment>("Attachments") {
            @Override
            public String getValueFor(FileAttachment element, int row) {
                return element.getName();
            }
        });
        tableBinding.addColumnBinding(new ActionColumnBinding<FileAttachment>("Actions") {
            @Override
            public IAction<?, ?>[] getValueFor(FileAttachment element, int row) {
                return getActionsFor(element);
            }
        });
        tableBinding.bindTo(attachmentTable, this.attachments);

Now think about you had to implement this using bare Swing. You need to create a TableModel which is unfortunately based on row and column indexes instead of elements, you need to write your own renderers and editors, not talking about the different listeners which need to map the passed indexes to the corresponding element.
JTable was designed as a spreadsheet like grid but most of the time people use it as a list of items. This change in use case needs a change in the API. Now indexes are not a good reference method for a cell, you want a list of elements and a column property. When the usage pattern changes you can write a new library or component or you can:

  Evolve your API.

Designed to be used

So why is one API design better than another? The better ones are designed to be used. They have a clearly defined purpose: to get the task done in a simple way. Just that. They don’t want to satisfy a standard or a specification. They don’t need to open up a huge new world of configuration options or preference tweaks.

Call to action

So I argue that to make Java (or your language of choice) a better language and environment we have to design better APIs. Better designed APIs help an environment more than just another new language feature. Don’t jump on the next super duper language band wagon because it has X or Y or any other cool language feature. Improve your (API) design skills! It will help you in every language/environment you use and will use. Learning new languages is good to give you new viewpoints but don’t just flee to them.

Speed up your buildbox, Part IV: Beyond the box

This is the fourth and last part of a series on how to boost your build box without much effort. This episode talks about possible measures to increase the build performance when a single box isn’t enough.

© Friedberg - Fotolia.comIn the first three parts of our effort to speed up our buildbox, we replaced the harddisk with a RAM disk, upgraded the CPU to the top-notch model and installed plenty of fast RAM. This brought the build time down from 03:30 minutes to around 02:00 minutes. The CPU frequency was the biggest time saving factor in our case study. Two minutes is as fast as the build can get for our project without fiddling with the actual build process. It’s sufficient for our case, but it may not for yours.

Even top speed is too slow

Lets assume we maxed out the hardware and still have a build duration far beyond the magical ten minutes mark. What can we do now? There are two viable options at hand if you can exclude the possibility that your build process is really inefficient and needs optimization. In the latter case, it would be better to revise the process instead of the build infrastructure.

Two ways to speed up your build infrastructure

You can go down one or both of two general paths to speed up your build process. To understand the examples, lets assume the build takes 20 minutes to run on your top-notch build box.

  • Add more build boxes. This is the classical “parallelize it!” approach. It won’t speed up the individual build process, but enable more processes to run at the same time. This approach wont change anything if your team does seldom check-ins, which in itself is an anti-pattern to continuous integration. But if your team commits changes every ten minutes, having at least two build boxes will prevent the second committer from waiting 30 minutes on the CI results. Instead, the results will always be there after 20 minutes. You haven’t exactly sped up your build process, but the maximal waiting time of your committers. For details on the implementation, see below at “Growing a build park“.
  • Chop up your build process. This is known as “staging” or “pipelining” your build. This won’t speed up the individual build process, either, but deliver certain partial results of your build earlier. Lets assume you can split your build process into four distinct stages: compile, unit test, integration test, package. Whenever a stage yields a result, the comitter gets feedback immediately. In our example, this might be every 5 minutes. This has several disadvantages, as for example discussed in the article “The pipeline of doom” by Julian Simpson, but can lower the waiting time for specific aspects of your build drastically. You haven’t exactly sped up your build process, but the response time for partial results and therefore the average waiting time of your committer. For details on the implementation, see below at “Installing a build pipeline“.

Growing a build park

If you want to reduce the initial waiting delay of a build before it gets processed or increase the throughput of builds, the build farm pattern is your way to go. By adding slave build machines to your build master, you can distribute the workload on more shoulders. The best way to set up your infrastructure is to introduce a dedicated master box that only delegates actual builds to its slaves. The master box handles the archivation of build artifacts and deals with the web server requests, while the slaves only perform build tasks. The master box can be of average power, with increased storage size, while the slaves should be ultra-fast, without the need of big disks. Solid state disks or even RAM disks of the slaves can be tuned to actual workspace sizes, as it is all that needs to be stored there.

Distributed builds with Hudson

The Hudson continuous integration server has a strength in setting up these master/slave scenarios. It’s ridiculously easy to set up a build slave. You basically only need to click on a link to start the slave process. If you happen to have a standard build, everything needed gets downloaded automatically. If you want your slaves to operate automatically, you can install a windows daemon, provide a SSH account or write your own script. Usually, slaves are set up in a matter of minutes without hassle. A great idea is to turn powerful collegue boxes into build slaves (aka CI zombies) by booting an USB stick. The best way to start with master/slave builds is to turn your current PC into a hudson slave right now by using the Java Web Start method.

Installing a build pipeline

If you are interested in early but incomplete feedback from your build box, staging your build will help you out. If partitioned right, you’ll receive a series of answers on specific questions from your build process. The questions may be like:

  1. Will it compile?
  2. Will it pass the unit tests?
  3. Will it function (pass the integration tests)?
  4. Will it blend?

Ok, the last question is rather unlikely to be answered by your build box. The overall build process will not be any faster, but basic safety test results are reported earlier. If you combine this approach with distributed builds, there is the possibility to designate specifically tuned machines to different stages. The Hudson continuous integration server has the ability to tag a slave with different labels. You can then configure your build to only run on slaves with the desired label assigned.

Staged builds with Hudson

Staging with the Hudson continuous integration server isn’t as easy as the master/slave feature, but there are some plugins that allow for more complex setups. You might experience some functionality that’s still under development, but basic staging is possible even today. In combination with specialized slave build boxes, this approach can lower your build duration. It is a a complex endeavour, though.

Conclusion

Once your single build box is maxed out but still not fast enough, you enter a different realm of continuous integration infrastructure setups. Speeding up a build process beyond the single box isn’t as easy as installing more RAM. But with a fair amount of planning, you have a fair chance to improve the situation. Note that you won’t primarily lower build duration, but increase throughput and utilize partitioning and specialization. These are different measures and might not affect the wall clock time of your build. The combination of staging and distribution is the most powerful setup, but will result in the most complex infrastructure to maintain. Before entering this realm, be sure to apply any possible optimization to your build process. Because you’ll not leave that realm again soon.

What’s your story on build optimization beyond the box? Drop us a comment.

Aligning the Abstraction Level with constant booleans

Constant booleans can help to maintain a single level of abstraction in one method. They are less expensive than a separate method and a big improvement over a mere comment.

If you ever have done consulting, mentoring or teaching on programming techniques I’m sure you have experienced joy as well as disappointment when your “students” either took on your advice and followed it in their day-to-day work or when they just did what you said as long as you sat next to them but forgot all about it the next day. The disappointing behavior often comes from them not fully appreciating, or not being able to fully recognize the advantages of your solution. (And as you are the mentor/consultant/teacher, the latter might also be your fault).

One example for that is the principle to operate on only one level of abstraction within a method or function. See here for a detailed explanation. I have been applying this technique more or less unconsciously already for a long time now and was reminded of it as the Single-Level-of-Abstraction-Principle in Robert C. Martin’s Clean Code.

I have been trying to put this principle in peoples minds for some time now but often with little success. Sure, they often do see the advantages of arriving at much more readable code but they often ignore it in their own code. Most of the time they just don’t see the necessity to create another method with a meaningful name or they content themselves just with putting a comment above some chunk of lower abstraction code (The resulting loud screams for a Extract-Method refactoring often remain unheard, too)

Lately, I did have fairly good success with one little sub-technique of this principle: constant booleans for if-statements. That is, instead of (C++ code):

void someMethod()
{
   if (hard_to_read_boolean_expression_using_lower_abstractions)
   {
      // do stuff
   }
}

you write:

void someMethod()
{
   const bool expressive_name = 
      hard_to_read_boolean_expression_using_lower_abstractions;
   if (expressive_name)
   {
      // do stuff
   }
}

I guess the main reason for the success of this sub-technique is that it increases readability a lot at a cost that is only a tiny bit greater than a simple comment.

True, in many cases it may be even more readable to put the whole if-statement in another method, but using a boolean constant like above is already a big improvement.

Follow-up to our Dev Brunch January 2010

A follow-up to our January 2010 Dev Brunch, summarizing the talks and providing bonus material.

Today was our first Dev Brunch for the new year 2010. We held a well-attended and very interesting session with lots of coffee. It was the last brunch in the old office, as we are currently moving to new rooms. The brunch ended with a sneak peek into the new office.

The Dev Brunch

If you want to know more about the meaning of the term “Dev Brunch” or how we realize it, have a look at the follow-up posting of the brunch in October 2009. We used notebooks throughout the sessions today.

The topics of this session were:

  • Agile done wrong – A project that was converted to be agile now tends to be even more conservative when management lost faith in their developers. A rather sad first-hand story, with lots of Dilbert-style humor in it.
  • Implicits in Scala – Scala introduces a powerful feature of implicit (hence the name) type conversion that can be used to greatly simplify work with complex type systems. Or to totally disturb your understanding of it.
  • Follow-up on the local XP-Days – The XP Days Germany of andrena objects ag are a small, yet powerful conference in Karlsruhe. We got a summary of the overall style and different presentations. Things like Pokens, Pecha Kucha (watch your pronunciation of it) and live code katas are all very promising stuff. Most presentation content itself was interesting, too.
  • Exception safety in Java – A classical topic of (not only) C++, ported to Java. This overview presentation highlighted the basics of exception safety and some insights for Java, mostly borrowed from Alan Griffiths.
  • Preview of an Eclipse based product – We won’t go into much detail here, but we got a glance of an upcoming product that will greatly ease the use of multi-site programming with Eclipse. The EclipseCon 2010 in March might get promising.

The topics ranged from first-hand experiences to literature research. We look forward to provide additional information linked in comments on this article, partially in german language.

Retrospection of the brunch

It was very entertaining to meet everyone after the long holiday season. Lots of news and chatter and stuff. The topics were interesting and thought provoking. If you weren’t there, you’ve missed something. Check out the comments for compensation.

Blog harvest, January 2010

Some noteworthy blog articles, harvested for January 2010. Don’t miss the worst gadget gallery!

Welcome to the new year. We started late to work again this year and will collect some overtime in the next weeks as we are in the process of moving to a new office. This might impact the additional blog entries, so here is a quick blog harvest. Don’t miss the anti-gadget gallery at the end – but beware, it will make you laugh out loud, so don’t read it at work (unless you are supposed to have fun at work).

  • Reasons to Use Google Collections – Recently, the google collections library went gold. It’s a really nice collection of… collections. If you haven’t met it yet, do it now. You might read James Sugrue’s teaser alongside.
  • How Programming Books Promote Code Smells – Finally, somebody shares my opinion on “code examples crippled for brevity”. I got disgusted by some java books because their code examples were so bad, it hurt. The situation eases a lot with the availability of “real” code in open source projects that you can read as recompense.
  • Test-Driven Teaching! – Another problem with bad code examples is to avoid them in live examples (like in lectures). Here is an interesting article by Peter Karich presenting the idea of integrating test code in your lectures. I’ll see if my students like the idea.
  • A performance tuning story – The best stories are real stories. This story has all that’s needed: a mysterious effect at the beginning, fast-paced action in the middle and an open end. The X files weren’t as thrilling as this one.
  • Maven and Ant guys: you’ll never agree. On anything. Period. Deal with it! – What would we do if we couldn’t join an eternal flame war at surf time? You can join on one side (there are always two sides, seldom more!) or stand in between and pick on both. Lieven Doclo chose the Maven side of builds.
  • Maven Mythbusters – The most prominent maven myths get busted by John Ferguson Smart to end the flame war mentioned above once and for all (or at least take out some misinformation). I’ve linked to the first busted myth, you’ll have no trouble finding the other(s). Read the comments, too!
  • The Top 10 Posts of this Blog Over the Years – Stephan Schmidt’s blog “Code Monkeyism” is a regular link target in my harvests. When he harvested his own blog for the most popular entries, I couldn’t resist linking him once more. Maybe this is called meta-harvesting.

This was the article side of this harvesting. Let’s continue to have fun by sliding through the gallery and then… sliding through your repository:

  • Gizmodo’s Worst Gadget Gallery – Oh yes, ten years full of technological crap, complete with pictures and a short description. If you spend 30 seconds on each gadget, you’re in for half an hour pure fun. My favorites are the DeathStar for personal reference (we lost 7 of 9) and the Eye-Track for its funny description.
  • Mining your source code repository – Part 1 – Yes, all your legacy repositories are still there. Discover the archeologist in yourself and perform some data mining on them. This article may be the starting point for your new career.

FindBugs-driven bughunting in legacy projects

I have been working on a >100k lines legacy project for a while now. We have to juggle customer requests, bug fixes and refactoring so it is hard to improve the quality and employ new techniques or tools while keeping the software running and the clients happy. Initially there were no unit tests and most of the code had a gigantic cyclomatic complexity. Over the course of time we managed to put the system under continuous integration, employed quite some unit tests and analyzed code “hotspots” and our progress with crap4j.

Normally we get bug reports from our userbase or have to test manually to find bugs. A few weeks ago I tried a new approach to bughunting in legacy projects using FindBugs. Many of you surely know this useful tool, so I just want to describe my experiences in that project using FindBugs. Many of the bugs may be in parts of the application which are seldom used or only appear in hard to reproduce circumstances. First a short list of what I encountered and how I dealt with it.

Interesting found bugs in the project

  • There was a calculation using an integer division but returning a double. So the actual computation result was wrong but yet the error would have been hard to catch because people rarely recalculate results of a computer. When writing the test associated to the bugfix I found a StackOverFlowError too!
  • There were quite some null dereferences found, often in contructs like
     if (s == null && s.length() == 0)
     

    instead of

    if (s == null || s.length() == 0)
    

    which could be simplified or rewritten anyway. Sometimes there were possibilities for null dereferences on some paths despite of several null checks in the code.

  • Many performance bugs which may or may not have an effect on overall performance of the system like: new String(), new Integer(12), string concatenation across loops, inefficient usage of java.util.Map.keySet() instead of java.util.Map.entrySet() etc.
  • Some dead stores of local variables and statements without effect which could be thrown away or be corrected to do the intended things.

Things you may want to ignore

There are of course some bugs that you may ignore for now because you know that it is a common pattern in the team and abuse and thus errors are extremely unlikely. I, for example, opted to ignore some dozens of “may expose internal representation” found bugs regarding arrays in interfaces or accessibly via getters because it is a common pattern on the team not to tamper existing arrays as they are seen as immutable by the team members. It would have taken too much time to fix all those without that much of a benefit.

You may opt to ignore the performance bugs too but they are usually easy to fix.

Tips

  • If you have many foundbugs, fix the easy ones to be able to see the important ones more easily.
  • Ignore certain bug categories for now, fix them later, when you stumble upon them.
  • Concentrate on the ones that lead to wrong behaviour and crashes of your application.
  • Try to reproduce the problem with unit test and then fix the code whenever feasible! Tests are great to expose the bug and fix it without unwanted regressions!
  • Many bugs appear in places which need refactoring anyway so here is your chance to catch several flies at once.

Conclusion

With FindBugs you can find common programming errors sprinkled across the whole application in places where you probably would not have looked for years. It can help you to understand some common patterns of your team members and help you all to improve your code quality. Sometimes it even finds some hard to spot errors like the integer computation or null dereferences on certain paths. This is even more true in entangled legacy projects without proper test coverage.