Follow-up to our Dev Brunch March 2010

A follow-up to our March 2010 Dev Brunch, summarizing the talks and providing bonus material.

Yesterday we held our Dev Brunch for this month. It was the second brunch in our new office, with some attendees visiting it for the first time. The reactions were the same: “I want to move in here!”. The topics were of different kinds, from live presentations to mere questions open for discussion.

The Dev Brunch

If you want to know more about the meaning of the term “Dev Brunch” or how we implement it, have a look at the follow-up posting of the brunch in October 2009. We continued to allow presence over topics. These topics were discussed today:

  • Singleton vs. Monostate – We all know that Singletons are bad for your test coverage, they make a poor performance on your dependency chart and are generally seen as “evil”. We discussed the Monostate pattern and if it could solve some of the problems Singletons inherently bring along. Based upon the article from Uncle Bob, we concluded that Monostates are difficile at least and don’t help with the abovementioned problems.
  • What is “agile” for you? – This simple question provoked a lot of thoughts. You can always obey the Agile Manifesto word by word without understanding what the deeper motives are. The answer that fitted best was: “You can name it when you see it”. We concluded that it’s easy and common practice to label any given process “agile” just to sound modern.
  • News around Yoxos – If you are using Eclipse, you’ve certainly heard about Yoxos already. Now during the EclipseCon 2010, good news were announced. We got a sneak peek on the new Yoxos Launcher and how it will help in managing your pack of Eclipse installations. We are looking forward to become beta testers because we can’t wait to use it.
  • Teaser talk for “Actors in Scala” – The actor paradigm for parallel programming is a promising alternative to threads. While threads are inevitable complex even for simple tasks, actors seem to recreate  a more natural approach to parallelism. This talk was only the teaser for a more in-depth talk next time, with hands-on code examples.
  • Properties in Scala – This talk had lots of code examples and hands-on discussion about the Properties feature of Scala. Properties are an elegant way to reduce your boilerplate code for simple objects and to sustain compatibility with Java frameworks that rely on the Java Beans semantics. We clearly understood the advantages, but ran into some strangeness related to the conjoint namespaces of fields and methods along the way. Scala isn’t Java, that’s for sure.
  • Introduction to PreziPrezi is a modern presentation tool in the tradition of the dreaded PowerPoint or Apple’s Keynote. It adds a twist to your presentation by adding two new dimensions: laying out everything on a big single canvas (no slides!) and relying heavily on zooming effects. The online editor is surprisingly usable, yet simple and lightweight. If you want to meet prezi, check out the introduction prezis and the showcase on their homepage.

As usual, the topics ranged from first-hand experiences to literature research. For additional information, check out the comment sections. Comments and resources might be in german language.

Retrospection of the brunch

We keep getting better in timing our talks. We nearly maintained our time limit and didn’t hurry anything. For the next brunch, we are looking forward to use our new office roof garden to brunch and talk in the springtime sun.

Verbosity is not Java’s fault

One of Java’s most heard flaws (verbosity) isn’t really tied to the language it is rooted in a more deeply source: it comes from the way we use it.

Quiz: Whats one of the most heard flaws of Java compared to other languages?

Bad Performance? That’s a long overhauled myth. Slow startup? OK, this can be improved… It’s verbosity, right? Right but wrong. Yes, it is one of the most mentioned flaws but is it really inherit to the language Java? Do you really think Closures, annotations or any other new introduced language feature will significantly reduce the clutter? Don’t get me wrong here: closures are a mighty construct and I like them a lot. But the source of the problem lies elsewhere: the APIs. What?! You will tell me Java has some great libraries. These are the ones that let Java stand out! I don’t talk about the functionality of the libraries here I mean the design of the API. Let me elaborate on this.

Example 1: HTML parsing/manipulation

Say you want to parse a HTML page and remove all infoboxes and add your link to a blog box:

        DOMFragmentParser parser = new DOMFragmentParser();
        parser.setFeature("http://xml.org/sax/features/namespaces", false); 
        parser.setFeature("http://cyberneko.org/html/features/balance-tags", false);
        parser.setFeature("http://cyberneko.org/html/features/balance-tags/document-fragment", true);
        parser.setFeature("http://cyberneko.org/html/features/scanner/ignore-specified-charset", true);
        parser.setFeature("http://cyberneko.org/html/features/balance-tags/ignore-outside-content", true);
        HTMLDocument document = new HTMLDocumentImpl();
        DocumentFragment fragment = document.createDocumentFragment();
        parser.parse(new InputSource(new StringReader(html)), fragment);
        XPathFactory factory = XPathFactory.newInstance();
        XPath xpath = factory.newXPath();
        Node infobox = xpath.evaluate("//*/div[@class='infobox']", fragment, XPathConstants.NODE);
        infobox.getParentNode().removeChild(infobox);
        Node blog = xpath.evaluate("//*[@id='blog']", fragment, XPathConstants.NODE);
        NodeList children = blog.getChildNodes();
        for (int i = 0; i < children.getLength(); i++) {
            node.remove(children.item(i));
        }
        blog.appendChild(/*create Elementtree*/);

What you really want to say is:

HTMLDocument document = new HTMLDocument(url);
document.at("//*/div[@class='infobox']").remove();
document.at("//*[@id='blog']").setInnerHtml("<a href='blog_url'>Blog</a>");

Much more concise, easy to read and it communicates its purpose clearly. The functionality is the same but what you need to do is vastly different.

  The library behind the API should do the heavy lifting not the API's user.

Example 2: HTTP requests

Take this example of sending a post request to an URL:

HttpClient client = new HttpClient();
PostMethod post = new PostMethod(url);
for (Entry param : params.entrySet()) {
    post.setParameter(param.key, param.value);
}
try {
    return client.executeMethod(post);
} finally {
    post.releaseConnection();
}

and compare it with:

HttpClient client = new HttpClient();
client.post(url, params);

Yes, there are cases where you want to specify additional attributes or options but mostly you just want to send some params to an URL. This is the default functionality you want to use, so why not:

  Make the easy and most used cases easy,
    the difficult ones not impossible to achieve.

Example 3: Swing’s JTable

So what happens when you designed for one purpose but people usually use it for another one?
The following code displays a JTable filled with attachments showing their name and additional actions:
(Disclaimer: this one makes heavy use of our internal frameworks)

        JTable attachmentTable = new JTable();
        TableColumnBinder<FileAttachment> tableBinding = new TableColumnBinder<FileAttachment>();
        tableBinding.addColumnBinding(new StringColumnBinding<FileAttachment>("Attachments") {
            @Override
            public String getValueFor(FileAttachment element, int row) {
                return element.getName();
            }
        });
        tableBinding.addColumnBinding(new ActionColumnBinding<FileAttachment>("Actions") {
            @Override
            public IAction<?, ?>[] getValueFor(FileAttachment element, int row) {
                return getActionsFor(element);
            }
        });
        tableBinding.bindTo(attachmentTable, this.attachments);

Now think about you had to implement this using bare Swing. You need to create a TableModel which is unfortunately based on row and column indexes instead of elements, you need to write your own renderers and editors, not talking about the different listeners which need to map the passed indexes to the corresponding element.
JTable was designed as a spreadsheet like grid but most of the time people use it as a list of items. This change in use case needs a change in the API. Now indexes are not a good reference method for a cell, you want a list of elements and a column property. When the usage pattern changes you can write a new library or component or you can:

  Evolve your API.

Designed to be used

So why is one API design better than another? The better ones are designed to be used. They have a clearly defined purpose: to get the task done in a simple way. Just that. They don’t want to satisfy a standard or a specification. They don’t need to open up a huge new world of configuration options or preference tweaks.

Call to action

So I argue that to make Java (or your language of choice) a better language and environment we have to design better APIs. Better designed APIs help an environment more than just another new language feature. Don’t jump on the next super duper language band wagon because it has X or Y or any other cool language feature. Improve your (API) design skills! It will help you in every language/environment you use and will use. Learning new languages is good to give you new viewpoints but don’t just flee to them.

FindBugs-driven bughunting in legacy projects

I have been working on a >100k lines legacy project for a while now. We have to juggle customer requests, bug fixes and refactoring so it is hard to improve the quality and employ new techniques or tools while keeping the software running and the clients happy. Initially there were no unit tests and most of the code had a gigantic cyclomatic complexity. Over the course of time we managed to put the system under continuous integration, employed quite some unit tests and analyzed code “hotspots” and our progress with crap4j.

Normally we get bug reports from our userbase or have to test manually to find bugs. A few weeks ago I tried a new approach to bughunting in legacy projects using FindBugs. Many of you surely know this useful tool, so I just want to describe my experiences in that project using FindBugs. Many of the bugs may be in parts of the application which are seldom used or only appear in hard to reproduce circumstances. First a short list of what I encountered and how I dealt with it.

Interesting found bugs in the project

  • There was a calculation using an integer division but returning a double. So the actual computation result was wrong but yet the error would have been hard to catch because people rarely recalculate results of a computer. When writing the test associated to the bugfix I found a StackOverFlowError too!
  • There were quite some null dereferences found, often in contructs like
     if (s == null && s.length() == 0)
     

    instead of

    if (s == null || s.length() == 0)
    

    which could be simplified or rewritten anyway. Sometimes there were possibilities for null dereferences on some paths despite of several null checks in the code.

  • Many performance bugs which may or may not have an effect on overall performance of the system like: new String(), new Integer(12), string concatenation across loops, inefficient usage of java.util.Map.keySet() instead of java.util.Map.entrySet() etc.
  • Some dead stores of local variables and statements without effect which could be thrown away or be corrected to do the intended things.

Things you may want to ignore

There are of course some bugs that you may ignore for now because you know that it is a common pattern in the team and abuse and thus errors are extremely unlikely. I, for example, opted to ignore some dozens of “may expose internal representation” found bugs regarding arrays in interfaces or accessibly via getters because it is a common pattern on the team not to tamper existing arrays as they are seen as immutable by the team members. It would have taken too much time to fix all those without that much of a benefit.

You may opt to ignore the performance bugs too but they are usually easy to fix.

Tips

  • If you have many foundbugs, fix the easy ones to be able to see the important ones more easily.
  • Ignore certain bug categories for now, fix them later, when you stumble upon them.
  • Concentrate on the ones that lead to wrong behaviour and crashes of your application.
  • Try to reproduce the problem with unit test and then fix the code whenever feasible! Tests are great to expose the bug and fix it without unwanted regressions!
  • Many bugs appear in places which need refactoring anyway so here is your chance to catch several flies at once.

Conclusion

With FindBugs you can find common programming errors sprinkled across the whole application in places where you probably would not have looked for years. It can help you to understand some common patterns of your team members and help you all to improve your code quality. Sometimes it even finds some hard to spot errors like the integer computation or null dereferences on certain paths. This is even more true in entangled legacy projects without proper test coverage.

A more elegant way to equals in Java

Implementing equals and hashCode in Java is a basic part of your toolbox. Here I describe a cleaner and less error-prone way to use in your code.

— Disclaimer: I know this is pretty basic stuff but many, many programmers are doing it still wrong —
As a Java programmer you know how to implement equals and that hashCode has to be implemented as well. You use your favorite IDE to generate the necessary code, use common wisdom to help you code it by hand or use annotations. But there is a fourth way: introducing EqualsBuilder (not the apache commons one which has some drawbacks over this one) which implements the general rules for equals and hashCode:

public class EqualsBuilder {

  public static interface IComparable {
      public Object[] getValuesToCompare();
  }

  private EqualsBuilder() {
    super();
  }

  public static int getHashCode(IComparable one) {
    if (null == one) {
      return 0;
    }
    final int prime = 31;
    int result = 1;
    for (Object o : one.getValuesToCompare()) {
      result = prime * result
                + EqualsBuilder.calculateHashCode(o);
    }
    return result;
  }

  private static int calculateHashCode(Object o) {
    if (null == o) {
      return 0;
    }
    return o.hashCode();
  }

  public static boolean isEqual(IComparable one,
                                              Object two) {
    if (null == one || null == two) {
      return false;
    }
    if (one.getClass() != two.getClass()) {
      return false;
    }
    return compareTwoArrays(one.getValuesToCompare(),
              ((IComparable) two).getValuesToCompare());
  }

  private static boolean compareTwoArrays(Object arrayOne, Object arrayTwo) {
      if (Array.getLength(arrayOne) != Array.getLength(arrayTwo)) {
        return false;
      }
      for (int i = 0; i < Array.getLength(arrayOne); i++) {
        if (!EqualsBuilder.areEqual(Array.get(arrayOne, i), Array.get(arrayTwo, i))) {
          return false;
        }
      }
      return true;
  }

  private static boolean areEqual(Object objectOne, Object objectTwo) {
    if (null == objectOne) {
      return null == objectTwo;
    }
    if (null == objectTwo) {
      return false;
    }
    if (objectOne.getClass().isArray() && objectTwo.getClass().isArray()) {
        return compareTwoArrays(objectOne, objectTwo);
    }
    return objectOne.equals(objectTwo);
  }

}

The interface IComparable ensures that equals and hashCode are based on the same instance variables.
To use it your class needs to implement the interface and call the appropiate methods from EqualsBuilder:

public class MyClass implements IComparable {
  private int count;
  private String name;

  public Object[] getValuesToCompare() {
    return new Object[] {Integer.valueOf(count), name};
  }

  @Override
  public int hashCode() {
    return EqualsBuilder.getHashCode(this);
  }

  @Override
  public boolean equals(Object obj) {
    return EqualsBuilder.isEqual(this, obj);
  }
} 

Update: If you want to use isEqual directly one test should be added to the start:

  if (one == two) {
    return true;
  }

Thanks to Nyarla for this hint.

Update 2: Thanks to a hint by Alex I fixed a bug in areEqual: when an array (especially a primitive one) is passed than the equals would return a wrong result.

Update 3: The newly added compareTwoArrays method had a bug: it resulted in true if arrayTwo is bigger than arrayOne but starts the same. Thanks to Thierry for pointing that out.

Forced into switch/case – Qt’s Model/View API

During my life as a programmer I have more and more come to dislike switch/case statements. They tend to be hard to grasp and with languages like C/C++ they are often the source of hard-to-find errors. Compilers that have warnings about missing default statements or missing cases for enumerated values can help to mitigate the situation, but still, I try to avoid them whenever I can.

The same holds true for if-elseif cascades or lots of if-elses in one method. They are hard to read, hard to maintain, increase the Crap, etc.

If you share this kind of mindset I invite you implement to some custom models with Qt4’s Model/View API. The design of the Model/View classes is derived from the well-known MVC pattern which separates data (model), presentation (view) and application logic (controller). In Qt’s case, view and controller are combined, supposedly making it simpler to use.

The basic idea of Qt’s implementation of its Model/View design is that views communicate with models using so-called model indexes. Using a table as an example, a row/column pair of (3,4) would be a model index pointing to data element in row 3, column 4. When a view is to be displayed it asks the attached model for all sorts of information about the data.

There are a few model implementations for standard tasks like simple string lists (QStringListModel) or file system manipulation (QDirModel < Qt4.4, QFileSystemModel >= Qt4.4). But usually you have to roll your own. For that, you have to subclass one of the abstract model classes that suits your needs best and implement some crucial methods.

For example, model methods rowCount and columnCount are called by the view to obtain the range of data it has to display. It then uses, among others, the data method to query all the stuff it needs to display the data items. The data method has the following signature:

QVariant data ( const QModelIndex&amp; index, int role ) const

Seems easy to understand: parameter index determines the data item to display and with QVariant as return type it is possible to return a wide range of data types. Parameter role is used to query different aspects of the data items. Apart from Qt::DisplayRole, which usually triggers the model to return some text, there are quite a lot other roles. Let’s look at a few examples:

  • Qt::ToolTipRole can be used to define a tool tip about the data item
  • Qt::FontRole can be use to define specific fonts
  • Qt::BackgroundRole and Qt::ForegroundRole can be used to set corresponding colors

So the views call data repeatedly with all the different roles and your model implementation is supposed to handle those different calls correctly. Say you implement a table model with some rows and columns. The design of the data method is forcing you into something like this …

QVariant data ( const QModelIndex&amp; index, int role ) const  {
   if (!index.isValid()) {
      return QVariant();
   }

   switch (role)
   {
      case Qt::DisplayRole:
         switch (index.column())
         {
            case 0:
               // return display data for column 0
               break;
            case 1:
               // return display data for column 1
               break;
            ...
         }
         break;

      case Qt::ToolTipRole:
         switch (index.column())
         {
            case 0:
               // return tool tip data for column 0
               break;
            case 1:
               // return tool tip data for column 1
               break;
            ...
         }
         break;
      ...
   }
}

… or equivalent if-else structures. What happens here? The design of the data method forces the implementation to “switch” over role and column in one method. But nested switch/case statements? AARGH!! With our mindset outlined in the beginning this is clearly unacceptable.

So what to do? Well, to tell the truth, I’m still working on the best™ solution to that but, anyway, here is a first easy improvement: handler methods. Define handler methods for each role you want to support and store them in a map. Like so:

#include &lt;QAbstractTableModel&gt;

class MyTableModel : public QAbstractTableModel
{
  Q_OBJECT

  typedef QVariant (MyTableModel::*RoleHandler) (const QModelIndex&amp; idx) const;
  typedef std::map&lt;int, RoleHandler&gt; RoleHandlerMap;

  public:
    enum Columns {
      NAME_COLUMN = 0,
      ADDRESS_COLUMN
    };

    MyTableModel() {
      m_roleHandlerMap[Qt::DisplayRole] =
         &amp;MyTableModel::displayRoleHandler;
      m_roleHandlerMap[Qt::ToolTipRole] =
         &amp;MyTableModel::tooltipRoleHandler;
    }

    QVariant displayRoleHandler(const QModelIndex&amp; idx) const {
      switch (idx.column()) {
        case NAME_COLUMN:
          // return name data
          break;

        case ADDRESS_COLUMN:
          // return address data
          break;

        default:
          Q_ASSERT(!&quot;Invalid column&quot;);
          break;
      }
      return QVariant();
    }

    QVariant tooltipRoleHandler(const QModelIndex&amp; idx) const {
      ...
    }

    QVariant data(const QModelIndex&amp; idx, int role) const {
      // omitted: check for invalid model index

      if (m_roleHandlerMap.count(role) == 0) {
        return QVariant();
      }

      RoleHandler roleHandler =
        (*m_roleHandlerMap.find(role)).second;
      return (this-&gt;*roleHandler)(idx);
    }
  private:
    RoleHandlerMap m_roleHandlerMap;
};

The advantage of this approach is that the supported roles are very well communicated. We still have to switch over the columns, though.

I’m currently working on a better solution which splits the data calls up into more meaningful methods and kind of binds the columns to specific parts of the data items in order to get a more row-centric approach: one row = one element, columns = element attributes. I hope this will get me out of this switch/case/if/else nightmare.

What do you think about it? I mean, is it just me, or is an API that forces you into crappy code just not so well done?

How would you solve this?

Blog harvest: Metaprogramming in Ruby,Hudson builds IPhone apps, Git workflow, Podcasting Equipment and Marketing

harvest64
Four blog posts:

  • Python decorators in Ruby – You can do amazing things in a language like Ruby or Lisp with a decent meta programming facility, here a language feature to annotate methods which needed a syntax change in Python is build inside of Ruby without any change to the language spec.
  • How to automate your IPhone app builds with Hudson – Another domain in which the popular CI Hudson helps: building your IPhone apps.
  • A Git workflow for agile teams – As distributed version control systems get more and more attention and are used by more teams you have to think about your utilisation of them.
  • Podcasting Equipment Guide – A bit offtopic but interesting nonetheless: if you want to do your own podcasts which equipment is right for you.

and a video:

A more elegant way to HTTP Requests in Java

The support for sending and processing HTTP requests was always very basic in the JDK. There are many, many frameworks out there for sending requests and handling or parsing the response. But IMHO two stand out: HTTPClient for sending and HTMLUnit for handling. And since HTMLUnit uses HTTPClient under the hood the two are a perfect match.

This is an example HTTP Post:

HttpClient client = new HttpClient();
PostMethod post = new PostMethod(url);
for (Entry param : params.entrySet()) {
    post.setParameter(param.key, param.value);
}
try {
    return client.executeMethod(post);
} finally {
    post.releaseConnection();
}

and HTTP Get:

WebClient webClient = new WebClient();
return (HtmlPage) webClient.getPage(url);

Accessing the returned HTML via XPath is also very straightforward:

List roomDivs=(List)page.getByXPath("//div[contains(@class, 'room')]");
for (HtmlElement div:roomDivs) {
  rooms.add(
    new Room(this,
      ((HtmlElement) div.getByXPath(".//h2/a").get(0)).getTextContent(),
      div.getId())
  );
}

One last issue remains: HTTPClient caches its cookies but HTMLUnit creates a HTTPClient on its own. But if you override HttpWebConnection and give it your HTTPClient everything works smoothly:

public class HttpClientBackedWebConnection extends HttpWebConnection {
  private HttpClient client;

  public HttpClientBackedWebConnection(WebClient webClient,
      HttpClient client) {
    super(webClient);
    this.client = client;
  }

  @Override
  protected HttpClient getHttpClient() {
    return client;
  }
}

Just set your custom webconnection on your webclient:

webClient.setWebConnection(
  new HttpClientBackedWebConnection(webClient, client)
);

About breaking class contracts – fear clone()

Recently I had some discussions about copying of Objects in Java with some fellow developers. They were overriding clone() which I never felt neccessary. Shortly after I stumbled over a Checkstyle-Warning in our own code regarding clone() where overriding it is absolutely discouraged. Triggered by these two events I decided to dig a bit deeper into the issue.Climbing a Pile of Files

The bottom line is that Object.clone() has a defined contract which is very easy to break. This has to do with it’s interaction with the Cloneable interface which does not define a clone() method and the nature of Object’s clone implementation which is native.  Joshua Bloch names some problems and pitfalls with overriding clone in his excellent book Effective Java (Item 11):

  • “If you override the clone method in a nonfinal class, you shoud return an object obtained by invoking super.clone()”. A problem here is that this is never enforced.
  • “In practice, a class that implements Cloneable is expected to provide a properly functioning public clone method”. Again this is enforced nowhere.
  • “In effect, the clone method functions as another constructor; you must ensure that it does no harm to the original object and that it properly establishes invariants on the clone.”. This means paying extreme attention to the issue of shallow and deep copies. Also be sure not to forget possible side effects your constructors may have like registering the object as a listener.
  • “The clone architecture is incompatible with normal use of final fields referring to mutable objects”. You are sacrificing freedom in your class design because of flaw in the clone() concept.

He also provides better alternatives like copy constructors or copy factories if you really need object copying. I urge you to use one of the alternatives because breaking class contracts is evil and your classes may not work as expected. This one is easy to break. If you absolutely must implement a clone() method because you are subclassing an unchangeable cloneable class be sure to follow the rules. As a sidenote also be aware of the contract that hashCode() and equals() define.

A Small XML Builder in Ruby

From a C++ point of view, i.e. the statically typed world with no “dynamic” features that deserved the name, I guess you would all agree that languages like Groovy or Ruby are truly something completely different. Having strong C++ roots myself, my first Grails project gave me lots of eye openers on some nice “dynamic” possibilities. One of the pretty cool things I encountered there was the MarkupBuilder. With it you can just write XML as if it where normal Groovy Code. Simple and just downright awesome.

The other day in yet another C++ project I was again faced with the task to generate some XML from text file. And, sure enough, my thoughts wandered to the good days in the Grails project where I could just instantiate the MarkupBuilder… But wait! I remembered that a colleague had already done some scripting stuff with Ruby, so the language was already kind of introduced into the project. And despite the fact that it was a new language for him he did some heavy lifting with it in just no time (That sure does not come as a big surprise all you Ruby folks out there).

So if Ruby is such a cool language there must be something like a markup builder in it, right? Yes there is, well, sort of. Unfortunately, it’s not part of the language package and you first have to install a thing called gems to even install the XML builder package. Being in a project with tight guidelines when it comes to external dependencies and counting in the fact that we had no patience to first having to learn what Ruby gems even are, my colleague and I decided to hack our own small XML builder (and of course, just for the fun of it). I mean hey, it’s Ruby, everything is supposed to be easy in Ruby.

Damn right it is! Here is what we came up with in what was maybe an hour or so:

class XmlGen
   def initialize
      @xmlString = ""
      @indentStack = Array.new
   end

   def method_missing(tagId, attr = {})
      argList = attr.map { |key, value|
         "#{key}=\"#{value}\""
      }.reverse.join(' ')

      @xmlString << @indentStack.join('') 
      @xmlString << "<" << tagId.to_s << " " << argList
      if block_given?
         @xmlString << ">\n"
         @indentStack.push "\t"
         yield
         @indentStack.pop
         @xmlString << @indentStack.join('') << "</" << tagId.to_s << ">\n"
      else
         @xmlString << "/>\n"
      end
      self
   end

   def to_s
      @xmlString
   end
end

And here is how you can use it:

xml = XmlGen.new
xml.FirstXmlTag {
   xml.SubTagOne( {'attribute1' => 'value1'} ) {
      someCollection.each { |item|
         xml.CollectionTag( {'itemId' => item.id} )
      }
   }
}

It’s not perfect, it’s not optimized in any way and it may not even be the Ruby way. But hey, it served our needs perfectly, it was a pretty cool Ruby experience, and it sure is not the last piece of Ruby code in this project.

Always be aware of the charset encoding hell

Most developers already struggled with textual data from some third party system and getting garbage special characters and the like because of wrong character encodings.  Some days ago we encountered an obscure problem when it was possible to login into one of our apps from the computer with the password database running but not from other machines using the same db.  After diving into the problem we found out that they SHA-1 hashes generated from our app were slightly different. Looking at the code revealed that platform encoding was used and that lead to different results:platform-encoding

The apps were running on Windows XP and Windows 2k3 Server respectively and you would expect that it would not make much of a difference but in fact it did!

Lesson:

Always specify the encoding explicitly, when exchanging character data with any other system. Here are some examples:

  • String.getBytes(“utf-8”), new Printwriter(file, “ascii”) in Java
  • HTML-Forms with attribute accept-charset="ISO-8859-1"
  • In XML headers <?xml version="1.0" encoding="ISO-8859-15"?>
  • In your Database and/or JDBC driver
  • In your file format documentation
  • In LaTeX documents
  • everywhere where you can provide that info easily (e.g. as a comment in a config file)

Problems with character encodings seem to appear every once in a while either as end user, when your umlauts get garbled or as a programmer that has to deal with third party input like web forms or text files.

The text file rant

After stumbling over an encoding problem *again* I thought a bit about the whole issue and some of my thought manifested in this rant about text files. I do not want to blame our computer science predecessors for inventing and using restricted charsets like ASCII or iso8859. Nobody has forseen the rapid development of computers and their worldwide adoption and use in everyday life and thus need for an extensible charset (think of the addition of new symbols like the €), let aside performance and memory considerations. The problem I see with text files is that there is no standard way to describe the used encoding. Most text files just leave it to the user to guess what the encoding might be whereas almost all binary file formats feature some kind of defined header with metadata about the content, e.g. bit depth and compression method in image files. For text files you usually have to use heuristical tools which work  more or less depending on the input.

A standardized header for text files right from the start would have helped to indicate the encoding and possibly language or encoding version information of the text and many problems we have today would not exist. The encoding attribute in the XML header or the byte order mark in UTF-8 are workarounds for the fundamental problem of a missing text file header.