Blog harvest: Metaprogramming in Ruby,Hudson builds IPhone apps, Git workflow, Podcasting Equipment and Marketing

harvest64
Four blog posts:

  • Python decorators in Ruby – You can do amazing things in a language like Ruby or Lisp with a decent meta programming facility, here a language feature to annotate methods which needed a syntax change in Python is build inside of Ruby without any change to the language spec.
  • How to automate your IPhone app builds with Hudson – Another domain in which the popular CI Hudson helps: building your IPhone apps.
  • A Git workflow for agile teams – As distributed version control systems get more and more attention and are used by more teams you have to think about your utilisation of them.
  • Podcasting Equipment Guide – A bit offtopic but interesting nonetheless: if you want to do your own podcasts which equipment is right for you.

and a video:

A more elegant way to HTTP Requests in Java

The support for sending and processing HTTP requests was always very basic in the JDK. There are many, many frameworks out there for sending requests and handling or parsing the response. But IMHO two stand out: HTTPClient for sending and HTMLUnit for handling. And since HTMLUnit uses HTTPClient under the hood the two are a perfect match.

This is an example HTTP Post:

HttpClient client = new HttpClient();
PostMethod post = new PostMethod(url);
for (Entry param : params.entrySet()) {
    post.setParameter(param.key, param.value);
}
try {
    return client.executeMethod(post);
} finally {
    post.releaseConnection();
}

and HTTP Get:

WebClient webClient = new WebClient();
return (HtmlPage) webClient.getPage(url);

Accessing the returned HTML via XPath is also very straightforward:

List roomDivs=(List)page.getByXPath("//div[contains(@class, 'room')]");
for (HtmlElement div:roomDivs) {
  rooms.add(
    new Room(this,
      ((HtmlElement) div.getByXPath(".//h2/a").get(0)).getTextContent(),
      div.getId())
  );
}

One last issue remains: HTTPClient caches its cookies but HTMLUnit creates a HTTPClient on its own. But if you override HttpWebConnection and give it your HTTPClient everything works smoothly:

public class HttpClientBackedWebConnection extends HttpWebConnection {
  private HttpClient client;

  public HttpClientBackedWebConnection(WebClient webClient,
      HttpClient client) {
    super(webClient);
    this.client = client;
  }

  @Override
  protected HttpClient getHttpClient() {
    return client;
  }
}

Just set your custom webconnection on your webclient:

webClient.setWebConnection(
  new HttpClientBackedWebConnection(webClient, client)
);

About breaking class contracts – fear clone()

Recently I had some discussions about copying of Objects in Java with some fellow developers. They were overriding clone() which I never felt neccessary. Shortly after I stumbled over a Checkstyle-Warning in our own code regarding clone() where overriding it is absolutely discouraged. Triggered by these two events I decided to dig a bit deeper into the issue.Climbing a Pile of Files

The bottom line is that Object.clone() has a defined contract which is very easy to break. This has to do with it’s interaction with the Cloneable interface which does not define a clone() method and the nature of Object’s clone implementation which is native.  Joshua Bloch names some problems and pitfalls with overriding clone in his excellent book Effective Java (Item 11):

  • “If you override the clone method in a nonfinal class, you shoud return an object obtained by invoking super.clone()”. A problem here is that this is never enforced.
  • “In practice, a class that implements Cloneable is expected to provide a properly functioning public clone method”. Again this is enforced nowhere.
  • “In effect, the clone method functions as another constructor; you must ensure that it does no harm to the original object and that it properly establishes invariants on the clone.”. This means paying extreme attention to the issue of shallow and deep copies. Also be sure not to forget possible side effects your constructors may have like registering the object as a listener.
  • “The clone architecture is incompatible with normal use of final fields referring to mutable objects”. You are sacrificing freedom in your class design because of flaw in the clone() concept.

He also provides better alternatives like copy constructors or copy factories if you really need object copying. I urge you to use one of the alternatives because breaking class contracts is evil and your classes may not work as expected. This one is easy to break. If you absolutely must implement a clone() method because you are subclassing an unchangeable cloneable class be sure to follow the rules. As a sidenote also be aware of the contract that hashCode() and equals() define.

A Small XML Builder in Ruby

From a C++ point of view, i.e. the statically typed world with no “dynamic” features that deserved the name, I guess you would all agree that languages like Groovy or Ruby are truly something completely different. Having strong C++ roots myself, my first Grails project gave me lots of eye openers on some nice “dynamic” possibilities. One of the pretty cool things I encountered there was the MarkupBuilder. With it you can just write XML as if it where normal Groovy Code. Simple and just downright awesome.

The other day in yet another C++ project I was again faced with the task to generate some XML from text file. And, sure enough, my thoughts wandered to the good days in the Grails project where I could just instantiate the MarkupBuilder… But wait! I remembered that a colleague had already done some scripting stuff with Ruby, so the language was already kind of introduced into the project. And despite the fact that it was a new language for him he did some heavy lifting with it in just no time (That sure does not come as a big surprise all you Ruby folks out there).

So if Ruby is such a cool language there must be something like a markup builder in it, right? Yes there is, well, sort of. Unfortunately, it’s not part of the language package and you first have to install a thing called gems to even install the XML builder package. Being in a project with tight guidelines when it comes to external dependencies and counting in the fact that we had no patience to first having to learn what Ruby gems even are, my colleague and I decided to hack our own small XML builder (and of course, just for the fun of it). I mean hey, it’s Ruby, everything is supposed to be easy in Ruby.

Damn right it is! Here is what we came up with in what was maybe an hour or so:

class XmlGen
   def initialize
      @xmlString = ""
      @indentStack = Array.new
   end

   def method_missing(tagId, attr = {})
      argList = attr.map { |key, value|
         "#{key}=\"#{value}\""
      }.reverse.join(' ')

      @xmlString << @indentStack.join('') 
      @xmlString << "<" << tagId.to_s << " " << argList
      if block_given?
         @xmlString << ">\n"
         @indentStack.push "\t"
         yield
         @indentStack.pop
         @xmlString << @indentStack.join('') << "</" << tagId.to_s << ">\n"
      else
         @xmlString << "/>\n"
      end
      self
   end

   def to_s
      @xmlString
   end
end

And here is how you can use it:

xml = XmlGen.new
xml.FirstXmlTag {
   xml.SubTagOne( {'attribute1' => 'value1'} ) {
      someCollection.each { |item|
         xml.CollectionTag( {'itemId' => item.id} )
      }
   }
}

It’s not perfect, it’s not optimized in any way and it may not even be the Ruby way. But hey, it served our needs perfectly, it was a pretty cool Ruby experience, and it sure is not the last piece of Ruby code in this project.

Always be aware of the charset encoding hell

Most developers already struggled with textual data from some third party system and getting garbage special characters and the like because of wrong character encodings.  Some days ago we encountered an obscure problem when it was possible to login into one of our apps from the computer with the password database running but not from other machines using the same db.  After diving into the problem we found out that they SHA-1 hashes generated from our app were slightly different. Looking at the code revealed that platform encoding was used and that lead to different results:platform-encoding

The apps were running on Windows XP and Windows 2k3 Server respectively and you would expect that it would not make much of a difference but in fact it did!

Lesson:

Always specify the encoding explicitly, when exchanging character data with any other system. Here are some examples:

  • String.getBytes(“utf-8”), new Printwriter(file, “ascii”) in Java
  • HTML-Forms with attribute accept-charset="ISO-8859-1"
  • In XML headers <?xml version="1.0" encoding="ISO-8859-15"?>
  • In your Database and/or JDBC driver
  • In your file format documentation
  • In LaTeX documents
  • everywhere where you can provide that info easily (e.g. as a comment in a config file)

Problems with character encodings seem to appear every once in a while either as end user, when your umlauts get garbled or as a programmer that has to deal with third party input like web forms or text files.

The text file rant

After stumbling over an encoding problem *again* I thought a bit about the whole issue and some of my thought manifested in this rant about text files. I do not want to blame our computer science predecessors for inventing and using restricted charsets like ASCII or iso8859. Nobody has forseen the rapid development of computers and their worldwide adoption and use in everyday life and thus need for an extensible charset (think of the addition of new symbols like the €), let aside performance and memory considerations. The problem I see with text files is that there is no standard way to describe the used encoding. Most text files just leave it to the user to guess what the encoding might be whereas almost all binary file formats feature some kind of defined header with metadata about the content, e.g. bit depth and compression method in image files. For text files you usually have to use heuristical tools which work  more or less depending on the input.

A standardized header for text files right from the start would have helped to indicate the encoding and possibly language or encoding version information of the text and many problems we have today would not exist. The encoding attribute in the XML header or the byte order mark in UTF-8 are workarounds for the fundamental problem of a missing text file header.

Evil operator overloading of the day

The other day we encountered a strange stack overflow when cout-ing an instance of a custom class. The stream output operator << was overloaded for the class to get a nice output, but since the class had only two std::string attributes the implementation was very simple:

using namespace std;

class MyClass
{
   public:
   ...
   private:
      string stringA_;
      string stringB_;

   friend ostream& operator << (ostream& out, const MyClass& myClass);
};

ostream& operator << (ostream& out, const MyClass& myClass)
{
   return out << "MyClass (A: " << myClass.stringA_ 
              <<", B: " << myClass.stringB_ << ")"  << std::endl;
}

Because the debugger pointed us to a completely separate code part, our first thought was that maybe some old libraries had been accidently linked or some memory got corrupted somehow. Unfortunately, all efforts in that direction lead to nothing.

That was the time when we noticed that using old-style printf instead of std::cout did work just fine. Hm..

So back to that completely separate code part. Is it really so separate? And what does it do anyway?

We looked closer and after a few minutes we discovered the following code parts. Just look a little while before you read on, it’s not that difficult:

// some .h file somewhere in the code base that somehow got included where our stack overflow occurred:

...
typedef std::string MySpecialName;
...
ostream& operator << (ostream& out, const MySpecialName& name);

// and in some .cpp file nearby

...
ostream& operator << (ostream& out, const MySpecialName& name)
{
   out << "MySpecialName: " << name  << std::endl;
}
...

Got it? Yes, right! That overloaded out-stream operator << for MySpecialName together with that innocent looking typedef above put your program right into death by segmentation fault.  Overloading the out-stream operator for a given type can be a good idea – as long as that type is not a typedef of std::string. The code above not only leads to the operator << recursively calling itself but also sucks every other part of the code into its black hole which happens to include the .h file and wants to << a std::string variable.

You just have to love C++…

Grails Web Application Security: XSS prevention

XSS (Cross Site Scripting) became a favored attack method in the last years. Several things are possible using an XSS vulnerability ranging from small annoyances to a complete desaster.
The XSS prevention cheat sheet states 6 rules to prevent XSS attacks. For a complete solution output encoding is needed in addition to input validation.
Here I take a further look on how to use the built in encoding methods in grails applications to prevent XSS.

Take 1: The global option

There exists a global option that specifies how all output is encoded when using ${}. See grails-app/conf/Config.groovy:

// The default codec used to encode data with ${}
grails.views.default.codec="html" // none, html, base64

So every input inside ${} is encoded but beware of the standard scaffolds where fieldValue is used inside ${}. Since fieldValue uses encoding you get a double escaped output – not a security problem, but the output is garbage.
This leaves the tags from the tag libraries to be reviewed for XSS vulnerability. The standard grails tags use all HTML encoding. If you use older versions than grails 1.1: beware of a bug in the renderErrors tag. Default encoding ${} does not help you when you use your custom tags. In this case you should nevertheless encode the output!
But problems arise with other tags like radioGroup like others found out.
So the global option does not result in much protection (only ${}), double escaping and problems with grails tags.

Take 2: Tainted strings

Other languages/frameworks (like Perl, Ruby, PHP,…) use a taint mode. There are some research works for Java.
Generally speaking in gsps three different outputs have to be escaped: ${}, <%%> and the ones from tags/taglibs. If a tainted String appears you can issue a warning and disallow or escape it. The problem in Java/Groovy is that Strings are value objects and since get copied in every operation so the tainted flag needs to be transferred, too. The same tainted flag must also be introduced for GStrings.
Since there isn’t any implementation or plugin for groovy/grails yet, right now you have to take the classic route:

Take 3: Test suites and reviews

Having a decent test suite in e.g. Selenium and reviewing your code for XSS vulnerabilities is still the best option in your grails apps. Maybe the tainted flags can help you in the future to spot places which you didn’t catch in a review.

P.S. A short overview for Java frameworks and their handling of XSS can be found here

How much boost does a C++ newbie need?

The other day, I talked to a C++ developer, who is relatively new in the language, about the C++ training they just had at his company. The training topics were already somewhat advanced and contained e.g. STL containers and their peculiarities, STL algorithms and some boost stuff like binders and smart pointers. That got me thinking about how much of STL and boost does a C++ developer just has to know in order to survive their C++ projects.

There is also another angle to this. There are certain corners of the C++ language, e.g. template metaprogramming, which are just hard to get, even for more experienced developers. And because of that, in my opinion, they have no place in a standard industry C++ project. But where do you draw the line? With template meta-programming it is obvious that it probably will never be in every day usage by Joe Developer. But what about e.g. boost’s multi-index container or their functional programming stuff? One could say that it depends on the skills of team whether more advanced stuff can be used or not. But suppose your team consist largely of C++ beginners and does not have much experience in the language, would you want to pass on using Boost.Spirit when you had to do some serious parsing? Or would you want to use error codes instead of decent exceptions, because they add a lot more potentially “invisible” code paths? Probably not, but those are certainly no easy decisions.

One of the problems with STL and boost for a C++ beginner can be illustrated with the following easy problem: How do you convert an int into a std::string and back? Having already internalized the stream classes the beginner might come up with something like this:

 int i = 5;
 std::ostringstream out;
 out << i;
 std::string i_string = out.str();  

 int j=0;
 std::istringstream in(i_string);
 in >> j;
 assert(i == j);

But if he just had learned a little boost he would know that, in fact, it is as easy as this:

 int i=5;
 std::string i_string = boost::lexical_cast<std::string>(i);

 int j = boost::lexical_cast<int>(i_string);

So you just have to know some basic boost stuff in order to write fairly decent C++ code. Besides boost::lexical_cast, which is part of the Boost Conversion Library, here is my personal list of mandatory boost knowledge:

Boost.Assign: Why still bother with std::map::push_back and the likes, if there is a much easier and concise syntax to initialize containers?

Boost.Bind (If you use functional programming): No one should be forced to wade through the mud of STL binders any longer. Boost::bind is just so much easier.

Boost.Foreach: Every for-loop becomes a code-smell after your first use of BOOST_FOREACH.

Boost.Member Function: see Boost.Bind

Boost.Smart Pointers: No comment is needed on that one.

As you can see, these are only the most basic libraries. Other extremely useful things for day-to-day programming are e.g. Boost.FileSystem, Boost.DateTime, Boost.Exceptions, Boost.Format, Boost.Unordered and Boost.Utilities.

Of course, you don’t have to memorize every part of the boost libraries, but boost.org should in any case be the first address to look for a solution to your daily  C++ challenges.

Don’t trust micro versions

Normally you would think, that upgrading a third party dependency where its micro version (after the second dot, like x in 2.3.x) changes should make your software work (even) better and not break it. Sadly enough it can easily happen. Some time ago we stumbled over a subtle change in the JNDI implementation of the Jetty webserver and servlet container: In version 6.1.11 you specified (or at least could specify) JNDI resources in jetty-env.xml with URLs like jdbc/myDatabase. After the update to 6.1.12 the specified resource could not be found anymore. Digging through code changelogs and the like provided a solution that finally worked with 6.1.12: java:comp/env/jdbc/myDatabase. The bad thing is that the latter does not work with 6.1.11 so that our configuration became micro-version-dependent on Jetty.

It seems that a new feature around JETTY-725 in the update from 6.1.11 to 6.1.12 broke our software.

Conclusion

Always make sure that your dependencies are fixed for your software releases and test your software everytime when upgrading a dependency. Do not trust some automatic dependency update system or the version numbers of a project. In the end they are just numbers and should indicate the impact of the changes but you never can be sure the changes do not break something for you.

Small gaps in the grails docs

Just for reference, if you come across one of the following problems:

Validation only datasource

Looking at the options of dbCreate in Datasource.groovy I only found 3 values: create-drop, create or update. But there is a fourth one: validate!
This one helps a lot when you use schema generation with Autobase or doing your schema updates external.

Redirect

Controller.redirect has two options for passing an id to the action id and params, but if you specify both which one will be used?

controller.redirect(id:1, params:[id:2])

Trying this out I found the id supersedes the params.id.

Update:
Thanks to Burt and Alvaro for their hints. I submitted a JIRA issue