Evil operator overloading of the day

The other day we encountered a strange stack overflow when cout-ing an instance of a custom class. The stream output operator << was overloaded for the class to get a nice output, but since the class had only two std::string attributes the implementation was very simple:

using namespace std;

class MyClass
{
   public:
   ...
   private:
      string stringA_;
      string stringB_;

   friend ostream& operator << (ostream& out, const MyClass& myClass);
};

ostream& operator << (ostream& out, const MyClass& myClass)
{
   return out << "MyClass (A: " << myClass.stringA_ 
              <<", B: " << myClass.stringB_ << ")"  << std::endl;
}

Because the debugger pointed us to a completely separate code part, our first thought was that maybe some old libraries had been accidently linked or some memory got corrupted somehow. Unfortunately, all efforts in that direction lead to nothing.

That was the time when we noticed that using old-style printf instead of std::cout did work just fine. Hm..

So back to that completely separate code part. Is it really so separate? And what does it do anyway?

We looked closer and after a few minutes we discovered the following code parts. Just look a little while before you read on, it’s not that difficult:

// some .h file somewhere in the code base that somehow got included where our stack overflow occurred:

...
typedef std::string MySpecialName;
...
ostream& operator << (ostream& out, const MySpecialName& name);

// and in some .cpp file nearby

...
ostream& operator << (ostream& out, const MySpecialName& name)
{
   out << "MySpecialName: " << name  << std::endl;
}
...

Got it? Yes, right! That overloaded out-stream operator << for MySpecialName together with that innocent looking typedef above put your program right into death by segmentation fault.  Overloading the out-stream operator for a given type can be a good idea – as long as that type is not a typedef of std::string. The code above not only leads to the operator << recursively calling itself but also sucks every other part of the code into its black hole which happens to include the .h file and wants to << a std::string variable.

You just have to love C++…

Branching support missing in JIRA

JIRA is a really great tool when it comes to issue tracking. It is powerful, extensible, usable and widespread. We and many of our clients use it for years and are quite satisfied coming from other tools like Bugzilla.

One thing we find is missing though is the concept of branching. In JIRA you have projects and can define a roadmap consisting of versions that follow each other. Many software products require sooner or later a development line which is only maintained getting bug and security fixes while feature development happens on a separate branch.

Tracking issues for a software with different branches is a bit more demanding because you have to identify the branches one issue affects and possibly separately fix them on each branch. One and the same issue could have different resolutions per branch, i.e. invalid if a broken functionality does not exist anymore. Nevertheless, all branches have to be checked, likely in different time frames.

Unfortunately JIRA does not support the notion of branches. You have to emulate the behaviour using different schemes like:

  • One issue per version that represents a branch
  • Multiple fix versions for an issue
  • Subtasks for an issue or issue links to ckeck and fix the issue for different versions

To me all those are workarounds which lack polished usability and add overhead to your issue management. Real branching support could help you check on which branch an issue is done, where it has still to be resolved and so on without adding more and more (perhaps loosely connected) issues to your system or forgetting to fix the issue on some branch (no question/warning when resolving an issue with multiple fix versions).

[Update:]

I created a new feature request for JIRA where you can vote or track progress on this issue.

Grails Web Application Security: XSS prevention

XSS (Cross Site Scripting) became a favored attack method in the last years. Several things are possible using an XSS vulnerability ranging from small annoyances to a complete desaster.
The XSS prevention cheat sheet states 6 rules to prevent XSS attacks. For a complete solution output encoding is needed in addition to input validation.
Here I take a further look on how to use the built in encoding methods in grails applications to prevent XSS.

Take 1: The global option

There exists a global option that specifies how all output is encoded when using ${}. See grails-app/conf/Config.groovy:

// The default codec used to encode data with ${}
grails.views.default.codec="html" // none, html, base64

So every input inside ${} is encoded but beware of the standard scaffolds where fieldValue is used inside ${}. Since fieldValue uses encoding you get a double escaped output – not a security problem, but the output is garbage.
This leaves the tags from the tag libraries to be reviewed for XSS vulnerability. The standard grails tags use all HTML encoding. If you use older versions than grails 1.1: beware of a bug in the renderErrors tag. Default encoding ${} does not help you when you use your custom tags. In this case you should nevertheless encode the output!
But problems arise with other tags like radioGroup like others found out.
So the global option does not result in much protection (only ${}), double escaping and problems with grails tags.

Take 2: Tainted strings

Other languages/frameworks (like Perl, Ruby, PHP,…) use a taint mode. There are some research works for Java.
Generally speaking in gsps three different outputs have to be escaped: ${}, <%%> and the ones from tags/taglibs. If a tainted String appears you can issue a warning and disallow or escape it. The problem in Java/Groovy is that Strings are value objects and since get copied in every operation so the tainted flag needs to be transferred, too. The same tainted flag must also be introduced for GStrings.
Since there isn’t any implementation or plugin for groovy/grails yet, right now you have to take the classic route:

Take 3: Test suites and reviews

Having a decent test suite in e.g. Selenium and reviewing your code for XSS vulnerabilities is still the best option in your grails apps. Maybe the tainted flags can help you in the future to spot places which you didn’t catch in a review.

P.S. A short overview for Java frameworks and their handling of XSS can be found here

How much boost does a C++ newbie need?

The other day, I talked to a C++ developer, who is relatively new in the language, about the C++ training they just had at his company. The training topics were already somewhat advanced and contained e.g. STL containers and their peculiarities, STL algorithms and some boost stuff like binders and smart pointers. That got me thinking about how much of STL and boost does a C++ developer just has to know in order to survive their C++ projects.

There is also another angle to this. There are certain corners of the C++ language, e.g. template metaprogramming, which are just hard to get, even for more experienced developers. And because of that, in my opinion, they have no place in a standard industry C++ project. But where do you draw the line? With template meta-programming it is obvious that it probably will never be in every day usage by Joe Developer. But what about e.g. boost’s multi-index container or their functional programming stuff? One could say that it depends on the skills of team whether more advanced stuff can be used or not. But suppose your team consist largely of C++ beginners and does not have much experience in the language, would you want to pass on using Boost.Spirit when you had to do some serious parsing? Or would you want to use error codes instead of decent exceptions, because they add a lot more potentially “invisible” code paths? Probably not, but those are certainly no easy decisions.

One of the problems with STL and boost for a C++ beginner can be illustrated with the following easy problem: How do you convert an int into a std::string and back? Having already internalized the stream classes the beginner might come up with something like this:

 int i = 5;
 std::ostringstream out;
 out << i;
 std::string i_string = out.str();  

 int j=0;
 std::istringstream in(i_string);
 in >> j;
 assert(i == j);

But if he just had learned a little boost he would know that, in fact, it is as easy as this:

 int i=5;
 std::string i_string = boost::lexical_cast<std::string>(i);

 int j = boost::lexical_cast<int>(i_string);

So you just have to know some basic boost stuff in order to write fairly decent C++ code. Besides boost::lexical_cast, which is part of the Boost Conversion Library, here is my personal list of mandatory boost knowledge:

Boost.Assign: Why still bother with std::map::push_back and the likes, if there is a much easier and concise syntax to initialize containers?

Boost.Bind (If you use functional programming): No one should be forced to wade through the mud of STL binders any longer. Boost::bind is just so much easier.

Boost.Foreach: Every for-loop becomes a code-smell after your first use of BOOST_FOREACH.

Boost.Member Function: see Boost.Bind

Boost.Smart Pointers: No comment is needed on that one.

As you can see, these are only the most basic libraries. Other extremely useful things for day-to-day programming are e.g. Boost.FileSystem, Boost.DateTime, Boost.Exceptions, Boost.Format, Boost.Unordered and Boost.Utilities.

Of course, you don’t have to memorize every part of the boost libraries, but boost.org should in any case be the first address to look for a solution to your daily  C++ challenges.

Don’t trust micro versions

Normally you would think, that upgrading a third party dependency where its micro version (after the second dot, like x in 2.3.x) changes should make your software work (even) better and not break it. Sadly enough it can easily happen. Some time ago we stumbled over a subtle change in the JNDI implementation of the Jetty webserver and servlet container: In version 6.1.11 you specified (or at least could specify) JNDI resources in jetty-env.xml with URLs like jdbc/myDatabase. After the update to 6.1.12 the specified resource could not be found anymore. Digging through code changelogs and the like provided a solution that finally worked with 6.1.12: java:comp/env/jdbc/myDatabase. The bad thing is that the latter does not work with 6.1.11 so that our configuration became micro-version-dependent on Jetty.

It seems that a new feature around JETTY-725 in the update from 6.1.11 to 6.1.12 broke our software.

Conclusion

Always make sure that your dependencies are fixed for your software releases and test your software everytime when upgrading a dependency. Do not trust some automatic dependency update system or the version numbers of a project. In the end they are just numbers and should indicate the impact of the changes but you never can be sure the changes do not break something for you.

Small gaps in the grails docs

Just for reference, if you come across one of the following problems:

Validation only datasource

Looking at the options of dbCreate in Datasource.groovy I only found 3 values: create-drop, create or update. But there is a fourth one: validate!
This one helps a lot when you use schema generation with Autobase or doing your schema updates external.

Redirect

Controller.redirect has two options for passing an id to the action id and params, but if you specify both which one will be used?

controller.redirect(id:1, params:[id:2])

Trying this out I found the id supersedes the params.id.

Update:
Thanks to Burt and Alvaro for their hints. I submitted a JIRA issue

A guide through the swamp – The CrapMap

Locate your crappy methods quickly with this treemap visualization tool for Crap4J report data.

One of the most useful metrics to us in the Softwareschneiderei is “CRAP”. For java, it is calculated by the Crap4J tool and provided as an HTML report. The report gives you a rough idea whats going on in your project, but to really know what’s up, you need to look closer.

A closer look on crap

The Crap4J tool spits out lots of numbers, especially for larger projects. But from these numbers, you can’t easily tell some important questions like:

  • Are there regions (packages, classes) with lots more crap than others?
  • What are those regions?

So we thought about the problem and found it to be solvable by data visualization.

Enter CrapMap

If you need to use advanced data visualization techniques, there is a very helpful project called prefuse (which has a successor named flare for web applications). It provides an exhaustive API to visualize nearly everything the way you want to. We wanted our crap statistics drawn in a treemap. A treemap is a bunch of boxes, crammed together by a clever layouting strategy, each one representing data, for example by its size or color.

The CrapMap is a treemap where every box represents a method. The size gives you a hint of the method’s complexity, the color indicates its crappyness. Method boxes reside inside their classes’ boxes which reside in package boxes. That way, the treemap represents your code structure.

A picture worth a thousand numbers

crapmap1

This is a screenshot of the CrapMap in action. You see a medium sized project with few crap methods (less than one percent). Each red rectangle is a crappy method, each green one is an acceptable method regarding its complexity.

Adding interaction

You can quickly identify your biggest problem (in terms of complexity) by selecting it with your mouse. All necessary data about this method is shown in the bottom section of the window. The overall data of the project is shown in the top section.

If you want to answer some more obscure questions about your methods, try the search box in the lower right corner. The CrapMap comes with a search engine using your methods’ names.

Using CrapMap on your project

CrapMap is a java swing application, meant for desktop usage. To visualize your own project, you need the report.xml data file of it from Crap4J. Start the CrapMap application and load the report.xml using the “open file” dialog that shows up. That’s all.

In the near future, CrapMap will be hosted on dev.java.net (crapmap.dev.java.net). Right now, it’s only available as a binary executable from our download server (1MB download size). When you unzip the archive, double-click the crapmap.jar to start the application. CrapMap requires Java6 to be installed.

Show your project

We would be pleased to see your CrapMap. Make a screenshot, upload it and leave a comment containing the link to the image.

Enable Capture/Replay for Selenium Flex

One of the missing features of the SeleniumFlexAPI was capture/replay. So I looked into different ways to enable it:

  • Approach 1: Dispatch a DOM event and listen for it in ide-extensions.js

    Problem: Where do I include the name/id of the Flex control?
  • Approach 2: Custom events

    Problem: How do I listen to them in ide-extensions.js?

Solution: Additions to the Selenium IDE code: window.record

The solution is to add a new method to the Selenium IDE code: window.record which delegates to recorder.record. So that the Flex code can call this method directly through the ExternalInterface. The clear advantage of this technique is that there is no code pollution in your production code. But you have to change the SeleniumIDE code. There is an issue in the SeleniumIDE Jira which describes the additions, so go and vote for it!
Addtional code is also needed in the SeleniumFlexAPI.as in the applicationCompleteHandler:

private function applicationCompleteHandler(event:FlexEvent):void {
  ...
  registerListeners(appTreeParser.thisApp.parent);
  ...
}
private function registerListeners(subject:*):void {
  subject.addEventListener(MouseEvent.CLICK, recordClick);
  subject.addEventListener(MouseEvent.DOUBLE_CLICK, recordDoubleClick);
  subject.addEventListener(Event.ADDED, childAdded);
  addListenerRecursive(subject);
}

Bubbling events like MouseEvent.CLICK can be added here but for the non-bubbling ones you have to recursively walk the displayobject hierarchy:

public function addListenerRecursive(root:*):void {
  for(var i:int = 0; i < root.numChildren; i++) {
    try {
      var child:Object = root.getChildAt(i);
      if (isMenuBar(child)) {
        child.removeEventListener(MenuEvent.ITEM_CLICK, recordMenuItemClick);
        child.addEventListener(MenuEvent.ITEM_CLICK, recordMenuItemClick);
      }
      if (isTextControl(child)) {
        child.removeEventListener(Event.CHANGE, recordTextChange);
        child.addEventListener(Event.CHANGE, recordTextChange);
      }
      if (isDataGrid(child)) {
        child.removeEventListener(DataGridEvent.ITEM_FOCUS_IN, recordItemClick);
        child.addEventListener(DataGridEvent.ITEM_FOCUS_IN, recordItemClick);
      }
      addListenerRecursive(child);
    } catch(e:Error) {}
  }
}

In the event handling functions you just call the record function with the appropiate SeleniumFlex command:

private function recordClick(event:MouseEvent):void {
  ExternalInterface.call("record", "flexClick", "name=" + event.target.name, "");
}

Since Flex Sprites have no ids I use the name here for identifying the clicked target.
Another pitfall is when components are added dynamically (like when a Date opens, it adds a calendar view):

private function childAdded(event:Event):void {
  if (isDate(event.target.parent.parent)) {
    event.target.parent.parent.removeEventListener(CalendarLayoutChangeEvent.CHANGE, recordDate);
    event.target.parent.parent.addEventListener(CalendarLayoutChangeEvent.CHANGE, recordDate);
  }
}

Conclusion

So finally capture/replay in SeleniumFlex becomes a reality! Nonetheless there is some work to do to support the different kinds of flex controls and Selenium commands.

Stacked smartness doesn’t add up

When a software is composed of different layers, friction occurs. That’s when features turn into bugs.

houseofcardsThere is a strong urge to make software smart. Whenever something smart gets built in, it’s called a feature. Features of a software are effects you don’t foresee, but find handy for your use case. If your use case is impaired by a feature, you’ll likely call it a bug.

Some features of various software

To make my point clear, i have to introduce two features of software that are very practical for their anticipated use case and then change their context by adding another layer:

Ant filesets

If you use Ant as a build script language, you’ll find filesets very pratical. If you want to modify, copy or delete a bunch of files, you specify a root directory and some similarities between the files (like equal filetypes or names) and you’re done. Let me give you an example to show the use case:

<delete>
    <fileset dir="${basedir}" include="**/build.xml,**/pom.xml"/>
</delete>

This will very likely delete all ant build scripts and maven setting files in your project (so please use with care). Notice how the include attribute is comma-separated for multiple patterns. According to the documentation, the comma can be omitted for a space character.

Then, there is Hudson, a very powerful continuous integration server. One source of its power is the familiarity of configuration syntax, specifically when accessing a bunch of files:

hudson-include

The given text field specifies the include attribute of an Ant fileset. You immediately inherit all the power of ant’s fileset, but the features, too. Here, it’s a feature that two pattern can be excluded by just a space character. If your path contains spaces, you cannot express your pattern in this text field. When using the fileset directly in Ant, you can alter the syntax and use multiple nested include tags, but within Hudson, you are stuck with the single include attribute.

Struts2 internationalization

The second example is fully described in my previous blog entry (“The perils of \u0027”).

As a short summary: The Struts2 framework inherits the power of Java’s MessageFormat when loading language dependent text. As the apostrophe is a special character to MessageFormat, it cannot be used directly in the text entries.

The principle behind the examples

Both examples share a common principle: “Stacked smartness doesn’t add up“. What’s a feature to one software, may be a bug to a software that builds on top of it.

Software developers tend to “stack up” different third party software products to compose their own product with even higher-level functionality. There is nothing wrong with this approach, as long as the context of the underlying products doesn’t change much. If it changes, features begin to behave like bugs.

The cost of stacking

Stacked products are likely to increase the ability of skilled users to re-use their knowledge. Every developer familiar to Ant will instantly be empowered to use the file patterns of Hudson. Every developer familiar to MessageFormat can produce powerful i18n entries that do most of the formatting automatically. That’s a great productivity gain.

But on the other side, if you aren’t familiar to ant when using hudson or know nothing about MessageFormat when just translating the i18n entries of a Struts2 webapp, you’ll be surprised by strange effects. And you won’t find sufficient documentation of these effects in the first place. There will be a link to some obscure project or class you never heard about, telling you all sorts of details you don’t want to hear right now. You can’t easily put them into the right context anyway. You will be down to trial and error, frustrated that your use case seems impossible without explanation. That’s a great productivity loss.

Often, you can’t blame any part of the stack, not even the topmost, for the occuring bugs. If a specific stack maintains and increases productivity, depends on the use case of the topmost layer compared to the underlying anticipated ones. If those aren’t documentated, its hard to notice the displacement.

A metaphor on software stacks

Whenever I hear about a software stack, a picture of a man on a stack of crates occurs to me. Here is the original photo of my thoughts.

What’s your encounter with a shaky stacking?

The perils of \u0027

Adventures (read: pitfalls) of internationalization with Struts2, concerning the principle “stacked smartness doesn’t add up”.

u0027Struts2 is a framework for web application development in Java. It’s considered mature and feature-rich and inherits the internationalization (i18n) capabilities of the Java platform. This is what you would expect. In fact, the i18n features of Struts2 are more powerful than the platform ones, but the power comes with a price.

Examples of the sunshine path

If you read a book like “Struts 2 in Action” written by Donald Brown and others, you’ll come across a chapter named “Understanding internationalization” (it’s chapter 11). You’ll get a great overview with a real-world example of what is possible (placeholder expansion, for example) and if you read a bit further, there is a word of warning:

“You might also want to further investigate the MessageFormat class of the Java platform. We saw its fundamentals in this chapter when we learned of the native Java support for parameterization of message texts and the autoformatting of date and numbers. As we indicated earlier, the MessageFormat class is much richer than we’ve had the time to demonstrate. We recommend consulting the Java documentation if you have further formatting needs as well. “

If you postpone this warning, you’re doomed. It’s not the fault of the book that their examples are the sunshine case (the best circumstances that might happen). The book tries to teach you the basics of Struts2, not its pitfalls.

A pitfall of Struts2 I18N

You will write a web application in Struts2, using the powerful built-in i18n, just to discover that some entries aren’t printed right. Let’s have an example i18n entry:

impossible.action.message=You can't do this

If you include this entry in a webpage using Struts2 i18n tags, you’ll find the apostrophe (unicode character \u0027) missing:

You cant do this

What happened? You didn’t read all about MessageFormat. The apostrophe is a special character for the MessageFormat parser, indicating regions of non-interpreted text (Quoted Strings). As there is only one apostrophe in our example, it just gets omitted and ignored. If there were two of them, both would be omitted and all expansion effort between them would be ceased.

How to overcome the pitfall

You’ll need to escape the apostrophe to have it show up. Here’s the paragraph of the MessageFormat APIDoc:

Within a String, "''" represents a single quote. A QuotedString can contain arbitrary characters except single quotes; the surrounding single quotes are removed. An UnquotedString can contain arbitrary characters except single quotes and left curly brackets. Thus, a string that should result in the formatted message “‘{0}'” can be written as "'''{'0}''" or "'''{0}'''".

That’s bad news. You have to tell your translators to double-type their apostrophes, else they won’t show up. But only the ones represented by \u0027, not the specialized ones of the higher unicode regions like “grave accent”  or “acute accent”. If you already have a large amount of translations, you need to check every apostrophe if it was meant to be printed or to control the MessageFormat parser.

The underlying principle

This unexpected behaviour of an otherwise powerful functionality is a common sign of a principle I call “stacked smartness doesn’t add up”. I will blog about the principle in the near future, so here’s just a short description: A powerful (smart) behaviour makes sense in the original use case, but when (re-)used in another layer of functionality, it becomes a burden, because strange side-effects need to be taken care of.