Paging with different DBs

Sometimes you cannot or do not want to use an object-relational mapping tool. When not using an OR-mapper like Hibernate or Oracle Toplink you have to deal with database specifics. One common case especially for web applications is limiting the result set to a number of items that fit nicely on a web. You then often want to allow the users to navigate between these “pages” of items aka “paging”.

This type of functionality became part of SQL only as of SQL2008 in the following form:
SELECT * FROM t WHERE ... ORDER BY c OFFSET start_row FETCH count ONLY

Since most popular database management systems (DBMSes) do not yet implement this syntax you have to implement paging in propriatory ways.

My experience with an Oracle DBMS and the frustrating and comparatively long time it took to find the correct™ solution inspired me to write this post. Now I want to present you the syntax for some widely used DBMSes which we encounter frequently in our projects.

  • MySQL, H2 and PostgreSQL (< 8.4 which will also implement the SQL2008 standard) use the same syntax:
    SELECT * FROM t WHERE ... ORDER BY c LIMIT count OFFSET start
  • Oracle is where the fun begins. There is actually no easy and correct way of doing this. So you will end up with a mess like:
    SELECT col1 FROM (SELECT col1, ROWNUM r FROM (SELECT col1 FROM table ORDER BY col1)) WHERE r BETWEEN start AND end
  • DB2 AFAIK uses the syntax proposed in SQL2008 but correct me if I am wrong because we do not yet work with DB2 databases.
  • As we did not need paging with MS SQLServer as of now I did not bother to look for a solution yet. Hints are very welcome.

With all solutions the ORDER BY clause is critical because SQL does not guarantee the order of the returned rows.

Wikipedia delivers some additional and special case information but does not really explain the general, real world case the specific DBMSes.

I hope that I raised some awareness about database specifics and perhaps saved you some time trying to find a solution the problem using your favorite DBMS.

Visualizations with Flare/Prefuse

Recently we were in need of visualizing lots of data in networks. We started with a Javascript based version using RaphaelJS. RaphaelJS uses SVG/VML as the underlying graphics API. Drawing lots of nodes and edges in different shapes and colors worked like a breeze. But we quickly got performance problems when animating the transitions between different states of the network. So we had to look to an alternative technology which is performant enough and pragmatic to use. Reading a lot about Flex and its promising animation capabilities we gave it a try.
Shortly after we stumbled upon Flare/Prefuse and the stunning demos convinced us. It is very easy to use and gives remarkable results in a short time. We integrated it into our web app but suddenly the visualizations weren’t drawn upon startup. Everything went fine when the visualization was ready before the Flex app was fully completed but when the data was visualized after that the edges and nodes weren’t shown. Debugging our app and reading the Flex Sprite API wasn’t much helpful. But one comment in a superclass of all the Flare sprites solved out problem:

Internally, the DirtySprite class maintains a static list of all "dirty" sprites, and redraws each sprite in the list when a Event.RENDER event is issued. Typically, this process is performed automatically. In a few cases, erratic behavior has been observed due to a Flash Player bug that results in RENDER events not being properly issued. As a fallback, the static renderDirty() method can be invoked to manually force each dirty sprite to be redrawn.

These few cases were the default behaviour of our app, so invoking renderDirty solved all our drawing issues. I found no blog entry or hint in the docs and the demos run perfectly since all data is there before the app is shown. So the lesson here is:

  • retest your assumptions (we thought the Flare sprites were all DataSprites and the next superclass is the Flex API Sprite and forgot about the DirtySprite)
  • besides the demos and tutorials/docs read the API docs carefully, often there are implementation/technology specific hints

Software Craftsman Project Priority Survey

Answers to a question of project priorities from the upcoming book “Apprenticeship Patterns”.

apprenticeship-patters-coverThere is an upcoming and very promising book title written by Dave Hoover and Adewale Oshineye called “Apprenticeship Patterns: Guidance For The Aspiring Software Craftsman”.  It will cover all the basic rules you’ll need to become a Software Craftsman. This is a rather new term to describe professional software developers, eventually leading to the Software Craftsmanship Manifesto. The Manifesto itself reads like an addition to the Agile Manifesto:

As aspiring Software Craftsmen we are raising the bar of professional software development by practicing it and helping others learn the craft. Through this work we have come to value:

  • Not only working software, but also well-crafted software
  • Not only responding to change, but also steadily adding value
  • Not only individuals and interactions, but also a community of professionals
  • Not only customer collaboration,but also productive partnerships

That is, in pursuit of the items on the left we have found the items on the right to be indispensable.

© 2009, the undersigned. this statement may be freely copied in any form, but only in its entirety through this notice.

A very good question

When i read the blog of “Apprenticeship Patterns“, i noticed a very good question about project priorities:

Rank the following 3 project attributes in order of importance and explain why.

  • Test Coverage
  • Timely Delivery
  • Code Quality

This question really got me hooked, because there is no single valid answer, only personal statements about values.

An informal survey

I’m in the lucky position of meeting a lot of senior developers and a great number of software engineering students. So I instantly decided to perform a survey on this question and watch out for emerging answer patterns.

I gave each project attribute an unique letter, C for “Test Coverage”, D for “Timely Delivery” and Q for “Code Quality”. There are six possible answers, here are their rates in the survey (when 58 persons gave their answers):stats-all1

  • CDQ: 7 percent
  • CQD: 9 percent
  • DCQ: 5 percent
  • DQC: 7 percent
  • QCD: 41 percent
  • QDC: 31 percent

The vast majority of developers stated Code Quality as their highest goal. This isn’t very surprising to me, as most developers take pride in writing high quality code.

Comparing the answers

But what about the answers of only senior developers? Lets have a look at the numbers without student answers:stats-senior1

  • CDQ: 7 percent
  • CQD: 14 percent
  • DCQ: 7 percent
  • DQC: 14 percent
  • QCD: 21 percent
  • QDC: 36 percent

The big pattern still applies: Code Quality first. It’s amazing to see the other attributes gaining importance, though. To me, that’s a sign that code-centric thinking is one pattern of apprenticeship.

What’s not in the numbers

When i held the survey, the relevant group of people was gathered together, so a discussion of the results arose every time.  But the discussions followed different patterns:

  • The teams (of senior developers) gave very distinct answers while working on the same project. The answers were driven by personal conviction rather than project necessities.
  • The courses (of students) gave more similar answers while having a wide variety of backgrounds. The answers were mostly explained with current project necessities (like security-critical systems as reason for Test Coverage being most important).

When I have to compare the two groups, I tend to say that younger developers are more driven by extrinsic demands while more experienced developers act on their own internal values.

Our duty as Software Craftsman

In conclusion, I see a duty for experienced developers: to share their experience. Leading a discussion about “Team Values” at your current project is the least you can do. Helping others to develop their own set of internal values, even if it isn’t yours, seems crucial to me.

The upcoming “Apprenticeship Patterns” book and the brand new “97 Things Every Software Architect Should Know” are perfect starting points for this.

Make friends with your compiler

Suppose you are a C++ programmer on a project and you have the best intentions to write really good code. The one fellow that you better make friends with is your compiler. He will support you in your efforts whenever he can. Unless you don’t let him. One sure way to reject his help is to switch off all compiler warnings. I know it should be well-known by now that compiling at high warning levels is something to do always and anytime but it seems that many people just don’t do it.
Taking g++ as example, high warning levels do not mean just having “-Wall” switched on. Even if its name suggests otherwise, “-Wall” is just the minimum there. If you just spend like 5 minutes or so to look at the man page of g++ you find many many more helpful and reasonable -W… switches. For example (g++-4.3.2):


-Wctor-dtor-privacy: Warn when a class seems unusable because all the constructors or destructors in that class are private, and it has neither friends nor public static member functions.

Cool stuff! Let’s what else is there:


-Woverloaded-virtual: Warn when a function declaration hides virtual functions from a base class. Example:

class Base
{
public:
virtual void myFunction();
};

class Subclass : public Base
{
public:
void myFunction() const;
};

I would certainly like to be warned about that, but may be that’s just me.


-Weffc++: Warn about violations of the following style guidelines from Scott Meyers’ Effective C++ book

This is certainly one of the most “effective” weapons in your fight for bug-free software. It causes the compiler to spit out warnings about code issues that can lead to subtle and hard-to-find bugs but also about things that are considered good programming practice.

So suppose you read the g++ man page, you enabled all warning switches additional to “-Wall” that seem reasonable to you and you plan to compile your project cleanly without warnings. Unfortunately, chances are quite high that your effort will be instantly thwarted by third-party libraries that your code includes. Because even if your code is clean and shiny, header files of third-pary library may be not. Especially with “-Weffc++” this could result in a massive amount of warning messages in code that you have no control of. Even with the otherwise powerful, easy-to-use and supposedly mature Qt library you run into that problem. Compiling code that includes Qt headers like <QComboBox>with “-Weffc++” switched on is just unbearable.

Leaving aside the fact that my confidence in Qt has declined considerably since I noticed this, the question remains what to do ignore the shortcomings of other peoples code. With GCC you can for example add pragmas around offending includes as desribed here. Or you can create wrapper headers for third-party includes that contain


#pragma GCC system_header

AFAIK Microsoft’s compilers have similar pragmas:


#pragma warning(push, 1)
#include
#include
#pragma warning(pop)

Warning switches are a powerful tool to increase your code quality. You just have to use them!

Drawing your background with Javascript

Having complex backgrounds (like radial gradients) or even dynamic backgrounds in a web page is not an easy task using only images. Javascript helps here and the SVG/VML support in modern browsers makes drawing a reality. Looking around in the Javascript library world there are some which have APIs for crossbrowser drawing or charting. But if you are looking for a small and easy to learn on you have to search. We found raphaeljs, a project which started as a 20%er at Atlassian. It has an API which is close to SVG but also supports non SVG browsers like IE. How to make animations and complex drawings is easy to learn. The examples give a hint of what is possible.
We needed a radial gradient to have a smooth background, looking at the docs and the SVG spec, it was a four liner:

var Paper = Raphael("canvas", width, height);
var b = Paper.circle(width / 2, height / 2, Math.max(width, height));
var gradient = {type: "radial", dots: [{color: "#FFFFFF"}, {color: "#D5D5D5"}], vector: [0, 0, 0, "100%"]};
b.attr({"gradient": gradient});

About String Concatenation in Java or “don’t fear the +”

When it comes to string concatenation in Java many people have almost religious views about performance and style. Sadly, there are some misconceptions and misinformation especially about the performance bits. Many people think that concatenating many strings using + means expensive string copying each time and is thus slow as hell which is mostly wrong.

Justin Lee has a nice writeup of the most prominent concatenation options. But imho he misses out some things and his benchmark is a bit oversimplified although it does tell a true story. I assume that he followed at least the basic rules for performance measurement as his results suggest.

Now I want to try to clarify some points I think he missed and I find important:

  • Concatenation using + in one statement is actually compiled to the use of StringBuilder (at least for Sun Java6 compilers, where I checked it in the debugger, try it yourself!). So it’s no surprise that there is no difference between these two options in Justin’s benchmark.
  • It should be clear that the format variants have some overhead because they actually do more than just concatenate strings. There is at least some string parsing and copying involved so that these methods should be used for the cases where for example parameter reordering (think I18N) is needed or readability suffers using normal concatenation.
  • You have to pay attention when using + concatenation over the course of multiple statements because it then involves string copying. Consider the following code: Critical String Concatenation Here it really does make a difference which option you choose. The StringBuilder will perform far better for higher loop counts. We had a real world issue back some time with that when we used the Simple web framework for serving directory listing of several thousand files. The HTML-code was generated using a concatenatePlus()-style method and took like 40(!) seconds. After changing the code to the StringBuilder variant the page was served in sub-second time.

Whether you use + or StringBuilder is mostly a matter of taste and readability in many cases. When your string concatenation gets more complex you should really consider using StringBuilder as it is the safe bet.

On teaching software engineering

Be sure to include current topics in your lectures when teaching software engineering. Here are some hints.

overheadIn my rare spare time, I hold lectures on software engineering at the University of Cooperative Education in Karlsruhe. The topics range from evergreens like UML to modern subjects like aspect oriented programming (AOP) or Test Driven Development (TDD).

One thing I observe is that students don’t have difficulty separating the old topics from the current ones, even if they hear both of them for the first time. It seems that subject matter ages by itself, just like source code does. So, I’m constantly searching for new topics to include in the lectures, replacing the oldest ones.

Three things to include in your lectures

Some months ago, I read a very good blog post written by Alan Skorkin, titled “3 Things They Should Have Taught In My Computer Science Degree”. Alan covers three points:

  1. Open Source Development
  2. An Agile Process (e.g. XP, Scrum)
  3. Corporate Politics/Building Relationships

The idea of missed opportunities to tell some fundamentals to my students struck me. I compared my presentations to the list, finding the leading two topics covered to a great extent. The last one, corporate politics, is a bit off-topic for a technical lecture. But nevertheless, it’s too important to omit completely, so I already had included some Tom DeMarco lessons in my presentations. Perhaps I can build this part up a bit in the future.

What they should have taught me

Soon afterwards, I though about things my lecturers missed during my study. Here’s the list with only two points in addition to Alan’s list:

  • Age and “maturity” of topic: When I was a student, I quickly identified old topics, like my students do nowadays. What I couldn’t tell was if a topic was mature (a classic) or just deprecated. It would have helped to announce that a topic was necessary, but of little actual relevance in modern software development craftmanship. Or that a topic is academical news, but yet unheard of in the industry and lacking wide-spread acceptance. Both extremes were blended together in the presentations, creating an unique mixture of antiquated and futuristic approaches. This is a common problem of Advanced Beginners in the Dreyfus model.
  • Ergonomics and Effectiveness: I still can’t believe I didn’t hear a word about proper workplace setup from my teachers. I had courses teaching me how to learn, but not a single presentation that taught me how to work. This topic ranges from the right chair over lighting to screen size and quantity and could be skimmed over in less than an hour. But it doesn’t stop with the hardware. Entire books like Neal Ford’s “The Productive Programmer” cover the software side of effective workplace setup. And even further, the minimal set of tools a software developer should use (e.g. IDE, SCM, CI, issue tracker, Wiki) wasn’t even mentioned.

I hope to provide all these topics and information to my students in a recognizable (and rememberable) manner. They deserve to learn about the latest achievements in software engineering. Otherwise you aren’t prepared to work in an industry changing fundamentally every five to ten years. Of course, hearing about the classic stuff is necessary, too.

Give me feedback. What are your missed topics during apprenticeship, study or even work?

Update: In case you can’t visit my lectures but want to know a bit about ergonomics, I’ve written two blog articles on this topic:

Lightweight dependency management

Managing project dependencies without maven or ant ivy, using a custom ant task to ensure classpath orthogonality.

Java’s classpath is a powerful concept – when used appropriate. As your project grows larger in terms of code and people, it gets harder to ensure that your classpath is correct. A great danger arises from JAR files containing different versions of the same resource. You might end up running different code than you think, leading to strange effects. If you build your classpath using wildcards, you can’t even control the order your JAR files are loaded.

Managing dependencies

To avoid the issues mentioned above, you need to manage your project dependencies. It’s a common practice to implement the build process of the project using maven or ant ivy. Both tools provide dependency mangement by declaration. But at a high cost. Especially maven has received some malice lately, criticizing its steep learning curve and complexity.

Scratching the biggest itch

We decided to try a different approach to dependency management, tackling only our biggest concern: The duplication of classpath resources. We take care of the scope of a third-party library, put required JARs in the repository (to us, third party binary artifacts are part of the project source) and update manually. The one thing we cannot assure manually is that every resource is unique. Sometimes, the same class is included in different JARs, as it seems to be common practice among java web frameworks.

Ant to the rescue

Thus, I wrote a custom ant task that, given the classpath, checks for duplicate entries. If it finds one, it lists the culprits and optionally aborts the build process. Included in our continuous integration system, it gets run every time somebody performs a change. You can’t forget to delete an old version of a library or check in the same library twice without breaking the build now.

Our ClasspathCollisionCheckTask

I provide this task here, without any warranty. The source code is included in the JAR alongside the classes, if you want to know what it does exactly.

Assuming you already know how to use custom tasks within an ant build script, here’s only a short usage description.

Import the custom task:

<taskdef
    name="check.collision"
    classname="com.schneide.internal.anttask.ClasspathCollisionCheckTask"
    classpath="${customtasks.library.directory}/schneidetasks.jar"
/>

Next, use it on your classpath:

<check.collision verbose="true" failOnCollision="true">
    <path>
        <fileset dir="${classpath.library.directory}">
            <include name="**/*.jar"/>
        </fileset>
        <fileset dir="${internal.library.directory}">
            <include name="**/*.jar"/>
        </fileset>
    </path>
</check.collision>

The task scans the whole path you give it and reports any collision it detects. You will see the warnings in your build log.

If the failOnCollision parameter is set to true (optional, defaults to false), the build will abort after a collision. If you want to have debug information, set the verbose parameter to true (optional, defaults to false).

Conclusion

If you manage your project dependencies manually, you might find our custom ant task useful. If you use maven or ant ivy, you already have this functionality in your build process.

Feedback

I’m very interested in hearing your opinion on the task or about your way of handling dependencies. Leave us a comment.

Industry Standard C++

The other day I was browsing through the C++ API code of a third-party library. I was not much surprised to see stuff like

#define MAX(a, b) ( (a) >= (b) ? (a) : (b))
#define MIN(a, b) ( (a) <= (b) ? (a) : (b))

because despite the fact that std::min, std::max together with the rest of the C++ standard library is around for quite a while now, you still come across old fashioned code like above frequently. But things got worse:

#define FALSE 0
#define TRUE 1

and later:

...
bool someVariable = TRUE;

As if they learned only half the story about the bool type in C++. But there was more to come:

class ListItem
{
   ListItem* next;
   ListItem* previous;
   ...
};

class List : private ListItem
{
...
};

Yes, that’s right, the API guys created their own linked-list implementation. And a pretty weird one, too, mixing templates with void* pointers to hold the contents. Now, why on earth would you do that when you could just use std::list or std::vector? Makes you wonder about the quality of the rest of the code. Especially with C++ where there are so many little pitfalls and details which can burn you. Hey, if you have no clue about the very basics of a language, leave it alone!

Unfortunately, the above example is not exceptional in industry software. It seems that the C++ world these days is actually split into two worlds. In one, people like Andrei Alexandrescu write great books about Modern C++ design, Scott Meyers gives talks about Effective C++ and the boost guys introduce the next library using even more creative operator overloading that in the spirit library (which is pretty cool stuff, btw).

In the other world, you could easily call it industry reality, people barely know the STL, don’t use templates at all, or fall for misleading and dangerous c++ features like the throw() clause in method signatures. Or they ban certain c++ features because they are supposedly not easy to understand for the new guy on the project or are less readable in general. Take for example the Google C++ Style Guide. They don’t even allow exceptions, or the use of std::auto_ptr. Their take on the boost library is that “some of the libraries encourage … an excessively “functional” style of programming”. What exactly is bad about piece of functional programming used as the right tool in the right place? And what communicates ownership issues better than e.g. returning a heap allocated object using a std::auto_ptr?

The no-exceptions rule is also only partly understandable. Sure enough, exceptions increase code complexity in C++ more than in other languages (read Items 18 and 19 of Herb Sutter’s Exceptional C++ as an eye-opener. Or look here). But IMHO their advantages still outweigh their downsides.

With the upcoming new C++0X standard my guess is that the situation will not get any better, to put it mildly. Most likely, things like type inference with the new auto keyword will sell big because they save typing effort. Same thing with the long overdue feature of constructor delegation. But why would people who find functional programming less readable start to use lambda functions? As little known as the explicit keyword is now, how many people will know about or actually use the new “= delete” keyword, let alone “= default“? Maybe I’m a little too pessimistic here but I will certainly put a mark in my calender on the day I encounter the first concept definition in some piece of industry C++ software.

Update: Concepts have been removed from C++0X so that mark in my calender will not come any time soon…

Creative Wordle usage

Wordle used to summarize conference programs and analyze java projects.

Many of you may have stumbled over Wordle like I did some time ago. It is a nice little tool (implemented as a Java applet) that creates nice word clouds out of some arbitrary text pasted directly into the browser or provided by an URL. Since then I have found some nice, interesting and creative uses for it.

  • schneideblogwordleSchneide Blog Wordle First a Wordle-cloud of our blog frontpage. The left image shows that we are currently talking a lot about projects and our approach to blogging. Compare that with the cloud from a few weeks ago where listener structures and memory management were hot topics.
  • The guys over at EclipseSource ran Wordle over the EclipseCon program to give a quick overview what this conference is all about.
  • Daan analyzed the class naming of popular OpenSource projects and put the very interesting results on his blog.

I quickly hacked something similar together and ran it over two of our projects. The interesting thing is that you can actually get an impression what the projects are about. Let’s take a look at it:

NPA (Nano Particle Analyzer)

NPA Wordle
This seems to be a project where everything is about measuring something. We can see that there is need for calibration and that energy and data play a major role here. We can even spot a laser (try to find it!) out there which is an important part of the whole system.

Ramses

Ramses Wordle
This project seems to be very abstract and generic but there are some concrete pointers to what’s going on here too. It works together with some spectrometry hardware via Genie2k, with a Delphin box and some camera. There seems to be an appointment management integrated, i18n support ready and some more obscure things like fesas.

If you have other cool ideas how to use Wordle I would be glad to hear about them.