Software Development – Page 47

Performance considerations with network requests, database queries and other IO

Todays processors, memory and other sub systems are wicked fast. Nevertheless, many applications feel sluggish. In my experience this is true for client and server applications and not limited to specific scenarios. So the question is, why?

Many developer rush straight into optimizing their code to save CPU cycles. Most of the time thats not the real problem. The most important rule of performance optimisation stays true: Measure first!

Often times you will find your application waiting the greater part of its running time waiting for input/output (IO). Common sources for IO are database queries, network/http request and file system operations. Many developers are aware of these facts but we see this problem very often whether in inhouse or on-site customer projects.

Profile the unresponsive/slow parts of your application and check especially for hidden excess IO, here some Java examples:

The innocently looking method File.isFile() typically does a seek on the harddrive on each call. Using it an a loop over several dozens of files will slow you down massively.
The java.net.URL class does network requests for hashCode() and equals()! Never use it in collections, especially HashMaps. It is better to use the java.net.URI for managing the resource location and only convert to URL when needed.
Using an object-relational-mapping (ORM) tool like hibernate most people default to lazy loading. If your usage pattern requires to load the referenced objects all or most of the time you will get many additional database requests, at least one for each accessed association. In such cases it is most likely better to use eager fetching because the network and query overhead is reduced drastically and the data has to be loaded anyway.

So if you have performance and/or responsiveness problems, keep an eye on your IO pattern and optimize the algorithms to reduce IO. Usually it will help you much more than micro optimisation of your application code.

C/C++ pitfalls for Java developers

Java and C/C++ have concepts that are similar enough to get an inexperienced Java developer confused. Here I want to show you some mistakes I found or done myself.

Type conversion rules

A well known and often used pattern is simultaneous assignment of an expression to a variable and its comparison with another value.

if((a = b) != c) {
  // do something
}

In both Java and C would this code would have the same behaviour. The problem arises when a parenthesis is misplaced, resulting in an assignment of a boolean expression to a:

if((a = b != c)) {
  // do something
}

Since a boolean expression can be converted to an integer and the assignment expression is contained in a parenthesis, the compiler may even not ensue a warning. For Java this code isn’t legal anymore while perfectly fine in C. The error strikes most hard when the result of the comparison, namely 0 or 1, is a valid value. A good example is a call to socket(), that may return 0 as a file descriptor for stdin. The probably simplest solution to this problem is separating the assignment from comparison – even at the cost of a temporary variable.

Memory management

The behaviour of standard containers is sometimes combined with incomplete/misunderstood behaviour of pointers. An example:

class A {}
class B
{
  public:
  void foo()
  {
    std::vector<A*> theContainer;
    for(int i = 0; i < 100; i++) {
      theContainer.push_back(new A());
    }
  }
}

Every call to foo() would result in a memory leak due to not deleted A’s. When the vector is destructed, a destructor of each contained item is called. For pointers and other scalar types this is a no-op, resulting in missing call to the destructor of pointed to class. A solution to this problem could be the use of smart pointers wrapping the actual pointers or an explicit destruction of pointed to objects before the vector goes out of scope.

Deterministic destruction

Coming from language with automatic memory management there is some uncertainty when it comes to the order of destruction when multiple objects leave the scope. Consider this example:

void foo()
{
  std::lock_guard<std::mutex> lock(mutex);
  std::ifstream input ....

  //some operations

  //??
}

In this case the stream is destructed before the lock, guaranteeing that the stream is destructed before the execution reaches the destructor of the lock. This pattern is exploited by the RAII.

Exception handling

This is my personal favourite. Here is a little quiz: what is printed to the screen?

try {
  throw new SomeException();
} catch (SomeException& e) {
  std::cout << "first" << std::endl;
} catch (...) {
  std::cout << "second" << std::endl;
}

As some may already have guessed from the question: the answer is “second”. To make the code work, the reference in the catch block has to be replaced by the pointer. Another, and probably better alternative is to create the exception on the stack. The reason behind this mistake is that in java any thrown object is constructed with new. Explicit hints or experience are required to avoid such flawed exception handling.

Award your tests

If you want to know more about the gains you get from your tests, you might consider to tag them with their achievements (or failures). Here are some examples of typical awards.

We can probably agree on automated tests decreasing the likelihood of defects in a software. And we all have observed that writing automated tests can mean serious effort, especially if the software under test wasn’t designed with tests in mind. So we can say that automated tests are an investment in low defect rates, among other things. But like any other investment, it isn’t an uniform function of spending money to gain returns, but a portfolio of better and worse little investments. Each test can be seen as a separate micro-investment that should bring a separate micro-return.

A portfolio of micro-investments

We know from experience that this isn’t true. There are tests that have avoided a lot of embarrassment and there are tests that are a constant source of annoyance. And we aren’t very good at predicting which test will turn out which way, because we would avoid writing the frail ones otherwise. But there is a tool that can help us: statistics. In your financial investments, you keep track of your stock prices with elaborate charts and tables. How do you keep track of your test investment?

Track the investments

Can you say which tests broke most often? Which tests prohibited real bugs and which just cost you additional effort without gain? Do you know your most erratic tests? Given a decent amount of tests, it’s not possible to keep track of these characteristics manually. But what if you could look at the test and see its merits and evil deeds of the past? You don’t have a overall statistic, but an individual answer for every test you examine. That would be a first step. And it is relatively easy to make: Award your tests.

Reward your angels

There is the notion of “putting angels on your shoulders” when you write tests. You should listen to your angels. But you should reward those angels that are really helpful and restrain the ones that keep pointing out nonsense. You should just put a medal on your angel test whenever it was helpful and a warning badge whenever it misled you. After a while, you will see that your highest decorated tests are the really good investments. Let’s take a look at typical awards:

@bug: This test stands guard against a certain bug that once lingered in your system and is now fixed. The test didn’t actually catch the bug, but was introduced afterwards to protect against regression. The award is normally given with some kind of bug issue number, like @bug(SYS-132). It’s a relatively low award, because the test doesn’t have to do anything heroic yet, but it is a good indicator for fellow developers to trust this test if it cries wolf: It might actually warn you against recurring regression.
@regression: If a test breaks and points out an actual regression in the system (something worked before and isn’t working any more), it was a good investment. You want to immediately award it for being helpful. This is a very good protection against ever modifying or deleting this test. A @regression veteran should be listened to if it breaks. If a test collects this award several times, it might hint you to a poorly designed part in your code.
@lifesaver: You don’t give out this award easily. It’s the highest award a test can get. If a test points out a regression that would lead to a mission critical failure in production, this test just saved your ass. And if you weren’t even aware that your change would cause such trouble, the test was the best investment in code you can possibly make. Praise it! And if you happen to break a @lifesaver test with a change, chances are you’ve just done something very stupid (again).

These are only three examples of awards you might want to introduce into your project. But there are also some common warning badges you can hand out to troublesome tests:

#erratic: If a test fails with no apparent reason or connection to your recent change, you might label it as erratic (or hysteric, the slightly more troublesome variant). An erratic test loses its credibility to point to real problems and will turn into a burden. This was probably a bad investment. If a test collects several erratic badges, you should consider to put it into quarantine, because unreliable tests tend to spread ignorance about any sort of test failure very fast. Quarantining your tests is fairly easy: you move it into a separate test harness. Your primary test harness has the ability to break the build when unstable, but your secondary test harness just reports failures without breaking the build. You degrade your erratic tests to the reserve guard.
#fragile: If you need to make changes in a test because the production code changed slightly, you are temporarily compromising your safety net. A typical case of a fragile test would be a mock object based test that doesn’t mock a new method call or expects a call that doesn’t happen any more. The test reports a failure, but there isn’t a problem really (well, there’s some violation of the open/closed principle). You have to fix the tests instead of your production code. If you give out this badge often, you have a test harness with high maintenance cost. If your investment is still positive is you decision, but you can now quantify on the stability aspect of your tests compared to the stability aspect of your production code.
#cryptic: If, for whatever reason, you happen to read across a test and then read it again, and again just to understand what’s going on, you might hand out this warning badge. The test isn’t meant to be read, it’s written to be a mystery. In my book, that’s bad style and should be flagged as such. If a test collects several cryptic badges, that means that several developers spent time to grasp this little piece of code. If the test hasn’t earned awards yet, you’re looking at a typical waste of time. It might be poorly invested time.

Append the meta-data

If you start to label and award your tests, don’t forget to add appropriate meta-data like issue numbers, links to the project wiki and, most important, the date of bestowal. Every award fades with time, so you should be able to instantly see when they were added. A test that was quite erratic in its youth can be a great lifesaver right now. If you use a proper version control system, this information can be queried afterwards, but with greater effort. I think its wise to put all relevant information where it can be accessed immediately.

And where to put it exactly? The documentation comment (JavaDoc, Doxygen) above a test is perfectly suited for this kind of information. Have one line for every type of award and list the meta-data of all occurrences in this line. Or, if you want to give a comment with each award, you might use one line per award. But try to make the awards and the test code inseparable and readable as one piece of information, so that your awards survive refactoring and other editing efforts.

Reap your profits

After some time, you might want to visualize and rank your test awards. There’s no magic here, just grep over your test code a few times. What you make of your findings is up to you. It certainly helps to evaluate the investments that go into your tests.

Do you use a similar tagging scheme for your tests or your production code? Let us know in the comment section!

Grails / GORM performance tuning tips

Every situation and every code is different but here are some pitfalls that can cost performance and tips you can try to improve performance in Grails / GORM

First things first: never optimize without measuring. Even more so with Grails there are many layers involved when running code: the code itself, the compiler optimized version, the Grails library stack, hotspot, the Java VM, the operating system, the C libraries, the CPU… With this many layers and even more possibilities you shouldn’t guess where you can improve the performance.

Measuring the performance

So how do you measure code? If you have a profiler like JProfiler you can use it to measure different aspects of your code like CPU utilization, hotspots, JDBC query performance, Hibernate, etc. But even without a decent profiler some custom code snippets can go a long way. Sometimes we use dedicated methods for measuring the runtime:

class Measurement {
  public static void runs(String opertationName, Closure toMeasure) {
    long start = System.nanoTime()
    toMeasure.call()
    long end = System.nanoTime()
    println("Operation ${operationName} took ${(end - start) / 1E6} ms")
  }
}

or you can even show the growth in the Hibernate persistence context:

class Measurement {
  public static void grown(String opertationName, Closure toMeasure) {
    PersistenceContext pc = sessionFactory.currentSession.persistenceContext
    Map before = numberOfInstancesPerClass(pc)
    toMeasure.call()
    Map after = numberOfInstancesPerClass(pc)
    println "The operation ${operationName} has grown the persistence context: ${differenceOf(after, before)}"
  }
}

Improving the performance

So when you found your bad performing code, what can you do about it? Every situation and every code is different but here are some pitfalls that can cost performance and tips you can try to improve performance:

GORM hotspots

Performance problems with GORM can be in different areas. A good rule of thumb is to reduce the number of queries hitting the database. This can be achieved by combining results with outer join, eager fetching associations or improving caching. Another hotspot can be long running operations which you can improve via creating indices on the database but first analyze the query with database specific tools like ANALYZE.
Also a typical problem can be a large persistence context. Why is this a problem? The default flush mode in Hibernate and hence GORM is auto which means before any query the persistence context is flushed. Flushing means Hibernate checks every property of every instance if it has changed. The larger the persistence context the more work to do. One option would be to clear the session periodically after a flush but this could decrease the performance because once loaded and therefore cached instances need to be reloaded from the database.
Another option is to identify the parts of your code which only need read access on the instances. Here you can use a stateless session or in Grails you can use the Spring annotation @Transactional(readOnly = true). It can be beneficial for the performance to separate read only and write access to the database. You could also experiment with the flush mode but beware that this can lead to wrong query results.

The thin line: where to stop?

If you measure and improve you can get big and small improvements. The problem is to decide which of these small ones change the code in a good or minimal way. It is a trade off between performance and code design as some performance improvements can worsen the code quality. Another cup of tea left untouched in this discussion is scalability. Whereas performance concentrates of the actual data and the current situation, scalability looks on the performance of the system when the data increases. Some performance improvements can worsen scalability. As with performance: measure, measure, measure.

Testing Java with Grails 2.2

We have some projects that consist of both java and groovy classes. If you don’t pay attention, you can have a nice WTF moment.

Let us look at the following fictive example. You want to test a static method of a java class “BlogAction”. This method should tell you whether a user can trigger a delete action depending on a configuration property.

The classes under test:

public class BlogAction {
    public static boolean isDeletePossible() {
        return ConfigurationOption.allowsDeletion();
    }
}

class ConfigurationOption {
    static boolean allowsDeletion() {
        // code of the Option class omitted, here it always returns false
        return Option.isEnabled('userCanDelete');
    }
}

In our test we mock the method of ConfigurationOption to return some special value and test for it:

@TestMixin([GrailsUnitTestMixin])
class BlogActionTest {
    @Test
    void postCanBeDeletedWhenOptionIsSet() {
        def option = mockFor(ConfigurationOption)
        option.demand.static.allowsDeletion(0..(Integer.MAX_VALUE-1)) { -> true }

        assertTrue(BlogAction.isDeletePossible())
    }
}

As result, the test runner greets us with a nice message:

junit.framework.AssertionFailedError
    at junit.framework.Assert.fail(Assert.java:48)
    at junit.framework.Assert.assertTrue(Assert.java:20)
    at junit.framework.Assert.assertTrue(Assert.java:27)
    at junit.framework.Assert$assertTrue.callStatic(Unknown Source)
    ...
    at BlogActionTest.postCanBeDeletedWhenOptionIsSet(BlogActionTest.groovy:21)
    ...

Why? There is not much code to mock, what is missing? An additional assert statement adds clarity:

assertTrue(ConfigurationOption.allowsDeletion())

The static method still returns false! The metaclass “magic” provided by mockFor() is not used by my java class. BlogAction simply ignores it and calls the allowsDeletion method directly. But there is a solution: we can mock the call to the “Option” class.

@Test
void postCanBeDeletedWhenOptionIsSet() {
    def option = mockFor(Option)
    option.demand.static.isEnabled(0..(Integer.MAX_VALUE-1)) { String propertyName -> true }

    assertTrue(BlogAction.isDeletePossible())
}

Lessons learned: The more happens behind the scenes, the more important becomes the context of execution.

You probably forget too much, too soon and way too definite

Your stored data is probably worth a lot. To rule out accidental removal, you should interdict the delete operation for your application. Here’s why and how you can implement it.

If you felt spoken to by the blog title, you might relax a bit: I didn’t mean you, but your application. And I don’t suggest that your application forgets things, but rather removes them deliberately. My point is that it shouldn’t be able to do so.

A disaster waiting to happen

Try to imagine a child that is given a sharp scissors to play with by its parents. It runs around the house, scissors in hand, cutting away things here and there. Inevitably, as mandated by Murphy’s law, it will stumble and fall, probably hurting itself in the process. This scenario is a disaster waiting to happen. It is a perfect analogy of your application as long as it is able to perform the delete operation.

A safe environment

Now imagine an application that is forbidden to delete data. The database user used by the program is forbidden to issue the SQL DELETE command. This is like the child in the analogy before, but with the scissors taken away. It will leave a mess behind while playing that needs an adult to clean up periodically and it will fall down, but it won’t stab itself. If you can run your application in such a restricted environment, you can guarantee a “no-vanish” data safety: No data element that is stored will disappear ever.

Data safety

In case you wonder, this isn’t the maximum data safety level you can (and perhaps should) have. There is still the danger of accidental alteration, where existing data is replaced or overwritten by other data. To achieve the highest “no-loss” data safety level, you need to have a journaling database system that tracks every change ever made in an ever-growing transaction log. If that sounds just like a version control system to you, it’s probably because it essentially will be such a thing.
But “no-vanish” data safety is the first and most important step on the data safety ladder. And it’s easy to accomplish if you incorporate it into your system right from the start, implementing everything around the concept that no deletion will occur on behalf of the system.

Why data safety?

But why should you attempt to adhere to such a restriction? The short answer is: because the data in your system is worth it. We recently determined the immediate monetary worth of primary data entries in one of our systems and found out that every entry is worth several thousand euros. And we have several hundred entries in this system alone. So accidentally deleting two or three entries in this system is equivalent to wrecking your car. Who wouldn’t buy a car when the manufacturer guarantees that wreckages cannot happen, by design? That’s what data safety tries to achieve: Giving a guarantee that no matter how badly the developers wreck their code, the data will not be affected (at least to a degree, depending on the safety level).

No deletion

The best way to give a guarantee and hold onto it is to eliminate the root cause of all risks. In our digital world, this is surprisingly easy to accomplish in theory: If you don’t want to lose data (by accidental removal), prohibit usage of the delete operation on the lowest layer (probably the database). If your developers still try to delete things, they only get some kind of runtime error and their application will likely crash, but the data remains intact.

Implementation using a RDBMS

If you are using a relational database system, you should be aware that “no-vanish” safety comes with a cost. Every time you fetch a list of something from your database, you need to add the constraint that the result must only contain “non-obsolete” entries. Every row in your main tables will adopt some sort of “obsolete” column, housing a boolean flag that indicates that this row was marked as deleted by the application. Remember, you cannot delete a row, you can only mark it deleted using your own mechanisms.
The tables in your database holding “derived data”, like join tables or data only referenced by foreign keys, don’t necessarily need the obsolete flag, but will clutter up over time. That’s not a problem as long as nobody thinks that all entries in these tables are “living data”. The entries that are referenced by obsolete entries in the main tables will simply be forgotten because they are inaccessible by normal means of data retrieval.

The dedicated hitman

Using this approach, your database size will grow constantly and never shrink. There will come the moment when you really want to compact the whole thing and weed out the unused data. Don’t give the delete right back to your application’s database user! You basically don’t trust your application with this (remember the child analogy). You want this job done by a professional. You need a dedicated hitman. Create a second database user that doesn’t has the right to alter the data, but can delete it. Now run a separate job (another application) on your database within this new context and remove everything you want removed. The key here is to separate your normal application and the removal task as far as possible to prohibit accidental usage. If you think you’ve heard this concept somewhere before: It basically boils down to a “garbage collector”. The term “hitman” is just a dramatization of the bleak reality, trying to remind you to be very careful what data you really want to assasinate.

Implementation using a graph database

If this seems like a lot of effort to you, perhaps the approach used by graph databases suits you better. In a graph database, you group your data using “graph connections” or “edges”. If you have a node “persons” in your database, every person node in the database will be associated to this node using such a connection. If you want to remove a person node (without throwing the data away), you remove the edge to the “persons” node and add a new edge to the “deleted_persons” node. You can probably see how this will be easier to handle in your application code than ensuring that the obsolete flag is considered everywhere.

Conclusion

This isn’t a bashing of relational database systems, and it isn’t a praise of graph databases. In fact, the concept of “no deletion” is agnostic towards your actual persistence technology. It’s a requirement to ensure some basic data safety level and a great way to guarantee to your customer that his business assets are safe with your system.

If you have thoughts on this topic, don’t hesitate to share them!

Impressions from Java Forum Stuttgart 2013

Java Forum Stuttgart(JFS) is a yearly java focused conference primarily visited by developers. The conference lasts for a day, offering 45 minute long talks plus some time in between for discussions. This was my second visit and I am happy to tell you about my impressions.

vert.x: Polyglot – modular – Asynchron

Speaker: Eberhard Wolff from adesso AG

This was my first stop. The topic seemed interesting, because at Softwareschneiderei we are using a mix of different languages and frameworks for our projects. To learn about a new Framework was a nice thing. vert.x runs on a Java VM and can be written in a mixable variety of languages like Java, JavaScript or Groovy. The main points of the presentation were the examples in Java and JavaScript showing the asynchronous features and communication between different components. Judging by the function set and the questions asked, this seems to be a framework that provides java developers a smooth transition from the synchronous world to the event based asynchronous world. Compared to NodeJS, vert.x is currently a small project containing only a handful of modules.

Java 8 innovations

Speaker: Michael Wiedeking from MATHEMA Software GmbH

This one is somewhat special. After thousands of blog entries, presentations and whatever there is only a marginal chance to get fresh news about java features. The speaker did know this and spiced the presentation up with some jokes, while showing ever increasing complex code samples. Exactly what I hoped for: reading code and having fun.

Statical code analysis as a quality measurement?

Speaker: Dr. Karl-Heinz Wichert from iteratec GmbH

We are using grails in some of our projects. As any other highly dynamic language, grails suffers from its strength: weak type system. Without acceptance test support it is hard to verify whether a given code piece is correct or not. My hope was to hear something about new trends in statical analysis that allow me to detect simple errors faster, without firing up the system. Biggest mismatch between imagination and reality that day. The speaker presented the reasons why to use statical code analysys and its current shortcomings like the inability to verify that comments match the code commented or the inability to detect complex implementations of a simple algorithm. An interesting statement was that statical analysis fails if not every aspect is checked, the reason beeing the developer trying to optimize the code against the measured criteria while neglecting other aspects. From my point of view this is not a shortcoming of a statical analysis, but of the way the people use it. It is measurable that a maintainable product has proportionally more readable variable names than an unmaintainable one, but is not necessarily true that your product gets maintainable when you rename all your variables. All in one: the speaker managed to motivate me to look for holes in his argumentation and thus to actively think about the topic.

Enterprise portals with grails. Does it work?

Speakers: Tobias Kraft and Manuel Breitfeld from exentio GmbH

Like the previous presentation, this one attracted me because of the grails context. Additionally, because of the title, I was hoping for a nice description on pitfalls they encountered while building their portals. One part of the presentation was the description of the portal they built and the requirements it has to fullfill. Another part was a description of the grails platform. They use grails to deliver snippets for their portal that is organized as a collection of such snippets. Very valuable was the part about the problems one can encounter when using grails, where they honestly admitted that the migration from Grails 1.3.7 to 2.x did cost them some time. To detect regressions during platform upgrades they recommended to put extra effort in tests.

Car2Car systems – Java and Peer2Peer move into the car

Speaker: Adam Kovacs from msg-systems

After the impressive lunch break my brain waves almost reached zero. The program brochure missed any interesting titles for the next round so I went for the least common topic. The presentation turned out a lucky find. The speaker managed to keep the right level of detail, without diving too deep or scratching the surface. He described how Chord, an implementation of a distributed hash table, can be used to share locally relevant traffic data like traffic jams or accidents. To increase the stability and the security of the network he introduced the use of existing transmitting stations and certificate authorities.

Kotlin

Speaker: Dr Ralph Guderlei from eXXcellent Solutions

There were two reasons I wanted to visit this presentation. The first was: My colleague already showed me some features of this language. The second: the language is developed by the company that also develops the IntelliJ IDEA. Good IDE support is practically built in, isn’t it? The presentation covered syntax, lambdas, type inference extension methods and how kotlin handles null references. It looks like kotlin is going to become something like an improved java. I hope for the best.

Enterprise Integration Patterns

Speaker: Alexander Heusingfeld from First Point IT Solutions

Another Presentation that gets selected because of the least boring sounding title and another success. In the first minutes I expected an endless enumeration of common well known patterns. This was only true for the first minutes. The topic quickly shifted to asynchronous messaging and increasingly complex patterns to handle it. As two frameworks with similar range of functions he presented Apache Camel and Spring Integration

Bottom line

The event was as always fun. Unfortunately it was not possible to visit more presentations due to their “parallel execution”. Have you been an JFS too and want to share your impressions about same or other presentations too? Post a comment!

Communication Through Code

In a previous post my colleague described our experiment on our ability to transfer the intention of the code by tests. The tests describe how the code behaves when called from the outside. Additional approach is to communicate through code.

To understand the code, at least the following two questions have to be answered:

How does the code work?
What is the reason behind the way the code is implemented?

Challenge

As long as the code is readable, it is possible to deduce its meaning. Improving readability is a common technique to help the reader. This includes using descriptive names, reducing complexity or hiding implementation details until they are absolutely necessary to understand the problem.

On the other hand deducing the reason why exactly this implementation was chosen by somebody is an impossible task without the knowledge (or lack thereof) of all implementors combined. One of the missing parts are the assumptions. Our code is full of them. Consider the following example:

void print(char* text)
{
  printf("program says %s", text);
}

In this function the writer assumes that:

the text is a valid pointer
the text is zero terminated
this program can write to stdout, i.e. is a console app
the reader speaks english

Or something nastier:

void* allocateBuffer(size_t size)
{
  void* buffer = malloc(size);
  if (!buffer) {
    printf("expect a segmentation fault!");
  }
  return buffer;
}

Here the writer assumes that malloc always returns either NULL or a pointer to dereferenceable memory. It is not always the case:

If size is zero, the return value depends on the particular library implementation (it may or may not be a null pointer), but the returned pointer shall not be dereferenced.

Assumptions not explicitly defined in the code lead sooner or later to hard to discover bugs.

Solution approaches

Comments are the quick and dirty way of writing down assumptions. They are easiest to read, but are never enforced and tend to diverge from the code with every edit made to it. However it is better to read “should never come here” and hear the alarm bells ringing than seeing nothing but whitespace.

Some of the assumptions can be documented and verified through tests, with varying level of detail. Unit tests will be most efficient on assumptions with little or no context, like verifying that only non-NULL-pointers are passed to a function. For more global assumptions integration or acceptance tests can be used. Together they ensure that no changes to the codebase break the assumptions made earlier. The drawback of unit tests is that they are locally decoupled from the code tested, forcing the reader to gather the information by searching for direct or indirect references to it.

When new code is written, assertions help to document how the API is meant to be used. Since they are executed not only during the test phase, they can capture wrong assumptions the authors made about the runtime environment. Writing down every possible assumption can quickly clutter the code with repeated statements like “assume pointer x is not NULL”, reducing readability and usefulness of this technique.

Conclusion

All of the shown approaches are not new. Each one has an aspect it excels at, so to get the most information out of the code they all have to be used. Their domains overlap partially, so it is possible to choose the approach depending on the situation, i.e. replacing assertions with unit tests for time critical code. One niche currently not filled by any of them is the description of global assumptions like the cultural background of the users.

Composite comparators in Java

Some time ago a fellow developer wrote a really comprehensive blog post (unfortunately only available in german) about comparator implementations in Java. More specifically it is about composite comparators used to compare entities according to different attributes. The best solution for Java 7 he comes up with is a comparator

class FoobarComparator implements Comparator {
  @Override
  public int compare(Foobar lhs, Foobar rhs) {
    return compoundCompare(
      lhs.getLastName().compareTo(rhs.getLastName()),
      lhs.getFirstName().compareTo(rhs.getFirstName()),
      lhs.getPlaceOfBirth().compareTo(rhs.getPlaceOfBirth()),
      lhs.getDateOfBirth().compareTo(rhs.getDateOfBirth()));
  }
}

with a reusable compoundCompare()-method

// utility method used with static import
int compoundCompare(int... results) {
  for (int result : results) {
    if (result != 0) {
      return result;
    }
  }
  return 0;
}

While this solution is quite clean and a vast improvement over the critized implementations it has the flaw that it eagerly evaluates all attributes even though short-circuiting may be possible for many entities. This may lead to performance problems in some cases. So he goes on to explain how Java 8 will fix this problem with Lambdas or another solution he calls “KeyMethodComparator”.

Now I want to show you an implementation very similar to his approach above but without the performance penalty and possible in Java 7 using the composite pattern:

import java.util.Arrays;
import java.util.Comparator;
import java.util.List;

class FoobarComparator implements Comparator<Foobar> {

  private List<Comparator<Foobar>> defaultFoobarComparison =
    Arrays.<Comparator<Foobar>>asList(
      new Comparator<Foobar>() {
        @Override
        public int compare(Foobar lhs, Foobar rhs) {
          return lhs.getLastName().compareTo(rhs.getLastName());
        }
      },
      new Comparator<Foobar>() {
        @Override
        public int compare(Foobar lhs, Foobar rhs) {
          return lhs.getFirstName().compareTo(rhs.getFirstName());
        }
      },
      new Comparator<Foobar>() {
        @Override
        public int compare(Foobar lhs, Foobar rhs) {
          return lhs.getPlaceOfBirth().compareTo(rhs.getPlaceOfBirth());
        }
      },
      new Comparator<Foobar>() {
        @Override
        public int compare(Foobar lhs, Foobar rhs) {
          return lhs.getDateOfBirth().compareTo(rhs.getDateOfBirth());
        }
      });

  @Override
  public int compare(Foobar lhs, Foobar rhs) {
    for (Comparator<Foobar> comp : defaultFoobarComparison) {
      int result = comp.compare(lhs, rhs);
      if (result != 0) {
        return result;
      }
    }
    return 0;
  }
}

It features the lazy evaluation demanded by my fellow for performance and allows flexible construction of different composite comparators if you, e.g. add a constructor accepting a list of comparators.
Imho, it is a quite elegant solution using standard object-oriented programming in Java today and not only in the future.

Test your migrations

Do you trust your database migrations?

An evolving project that changes its persistent data structure can require a transformation of already existing content into the new form. To achieve this goal in our grails projects we use a grails database migration plugin. This plugin allows us to apply changesets to the database and keep track of its current state.

The syntax of the DSL for groovy database migrations is easy to read. This can trick you into the assumption that everything that looks good, compiles and runs without errors is OK. Of course it is not. Here is an example:

changeSet(author: 'vasili', id: 'copies messages to archive') {
  grailsChange {
    change {
      sql.eachRow("SELECT MESSAGES.ID, MESSAGES.CONTENT, "
                + "MESSAGES.DATE_SENT FROM MESSAGES WHERE "
                + "MESSAGES.DATE_SENT > to_date('2011-01-01 00:00', "
                + "'YYYY-MM-DD HH24:MI:SS')") { row ->
        sql.execute("INSERT INTO MESSAGES_ARCHIVE(ID, CONTENT, DATE_SENT) "
                  + "VALUES(${row.id}, ${row.content}, ${row.date_sent})")
      }
    }
    change {
      sql.eachRow("SELECT MESSAGES.ID, MESSAGES.CONTENT, "
                + "MESSAGES.DATE_SENT FROM MESSAGES WHERE "
                + "MESSAGES.DATE_SENT > to_date('2012-01-01 00:00', "
                + "'YYYY-MM-DD HH24:MI:SS')") { row ->
        sql.execute("INSERT INTO MESSAGES_ARCHIVE(ID, CONTENT, DATE_SENT) "
                + "VALUES(${row.id}, ${row.content}, ${row.date_sent})")
      }
    }
  }
}

Here you see two change closures that differ only by the year in the SQL where clause. What do you think will happen with your database when this migration is applied? The answer is: only changes from the year 2012 will be found in the destination table. The assumption that when there is one change closure in the grailsChange block there can also be two changes in it is, while compilable and runnable, wrong. Loking at the documentation you will see that it shows only one change block in the example code. When you divide the migration into multiple parts, each of them working on their own change, everything will work as expected.

Currently there is no safety net like unit tests for database migrations. Every assumption you make must be tested manually with some dummy test data.

	Anonymous on Cache configuration with WildF…
	Miq on Nested queries like N+1 in pra…
	mariuselvert on Creating functors with lambda…
	Nested queries like… on Common SQL Performance Gotchas…
	Nested queries like… on Make your users happy by not c…