bulk operations – Schneide Blog

The performance problem

In one part of our application we have personal messages that are marked as read after viewing. For some users there can be quite a lot messages so we implemented a “mark all as read”-feature. The naive implementation looks like this:

def markAllAsRead() {
    def user = securityService.loggedInUser
    def messages = Messages.findAllByUserAndTimelineEntry.findAllByAuthorAndRead(user, false)
    messages.each { message ->
        message.read = true
        message.save()
    }
    Messages.withSession { session -> session.flush()}
 }

While this is both correct and simple it only works well for a limited amount of messages per user. Performance will degrade because all the domain objects are loaded into domain objects, then modified and save one-by-one to the session. Finally the session is persisted to the database. In our use case this could take several seconds which is much too long for a good user experience.

DetachedCriteria to the rescue

GORM offers a much better solution for such use-cases that does not sacrifice expressiveness. Instead it offers a succinct API called “Where Queries” that creates DetachedCriteria and offers batch-updates.

def markAllAsRead() {
    def user = securityService.loggedInUser
    def messages = Messages.where {
        read == false
        addressee == user
    }
    messages.updateAll(read: true)
}

This implementation takes only a few milliseconds to execute with the same dataset as above which is de facto native SQL performance.

Conclusion

Before cursing GORM for bad performance one should have a deeper look at the alternative querying APIs like Where Queries, Criteria, DetachedCriteria, SQL Projections and Restrictions to enhance your ORM toolbox. Compared to dynamic finders and GORM-methods on domain objects they offer better composability and performance without resorting to HQL or plain SQL.

One exception in a collection operation like for-each or map/collect stops the processing of all the other elements. Instead of letting the whole task blow up it is often more desirable to skip those elements causing failures, log the errors (and possibly notify the user about the failing elements), but have all other elements processed. Examples for such operations are: sending bulk mails to users, bulk import/export, lists in user interfaces etc., and common errors are, for example, NullPointerExceptions, database errors or wrong email addresses.

Here’s some simple code for robust and reusable for-each and map operations in JavaScript:

function robustForEach(array, callback) {
  var failures = [];
  array.forEach(function(elem, i) {
    try {
      callback(elem, i);
    } catch (e) {
      failures.push({element: elem, index: i, error: e});
    }
  });
  return failures;
}

function robustMap(array, callback) {
  var result = { array: [] };
  result.failures = robustForEach(array, function(elem, i) {
    result.array.push(callback(elem, i));
  });
  return result;
}

Similar code can be easily implemented in other languages like Java (especially with Java 8 streams), Groovy, Ruby, etc.

If you decide to log the errors, you have to choose between two possible log strategies: one log operation per error, which can be annoying if you get a mail for each logged error, or one log operation bundling all occurred errors (make sure that a failing toString can’t spoil the whole bunch again).

function logAny(failures) {
  failures.forEach(function(fail) {
    log.error(failMessage(fail));
  });
}

function logAnyBundled(failures) {
  if (failures.length == 0) {
    return;
  }
  log.error(failures.map(function(fail) {
    return failMessage(fail);
  }).join('\n'));
}

function failMessage(fail) {
  return "Could not process '" +
         fail.element + "': " + fail.error;
}

You can easily combine the map and log operations:

function robustMapAndLog(array, callback) {
  var result = robustMap(array, callback);
  logAny(result.failures);
  return result.array;
}

Example usage:

var numbers = [1, 2, 3, 4, 5, 6, 7, 8];
var result = robustMapAndLog(numbers, function(n) {
  if (n == 5) {
    throw 'bad apple';
  }
  return n * n;
});
print(result);

// Error log output:
//  Could not process '5': bad apple
// Output:
//  [ 1, 4, 9, 16, 36, 49, 64 ]

One element could not be processed due to an error, but all other elements were not affected.

Conclusion

Be aware of the bad apple possibility for every loop you write (explicitly or implicitly) and consciously choose the appropriate error handling strategy depending on the situation. Don’t let indifference decide the fate of your bulk operations.

	Writing Integration… on Every Unit Test Is a Stage Pla…
	mariuselvert on C# is very strict about modify…
	Anonymous on C# is very strict about modify…
	Anonymous on Cache configuration with WildF…
	Miq on Nested queries like N+1 in pra…

Tag: bulk operations

Grails Domain update optimisation

The performance problem

DetachedCriteria to the rescue

Conclusion

Don’t let one bad apple spoil the whole bunch

Conclusion