Data-Oriented Design: Using data as interfaces

A Code Centric World

In main-stream OOP, polymorphism is achieved by virtual functions. To reuse some code, you simply need one implementation of a specific “virtual” interface. Bigger programs are composed by some functions calling other functions calling yet other functions. Virtual functions introduce a flexibility here to that allow parts of the call tree to be replaced, allowing calling functions to be reused by running on different, but homogenuous, callees. This is a very “code centric” view of a program. The data is merely used as context for functions calling each other.

Duality

Let us, for the moment, assume that all the functions and objects that such a program runs on, are pure. They never have any side effects, and communicate solely via parameters to and return values from the function. Now that’s not traditional OOP, and a more functional-programming way of doing things, but it is surely possible to structure (at least large parts of) traditional OOP programs that way. This premise helps understanding how data oriented design is in fact dual to the traditional “code centric” view of a program: Instead of looking at the functions calling each other, we can also look at how the data is being transformed by each step in the program because that is exactly what goes into, and comes out of each function. IS-A becomes “produces/consumes compatible data”.

Cooking without functions

I am using C# in the example, because LINQ, or any nice map/reduce implementation, makes this really staight-forward. But the principle applies to many languages. I have been using the technique in C++, C#, Java and even dBase.
Let’s say we have a recipe of sorts that has a few ingredients encoded in a simple class:

class Ingredient
{
  public string Name { get; set; }
  public decimal Amount { get; set; }
}

We store them in a simple List and have a nice function that can compute the percentage of each ingredient:

public static IReadOnlyList<(string, decimal)> 
    Percentages(IEnumerable<Ingredient> incredients)
{
  var sum = incredients.Sum(x => x.Amount);
    return incredients
      .Select(x => (x.Name, x.Amount / sum))
      .ToList();
}

Now things change, and just to make it difficult, we need a new ingredient type that is just a little more complicated:

class IngredientInfo
{
  public string Name { get; set; }
  /* other useful stuff */
}

class ComplicatedIngredient
{
  public IngredientInfo Info { get; set; }
  public decimal Amount { get; set; }
}

And we definitely want to use the old, simple one, as well. But we need our percentage function to work for recipes that have both Ingredients and also ComplicatedIngredients. Now the go-to OOP approach would be to introduce a common interface that is implemented by both classes, like this:

interface IIngredient
{
  string GetName();
  string GetAmount();
}

That is trivial to implement for both classes, but adds quite a bunch of boilerplate, just about doubling the size of our program. Then we just replace IReadOnlyList<Ingredient> by IReadOnlyList<IIngredient> in the Percentage function. That last bit is just so violating the Open/Closed principle, but just because we did not use the interface right away (Who thought YAGNI was a good idea?). Also, the new interface is quite the opposite of the Tell, don’t ask principle, but there’s no easy way around that because the “Percentage” function only has meaning on a List<> of them.

Cooking with data

But what if we just use data as the interface? In this case, it so happens that we can easiely turn a ComplicatedIngredient into an Ingredient for our purposes. In C#’s LINQ, a simple Select() will do nicely:

var simplified = complicated
  .Select(x => new Ingredient
   { 
     Name = x.Info.Name,
     Amount = x.Amount
   });

Now that can easiely be passed into the Percentages function, without even touching it. Great!

In this case, one object could neatly be converted into the other, which is often not the case in practice. However, there’s often a “common denominator class” that can be found pretty much the same way as extracting a common interface would. Just look at the info you can retrieve from that imaginary interface. In this case, that was the same as the original Ingredients class.

Further thoughts

To apply this, you sometimes have to restructure your programs a little bit, which often means going wide instead of deep. For example, you might have to convert your data to a homogenuous form in a preprocessing step instead of accessing different objects homogenuously directly in your algorithms, or use postprocessing afterwards.
In languages like C++, this can even net you a huge performance win, which is often cited as the greatest thing about data-oriented design. But, first and foremost, I find that this leads to programs that are easier to understand for both machine and people. I have found myself using this data-centric form of code reuse a lot more lately.

Are you using something like this as well or are you still firmly on the override train, and why? Tell me in the comments!

One thought on “Data-Oriented Design: Using data as interfaces

  1. Pingback: Math development practices | Schneide Blog

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Google photo

You are commenting using your Google account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s

This site uses Akismet to reduce spam. Learn how your comment data is processed.