Generating an Icosphere in C++

If you want to render a sphere in 3D, for example in OpenGL or DirectX, it is often a good idea to use a subdivided icosahedron. That often works better than the “UVSphere”, which means simply tesselating a sphere by longitude and latitude. The triangles in an icosphere are a lot more evenly distributed over the final sphere. Unfortunately, the easiest way, it seems, is to generate such a sphere is to do that in a 3D editing program. But to load that into your application requires a 3D file format parser. That’s a lot of overhead if you really need just the sphere, so doing it programmatically is preferable.

At this point, many people will just settle for the UVSphere since it is easy to generate programmatically. Especially since generating the sphere as an indexed mesh without vertex-duplicates further complicates the problem. But it is actually not much harder to generate the icosphere!
Here I’ll show some C++ code that does just that.

C++ Implementation

We start with a hard-coded indexed-mesh representation of the icosahedron:

struct Triangle
{
  Index vertex[3];
};

using TriangleList=std::vector<Triangle>;
using VertexList=std::vector<v3>;

namespace icosahedron
{
const float X=.525731112119133606f;
const float Z=.850650808352039932f;
const float N=0.f;

static const VertexList vertices=
{
  {-X,N,Z}, {X,N,Z}, {-X,N,-Z}, {X,N,-Z},
  {N,Z,X}, {N,Z,-X}, {N,-Z,X}, {N,-Z,-X},
  {Z,X,N}, {-Z,X, N}, {Z,-X,N}, {-Z,-X, N}
};

static const TriangleList triangles=
{
  {0,4,1},{0,9,4},{9,5,4},{4,5,8},{4,8,1},
  {8,10,1},{8,3,10},{5,3,8},{5,2,3},{2,7,3},
  {7,10,3},{7,6,10},{7,11,6},{11,0,6},{0,1,6},
  {6,1,10},{9,0,11},{9,11,2},{9,2,5},{7,2,11}
};
}

icosahedron
Now we iteratively replace each triangle in this icosahedron by four new triangles:

subdivision

Each edge in the old model is subdivided and the resulting vertex is moved on to the unit sphere by normalization. The key here is to not duplicate the newly created vertices. This is done by keeping a lookup of the edge to the new vertex it generates. Note that the orientation of the edge does not matter here, so we need to normalize the edge direction for the lookup. We do this by forcing the lower index first. Here’s the code that either creates or reused the vertex for a single edge:

using Lookup=std::map<std::pair<Index, Index>, Index>;

Index vertex_for_edge(Lookup& lookup,
  VertexList& vertices, Index first, Index second)
{
  Lookup::key_type key(first, second);
  if (key.first>key.second)
    std::swap(key.first, key.second);

  auto inserted=lookup.insert({key, vertices.size()});
  if (inserted.second)
  {
    auto& edge0=vertices[first];
    auto& edge1=vertices[second];
    auto point=normalize(edge0+edge1);
    vertices.push_back(point);
  }

  return inserted.first->second;
}

Now you just need to do this for all the edges of all the triangles in the model from the previous interation:

TriangleList subdivide(VertexList& vertices,
  TriangleList triangles)
{
  Lookup lookup;
  TriangleList result;

  for (auto&& each:triangles)
  {
    std::array<Index, 3> mid;
    for (int edge=0; edge<3; ++edge)
    {
      mid[edge]=vertex_for_edge(lookup, vertices,
        each.vertex[edge], each.vertex[(edge+1)%3]);
    }

    result.push_back({each.vertex[0], mid[0], mid[2]});
    result.push_back({each.vertex[1], mid[1], mid[0]});
    result.push_back({each.vertex[2], mid[2], mid[1]});
    result.push_back({mid[0], mid[1], mid[2]});
  }

  return result;
}

using IndexedMesh=std::pair<VertexList, TriangleList>;

IndexedMesh make_icosphere(int subdivisions)
{
  VertexList vertices=icosahedron::vertices;
  TriangleList triangles=icosahedron::triangles;

  for (int i=0; i<subdivisions; ++i)
  {
    triangles=subdivide(vertices, triangles);
  }

  return{vertices, triangles};
}

There you go, a customly subdivided icosphere!
icosphere

Performance

Of course, this implementation is not the most runtime-efficient way to get the icosphere. But it is decent and very simple. Its performance depends mainly on the type of lookup used. I used a map instead of an unordered_map here for brevity, only because there’s no premade hash function for a std::pair of indices. In pratice, you would almost always use a hash-map or some kind of spatial structure, such as a grid, which makes this method a lot tougher to compete with. And certainly feasible for most applications!

The general pattern

The lookup-or-create pattern used in this code is very useful when creating indexed-meshes programmatically. I’m certainly not the only one who discovered it, but I think it needs to be more widely known. For example, I’ve used it when extracting voxel-membranes and isosurfaces from volumes. It works very well whenever you are creating your vertices from some well-defined parameters. Usually, it’s some tuple that describes the edge you are creating the vertex on. This is the case with marching cubes or marching tetrahedrons. It can, however, also be grid coordinates if you sparsely generate vertices on a grid, for example when meshing heightmaps.

ECMAScript 6 is coming

ECMAScript is the standardized specification of the scripting language that is the basis for JavaScript implementations in web browsers.

ECMAScript 5, which was released in 2009, is in widespread use today. Six years later in 2015 ECMAScript 6 was released, which introduced a lot of new features, especially additions to the syntax.

However, browser support for ES 6 needed some time to catch up.
If you look at the compatibility table for ES 6 support in modern browsers you can see that the current versions of the main browsers now support ES 6. This means that soon you will be able to write ES 6 code without the necessity of a ES6-to-ES5 cross-compiler like Babel.

The two features of ES 6 that will change the way JavaScript code will be written are in my opinion: the new class syntax and the new lambda syntax with lexical scope for ‘this’.

Class syntax

Up to now, if you wanted to define a new type with a constructor and methods on it, you would write something like this:

var MyClass = function(a, b) {
	this.a = a;
	this.b = b;
};
MyClass.prototype.methodA = function() {
	// ...
};
MyClass.prototype.methodB = function() {
	// ...
};
MyClass.prototype.methodC = function() {
	// ...
};

ES 6 introduces syntax support for this pattern with the class keyword, which makes class definitions shorter and more similar to class definitions in other object-oriented programming languages:

class MyClass {
	constructor(a, b) {
		this.a = a;
		this.b = b;
	}
	methodA() {
		// ...
	}
	methodB() {
		// ...
	}
	methodC() {
		// ...
	}
}

Arrow functions and lexical this

The biggest annoyance in JavaScript were the scoping rules of this. You always had to remember to rebind this if you wanted to use it inside an anonymous function:

var self = this;
this.numbers.forEach(function(n) {
    self.doSomething(n);
});

An alternative was to use the bind method on an anonymous function:

this.numbers.forEach(function(n) {
    this.doSomething(n);
}.bind(this));

ES 6 introduces a new syntax for lambdas via the “fat arrow” syntax:

this.numbers.forEach((n) => this.doSomething(n));

These arrow functions do not only preserve the binding of this but are also shorter. Especially if the function body consists of only a single expression. In this case the curly braces and and the return keyword can be omitted:

// ES 5
numbers.filter(function(n) { return isPrime(n); });
// ES 6
numbers.filter((n) => isPrime(n));

Wide-spread support for ECMAScript 6 is just around the corner, it’s time to familiarize yourself with it if you haven’t done so yet. A nice overview of the new features can be found at es6-features.org.

4 questions you need to ask yourself constantly while programming

Most of today’s general purpose progamming languagues come with plethora of features. Often there are different levels of abstractions and intended use cases. Some features are primarily for library designers, others ease implementation of domain specific languages and application developers use mostly another feature set.

Some language communities are discussing “language profiles / levels” to ban certain potentionally harmful constructs. The typical audience like application programmers does not need them but removing them from the language would limit its usefulness in other cases. Examples are Scala levels (a bit dated), the Google C++ Style Guide or Profiles in the C++ Core Guidelines.

In the wild

When reading other peoples code I often see novice code dealing with low-level threading. Or they go over board with templates, reflection or meta programming.

I have even seen custom ClassLoaders in Java written by normal application programmers. People are using threads when workers, tasks, actors or other more high-level abstractions would fit much better.

Especially novices seem to be unable to recognize their limits and to stay off of inappropriate and potentially dangerous features.

How do you decide what is appropriate in your situation?

Well, that is a difficult question. If you find the task at hand seems hard you should probably take a step back because:

There are two hard things in computer science: cache invalidation, naming things, and off-by-one errors.

-Jeff Atwood

Then ask yourself some simple questions:

  1. Someone must have done it before. Have I searched thoroughly for hints or solutions?
  2. Is there a (better) library, data structure or abstraction?
  3. Do I really have to do this? There must be a better/easier way!
  4. What do I gain using feature/library/tool X and what are its costs? What about the alternatives?

Conclusion

You need some experience to recognize that you are on the wrong path, solving problems you would not even have if doing the right thing in the first place.

Experience is what you got by not having it when you needed it.

-Author Unknown

Try to know and admit your limits – there is nothing wrong with struggling to get things working but it helps to frequently check your direction by taking a step back and reflecting.

From Agile to UX: a change in perspective

Usually a project starts with a client sending us a list of requirements in varying levels of detail. In my early days I started with finding the most efficient way to fulfill these requirements with written software.

Over time and with increased experience I broke down the large requirements into smaller ones. With every step I tried to get feedback from the client if my solution matched his imagination.

Step by step I refined this iterative process by developing more efficiently, getting earlier feedback, testing and asking questions for more detail about the constraints of the solution. Sometimes I identified some parts that weren’t needed in the solution.

In the journey to getting more agile I even re-framed the initial question from ‘how can I get more efficiently to a satisfying solution’ to ‘which minimal solution brings the most value to the customer’.

This was the first change of perspective from the process of solving to a process of value. But a problem still persisted: the solution was based on my assumptions of what I believe that brings value to the customer.
The only feedback I would get was that the customer accepted the solution or stated some improvements. But for this feedback I had to implement the whole solution in software.

The clash of worlds: agile and UX

Diving into the UX and product management world my view of software development is questioned at its foundation. Agile software development and development projects in general start with a fatal assumption: the goal is to bring value through software that fulfills the requirements. But value is not created by software or by satisfying any requirements. For the user value is created by helping him getting his jobs done, helping him solving his problems in his context.

This might sound harsh but software is not an end in itself but rather one way to help users achieve their goals.
On top of that requirements lack the reasons why the user needs them (which jobs they help the user do) and in which situation the user might need them (the context).
In order to account this I need to change my focus away from defining the solution (the requirements) to exploring the users, their problems and their context.

The problem with finding problems

As a software developer I am skilled in finding solutions. Because of this I have some difficulties in finding the problems without proposing (even subconsciously) a solution right away. If you are like me while talking with a client or user I tend to imagine the possible solutions in my head. On top of that missing details that are filled by my experience or my assumptions. The problem is that assumptions are difficult to spot. One way is to look at the language. A repeatable way is to use a process for finding them.

The product kata

Recently I stumbled upon an excellent blog post by Melissa Perri, a product manager and UX designer. In this post she describes a way named ‘the product kata’.

The product kata starts with defining the direction: the problem, job or task I want to address.
After a listening tour with the different stakeholders (including clients and users), the requirements list and a contextual observation of the current solution I can at least give a rough description of the problem.

These steps help me to get a basic understanding of the domain and the current situation. In my old way of doing things I would now rush towards the solution. I would identify the next step(s) bearing the most value for the client and along the way remove the shortcomings of the current solution. But wait. The product kata proposes a different way.

A different way

Let’s use an example from a real project. Say the client needs a way to check incoming values measured by some sensors. The old solution plots these values over time in a chart. It lacks important contextual information, has no notion of what ‘checking the values’ mean and therefore cannot document the check result. Since the process of checking the values is central to the business we need to put it first and foremost. Following the product kata I define the direction as ‘check the sensor values’.

Direction: check the sensor values

To see if I reached the goal the kata needs a target condition which I define as ‘the user should be able to check the sensor values and record the check result’.

Target condition: the user should be able to check the sensor values and record the check result

Currently the user isn’t able to check anything. So the next step of the kata is to look at the current condition. If the current condition matches the target condition I am done. The current condition in my example is that the user cannot check the sensor values in the right way.

Current condition: the user cannot check the values in the right way

The first obstacle to achieving the target condition is that I don’t know what the right way is. Since the old solution lacks some important information to fulfill the check my first obstacle I want to address is to find out what information does the user need.

Obstacle: what additional information (besides the values themselves) does the user need

Since the product kata originates from lean product management I need to find an efficient step which addresses this obstacle. In my case I choose to make a simple paper sketch of a chart and interview the user about which data they needed in the past.

Step: paper sketch of chart (to frame the discussion) and interview about information needed in the past

I expect to collect a list with all the information needed.

Expected: a list of past data sources which helped the user in his check process

After doing this I learned what information should be displayed and which information (from the old solution) was not needed in the past.

Learned: two list of things: what was needed, what not

Now I repeat the kata from the start. The current condition still not matches the target condition. My next obstacle is that we do not know from the vast resources of information that is needed and the possible actions during the check which are related, form a group or are the most important ones. So my next step is to do a card sort with the users to take a peek into their mental model. I expect to find out about the priorities and grouping of the possible information and actions.

After I gathered and condensed the information from the card sorts, my next obstacle is to find out more about the journey of the user during the check and the struggles he has. From my earlier contextual observation I have the current user journey and the struggles along the way. Armed with the insights from the earlier steps I can now create a design which maps the user journey and addresses the struggles with the data and the actions according to the mental model of the user.
For this I develop a (prototypical) implementation in software and test them with a group of users. I expect to verify or find problems regarding the match of the mental model of the user and my solution.
This process of the product kata is repeated until the current condition meets the target condition.

Why this is needed

What changed for me is that I do not rush towards solving the problem but first build a solid understanding by learning more about the users, their jobs and contexts in a directed way. The product kata helps me to frame my thoughts towards the users and away from the solution and my assumptions about it. It helps me structure my discovery process and the progress of the project.
My old way started from requirements and old solutions improving what was done before. The problem with this approach is that assumptions and decisions that were made in the past are not questioned. The old way of doing things may be an inspiration but it is not a good starting point.
Requirements by the client are predefined solutions: they frame the solution space. If the client is a very good project manager this might work but if I am responsible for leading the project this can lead to a disaster.
The agile way of developing software is great at executing. But without guidance and a way of learning about the users and their problems, agile software development is lost.

The rule of additive changes

Change is in the nature of software development. Most difficult aspects of the craft revolve around dealing with change. How does one keep software extensible? How do you adapt to new business requirements?

With experience comes the intuition that some kind of changes are more volatile than other changes. For example, it is often safer to add a new function or type to an application than change an existing one.

This is because adding something new means that it is not already strongly connected to the rest of the application. Or at least that’s the assumption. You have yet to decide how the new component interacts with the rest of the application. Usually this is done by a, preferably small, incision in the innards of your software. The first change, the adding, should not break anything. If anything, the small incision should be the only dangerous aspect of the change.

This is as very important concept: adding should not break things! This is so important, I want to give it a name:

The Rule of Additive Changes

Adding something to a well-designed software system should not break existing functionality. Exceptions should be thoroughly documented and communicated.

Systems should always be designed and tought so that the rule of additive changes holds. Failure to do so will lead to confusing surprises in the best cases, and well hidden bugs in worse cases.

The rule is nothing new, however: it’s a foundation, an axiom, to many other rules, such as the Liskov Substitution Principle:

Inheritance

Quoting from Wikipedia:

“If S is a subtype of T, then objects of type T in a program may be replaced with objects of type S without altering any of the desirable properties of that program”

This relies on subtyping as an additive change: S works at least as good as any T, so it is an extension, an addition. You should therefore design your systems in a way that the Liskov Substition Principle, and therefore the rule of additive changes, both hold: An addition of a new type in a hierarchy cannot break anything.

Whitelists vs. Blacklists

Blacklists will often violate the rule of additive changes. Once you add a new element to the domain, the domain behind the blacklist will change as well, while the domain behind a whitelist will be unaffected. Ultimately, both can be what you want, but usually, the more contained change will break less – and you can still change the whitelist explicitly later!

Note that systems that filter classes from a hierarchy via RTTI or, even more subtle, via ask-interfaces, are blacklists. Those systems can break easily when new types are introduces to a hierarchy. Extra care needs to be taken to make sure the rule of addition holds for these systems.

Introspection and Reflection

Without introspection and reflection, programs cannot know when you are adding a new type or a new function. However, with introspection, they can. Any additive change can also be an incision point. Therefore, you need to be extra careful when designing systems that use introspection: They should not break existing functionality for adding something.

For example, adding a function to enable a specific new functionality is okay. A common case of this would be to adding a function to a controller in a web-framework to add a new action. This will not inferfere with existing functionality, so it is fine.

On the other hand, adding a member to a controller should not disable or change functionality. Adding a special member for “filtering” or some kind of security setting falls into this category. You think you’re merely adding something, but in fact you are modifying. A system that relies on such behavior therefore violates the rule of additive changes. Decorating the member is a much better alternative, as that makes it clear that you are indeed modifying something, which might break existing functionality.

Not all languages or frameworks provide this possibility though. In that case, the only alternative is good communication and documentation!

Refactoring

Many engineers implicitly assume that the rule of additive changes holds. In his book “Working Effectively With Legacy Code”, Micheal Feathers proposes the sprout and wrap techniques to change legacy software. The underlying technique is the same for both: formulating a potentially breaking change as mostly additive, with only a small incision point. In the presence of systems that do not follow the rule of additive changes, such risk minimization does not work at all. For example, adding additional function can break a system that relies heavily on introspection – which goes against all intuition.

Conclusion

This rule is not a new concept. It is something that many programmers have in their head already, but possibly fractured into lots of smaller guidelines. But it is one overarching concept and it needs a name to be accessible as such. For me, that makes things a lot clearer when reasoning about systems at large.

Monitoring data integrity with health checks

An important aspect for systems, which are backed by a database storage, is to maintain data integrity. Most relational databases offer the possibility to define constraints in order to maintain data integrity, usually referential integrity and entity integrity. Typical constraints are foreign key constraints, not-null constraints, unique constraints and primary key constraints.

SQL also provides the CHECK constraint, which allows you to specify a condition on each row in a table:

ALTER TABLE table_name ADD CONSTRAINT
   constraint_name CHECK ( predicate )

For example:

CHECK (AGE >= 18)

However, these check constraints are limited. They can’t be defined on views, they can’t refer to columns in other tables and they can’t include subqueries.

Health checks

In order to monitor data integrity on a higher level that is closer to the business rules of the domain, we have deployed a technique that we call health checks in some of our applications.

These health checks are database queries, which check that certain constraints are met in accordance with the business rules. The queries are usually designed to return an empty result set on success and to return the faulty data records otherwise.

The health checks are run periodically. For example, we use a Jenkins job to trigger the health checks of one of our web applications every couple of hours. In this case we don’t directly query the database, but the application does and returns the success or failure states of the health checks in the response of a HTTP GET request.

This way we can detect problems in the stored data in a timely manner and take countermeasures. Of course, if the application is bug free these health checks should never fail, and in fact they rarely do. We mostly use the health checks as an addition to regression tests after a bug fix, to ensure and monitor that the unwanted state in the data will never happen again in the future.

Getting Shibboleth SSO attributes securely to your application

Accounts and user data are a matter of trust. Single sign-on (SSO) can improve the user experience (UX), convenience and security especially if you are offering several web applications often used by the same user. If you do not want to force your users to big vendors offering SSO like google or facebook or do not trust them you can implement SSO for your offerings with open-source software (OSS) like shibboleth. With shibboleth it may be even feasible to join an existing federation like SWITCH, DFN or InCommon thus enabling logins for thousands of users without creating new accounts and login data.

If you are implementing you SSO with shibboleth you usually have to enable your web applications to deal with shibboleth attributes. Shibboleth attributes are information about the authenticated user provided by the SSO infrastructure, e.g. the apache web server and mod_shib in conjunction with associated identity providers (IDP). In general there are two options for access of these attributes:

  1. HTTP request headers
  2. Request environment variables (not to confuse with system environment variables!)

Using request headers should be avoided as it is less secure and prone to spoofing. Access to the request environment depends on the framework your web application is using.

Shibboleth attributes in Java Servlet-based apps

In Java Servlet-based applications like Grails or Java EE access to the shibboleth attributes is really easy as they are provided as request attributes. So simply calling request.getAttribute("AJP_eppn") will provide you the value of the eppn (“EduPrincipalPersonName”) attribute set by shibboleth if a user is authenticated and the attribute is made available. There are 2 caveats though:

  1. Request attributes are prefixed by default with AJP_ if you are using mod_proxy_ajp to connect apache with your servlet container.
  2. Shibboleth attributes are not contained in request.getAttributeNames()! You have to directly access them knowing their name.

Shibboleth attributes in WSGI-based apps

If you are using a WSGI-compatible python web framework for your application you can get the shibboleth attributes from the wsgi.environ dictionary that is part of the request. In CherryPy for example you can use the following code to obtain the eppn:

eppn = cherrypy.request.wsgi_environ['eppn']

I did not find the name of the WSGI environment dictionary clearly documented in my efforts to make shibboleth work with my CherryPy application but after that everything was a bliss.

Conclusion

Accessing shibboleth attributes in a safe manner is straightforward in web environments like Java servlets and Python WSGI applications. Nevertheless, you have to know the above aspects regarding naming and visibility or you will be puzzled by the behaviour of the shibboleth service provider.

Every time you write a getter, a function dies

This blog post explores the difference between classic and Tell, don’t Ask-style code without any sourcecode examples.

Don’t be too alarmed by the title. Functions are immortal concepts and there’s nothing wrong with a getter method. Except when you write code under the rules of the Object Calisthenics (rule number 9 directly forbids getter and setter methods). Or when you try to adhere to the ideal of encapsulation, a cornerstone of object-oriented programming. Or when your code would really benefit from some other design choices. So, most of the time, basically. Nobody dies if you write a getter method, but you should make a concious decision for it, not just write it out of old habit.

One thing the Object Calisthenics can teach you is the immediate effect of different design choices. The rules are strict enough to place a lot of burden on your programming, so you’ll feel the pain of every trade-off. In most of your day-to-day programming, you also make the decisions, but don’t feel the consequences right away, so you get used to certain patterns (habits) that work well for the moment and might or might not work in the long run. You should have an alternative right at hands for every pattern you use. Otherwise, it’s not a pattern, it’s a trap.

Some alternatives

Here is an incomplete list of common alternatives to common patterns or structures that you might already be aware of:

  • if-else statement (explicit conditional): You can replace most explicit conditionals with implicit ones. In object-oriented programming, calling polymorphic methods is a common alternative. Instead of writing if and else, you call a method that is overwritten in two different fashions. A polymorphic method call can be seen as an implicit switch-case over the object type.
  • else statement: In the Object Calisthenics, rule 2 directly forbids the usage of else. A common alternative is an early return in the then-block. This might require you to extract the if-statement to its own method, but that’s probably a good idea anyway.
  • for-loop: One of the basic building blocks of every higher-level programming language are loops. These explicit iterations are so common that most programmers forget their implicit counterpart. Yeah, I’m talking about recursion here. You can replace every explicit loop by an implicit loop using recursion and vice versa. Your only limit is the size of your stack – if you are bound to one. Recursion is an early brain-teaser in every computer science curriculum, but not part of the average programmer’s toolbox. I’m not sure if that’s a bad thing, but its an alternative nonetheless.
  • setter method: The first and foremost alternative to a state-altering operation are immutable objects. You can’t alter the state of an immutable, so you have to create a series of edited copies. Syntactic sugar like fluent interfaces fit perfectly in this scenario. You can probably imagine that you’ll need to change the whole code dealing with the immutables, but you’ll be surprised how simple things can be once you let go of mutable state, bad conscience about “wasteful” heap usage and any premature thought about “performance”.

Keep in mind that most alternatives aren’t really “better”, they are just different. There is no silver bullet, every approach has its own advantages and drawbacks, both shortterm and in the long run. Your job as a competent programmer is to choose the right approach for each situation. You should make a deliberate choice and probably document your rationale somewhere (a project-related blog, wiki or issue tracker comes to mind). To be able to make that choice, you need to know about the pros and cons of as much alternatives as you can handle. The two lamest rationales are “I’ve always done it this way” and “I don’t know any other way”.

An alternative for get

In this blog post, you’ll learn one possible alternative to getter methods. It might not be the best or even fitting for your specific task, but it’s worth evaluating. The underlying principle is called “Tell, don’t Ask”. You convert the getter (aka asking the object about some value) to a method that applies a function on the value (aka telling the object to work with the value). But what does “applying” mean and what’s a function?

191px-Function_machine2.svgA function is defined as a conversion of some input into some output, preferably without any side-effects. We might also call it a mapping, because we map every possible input to a certain output. In programming, every method that takes a parameter (or several of them) and returns something (isn’t void) can be viewed as a function as long as the internal state of the method’s object isn’t modified. So you’ve probably programmed a lot of functions already, most of the time without realizing it.

In Java 8 or other modern object-oriented programming languages, the notion of functions are important parts of the toolbox. But you can work with functions in Java since the earliest days, just not as convenient. Let’s talk about an example. I won’t use any code you can look at, so you’ll have to use your imagination for this. So you have a collection of student objects (imagine a group of students standing around). We want to print a list of all these students onto the console. Each student object can say its name and matriculation number if asked by plain old getters. Damn! Somebody already made the design choice for us that these are our duties:

  • Iterate over all student objects in our collection. (If you don’t want to use a loop for this you know an alternative!)
  • Ask each student object about its name and matriculation number.
  • Carry the data over to the console object and tell the console to print both informations.

But because this is only in our imagination, we can go back in imagined time and eliminate the imagined choice for getters. We want to write our student objects without getters, so let’s get rid of them! Instead, each student object knows about their name and matriculation number, but cannot be asked directly. But you can tell the student object to supply these informations to the only (or a specific) method of an object that you give to it. Read the previous sentence again (if you’ve not already done it). That’s the whole trick. Our “function” is an object with only one method that happens to have exactly the parameters that can be provided by the student object. This method might return a formatted string that we can take to the console object or it might use the console itself (this would result in no return value and a side effect, but why not?).  We create this function object and tell each student object to use it. We don’t ask the student object for data, we tell it to do work (Tell, don’t Ask).

In this example, the result is the same. But our first approach centers the action around our “main” algorithm by gathering all the data and then acting on it. We don’t feel pain using this approach, but we were forced to use it by the absence of a function-accepting method and the presence of getters on the student objects. Our second approach prepares the action by creating the function object and then delegates the work to the objects holding the data. We were able to use it because of the presence of a function-accepting method on the student objects. The absence of getters in the second approach is a by-product, they simply aren’t necessary anymore. Why write getters that nobody uses?

We can observe the following characteristics: In a “traditional”, imperative style with getters, the data flows (gets asked) and the functionality stays in place. In a Tell, don’t Ask style with functions, the data tends to stay in place while the functionality gets passed around (“flows”).

Weighing the options

This is just one other alternative to the common “imperative getter” style. As stated, it isn’t “better”, just maybe better suited for a particular situation. In my opinion, the “functional operation” style is not straight-forward and doesn’t pay off immediately, but can be very rewarding in the long run. It opens the door to a whole paradigm of writing sourcecode that can reveal inherent or underlying concepts in your solution domain a lot clearer than the imperative style. By eliminating the getter methods, you force this paradigm on your readers and fellow developers. But maybe you don’t really need to get rid of the getters, just reduce their usage to the hard cases.

So the title of this blog post is a bit misleading. Every time you write a getter, you’ve probably considered all alternatives and made the informed decision that a getter method is the best way forward. Every time you want to change that decision afterwards, you can add the function-accepting method right alongside the getters. No need to be pure or exclusive, just use the best of two worlds. Just don’t let the functions die (or never be born) because you “didn’t know about them” or found the style “unfamiliar”. Those are mere temporary problems. And one of them is solved right now. Happy coding!

Explicit types – and when to use them

Many modern programming languages offer a way declare variables without an explicit type if the type can be inferred, either dynamically or statically. Many also allow for variables to be explicitly defined with a type. For example, Scala and C# let you omit the explicit variable type via the var keyword, but both also allow defining variables with explicit types. I’m coming from the C++ world, where “auto” is available for this purpose since the relatively recent C++11. However, people are still debating whether you should actually use it.

Pros

Herb Sutter popularised the almost-always-auto style. He advocates that using more type inference is good because it is roughly equivalent to programming against interfaces instead of implementations. He says that “Overcommitting to explicit types makes code less generic and more interdependent, and therefore more brittle and limited.” However, he also mentions that you might sometimes want to use explicit types.

Now what exactly is overcommiting here? When is the right time to use explicit types?

Cons

Opponents to implicit typing, many of them experienced veterans, often state that they want the actual type visible in the source code. They don’t want to rely on type inference being right. They want the code to explicitly state what’s going on.

At first, I figured that was just conservatism in the face of a new “scary” feature that they did not fully understand. After all, IDEs can usually infer the type on-the-fly and you can hover on a variable to let it show you the type.

For C++, the function signature is a natural boundary where you often insert explicit types, unless you want to commit to the compile time and physical dependency cost that comes with templates. Other languages, such as Groovy, do not have this trade-off and let you skip explicit types almost everywhere. After working with Groovy/Grails for a while, where the dominant style seems to be to omit types whereever possible, it dawned on me that the opponents of implicit typing have a point. Not only does the IDE often fail to show me the inferred type (even though it still works way more often than I would have anticipated), but I also found it harder to follow and modify code that did not mention explicit types. Seemingly contrary to Herb Sutter’s argument, that code felt more brittle than I had liked.

Middle-ground

As usual, the truth seems to be somewhere in the middle. I propose the following rule for when to use explicit types:

  • Explicit typing for domain-types
  • Implicit typing everywhere else

Code using types from the problem domain should be as specific as possible. There’s no need for it to be generic – it’s actually counter-productive, as otherwise the code model would be inconsistent with model of the problem domain. This is also the most important aspect to grok when reading code, so it should be explicit. The type is as important as the action on it.

On the other hand, for pure-fabrication types that do not respresent a concept in the domain, the action is important, while the type is merely a means to achieve this action. Typically, most of the elements from a language’s standard library fall into this category. All your containers, iterators, callables. Their types are merely implementation details: an associative container could be an array, or a hash-map or a tree structure. Exchanging it rarely changes the meaning of the code in the problem domain – it just changes its performance characteristics.

Containers will occasionally contain domain-types in their type. What do you do about those? I think they belong in the “everywhere else” catergory, but you should be take extra care to name the contained type when working with it – for example when declaring the variable of the for-each loop on it, or when inserting something into it. This way, the “collection of domain-type” aspect will become clear, but the specific container implementation will stay implicit – like it should.

What do you think? Is this a useful proposition for your code?

Exiling a legacy COM component

One of our long-standing Java applications has several dependencies to native libraries, which are called via the Java Native Interface (JNI). We usually avoid native library dependencies, but this application must interface with some hardware devices for which the vendors only provide access through native libraries.

From 32 bit to 64 bit

Until recently the application ran in a 32-bit Java VM and the native libraries were 32-bit DLLs as well. Then the time had come to update the application to the 64-bit world. We wanted the application to run in a 64-bit JVM, and 32-bit library code cannot run in a 64-bit process.

We were able to replace the 32-bit libraries with 64-bit libraries, all except one. This particular dependency is not just a native library, but a Windows COM component. We had developed a wrapper DLL, which connected the COM component via JNI to the Java application. While we do have the source code of the wrapper DLL, we don’t have the source code of the COM component, and the vendor does not provide a 64-bit version of the component.

32 bit

So we decided to exile this COM component to a separate process, which we refer to as a container process. The main application runs in a 64-bit JVM and communicates with this process via a simple HTTP API. The API calls from the application do not require a very low latency.

Initially we had planned to implement this container process as a C++ program. However, after a spike it turned out that there is a quicker way to both interface with a COM component and provide a simple self-hosted HTTP service on the Windows platform: The .NET framework provides excellent support for COM interoperability. Using a COM component in .NET is as simple as adding the component as a reference to the project, importing the namespace and instantiating the COM object like a regular .NET object. The event handling of the COM component, which requires quite some boilerplate code to set up in C++, gets automatically mapped to the C#/.NET event handling. The only reminder of the fact that you’re using a COM component is the amount of out and ref parameters.

For the HTTP API we chose the Nancy framework, with which we have had good experiences in previous projects. The architecture now looks like this:

64-bit

While it is a drawback that the container process now depends on the .NET runtime, it is outweighed by the benefit for us: The C#/.NET code for interfacing with the COM component is more maintainable than the previous JNI wrapper code was or a native implementation of the container process would have been.