Transforming C-Style arrays in java

Every now and then some customer asks us to fix or improve some important legacy application other people have written. Usually, such projects are fun and it is rewarding to see the improvements both in code and value for the users.

In one of these projects there is a Java GUI application that uses C-style arrays for some of its central data structures:

public class LegoBox  {
  public LegoBrick[] bricks = new LegoBrick[8000];
  public int brickCount = 0;
}

The array-length is a constant upper bound and does not denote the actual elements in the array. Elements are added dynamically to the array and it looks like a typical job for a automatically growing Collection like java.util.ArrayList. Most operations simply iterate over all elements and perform some calculations. But changing such a central part in a performance sensitive application is not only a lot of work but also risky.

We decided to take an incremental approach to improve code readability and maintainability and measured performance with a large, representative dataset between refactorings. There are two easy alternative APIs that improve working with the above data structure.

Imperative API

Smooth migration from the existing imperative “ask”-code (see “Tell, don’t ask”-principle) can be realized by providing an java.util.Iterable to the underlying array.


public int countRedBricks() {
  int redBrickCount = 0;
  for (int i = 0; i < box.brickCount; i++) {
    if (box.bricks[i].isRed()) {
      redBrickCount++;
    }
  }
  return redBrickCount;
}

Code like above is easily transformed to much clearer code like below:

public class LegoBox  {
  public LegoBrick[] bricks = new LegoBrick[8000];
  public int brickCount = 0;

  public Iterable<LegoBrick> allBricks() {
    return Arrays.stream(tr, 0, brickCount).collect(Collectors.toList());
  }
}

public int countRedBricks() {
  int redBrickCount = 0;
  for (LegoBrick brick : box.bricks) {
    if (brick.isRed()) {
      redBrickCount++;
    }
  }
  return redBrickCount;
}

Functional API

A nice alternative to the imperative solution above is a functional interface to the array. In Java 8 and newer we can provide that easily and encapsulate the iteration over our array:

public class LegoBox  {
  public LegoBrick[] bricks = new LegoBrick[8000];
  public int brickCount = 0;

  public <R> R forAllBricks(Function<Brick, R> operation, R identity, BinaryOperator<R> reducer) {
    return Arrays.stream(bricks, 0, brickCount).map(operation).reduce(identity, reducer);
  }

  public void forAllBricks(Consumer<LegoBrick> operation) {
    Arrays.stream(bricks, 0, brickCount).forEach(operation);
  }
}

public int countRedBricks() {
  return box.forAllBricks(brick -> brick.isRed() ? 1 : 0, 0, (sum, current) -> sum + current);
}

The functional methods can be tailored to your specific needs, of course. I just provided two examples for possible functional interfaces and their implementation.

The function + reducer case is a very general interface and used here for an implementation of our “count the red bricks” use case. Alternatively you could implement this use case with a more specific but easier to use filter + count interface:

public class LegoBox  {
  public LegoBrick[] bricks = new LegoBrick[8000];
  public int brickCount = 0;

  public long countBricks(Predicate<Brick> filter) {
    return Arrays.stream(bricks, 0, brickCount).filter(operation).count();
  }
}

public int countRedBricks() {
  return box.countBricks(brick -> brick.isRed());
}

The consumer case is very simple and found a lot in this specific project because mutation of the array elements is a typical operation and all over the place.

The functional API avoids duplicating the iteration all the time and removes the need to access the array or iterable/collection. It is therefore much more in the spirit of “tell”.

Conclusion

The new interfaces allow for much simpler and maintainable client code and remove a lot of duplicated iterations on the client side. They can be introduced on the way when implementing requested features for the customer.

That way we invested only minimal effort in cleaner, better maintainable and more error-proof code. When someday all accesses to the public array are encapsulated we can use the new found freedom to internalize the array and change it to a better fitting data structure like an ArrayList.

Lessons learned developing hybrid web apps (using Apache Cordova)

In the past year we started exploring a new (at leat for us) terrain: hybrid web apps. We already developed mobile web apps and native apps but this year we took a first step into the combination of both worlds. Here are some lessons learned so far.

Just develop a web app

after all the hybrid app is a (mobile) web app at its core, encapsulating the native interactions helped us testing in a browser and iterating much faster. Also clean architecture supports to defer decisions of the environment to the last possible moment.

Chrome remote debugging is a boon

The tools provided by Chrome for remote debugging on Android web views and browser are really great. You can even see and control the remote UI. The app has some redraw problems when the debugger is connected but overall it works great.

Versioning is really important

Developing web apps the user always has the latest version. But since our app can run offline and is installed as a normal Android app you have to have versions. These versions must be visible by the user, so he can tell you what version he runs.

Android app update fails silently

Sometimes updating our app only worked in parts. It seemed that the web view cached some files and didn’t update others. The problem: the updater told the user everything went smoothly. Need to investigate that further…

Cordova plugins helped to speed up

Talking to bluetooth devices? checked. Saving lots of data in a local sqlite? Plugins got you covered. Writing and reading local files? No problemo. There are some great plugins out there covering your needs without going native for yourself.

JavaScript isn’t as bad as you think

Working with JavaScript needs some discipline. But using a clean architecture approach and using our beloved event bus to flatten and exposing all handlers and callbacks makes it a breeze to work with UIs and logic.

SVG is great

Our apps uses a complex visualization which can be edited, changed, moved and zoomed by the user. SVG really helps here and works great with CSS and JavaScript.

Use log files

When your app runs on a mobile device without a connection (to the internet) you need to get information from the device to you. Just a console won’t cut it. You need log files to record the actions and errors the user provokes.

Accessibility is harder than you think

Modern design trends sometimes make it hard to get a good accessibility. Common problems are low contrast, using only icons on buttons, indiscernible touch targets, color as information bearer and touch targets that are too small.

These are just the first lessons we learned tackling hybrid development but we are sure there are more to come.

4 Tips for better CMake

We are doing one of those list posts again! This time, I will share some tips and insights on better CMake. Number four will surprise you! Let’s hop right in:

Tip #1

model dependencies with target_link_libraries

I have written about this before, and this is still my number one tip on CMake. In short: Do not use the old functions that force properties down the file hierarchy such as include_directories. Instead set properties on the targets via target_link_libraries and its siblings target_compile_definitions, target_include_directories and target_compile_options and “inherit” those properties via target_link_libraries from different modules.

Tip #2

always use find_package with REQUIRED

Sure, having optional dependencies is nice, but skipping on REQUIRED is not the way you want to do it. In the worst case, some of your features will just not work if those packages are not found, with no explanation whatsoever. Instead, use explicit feature-toggles (e.g. using option()) that either skip the find_package call or use it with REQUIRED, so the user will know that another lib is needed for this feature.

Tip #3

follow the physical project structure

You want your build setup to be as straight forward as possible. One way to simplify it is to follow the file system and and the artifact structure of your code. That way, you only have one structure to maintain. Use one “top level” file that does your global configuration, e.g. find_package calls and CPack configuration, and then only defers to subdirectories via add_subdirectory. Only for direct subdirectories though: if you need extra levels, those levels should have their own CMake files. Then build exactly one artifact (e.g. add_executable or add_library) per leaf folder.

Tip #4

make install() an option()

It is often desirable to include other libraries directly into your build process. For example, we usually do this with googletest for our unit test. However, if you do that and use your install target, it will also install the googletest headers. That is usually not what you want! Some libraries handle this automagically by only doing the install() calls when they are the top level project. Similar to the find_package tip above, I like to do this with an option() for explicit user control!

Generating done

That is it for today! I hope this is helps and we will all see better CMake code in the future.

About API astonishments

Nowadays we developers tend to stand on the shoulders of giants: We put powerful building-blocks from different libraries together to build something worth man-years in hours. Or we fill-in the missing pieces in a framework infrastructure to create a complete application in just a few days.

While it is great to have such tools in the form of application programmer interfaces (API) at your disposal it is hard to build high quality APIs. There are many examples for widely used APIs, good and bad. What does “bad API” mean? It depends on your view point:

Bad API for the API user

For the application programmer a bad API means things like:

  • Simple tasks/use cases are complicated
  • Complex tasks are impossible or require patching
  • Easy to misuse producing bugs

A very simple real life example of such an API is a C++ camera API I had to use in a project. Our users were able to change the area of interest (AOI) of the picture to produce images consisting of only a part of full resolution images. Our application did crash or not work as expected without obvious reasons. It took many hours of debugging to spot the subtle API misuse that could be verified be reading the documentation:

The value of camera.Width.GetMax() changed instead of being constant! The reason is that AOI was meant and not the sensor resolution width. The full resolution width we actually wanted is obtained by calling camera.WidthMax.GetValue(). This kind of naming makes the properties almost undistinguishable and communicates nothing of the implications. Terms like AOI or sensor width or full resolution just do not appear in this part of the API.

Small things like the example above may really hurt productivity and user experience of an API.

Bad API for the API programmer

API programmers can easily produce APIs that are bad for themselves because they take away too much freedom away resulting in:

  • Frequent breaking changes
  • API rewrites
  • Unimplementable features
  • Confusing, not fitting interfaces

Design your interfaces small and focused. Use types in the interface that leave as much freedom as possible without hurting usability (see Iterable vs. Collection vs. List vs. ArrayList for example). Try to build composable and extendable types because adding types or methods is less of a problem than changing them.

Conclusion

Developers should put extra care in interfaces they want to publish for others to use. Once the API is out there breaking it means angry users. Be aware that good API design is hard and necessary for a painless evolution of an API. Consider reading books like “Practical API Design” or “Build APIs You Won’t Hate” if you want to target a wider audience.

A good name will shine forever

Naming things is supposedly one of the two hard things in Computer Science. Here are some tips on naming for programmers.

Getters

In the Java world property accessors are traditionally prefixed with “get” and “set”, the Java bean convention:

person.getFirstName()

Code becomes more pleasant to read if you omit the “get” prefix:

person.firstName()

Of course, you can do this only if you don’t use a framework that depends on the convention to recognize properties via reflection (like some OR mappers, for example).

What about setters? I rarely write setters anymore. If you design your classes as immutable types you don’t need setters. Even if your class has mutable state you probably want to control this state via methods more specific to the domain of the problem. Also, the more you apply the tell, don’t ask principle the less you will find the need for getters.

Brevity vs. verbosity

There were times when it was common to see mass variable declarations like the following at the beginning of a function:

int i, j, k, l, m, n;
float a, b, c, u, v, x, y, z;

Fortunately, times have changed for the better, and most programmers are aware that descriptive naming is important. However, some programmers do over-compensate. Length of an identifier is not a virtue by itself.

The Objective-C Cocoa framework is famous for overly long method names:

[array objectAtIndex:index]

Parts of Objective-C were inspired by Smalltalk. But in Smalltalk the same method is called at:

[array at:index]

This is a reasonably sufficient name for such a common functionality in programming.

Here’s another example: If the concept of a measurement station is very prevalent in the domain of your project then it’s ok to call instances just station instead of measurementStation if it’s the only kind of station in the domain.

Yes, the IDE does auto-complete long names. However, readability of the code decreases if the reader has to scan the same long-winded names over and over again:

MeasurementStation measurementStation = new MeasurementStation();
Measurement measurement = measurementStation.startMeasurement();

Often you can find names that are more to the point than longer descriptions, e.g. acquire instead of takeOwnershipOf. (source)

Hungarian notation and friends

The famous Hungarian notation is no longer in widespread use. However, there are variations of it that I would recommend against as well for the sake of readability. For example, bookList or bookArray can be simply books. Another variation would be conventions like myField or m_field for member variables. If you need these notations to determine the origin of a variable, then your scopes are probably too big, i.e. your methods, functions or classes are too long. Additionally, IDEs and editors for programmers can highlight these different scopes anyway. Other examples for unnecessary Hungarian-style notation are IFoo for interfaces, EFoo for enums or the infamous FooImpl.

Screaming constants

There is really no need for constants and enum values to constantly SCREAM at you and other readers. This SCREAMING_CASE convention has its origin in C, where constants used to be defined as macros when the const keyword wasn’t introduced yet, and it later found its way into other programming language ecosystems. Names for constants and enum values are not more important than other identifiers and don’t have to be spelled differently. Try it, you will enjoy the newfound silence in your code.

Conclusion

These are some tips to improve readability of code through better names. Some of these tips go against traditional conventions, so you should discuss them with your team before applying them. Consistency within an existing code base can definitely be more important. But if you have the freedom you should definitely give them a try.

Evolvability of Code: Uniform Access Principle

Most programmers like freedom. So there are many means of hiding implementations in modern programming languages, e.g. interfaces in Java, header files in C/C++ and visibility modifiers like private and protected in most object-oriented languages. Even your ordinary functions or public class interface gives you the freedom to change the implementation without needing to touch the clients. Evolvability in this sense means you can change and refine your implementations without requiring others, namely clients of your code, to change.

Changing the class interface or function signatures within a project is often possible and feasible, at least if you have access to all client code and use powerful refactoring tools. If you published your code as a library or do not want to break all client code or forcing them to adapt to your changes you have to consider your interface code to be fixed. This takes away some of your precious freedom. So you have to design your interfaces carefully with evolability in mind.

Some programming languages implement the uniform access principle (UAP) that eases evolvability in that it allows you to migrate from public attributes to properties/method calls without changing the clients: Read and write access to the attribute uses the same syntax as invoking corresponding methods. For clarification an example in Python where you may start with a class like:

class Person(object):
  def __init__(self, name, age):
    self.name = name
    self.age = age

Using the above class is trivial as follows

>>> pete = Person("pete", 32)
>>> print pete.age
32
# a year has passed
>>> pete.age = 33
>>> print pete.age
33

Now if the age is not a plain value anymore but needs checking, like always being greater zero or is calculated based on some calendar you can turn it to a property like so:

class Person(object):
  def __init__(self, name, age):
    self.name = name
    self._age = age

  @property
  def age(self):
    return self._age

  @age.setter
  def age(self, new_age):
    if new_age < 0:
      raise ValueError("Age under 0 is not possible")
    self._age = new_age

Now the nice thing is: The above client code still works without changes!

Scala uses a similar and quite concise mechanism for implementing the UAP wheres .NET provides some special syntax for properties but still migration from public fields easily possible.

So in languages supporting the UAP you can start really simple with public attributes holding the plain value without worrying about some potential future. If you later need more sophisticated stuff like caching, computation of the value, validation or even remote retrieval you can add it using language features without touching or bothering clients.

Unfortunately some powerful and widespread languages like Java and C++ lack support for UAP. Changing a public field to a more complex property means the introduction of getter and setter methods and changing all clients. Therefore you see, especially in Java, many data classes littered with trivial getter and setter pairs doing nothing interesting and introducing unnecessary bloat to maintain the evolvability of the code.

Why I’m not using C++ unnamed namespaces anymore

Well okay, actually I’m still using them, but I thought the absolute would make for a better headline. But I do not use them nearly as much as I used to. Almost exactly a year ago, I even described them as an integral part of my unit design. Nowadays, most units I write do not have an unnamed namespace at all.

What’s so great about unnamed namespaces?

Back when I still used them, my code would usually evolve gradually through a few different “stages of visibility”. The first of these stages was the unnamed-namespace. Later stages would either be a free-function or a private/public member-function.

Lets say I identify a bit of code that I could reuse. I refactor it into a separate function. Since that bit of code is only used in that compile unit, it makes sense to put this function into an unnamed namespace that is only visible in the implementation of that unit.

Okay great, now we have reusability within this one compile unit, and we didn’t even have to recompile any of the units clients. Also, we can just “Hack away” on this code. It’s very local and exists solely to provide for our implementation needs. We can cobble it together without worrying that anyone else might ever have to use it.

This all feels pretty great at first. You are writing smaller functions and classes after all.

Whole class hierarchies are defined this way. Invisible to all but yourself. Protected and sheltered from the ugly world of external clients.

What’s so bad about unnamed namespaces?

However, there are two sides to this coin. Over time, one of two things usually happens:

1. The code is never needed again outside of the unit. Forgotten by all but the compiler, it exists happily in its seclusion.
2. The code is needed elsewhere.

Guess which one happens more often. The code is needed elsewhere. After all, that is usually the reason we refactored it into a function in the first place. Its reusability. When this is the case, one of these scenarios usually happes:

1. People forgot about it, and solve the problem again.
2. People never learned about it, and solve the problem again.
3. People know about it, and copy-and-paste the code to solve their problem.
4. People know about it and make the function more widely available to call it directly.

Except for the last, that’s a pretty grim outlook. The first two cases are usually the result of the bad discoverability. If you haven’t worked with that code extensively, it is pretty certain that you do not even know that is exists.

The third is often a consequence of the fact that this function was not initially written for reuse. This can mean that it cannot be called from the outside because it cannot be accessed. But often, there’s some small dependency to the exact place where it’s defined. People came to this function because they want to solve another problem, not to figure out how to make this function visible to them. Call it lazyness or pragmatism, but they now have a case for just copying it. It happens and shouldn’t be incentivised.

A Bug? In my code?

Now imagine you don’t care much about such noble long term code quality concerns as code duplication. After all, deduplication just increases coupling, right?

But you do care about satisfied customers, possibly because your job depends on it. One of your customers provides you with a crash dump and the stacktrace clearly points to your hidden and protected function. Since you’re a good developer, you decide to reproduce the crash in a unit test.

Only that does not work. The function is not accessible to your test. You first need to refactor the code to actually make it testable. That’s a terrible situation to be in.

What to do instead.

There’s really only two choices. Either make it a public function of your unit immediatly, or move it to another unit.

For functional units, its usually not a problem to just make them public. At least as long as the function does not access any global data.

For class units, there is a decision to make, but it is simple. Will using preserve all class invariants? If so, you can move it or make it a public function. But if not, you absolutely should move it to another unit. Often, this actually helps with deciding for what to create a new class!

Note that private and protected functions suffer many of the same drawbacks as functions in unnamed-namespaces. Sometimes, either of these options is a valid shortcut. But if you can, please, avoid them.