Bending the Java syntax until it breaks

Java is a peculiar programming language. It is used in lots of professional business cases and yet regarded as an easy language suitable for beginner studies. Java’s syntax in particular is criticized as bloated and overly strict and on the next blog praised as lenient and powerful. Well, lets have a look at a correct, runnable Java program that I like to show my students:

class $ {
	{
		System.out.println("hello world");
	}

	static {
		System.out.println("hello, too");
	}

	$() {
		http://www.softwareschneiderei.de
		while ($()) {
			break http;
		}
	}

	static boolean $() {
		return true;
	}

	public static void main(String[] _$) {
		System.out.println($.$());
	}
}

This Java code compiles, perhaps with some warnings about unlucky naming and can be run just fine. You might want to guess what its output to the console is. Notice that the code is quiet simple. No shenanigans with Java’s generics or the dark corners of lambda expressions and method handles. In fact, this code compiles and runs correctly since more than 20 years.

Lets dissect the code into its pieces. To really know what’s going on, you need to look into Java’s Language Specification, a magnificent compendium about everything that can be known about Java on the syntax level. If you want to dive even deeper, the Java Virtual Machine Specification might be your cup of tea. But be warned! Nobody on this planet understands everything in it completely. Even the much easier Java Language Specification contains chapters of pure magic, according to Joshua Bloch (you might want to watch the whole presentation, the statement in question is around the 6 minute mark). And in the code example above, we’ve used some of the magic, even if they are beginner level tricks.

What about the money?

The first glaring anomaly in the code is the strange names that are dollar signs or underscores. These are valid names, according to chapter 3.8 about Identifiers. And we’ve done great by choosing them. Let me quote the relevant sentence from this chapter:

“The “Java letters” include uppercase and lowercase ASCII Latin letters A-Z (\u0041-\u005a), and a-z (\u0061-\u007a), and, for historical reasons, the ASCII dollar sign ($, or \u0024) and underscore (_, or \u005f). The dollar sign should be used […]”

Oh, and by the way: Java identifiers are of unlimited length. You could go and write valid Java code that never terminates. We’ve gone the other way and made our names as short as possible – one character. Since identifiers are used as class names, method names, variable names and (implicitly) constructor names, we can name them all alike.

The variable name of the arguments in the main method used to be just an underscore, but somebody at Oracle changed this section of the Language Specification and added the following sentence:

“The underscore may be used in identifiers formed of two or more characters, but it cannot be used as a one-character identifier due to being a keyword.”

This change happened in Java 9. You can rename the variable “_$” to just “_” in Java 8 and it will work (with a warning).

URLs as first-class citizens?

The next thing that probably caught your eye is the URL in the first line of the constructor. It just stands there. And as I told you, the code compiles. This line is actually a combination of two things: A labeled statement and a comment. You already know about end-of-line comments that are started with a double slash. The rather unknown thing is the labeled statement before it, ending with a colon. This is one of the darker regions of the Language Specification, because it essentially introduces a poor man’s goto statement. And they knew it, because they explicitly talk about it:

“Unlike C and C++, the Java programming language has no goto statement; identifier statement labels are used with break…”

And this explains the weird line in the while loop: “break http” doesn’t command Java to do harm to the computer’s Internet connection, but to leave and complete the labeled statement, in our case the while loop. This spares us from the looming infinite loop, but raises another question: What names are allowed as labels? You’ve guessed it, it’s a Java identifier. We could have named our label “$:” instead of “http:” and chuckled about the line “break $”.

So, Java has a goto statement, but it isn’t called as such and it’s crippled to the point of being useless in practice. In my 20+ years of Java programming, I’ve seen it used just once in the wild.

What about it all?

This example of Java code serves me as a reminder that a programming language is what we make out of it. Our Java programs could easily all look like this if we wanted to. It’s not Java’s merit that our code is readable. And it’s not Java’s fault that we write bloated code. Both are results of our choices as programmers, not an inevitableness from the language we program in.

I sometimes venture to the darker regions of programming languages to see what the language could look and feel like if somebody makes the wrong decisions. This code example is from one of those little journeys several years ago. And it proved its worth once again when I tried to compile it with Java 9. Remember the addition in the Language Specification that made the single underscore a keyword? It wasn’t random. Java’s authors want to improve the lambda expressions in Project Amber, specifically in the JEP 302 (Lambda Leftovers). This JDK Enhancement Proposal (JEP) was planned for Java 10, is still not included in Java 11 and has no clear release date yet. My code gave me the motivation to dig into the topic and made me watch presentations like the one from Brian Goetz at Devoxx 2017 that’s really interesting and a bit unsettling.

Bending things until they break is one way to learn about their limits. What are your ways to learn about programming languages? Do you always stay in the middle lane? Leave a comment on your journeys.

3 rules for projects under version control

Checking out a project under version control should be easy and repeatable. Here are a few tips on how to achieve that:

1. Self-containment

You should not need a specifically configured machine to start working on a project. Ideally, you clone the project and get started. Many things can be a problem to achieve that.
Maybe your project is needs a specific operating system, dependency installed, hardware or database setup to run. I personally draw that line at this:
You get a manual with the code that helps you to set up your development environment once and this should be as automated & easy as possible. You should always be able to run your projects without periphery, i.e. specific hardware that can be plugged in or databases that need to be installed.
To achieve this for hardware, you can often fake it via polymorphic interfaces and dependency injection, much like mocking it for testing. The same can be done with databases – or you can use in-memory databases as a fallback.

2. Separate build-artifacts

Building the project into an executable form should be clearly separated from the source-controlled files. For example, the build should never ever modify a file that is under version control. Ideally, the “build” directory is completely independent from the source – enabling a true out-of-source build. However, it is often a good middle ground to allow building in a few dedicated directories in your source repository – but these need to be in .gitignore.

3. Separate runtime data

In the same way, running your project should not touch any source controlled files. Ideally, the project can be run out-of-source. This is trivial for small programs that do not have data, but once some data needs to be managed by the source control system, it gets a little more tricky for the executables to find the data. For data that needs to be changed by the program (we call these “stores”), it is advisable to maintain templates in the VCS or in codes, and copy them to the runtime directory during the build process. For data that is not changed by running the program, such as images, videos, translation-tables etc., you can copy them as well, or make sure the program finds them in the source repository.

Following these guidelines will make it easier to work with version control, especially when multiple people are involved.

For what the javascript!

The setting

We are developing and maintaining an important web application for one of our clients. Our application scrapes a web page and embeds our own content into that page frame.

One day our client told us of an additional block of elements at the bottom of each page. The block had a heading “Image Credits” and a broken image link strangely labeled “inArray”. We did not change anything on our side and the new blocks were not part of the HTML code of the pages.

Ok, so some new Javascript code must be the source of these strange elements on our pages.

The investigation

I started the investigation using the development tools of the browser (using F12). A search for the string “Image Credits” instantly brought me to the right place: A Javascript function called on document.ready(). The code was basically getting all images with a copyright attribute and put the findings in an array with the text as the key and the image url as the value. Then it would iterate over the array and add the copyright information at the bottom of each page.

But wait! Our array was empty and we had no images with copyright attributes. Still the block would be put out. I verified all this using the debugger in the browser and was a bit puzzled at first, especially by the strange name “inArray” that sounded more like code than some copyright information.

The cause

Then I looked at the iteration and it struck me like lightning: The code used for (name in copyrightArray) to iterate over the elements. Sounds correct, but it is not! Now we have to elaborate a bit, especially for all you folks without a special degree in Javascript coding:

In Javascript there is no distinct notion of associative arrays but you can access all enumerable properties of an object using square brackets (taken from Mozillas Javascript docs):

var string1 = "";
var object1 = {a: 1, b: 2, c: 3};

for (var property1 in object1) {
  string1 = string1 + object1[property1];
}

console.log(string1);
// expected output: "123"

In the case of an array the indices are “just enumerable properties with integer names and are otherwise identical to general object properties“.

So in our case we had an array object with a length of 0 and a property called inArray. Where did that come from? Digging further revealed that one of our third-party libraries added a function to the array prototype like so:

Array.prototype.inArray = function (value) {
  var i;
  for (i = 0; i < this.length; i++) {
    if (this[i] === value) {
      return true;
    }
  }
  return false;
};

The solution

Usually you would iterate over an array using the integer index (old school) or better using the more modern and readable for…of (which also works on other iterable types). In this case that does not work because we do not use integer indices but string properties. So you have to use Object.keys().forEach() or check with hasOwnProperty() if in your for…in loop if the property is inherited or not to avoid getting unwanted properties of prototypes.

The takeaways

Iteration in Javascript is hard! See this lengthy discussion…The different constructs are named quite similar an all have subtle differences in behaviour. In addition, libraries can mess with the objects you think you know. So finally some advice from me:

  • Arrays are only true arrays with positive integer indices/property names!
  • Do not mess with the prototypes of well known objects, like our third-party library did…
  • Use for…of to iterate over true arrays or other iterables
  • Do not use associative arrays if other options are available. If you do, make sure to check if the properties are own properties and enumerable.

 

Transforming C-Style arrays in java

Every now and then some customer asks us to fix or improve some important legacy application other people have written. Usually, such projects are fun and it is rewarding to see the improvements both in code and value for the users.

In one of these projects there is a Java GUI application that uses C-style arrays for some of its central data structures:

public class LegoBox  {
  public LegoBrick[] bricks = new LegoBrick[8000];
  public int brickCount = 0;
}

The array-length is a constant upper bound and does not denote the actual elements in the array. Elements are added dynamically to the array and it looks like a typical job for a automatically growing Collection like java.util.ArrayList. Most operations simply iterate over all elements and perform some calculations. But changing such a central part in a performance sensitive application is not only a lot of work but also risky.

We decided to take an incremental approach to improve code readability and maintainability and measured performance with a large, representative dataset between refactorings. There are two easy alternative APIs that improve working with the above data structure.

Imperative API

Smooth migration from the existing imperative “ask”-code (see “Tell, don’t ask”-principle) can be realized by providing an java.util.Iterable to the underlying array.


public int countRedBricks() {
  int redBrickCount = 0;
  for (int i = 0; i < box.brickCount; i++) {
    if (box.bricks[i].isRed()) {
      redBrickCount++;
    }
  }
  return redBrickCount;
}

Code like above is easily transformed to much clearer code like below:

public class LegoBox  {
  public LegoBrick[] bricks = new LegoBrick[8000];
  public int brickCount = 0;

  public Iterable<LegoBrick> allBricks() {
    return Arrays.stream(tr, 0, brickCount).collect(Collectors.toList());
  }
}

public int countRedBricks() {
  int redBrickCount = 0;
  for (LegoBrick brick : box.bricks) {
    if (brick.isRed()) {
      redBrickCount++;
    }
  }
  return redBrickCount;
}

Functional API

A nice alternative to the imperative solution above is a functional interface to the array. In Java 8 and newer we can provide that easily and encapsulate the iteration over our array:

public class LegoBox  {
  public LegoBrick[] bricks = new LegoBrick[8000];
  public int brickCount = 0;

  public <R> R forAllBricks(Function<Brick, R> operation, R identity, BinaryOperator<R> reducer) {
    return Arrays.stream(bricks, 0, brickCount).map(operation).reduce(identity, reducer);
  }

  public void forAllBricks(Consumer<LegoBrick> operation) {
    Arrays.stream(bricks, 0, brickCount).forEach(operation);
  }
}

public int countRedBricks() {
  return box.forAllBricks(brick -> brick.isRed() ? 1 : 0, 0, (sum, current) -> sum + current);
}

The functional methods can be tailored to your specific needs, of course. I just provided two examples for possible functional interfaces and their implementation.

The function + reducer case is a very general interface and used here for an implementation of our “count the red bricks” use case. Alternatively you could implement this use case with a more specific but easier to use filter + count interface:

public class LegoBox  {
  public LegoBrick[] bricks = new LegoBrick[8000];
  public int brickCount = 0;

  public long countBricks(Predicate<Brick> filter) {
    return Arrays.stream(bricks, 0, brickCount).filter(operation).count();
  }
}

public int countRedBricks() {
  return box.countBricks(brick -> brick.isRed());
}

The consumer case is very simple and found a lot in this specific project because mutation of the array elements is a typical operation and all over the place.

The functional API avoids duplicating the iteration all the time and removes the need to access the array or iterable/collection. It is therefore much more in the spirit of “tell”.

Conclusion

The new interfaces allow for much simpler and maintainable client code and remove a lot of duplicated iterations on the client side. They can be introduced on the way when implementing requested features for the customer.

That way we invested only minimal effort in cleaner, better maintainable and more error-proof code. When someday all accesses to the public array are encapsulated we can use the new found freedom to internalize the array and change it to a better fitting data structure like an ArrayList.

Lessons learned developing hybrid web apps (using Apache Cordova)

In the past year we started exploring a new (at leat for us) terrain: hybrid web apps. We already developed mobile web apps and native apps but this year we took a first step into the combination of both worlds. Here are some lessons learned so far.

Just develop a web app

after all the hybrid app is a (mobile) web app at its core, encapsulating the native interactions helped us testing in a browser and iterating much faster. Also clean architecture supports to defer decisions of the environment to the last possible moment.

Chrome remote debugging is a boon

The tools provided by Chrome for remote debugging on Android web views and browser are really great. You can even see and control the remote UI. The app has some redraw problems when the debugger is connected but overall it works great.

Versioning is really important

Developing web apps the user always has the latest version. But since our app can run offline and is installed as a normal Android app you have to have versions. These versions must be visible by the user, so he can tell you what version he runs.

Android app update fails silently

Sometimes updating our app only worked in parts. It seemed that the web view cached some files and didn’t update others. The problem: the updater told the user everything went smoothly. Need to investigate that further…

Cordova plugins helped to speed up

Talking to bluetooth devices? checked. Saving lots of data in a local sqlite? Plugins got you covered. Writing and reading local files? No problemo. There are some great plugins out there covering your needs without going native for yourself.

JavaScript isn’t as bad as you think

Working with JavaScript needs some discipline. But using a clean architecture approach and using our beloved event bus to flatten and exposing all handlers and callbacks makes it a breeze to work with UIs and logic.

SVG is great

Our apps uses a complex visualization which can be edited, changed, moved and zoomed by the user. SVG really helps here and works great with CSS and JavaScript.

Use log files

When your app runs on a mobile device without a connection (to the internet) you need to get information from the device to you. Just a console won’t cut it. You need log files to record the actions and errors the user provokes.

Accessibility is harder than you think

Modern design trends sometimes make it hard to get a good accessibility. Common problems are low contrast, using only icons on buttons, indiscernible touch targets, color as information bearer and touch targets that are too small.

These are just the first lessons we learned tackling hybrid development but we are sure there are more to come.

4 Tips for better CMake

We are doing one of those list posts again! This time, I will share some tips and insights on better CMake. Number four will surprise you! Let’s hop right in:

Tip #1

model dependencies with target_link_libraries

I have written about this before, and this is still my number one tip on CMake. In short: Do not use the old functions that force properties down the file hierarchy such as include_directories. Instead set properties on the targets via target_link_libraries and its siblings target_compile_definitions, target_include_directories and target_compile_options and “inherit” those properties via target_link_libraries from different modules.

Tip #2

always use find_package with REQUIRED

Sure, having optional dependencies is nice, but skipping on REQUIRED is not the way you want to do it. In the worst case, some of your features will just not work if those packages are not found, with no explanation whatsoever. Instead, use explicit feature-toggles (e.g. using option()) that either skip the find_package call or use it with REQUIRED, so the user will know that another lib is needed for this feature.

Tip #3

follow the physical project structure

You want your build setup to be as straight forward as possible. One way to simplify it is to follow the file system and and the artifact structure of your code. That way, you only have one structure to maintain. Use one “top level” file that does your global configuration, e.g. find_package calls and CPack configuration, and then only defers to subdirectories via add_subdirectory. Only for direct subdirectories though: if you need extra levels, those levels should have their own CMake files. Then build exactly one artifact (e.g. add_executable or add_library) per leaf folder.

Tip #4

make install() an option()

It is often desirable to include other libraries directly into your build process. For example, we usually do this with googletest for our unit test. However, if you do that and use your install target, it will also install the googletest headers. That is usually not what you want! Some libraries handle this automagically by only doing the install() calls when they are the top level project. Similar to the find_package tip above, I like to do this with an option() for explicit user control!

Generating done

That is it for today! I hope this is helps and we will all see better CMake code in the future.

About API astonishments

Nowadays we developers tend to stand on the shoulders of giants: We put powerful building-blocks from different libraries together to build something worth man-years in hours. Or we fill-in the missing pieces in a framework infrastructure to create a complete application in just a few days.

While it is great to have such tools in the form of application programmer interfaces (API) at your disposal it is hard to build high quality APIs. There are many examples for widely used APIs, good and bad. What does “bad API” mean? It depends on your view point:

Bad API for the API user

For the application programmer a bad API means things like:

  • Simple tasks/use cases are complicated
  • Complex tasks are impossible or require patching
  • Easy to misuse producing bugs

A very simple real life example of such an API is a C++ camera API I had to use in a project. Our users were able to change the area of interest (AOI) of the picture to produce images consisting of only a part of full resolution images. Our application did crash or not work as expected without obvious reasons. It took many hours of debugging to spot the subtle API misuse that could be verified be reading the documentation:

The value of camera.Width.GetMax() changed instead of being constant! The reason is that AOI was meant and not the sensor resolution width. The full resolution width we actually wanted is obtained by calling camera.WidthMax.GetValue(). This kind of naming makes the properties almost undistinguishable and communicates nothing of the implications. Terms like AOI or sensor width or full resolution just do not appear in this part of the API.

Small things like the example above may really hurt productivity and user experience of an API.

Bad API for the API programmer

API programmers can easily produce APIs that are bad for themselves because they take away too much freedom away resulting in:

  • Frequent breaking changes
  • API rewrites
  • Unimplementable features
  • Confusing, not fitting interfaces

Design your interfaces small and focused. Use types in the interface that leave as much freedom as possible without hurting usability (see Iterable vs. Collection vs. List vs. ArrayList for example). Try to build composable and extendable types because adding types or methods is less of a problem than changing them.

Conclusion

Developers should put extra care in interfaces they want to publish for others to use. Once the API is out there breaking it means angry users. Be aware that good API design is hard and necessary for a painless evolution of an API. Consider reading books like “Practical API Design” or “Build APIs You Won’t Hate” if you want to target a wider audience.