Displaying numbers in tables

Many software applications have to display series of numbers, for example statistical information, measurement values or financial data. Of course there are many ways to visualize values graphically with charts, but sometimes the user wants to see the actual values as numbers. The typical layout method to display numbers are tables.

Here are some guidelines you should follow when you have to display numbers in a table.

Integer numbers

Right aligned integer numbers
Right-aligned integer numbers

Integer numbers that are shown in a table column should be right-aligned, because the orders of magnitude of a number’s digits increase from right to left. Additionally you should choose a font with fixed-width digits for numbers. This ensures that digits with the same orders of magnitude line up. Thus the numbers can be compared more easily. The font itself doesn’t have to be a fixed-width font in general. Some proportional fonts with variable widths for letters have fixed-widths for digits, called tabular figures.

Non-integer numbers

Aligned with decimal points
Aligned with decimal points

Non-integer numbers with decimal points should be aligned with their decimal points. The reason is the same as above: digits with the same orders of magnitude should line up. This can be a bit more effort to implement in your application than mere right-alignment, because components such as UI widgets or HTML tables usually don’t directly support this form of alignment.

However, you can implement it by using a font with tabular figures and then right-pad the numbers with spaces. Each of these spaces must have the same width as a digit, of course. This is the case with a fixed-width font, but there is also a special Unicode character for this purpose that can be used with proportional fonts and tabular figures: it’s called figure space and has the Unicode code point U+2007.

Modern developer Issue 4: My SQL toolbox

Thinking in SQL

SQL is such a basic and useful language but the underlying thinking is non-intuitive when you come from imperative languages like Java, Ruby and similar.
SQL is centered around sets and operations on them. The straight forward solution might not be the best one.

Limit

Let’s say we need the maximum value in a certain set. Easy:

select max(value) from table

But what if we need the row with the maximum value? Just adding the other columns won’t work since aggregations only work with other aggregations and group bys. Joining with the same table may be straight forward but better is to not do any joins:

select * from (select * from table order by value desc) where rownum<=1

Group by and having

Even duplicate values can be found without joining:

select value from table group by value having count(*) > 1

Grouping is a powerful operation in SQL land:

select max(value), TO_CHAR(time, 'YYYY-MM') from table group by TO_CHAR(time, 'YYYY-MM')

Finding us the maximum value in each month.

Mapping with outer joins

SQL is also good for calculations. Say we have one table with values and one with a mapping like a precalculated log table. Joining both gets the log of each of your values:

select t.value, log.y from table t left outer join log_table log on t.value=log.x

Simple calculations

We can even use a linear interpolation between two values. Say we have only the function values stored for integers but we values between them and these values between them can be interpolated linearly.

select t.value, (t.value-floor(t.value))*f.y + (ceil(t.value)-t.value)*g.y from table t left outer join function_table f on floor(t.value)=f.x left outer join function_table g on ceil(t.value)=g.x

When you need to calculate for large sets of values and insert them into another table it might be better to calculate in SQL and insert in one step without all the conversion and wrapping stuff present in programming languages.

Conditions

Another often overlooked feature is to use a condition:

select case when MOD(t.value, 2) = 0 then 'divisible by 2' else 'not divisible by 2' end from table t

These handful operations are my basic toolbox when working with SQL, almost all queries I need can be formulated with them.

Dates and timestamps

One last reminder: when you work with time always specify the wanted time zone in your query.

It’s only Cores and Caches but I like it

The programming game changed dramatically in the past ten years. We are playing CPU cores and caches now, but without proper visibility.

759px-amd_am5x86_dieMost of our software development economy is based on a simple promise: The computing power (or “performance”) of a common computer will double every two years. This promise accompanied us for 40 years now, a time during which our computers got monitors, acquired harddisks and provided RAM beyond the 640 kB that was enough for nobody. In the more recent years, we don’t operate systems with one CPU, but four, eight or even twelve of them. So it came as a great irritation when ten years ago, Herb Sutter predicted that “The free lunch is over” and even Gordon Moore, the originator of Moore’s Law that forms the basis of our simple promise said that it will only hold true for ten to fiveteen more years. Or, in other words, until today.

Irritation

That’s a bit unsettling, to say the least, and should be motivation enough to have a good look at everything we are doing. Intel, the biggest manufacturer of CPUs for computers, has indicated earlier this year that Moore’s Law cannot be fulfilled any longer. So, the free lunch is really over. And it turns out to have some hidden costs. One cost is a certain complacency, the conviction that things will continue to be as they were and that coding styles chiseled over years and decades hold an inherent value of experience.

Complacency

Don’t get me wrong – there is great value in experience, but not all knowledge of the past is helpful for the future. Sometimes, fundamental things change. Just as the tables will eventually turn for every optimization trick, we need to reevaluate some axioms of our stance towards performance. Let me reiterate some common knowledge:

There are two types of performance inherently baked into your source code: Theoretical and practical performance.

Performance

The first type is theoretical performance, measured in O(n), O(n²) or even O(n!) and mostly influenced by the complexity class of the algorithm you are using. It will translate into runtime behaviour (like in the case of O(n!) your software is already dead, you just don’t know yet), but isn’t concerned with the details of your implementation. Not using an unnecessary high complexity class for a given problem will continue to be a valueable skill that every developer should master.

On the other hand, practical performance is measured in milliseconds (or nanoseconds if you are into micro-benchmarks and can pull off to measure them correctly) and can heavily depend on just a few lines in your source code. Practical performance is the observable runtime behaviour of your software on a given hardware. There are two subtypes of practical performance:

  • Throughput (How many operations are computed by the system in a given unit of time?)
  • Latency (How long does it take one operation to be computed by the system?)

If you run a service, throughput is your main metric for performance. If you use a service, latency is your main concern. Let me explain this by the metapher of a breakfast egg. If you want to eat your breakfast at a hotel buffet and the eggs are empty, your main concern is how fast you will get your freshly boiled egg (latency). But if you run the hotel kitchen, you probably want to cook a lot of eggs at once (throughput), even if that means that one particular egg might boil slightly longer as if you’d boiled each of them individually.

Latency

Those two subtypes are not entirely independent from each other. But the main concern for most performance based work done by developers is latency. It is relatively easy to measure and to reason about. If you work with latency-based performance issues, you should know about the latency numbers every programmer should know, either in visual form or translated to a more human time scale. Lets iterate some of the numbers and their scaled counterpart here:

  • 1 CPU cycle (0.3 ns): 1 second
  • Level 1 cache access (0.9 ns): 3 seconds
  • Branch mispredict (2.5 ns): 8 seconds
  • Level 2 cache access (2.8 ns): 9 seconds
  • Level 3 cache access (12.9 ns): 43 seconds
  • Main memory access (120 ns): 6 minutes
  • Solid-state disk I/O (50-150 μs): 2-6 days
  • Rotational disk I/O (1-10 ms): 1-12 months

We can discuss any number in detail, but the overall message stands out nonetheless: CPUs are lightning fast and caches are the only system components that can somewhat keep up. As soon as your program hits the RAM, your peak performance is lost. This brings us to the main concept of latency optimization:

Your program’s latency is ultimately decided by your ability to decrease cache misses.

You can save CPU cycles by performing clever hacks, but if you are able to always read your data (and code) from the cache, you’ll be 360 times faster than if your program constantly has to read from RAM. Your source code doesn’t have to change at all for this to happen. A good compiler and/or optimizing runtime can work wonders if you adhere to your programming language’s memory model. In reality, you probably have to rearrange your instructions and align your data structures. That’s the performance optimization of today, not the old cycle stinting. The big challenge is that none of these aspects are visible on the source code level of your program. We have to develop our programs kind of blindfolded currently.

Concurrency

One way how we’ve held up Moore’s Law in the last ten years was the introduction of multiprocessor computing into normal computers. If you cram two CPUs onto the die, the number of transistors on it has doubled. A single-threaded program doesn’t run any faster, though. So we need to look at concurrent programming to unlock the full power of our systems. Basically, there are two types of concurrent programming, deliberate and mechanical.

  • Deliberate concurrent programming means that you as the developer actively introduce threads, fibers or similar concepts into your source code to control parallel computation.
  • Mechanical concurrent programming means that your source code can be parallelized by the compiler and/or runtime and/or the hardware (e.g. hyper-threading) without changing the correctness of your program.

In both types of concurrent programming, you need to be aware about the constraints and limitations of correct concurrency. It doesn’t matter if your program is blazingly fast and utilizes all cores if the result is wrong or only occasionally correct. Once again, the memory model of your programming language is a useful set of rules and abstractions to guide you. Most higher-level concurrency models like actors narrow your possibilities even further, with functional programming being one of the strictest (and most powerful ones).

In the field of software development, we are theoretically well-prepared to take on the task of pervasive concurrent programming. But we need to forget about the good old times of single-core confirmability and embrace the chaotic world of raw computing power, the world of cores and caches.

Cores ‘n’ Caches

This is our live now: We rely on Cores ‘n’ Caches to feed us performance, but the lunch isn’t free anymore. We have to adapt and adjust, to rethink our core axioms and let go of those parts of our experience that are now hindrances. Just like Rock ‘n’ Roll changed the rules in the music business, our new motto changes ours.

Let’s rock.

Why I’m not using C++ unnamed namespaces anymore

Well okay, actually I’m still using them, but I thought the absolute would make for a better headline. But I do not use them nearly as much as I used to. Almost exactly a year ago, I even described them as an integral part of my unit design. Nowadays, most units I write do not have an unnamed namespace at all.

What’s so great about unnamed namespaces?

Back when I still used them, my code would usually evolve gradually through a few different “stages of visibility”. The first of these stages was the unnamed-namespace. Later stages would either be a free-function or a private/public member-function.

Lets say I identify a bit of code that I could reuse. I refactor it into a separate function. Since that bit of code is only used in that compile unit, it makes sense to put this function into an unnamed namespace that is only visible in the implementation of that unit.

Okay great, now we have reusability within this one compile unit, and we didn’t even have to recompile any of the units clients. Also, we can just “Hack away” on this code. It’s very local and exists solely to provide for our implementation needs. We can cobble it together without worrying that anyone else might ever have to use it.

This all feels pretty great at first. You are writing smaller functions and classes after all.

Whole class hierarchies are defined this way. Invisible to all but yourself. Protected and sheltered from the ugly world of external clients.

What’s so bad about unnamed namespaces?

However, there are two sides to this coin. Over time, one of two things usually happens:

1. The code is never needed again outside of the unit. Forgotten by all but the compiler, it exists happily in its seclusion.
2. The code is needed elsewhere.

Guess which one happens more often. The code is needed elsewhere. After all, that is usually the reason we refactored it into a function in the first place. Its reusability. When this is the case, one of these scenarios usually happes:

1. People forgot about it, and solve the problem again.
2. People never learned about it, and solve the problem again.
3. People know about it, and copy-and-paste the code to solve their problem.
4. People know about it and make the function more widely available to call it directly.

Except for the last, that’s a pretty grim outlook. The first two cases are usually the result of the bad discoverability. If you haven’t worked with that code extensively, it is pretty certain that you do not even know that is exists.

The third is often a consequence of the fact that this function was not initially written for reuse. This can mean that it cannot be called from the outside because it cannot be accessed. But often, there’s some small dependency to the exact place where it’s defined. People came to this function because they want to solve another problem, not to figure out how to make this function visible to them. Call it lazyness or pragmatism, but they now have a case for just copying it. It happens and shouldn’t be incentivised.

A Bug? In my code?

Now imagine you don’t care much about such noble long term code quality concerns as code duplication. After all, deduplication just increases coupling, right?

But you do care about satisfied customers, possibly because your job depends on it. One of your customers provides you with a crash dump and the stacktrace clearly points to your hidden and protected function. Since you’re a good developer, you decide to reproduce the crash in a unit test.

Only that does not work. The function is not accessible to your test. You first need to refactor the code to actually make it testable. That’s a terrible situation to be in.

What to do instead.

There’s really only two choices. Either make it a public function of your unit immediatly, or move it to another unit.

For functional units, its usually not a problem to just make them public. At least as long as the function does not access any global data.

For class units, there is a decision to make, but it is simple. Will using preserve all class invariants? If so, you can move it or make it a public function. But if not, you absolutely should move it to another unit. Often, this actually helps with deciding for what to create a new class!

Note that private and protected functions suffer many of the same drawbacks as functions in unnamed-namespaces. Sometimes, either of these options is a valid shortcut. But if you can, please, avoid them.

Arbitrary 2D curves with Highcharts

Highcharts is a versatile JavaScript charting library for the web. The library supports all kinds of charts: scatter plots, line charts, area chart, bar charts, pie charts and more.

A data series of type line or spline connects the points in the order of the x-axis when rendered. It is possible to invert the axes by setting the property inverted: true in the chart options.

var chart = $('#chart').highcharts({
  chart: {
    inverted: true
  },
  series: [{
    name: 'Example',
    type: 'line',
    data: [
      {x: 10, y: 50},
      {x: 20, y: 56.5},
      {x: 30, y: 46.5},
      // ...
    ]
  }]
});

line-chart-inverted

Connecting points in an arbitrary order

Connecting the points in an arbitrary order, however, is not supported by default. I couldn’t find a Highcharts plugin which supports this either, so I implemented a solution by modifying the getGraphPath function of the series object:

var series = $('#chart').highcharts().series[0];
var oldGetGraphPath = series.getGraphPath;
Object.getPrototypeOf(series).getGraphPath = function(points, nullsAsZeroes, connectCliffs) {
  points = points || this.points;
  points.sort(function(a, b) {
    return a.sortIndex - b.sortIndex;
  });
  return oldGetGraphPath.call(this, points, nullsAsZeroes, connectCliffs);
};

The new function sorts the points by a new property called sortIndex before the line path of the chart gets rendered. This new property must be assigned to each point object of the series data:

series.setData([
  {x: 10, y: 50, sortIndex: 1},
  {x: 20, y: 56.5, sortIndex: 2},
  {x: 30, y: 46.5, sortIndex: 3},
  // ...
], true);

Now we can render charts with points connected in an arbitrary order like this:

A line chart with points connected in arbitrary order
A line chart with points connected in arbitrary order

Modern developer #3: Framework independent JavaScript architecture

Usually small JavaScript projects start with simple wiring of callbacks onto DOM elements. This works fine when it the project is in its initial state. But in a short time it gets out of hand. Now we have spaghetti wiring and callback hell. Often at this point we try to get help by looking at adopting a framework, hoping to that its coded best practices draw us out of the mud. But now our project is tied to the new framework.
In search of another, framework independent way I stumbled upon scalable architecture by Nicholas Zakas.
It starts by defining modules as independent units. This means:

  • separate JavaScript and DOM elements from the rest of the application
  • Modules must not reference other modules
  • Modules may not register callbacks or even reference DOM elements outside their DOM tree
  • To communicate with the outside world, modules can only call the sandbox

The sandbox is a central hub. We use a pub/sub system:

sandbox.publish({type: 'event_type', data: {}});

sandbox.subscribe('event_type', this.callback.bind(this));

Besides being an event bus, the sandbox is responsible for access control and provides the modules with a consistent interface.
Modules are started and stopped (in case of misbehaving) in the application core. You could also use the core as an anti corruption layer for third party libraries.
This architecture gives a frame for implementation. But implementing it raises other questions:

  • how do the modules update their state?
  • where do we call the backend?

Handling state

A global model would reside or be referenced by the application core. In addition every module has its own model. Updates are always done in application event handlers, not directly in the DOM event handlers.
Let me illustrate. Say we have a module with keeps track of selected entries:

function Module(sandbox) {
  this.sandbox = sandbox;
  this.selectedEntries = [];
}

Traditionally our DOM event handler would update our model:

button.on('click', function(e) {
  this.selectedEntries.push(entry);
});

A better way would be to publish an application event, subscribe the module to this event and handle it in the application event handler:

this.sandbox.subscribe('entry_selected', this.entrySelected.bind(this));

Module.prototype.entrySelected = function(event) {
  this.selectedEntries.push(event.entry);
};

button.on('click', function(e) {
  this.sandbox.publish({type: 'entry_selected', entry: entry});
});

Other modules can now listen on selecting entries. The module itself does not need to know who selected the entry. All the internal communication of selection is visible. This makes it possible to use event sourcing.

Calling the backend

No module should directly call the backend. For this a special module called extension is used. Extensions encapsulate cross cutting concerns and shield communication with other systems.

Summary

This architecture keeps UI parts together with their corresponding code, flattens callbacks and centralizes the communication with the help of application events and encapsulates outside communication. On top of that it is simple and small.

Children behind the wheel

What happens when you put children behind the steering wheel? This blog entry looks at the Dunning-Kruger effect in software development.

A few weeks ago, I read a funny news article about a 11 years old boy who stole a bus and drove the normal route with it. When the police stopped him, he had already picked up some passengers and somehow managed to only inflict minor damages along the way. The whole story (in german) can be read here.

Author: Vitaly Volkov, Волков Виталий Сергеевич (user kneiphof) https://commons.wikimedia.org/wiki/User:KneiphofThis blog entry is not about the unheeding passengers, it’s about the little boy and his mindset. This mindset exists in the business world, too. It’s the mixture of “what could possibly go wrong?” with “I’m totally able to pull this off” and a large dose of “everybody else is surely faking it, too”. In the context of the Dunning-Kruger effect, this mixture is called “unskilled and unaware of it”. It’s a dangerous situation for both the employee and the employer, because neither of them can properly evaluate the actual risk.

The Dunning-Kruger effect

Let’s start with the known theory. The Dunning-Kruger effect describes a cognitive bias in which unskilled individuals assess their ability (in the context of a given skill) much higher than it really is and highly skilled individuals tend to underestimate their (relative) competence. The problem is not that real experts tend to be modest about their expertise. The real problem is that “if you’re incompetent, you can’t know you’re incompetent. The skills you need to produce a right answer are exactly the skills you need to recognize what a right answer is.”

So in short: Being unskilled in a certain area probably means you don’t really know that you are unskilled.

Or, translated to our young bus driver: If you don’t know anything about driving a bus, you certainly think you are as much a decent bus driver as the man behind the wheel. It looks easy enough from the passenger seat.

The effect in practice

Why should this bother us in software development? Our education system ensures that we are exposed to enough development practice so that we can counter the “unaware of it” part of the Dunning-Kruger effect. But it isn’t effective enough, at least that’s what I see from time to time.

Every once in a while, I have the opportunity to evaluate an existing code base. Most of the time, it produces a working, profitable application, so it cannot be said to be a failure. Sometimes however, the code base itself is so convoluted, bloated and riddled with poor implementation choices that it absolutely cannot be developed any further without a high risk of regression bugs and/or absurd amounts of developer time for little changes.

These hopeless source codes have one thing in common: they are developed by one person and one person alone. This person has developed for months or even years, showing progress and reporting no problems and suddenly resigned, often shortly before a major milestone in the project like going live with the first version or announcing the next version. The code base now lies abandonded and needs to be adopted. And while no code base is perfect (or should even try to be), this one reeks of desperation and frustration. Often, the application itself is not very demanding, but the source code makes it appear to be.

A good example of this kind of project is my scrap metal tale (in three parts) from five years ago:

A more recent case is littered with inline comments that celebrate small victories of the developer:

  • 5 lines of convoluted, contradicatory statements
  • 3 lines commented out
  • 1 comment line stating “YES! This is finally working! Super!”
  • Still three obvious bugs, resource leaks or security flaws in these few lines alone

The most outrageous (and notorious) case might be the Brillant Paula Bean from the Daily WTF, but this code base is at least readily comprehensible.

The origins of the effect

I think a lot of the frustration, desperation, anxiety and outright fear that I can sense through the comments and code structure was really felt by the original author. It must have been incredibly hard to stay on course, work hard and come up with solutions in the face of imminent deadlines, ever-changing requirements and the lingering fear that you’re just not up to the task. Except that we’ve just learned that developers in the “unskilled and unaware of it” state won’t feel the fear. That’s the origin of all the bad code: The absolute conviction that “it’s not me, it’s the problem, the domain, the language, the compiler, the weather and everything else”. Programming is just hard. Nobody else could do this better. It’ll work in the end. Those “minor problems” (like never actually speaking to the hardware, hard-coded paths and addresses, etc.) will be fixed at the last moment. There’s nothing wrong with an occassional exception stack trace in the logfile and if it bothers you, I can always make the catch-block empty.

The most obvious problem is that these developers think that this is how everybody else develops software, too. That we all don’t bother with concurrency correctness, resource lifecycle management, data structures, graphical user interface design, fault tolerance or even just basic logging. That all programming is hard and frustrating. That finding out where to insert a sleep statement to quench that pesky exception is the pinnacle of developer ingenuity. That things like automated tests, code metrics, continuous integration or even version control are eccentric fads that will pass by and be forgotten soon, so no need to deal with it. That we all just fake it and dread the moment our software is used in the wild.

A possible remedy

The “unskilled and unaware of it” developer isn’t dumb or hopeless, he’s just unskilled. The real tragedy is that he doesn’t have a mentor that can alleviate the biggest problems in the code and show better approaches. The unskilled lonesome developer cannot train himself. A good explanation for this novice “lock-in” can be found in the Dreyfus Model of Skill Acquisition (I recommend you watch the excellent talk on this topic by Dave Thomas): Novices lack the skills for proper self-assessment and cannot learn from their mistakes (as stated by the Dunning-Kruger effect, too). They also cannot recognize them as “their” mistakes or even “mistakes”. They need outside feedback (and guidance) to advance themselves. A mentor’s role is to give exactly that.

Conclusion

If you find yourself in a position of being a “skilled and fairly aware of it” developer, please be aware that you’ve probably been mentored sometimes in the past. Pass it on! Be the mentor for an aspiring junior developer.

If you suspect that you might fall in the “unskilled” category of developers, don’t despair! Being aware of this is the first and most important step. Now you can act strategically to improve your skill. There is a whole book giving you invaluable advice: Apprenticeship Patterns: Guidance for the Aspiring Software Craftsman by Dave Hoover and Adewale Oshineye. Two prominent advices from the book are “Be the Worst (of your team)” and “Find Mentors“. And my most prominent advice? “Don’t stick it out alone“.

Programming is (or should be) fun after all.

Remote development with PyCharm

PyCharm is a fantastic tool for python development. One cool feature that I quite like is its support for remote development. We have quite a few projects that need to interact with special hardware, and that hardware is often not attached to the computer we’re developing on.
In order to test your programs, you still need to run it on that computer though, and doing this without tool support can be especially painful. You need to use a tool like scp or rsync to transmit your code to the target machine and then execute it using ssh. This all results in painfully long and error prone iterations.
Fortunately, PyCharm has tool support in its professional edition. After some setup, it allows you do develop just as you would on a local machine. Here’s a small guide on how to set it up with an ubuntu vagrant virtual machine, connecting over ssh. It work just as nicely on remote computers.

1. Create a new deployment configuration

In the Tools->Deployment->Configurations click the small + in the top left corner. Pick a name and choose the SFTP type.
add_server

In the “Connection” Tab of the newly created configuration, make sure to uncheck “Visible only for this project”. Then, setup your host and login information. The root path is usually a central location you have access to, like your home folder. You can use the “Autodetect” button to set this up.

connection
For my VM, the settings look like this.

On the “Mappings” Tab, set the deployment path for your project. This would be the specific folder of your project within the root you set on the previous page. Clicking the associated “…” button here helps, and even lets you create the target folder on the remote machine if it does not exist yet.

2. Activate the upload

Now check “Tools->Deployment->Automatic Upload”. This will do an upload when you change a file, so you still need to do the initial upload manually via “Tools->Deployment->Upload to “.

3. Create a project interpreter

Now the files are synced up, but the runtime environment is not on the remote machine. Go to the “Project Interpreter” page in File->Settings and click the little gear in the top-right corner. Select “Add Remote”.

remote_interpreter
It should have the Deployment configuration you just created already selected. Once you click ok, you’re good to go! You can run and debug your code just like on a local machine.

Have fun developing python applications remotely.

The migration path for Java applets

Java applets have been a deprecated technology for a while. It has become increasingly difficult to run Java applets in modern browsers. Browser vendors are phasing out support for browser plugins based on the Netscape Plugin API (NPAPI) such as the Java plugin, which is required to run applets in the browser. Microsoft Edge and Google Chrome already have stopped supporting NPAPI plugins, Mozilla Firefox will stop supporting it at the end of 2016. This effectively means that Java applets will be dead by the end of the year.

However, it does not necessarily mean that Java applet based projects and code bases, which can’t be easily rewritten to use other technologies such as HTML5, have to become useless. Oracle recommends the migration of Java applets to Java Web Start applications. This is a viable option if the application is not required to be run embedded within a web page.

To convert an applet to a standalone Web Start application you put the applet’s content into a frame and call the code of the former applet’s init() and start() from the main() method of the main class.

Java Web Start applications are launched via the Java Network Launching Protocol (JNLP). This involves the deployment of an XML file on the web server with a “.jnlp” file extension and content type “application/x-java-jnlp-file”. This file can be linked on a web page and looks like this:

<?xml version="1.0" encoding="UTF-8"?>
<jnlp spec="1.0+" codebase="http://example.org/demo/" href="demoproject.jnlp">
 <information>
  <title>Webstart Demo Project</title>
  <vendor>Softwareschneiderei GmbH</vendor>
  <homepage>http://www.softwareschneiderei.de</homepage>
  <offline-allowed/>
 </information>
 <resources>
  <j2se version="1.7+" href="http://java.sun.com/products/autodl/j2se"/>
  <jar href="demo-project.jar" main="true"/>
  <jar href="additional.jar"/>
 </resources>
 <application-desc name="My Project" main-class="com.schneide.demo.WebStartMain">
 </application-desc>
 <security>
  <all-permissions/>
 </security>
 <update check="always"/>
</jnlp>

The JNLP file describes among other things the required dependencies such as the JAR files in the resources block, the main class and the security permissions.

The JAR files must be placed relative to the URL of the “codebase” attribute. If the application requires all permissions like in the example above, all JAR files listed in the resources block must be signed with a certificate.

deployJava.js

The JNLP file can be linked directly in a web page and the Web Start application will be launched if Java is installed on the client operating system. Additionally, Oracle provides a JavaScript API called deployJava.js, which can be used to detect whether the required version of Java is installed on the client system and redirect the user to Oracle’s Java download page if not. It can even create an all-in-one launch button with a custom icon:

<script src="https://www.java.com/js/deployJava.js"></script>
<script>
  deployJava.launchButtonPNG = 'http://example.org/demo/launch.png';
  deployJava.createWebStartLaunchButton('http://example.org/demo/demoproject.jnlp', '1.7.0');
</script>

Conclusion

The Java applet technology has now reached its “end of life”, but valuable applet-based projects can be saved with relatively little effort by converting them to Web Start applications.

Modern developer Issue #2: RPM like deployment on Windows

Deployment is a crucial step in every development project. Without shipping no one would ever see our work (and we get no feedback if our work is good).

drawer

Often we fear deploying to production because of the effort involved and the errors we make. Questions like ‘what if we forget a step?’ or ‘what if the new version we install is buggy?’ buzz in our mind.

fears

Deployment needs to be a non-event, a habit. For this we need to automate every step besides the first one: clicking a button to start deployment.

deploy

On Linux we have wonderful tools for this but what if you are stuck with deploying to Windows?

brave

Fear not, brave developer! Even on Windows we can use a package manager to install and rollback buggy versions. Let me introduce you to chocolatey.

choco

Chocolatey (or choco in short) uses the common NuGet package format. Formerly developed for the .net platform we can use it for other platforms, too. In our following example we use a simple Java application which we install as a service and as a task.
Setting up we need a directory structure for the package like this:

folders

We need to create two files: one which specifies our package (my_project.nuspec) and one script which holds the deployment steps (chocolateyinstall.ps1). The specification file holds things like the package name, the package version (which can be overwritten when building the package), some pointers to project, source and license URLs. We can configure files and directories which will be copied to the package: in our example we use a directory containing our archives (aptly named archives) and a directory containing the installation steps (named tools). Here is a simple example:

<?xml version="1.0" encoding="utf-8"?>
<!-- Do not remove this test for UTF-8: if “Ω” doesn’t appear as greek uppercase omega letter enclosed in quotation marks, you should use an editor that supports UTF-8, not this one. -->
<package xmlns="http://schemas.microsoft.com/packaging/2015/06/nuspec.xsd">
  <metadata>
    <id>my_project</id>
    <title>My Project (Install)</title>
    <version>0.1</version>
    <authors>Me</authors>
    <owners>Me</owners>
    <summary></summary>
    <description>Just an example</description>
    <projectUrl>http://localhost/my_project</projectUrl>
    <packageSourceUrl>http://localhost/git</packageSourceUrl>
    <tags>example</tags>
    <copyright>My company</copyright>
    <licenseUrl>http://localhost/license</licenseUrl>
    <requireLicenseAcceptance>false</requireLicenseAcceptance>
    <releaseNotes></releaseNotes>
  </metadata>
  <files>
    <file src="tools\**" target="tools" />
    <file src="archives\**" target="archives" />
  </files>
</package>

This file tells choco how to build the packages and what to include. For the deployment process we need a script file written in Powershell.

powershell

A Powershell primer

Powershell is not as bad as you might think. Let’s take a look at some basic Powershell syntax.

Variables

Variables are started with a $ sign. As in many other languages ‘=’ is used for assignments.

$ErrorActionPreference = 'Stop'

Strings

Strings can be used with single (‘) and double quotes (“).

$serviceName = 'My Project'
$installDir = "c:\examples"

In double quoted strings we can interpolate by using a $ directly or with curly braces.

$packageDir = "$installDir\my_project"
$packageDir = "${installDir}\my_project"

For escaping double quotes inside a double quoting string we need back ticks (`)

"schtasks /end /f /tn `"${serviceName}`" "

Multiline strings are enclosed by @”

$cmdcontent = @"
cd /d ${packageDir}
java -jar ${packageName}.jar >> output.log 2>&1
"@

Method calls

Calling methods looks a mixture of command line calls with uppercase names.

Write-Host "Stopping and deleting current version of ${packageName}"
Get-Date -format yyyyddMMhhmm
Copy-Item $installFile $packageDir

Some helpful methods are:

  • Write-Host or echo: for writing to the console
  • Get-Date: getting the current time
  • Split-Path: returning the specified part of a path
  • Join-Path: concatenating a path with a specified part
  • Start-Sleep: pause n seconds
  • Start-ChocolateyProcessAsAdmin: starting an elevated command
  • Get-Service: retrieving a Windows service
  • Remove-Item: deleting a file or directory
  • Test-Path: testing for existence of a path
  • New-Item: creating a file or directory
  • Copy-Item: copying a file or directory
  • Set-Content: creating a file with the specified contents
  • Out-Null: swallowing output
  • Resolve-Path: display the path after resolving wildcards

The pipe (|) can be used to redirect output.

Conditions

Conditions can be evaluated with if:

if ($(Get-Service "$serviceName" -ErrorAction SilentlyContinue).Status -eq "Running") {
}

-eq is used for testing equality. -ne for difference.

Deploying with Powershell

For installing our package we need to create the target directories and copy our archives:

$packageName = 'myproject'
$installDir = "c:\examples"
$packageDir = "$installDir\my_project"

Write-Host "Making sure $installDir is in place"
if (!(Test-Path -path $installDir)) {New-Item $installDir -Type Directory  | Out-Null}

Write-Host "Making sure $packageDir is in place"
if (!(Test-Path -path $packageDir)) {New-Item $packageDir -Type Directory  | Out-Null}

Write-Host "Installing ${packageName} to ${packageDir}"
Copy-Item $installFile $packageDir

When reinstalling we first need to delete existing versions:

$installDir = "c:\examples"
$packageDir = "$installDir\my_project"

if (Test-Path -path $packageDir) {
  Remove-Item -recurse $(Join-Path $packageDir "\*") -exclude *.conf, *-bak*, *-old*
}

Now we get to the meat creating a Windows service.

$installDir = "c:\examples"
$packageName = 'myproject'
$serviceName = 'My Project'
$packageDir = "$installDir\my_project"
$cmdFile = "$packageDir\$packageName.cmd"

if (!(Test-Path ($cmdFile)))
{
    $cmdcontent = @"
cd /d ${packageDir}
java -jar ${packageName}.jar >> output.log 2>&1
"@
    echo "Dropping a ${packageName}.cmd file"
    Set-Content $cmdFile $cmdcontent -Encoding ASCII -Force
}

if (!(Get-Service "${serviceName}" -ErrorAction SilentlyContinue))
{
  echo "No ${serviceName} Service detected"
  echo "Installing ${serviceName} Service"
  Start-ChocolateyProcessAsAdmin "install `"${serviceName}`" ${cmdFile}" nssm
}

Start-ChocolateyProcessAsAdmin "set `"${serviceName}`" Start SERVICE_DEMAND_START" nssm

First we need to create a command (.cmd) file which starts our java application. Installing a service calling this command file is done via a helper called nssm. We set it to starting manual because we want to start and stop it periodically with the help of a task.

For enabling a reinstall we first stop an existing service.

$installDir = "c:\examples"
$serviceName = 'My Project'
$packageDir = "$installDir\my_project"

if (Test-Path -path $packageDir) {
  Write-Host $(Get-Service "$serviceName" -ErrorAction SilentlyContinue).Status

  if ($(Get-Service "$serviceName" -ErrorAction SilentlyContinue).Status -eq "Running") {
    Start-ChocolateyProcessAsAdmin "Stop-Service `"${serviceName}`""
    Start-Sleep 2
  }
}

Next we install a task with help of the build in schtasks command.

$serviceName = 'My Project'
$installDir = "c:\examples"
$packageDir = "$installDir\my_project"
$cmdFile = "$packageDir\$packageName.cmd"

echo "Installing ${serviceName} Task"
Start-ChocolateyProcessAsAdmin "schtasks /create /f /ru system /sc hourly /st 00:30 /tn `"${serviceName}`" /tr  `"$cmdFile`""

Stopping and deleting the task enables us to reinstall.

$packageName = 'myproject'
$serviceName = 'My Project'
$installDir = "c:\examples"
$packageDir = "$installDir\my_project"

if (Test-Path -path $packageDir) {
  Write-Host "Stopping and deleting current version of ${packageName}"
  Start-ChocolateyProcessAsAdmin "schtasks /delete /f /tn `"${serviceName}`" "
  Start-Sleep 2
  Start-ChocolateyProcessAsAdmin "schtasks /end /f /tn `"${serviceName}`" "
  Remove-Item -recurse $(Join-Path $packageDir "\*") -exclude *.conf, *-bak*, *-old*
}

tl;dr

Putting it all together looks like this:

$ErrorActionPreference = 'Stop'; # stop on all errors

$packageName = 'myproject'
$serviceName = 'My Project'
$installDir = "c:\examples"
$packageDir = "$installDir\my_project"
$cmdFile = "$packageDir\$packageName.cmd"
$currentDatetime = Get-Date -format yyyyddMMhhmm
$scriptDir = "$(Split-Path -parent $MyInvocation.MyCommand.Definition)"
$installFile = (Join-Path $scriptDir -ChildPath "..\archives\$packageName.jar") | Resolve-Path


if (Test-Path -path $packageDir) {
  Write-Host "Stopping and deleting current version of ${packageName}"
  Start-ChocolateyProcessAsAdmin "schtasks /delete /f /tn `"${serviceName}`" "
  Start-Sleep 2
  Start-ChocolateyProcessAsAdmin "schtasks /end /f /tn `"${serviceName}`" "
  Remove-Item -recurse $(Join-Path $packageDir "\*") -exclude *.conf, *-bak*, *-old*

  Write-Host $(Get-Service "$serviceName" -ErrorAction SilentlyContinue).Status

  if ($(Get-Service "$serviceName" -ErrorAction SilentlyContinue).Status -eq "Running") {
    Write-Host "Stopping and deleting current version of ${packageName}"
    Start-ChocolateyProcessAsAdmin "Stop-Service `"${serviceName}`""
    Start-Sleep 2
  }

  if ($(Get-Service "$serviceName"  -ErrorAction SilentlyContinue).Status -ne "Running") {
    Write-Host "Cleaning ${packageDir} directory"
    Remove-Item -recurse $(Join-Path $packageDir "\*") -exclude *.conf, *-bak*, *-old*
  }
}
 
Write-Host "Making sure $installDir is in place"
if (!(Test-Path -path $installDir)) {New-Item $installDir -Type Directory  | Out-Null}

Write-Host "Making sure $packageDir is in place"
if (!(Test-Path -path $packageDir)) {New-Item $packageDir -Type Directory  | Out-Null}

Write-Host "Installing ${packageName} to ${packageDir}"
Copy-Item $installFile $packageDir

if (!(Test-Path ($cmdFile)))
{
    $cmdcontent = @"
cd /d ${packageDir}
java -jar ${packageName}.jar >> output.log 2>&1
"@
    echo "Dropping a ${packageName}.cmd file"
    Set-Content $cmdFile $cmdcontent -Encoding ASCII -Force
}

if (!(Get-Service "${serviceName}" -ErrorAction SilentlyContinue))
{
  echo "No ${serviceName} Service detected"
  echo "Installing ${serviceName} Service"
  Start-ChocolateyProcessAsAdmin "install `"${serviceName}`" ${cmdFile}" nssm
}

Start-ChocolateyProcessAsAdmin "set `"${serviceName}`" Start SERVICE_DEMAND_START" nssm

echo "Installing ${serviceName} Task"
Start-ChocolateyProcessAsAdmin "schtasks /create /f /ru system /sc hourly /st 00:30 /tn `"${serviceName}`" /tr  `"$cmdFile`""

Finally

Now we just need to create the package in our build script. The package will be named my_project.version.nupkg.
On our build machine we need to install choco. On the target machine we need the following tools installed:
chocolatey and nssm (for service management). Now we can create the package with:

  choco pack --version=${version}

Copy it to the target machine and install the current version with:

choco install -f -y c:\\installations\\${archive.name} --version=${version}

Put these steps inside a build script and use your favourite contininuous integration platform and voila.
Done.

deploy