Centralized project documentation

Project documentation is one thing developers do not like to think about but it is necessary for others to use the software. There are several approaches to project documentation where it is either stored in the source code repository and/or some kind of project web page, e.g. in a wiki. It is often hard for different groups of people to find the documentation they need and to maintain it. I want to show an approach to store and maintain the documentation in one place and integrate it in several other locations.

The project documentation (not API documentation, generated by tools like javadoc or Doxygen) should be version controlled and close to the source code. So a directory in the project source tree seems to be a good place. That way the developers or ducumenters can keep it up-to-date with the current source code version. For others it may be hard to access the docs hidden somewhere in the source tree. So we need to integrate them into other tools to become easily accessible by all the people who need them.

Documentation format

We start with markdown as the documentation format because it is easily read and written using a normal text editor. It can be converted to HTML, PDF and other common document formats. The markdown files reside in a directory next to the source tree, named documentation for example. With pegdown there is a nice java library allowing integration of markdown support in your projects.

Integration in your wiki

Often you want to have your project documentation available on a web page, usually a wiki. With confluence you can directly embed markdown files from an URL in your project page using a plugin. This allows you to enrich the general project documentation in the source tree with your organisation specific documentation. The documentation becomes more widely accessible and searchable. The link can be served by a source code browser like gitweb: http://myrepo/git/?p=MyProject.git;a=blob_plain;f=README.md;hb=HEAD and is alsways up-to-date.

Integration in jenkins

Jenkins has a plugin to use markdown as description format. Combined with the project description setter plugin you can use a file from your workspace to display the job description. Short usage instructions or other notes and links can be maintained in the source tree and show up on the jenkins job page.

Integration in Github or Gitlab

Project hosting platforms like Github or your own repository manager, e.g. gitlab also can display markdown-formatted content from your source tree as the project description yielding a basic project page more or less for free.

Conclusion

Using markdown as a basis for your project documentation is a very flexible approach. It stays usable without any tool support and can be integrated and used in various ways using a plethora of tools and converters. Especially if you plan to open source a project it should contain useful documentation in such a widely understood format distributed with your source code.

Should I test this?

Writing software is hard, writing correct software is even harder. So everything that helps you writing better or more correct software should be used to your advantage. But does every test help? And does every code to be automatically tested? How do I decide what to test and how?

Writing software is hard, writing correct software is even harder. So everything that helps you writing better or more correct software should be used to your advantage. But does every test help? And does every code to be automatically tested? How do I decide what to test and how?
Given a typical web CRUD application, take a look at the following piece of functionality:
We have a model class Element which has a Type type:

class Element {
  ...
  Type type
  ...
}

The view contains a select tag which lets you choose a type:

...
<g:select name="filterByTypeId" from="${types}" value="${filterByType?.id}">
...

And finally in the controller we filter the list of shown elements via the selected type:

...
Type filterByType = Type.get(params['filterByTypeId'])
return [elements: filterByType ? Element.findAllByType(filterByType) : Element.list(), types: Type.list(), filterByType: filterByType]
...

Now ask yourself: would you write an automatic test for this? A functional / acceptance or some unit / integration tests? Would you really test this automatically or just by hand? And how do you decide this?

Dogma

According to TDD you should test everything, there does not exist any code without a test (first). If you really live by TDD the choice is already made: you test this code. But is this pragmatic? Effecient? Productive? And what about the aspects you forgot to test? The order of the types for example. The user wanted to list them lexicographically or by a priority or numbered. What if this part changes and your test is so coupled that you need to change it, too. There are some TDD enthusiasts out there but if you are more pragmatic there are other criteria to help you decide.

Cost

If you look at the code in question and think: how much effort is it to create the test(s)? Or to run the test? If the feedback cycle is too long you lose track of it. I need a test for the controller, this is the easy part. Then I need to test that the view passes the correct parameter and accepts and shows the correct list.
I also can write an acceptance test but this seems like a big gun for a small bird. In our case it heavily depends on the framework how easy or difficult and costly it is to write tests for our filter. What do you have to mock or to simulate? You also have to take the hidden costs into account: how much does it cost to maintain this test? When the requirement changes? When there are more filter criteria? Or if an element can have more than one type?

Value

Another question you can ask is: what is the value for the customer? How much does he need it to work? What is the cost of an error? What happens when the code in question does not work? The value for the customer is not only determined by the functionality it provides. Software can be seen as giving your users capabilities, to enable them. The capability is implemented by two things: implementation (your functionality) and affordance (the UI). The value is determined by both parts. So you hardly can decide on the value of a functionality alone. What if you need to change the UI (in our case the select tag) to increase the value? How does this effect your tests? Does the user reach his goal if the functionality part is broken? What is when the code is correct but it is slow? Or the UI isn’t visible on your user’s screen?

Personal / Team profile

You could decide what and if to test by looking at your past: your personal or team mistakes. Typical problems and bugs you made. Habits you have. You could test more when the (business or technical) domain or the underlying technology is new for you. You could write only few tests when you know the area you work in but more when it is unknown and you need to explore it. You can write more tests if you work in a dynamic language and few in a static language. Or vice versa.

Area / Type of code

You can write tests for every bug you find to prevent regression. You could write tests only for algorithms or data structures. For certain core parts or for interaction with other systems. Or only for (public) interfaces. The area or type of code can help you decide if to test or not.

Visibility

Also you could take a look at how easy it is to spot a bug when manually invoke the code. Do you or your user see the bug immediately? Is it hidden? In our case you should easily see when the list is not filtered or filtered by the wrong criteria. But what if it is just a rounding error or an error where cause and effect is separated by time or location?

Conclusion

Do you have or use additional criteria? How do you decide? I have to admit that I didn’t and I wouldn’t test the above code because I can easily spot problems in the code and try it out by hand if it works (visibility). If the code grows more complex and I cannot easily see the problem (again visibility) or the value (or cost of an error) for the customer is high I would write one.

How to avoid premature optimization

Three simple rules to develop by if you really want to avoid falling into the trap of premature performance optimization.

eternityA common quote linked with Donald E. Knuth of TeX fame is “premature optimization is the root of all evil”. While this might sound a bit harsh, it holds a lot of truth.

Performance as an asset

If you consider software performance as an asset, you can determine its characteristics and derive your decisions about whether to work on it from them. For example, you will discover that while a good performance is paramount, there is a certain threshold when further optimizations are worthless from the asset’s point of view. If you happen to develop a game, you only need to draw as much frames as the monitor can handle. If you process sensor data in real time, there is no need for a prolonged pause between data packets, because computers don’t grow tired.
If you treat performance as an asset, you can also apply a worth to every optimization you want to make and contrast it to the cost of the work you expect to have to invest. This divides the possible optimizations into a group of lucrative and a (probably larger) group of unprofitable investments.

Simple rules

Treating performance as an asset gives you the mental tools to make profound decisions about when and what to optimize. But there are also three simple rules you can apply if you don’t want to write a business plan every time you think “if I just change this line, the code will run much smoother”.

First rule: Don’t

The first rule of performance optimization with a tendency to avoid premature optimization is to just don’t care. You ask yourself if a LinkedList is faster than an ArrayList for a given use case? The short (and ignorant) answer is: both will be fast enough. Is it better to explicitly set all references to null after usage? Why bother when the garbage collector won’t slow you down anyway. Following this rule, you deliberately act dumber than you are with the goal to delay action.

There is a disclaimer, though. There are two different kinds of performance optimization: The first one was referenced in the examples above and deals with actual, but rather local code changes. The second and more important type of performance consideration deals with complexity theory (the one with big O notation) and isn’t measured in milliseconds, but in scalability. You don’t want to be ignorant of the latter type because it will always ruin your runtime behaviour regardless of any optimization of the former type if you implement an exponential or even factorial algorithm. You can be ignorant of “real performance tuning”, but should always be aware of the complexity category your algorithm is living in.

Second rule: Not Yet

There will be a moment when you clearly see an opportunity to improve the runtime performance of your code with just this very small (and very clever) modification. This is when you are ready to break the first rule. Now you should adhere to the second rule: If the cost is as marginal as you say and the gain is profound, go for it – but not now. Performance tuning isn’t a time limited sale that you are only offered right now or never. You can make the same change and reap the same advantages next week or next month. You doubt that you will remember the details? Write an issue or insert some code comment about it. You probably have another task on your todo list that is more important than speeding up the functionality at hand.

The goal of the first rule was to delay action, and that’s the goal of the second rule, too. You’ve probably guessed it already: you avoid premature optimization best by not optimizing at all or at least not optimizing too early. You need to be sure about the value of an optimization before you implement it. As a result of the second rule, your code will be enriched with possibilities for performance improvement. And if you actually need to improve your performance, you can orient yourself along these possibilities or find them then. You want to invest in the tuning business as late as possible, for it is highly speculative.

Third rule: Measure

If you cannot hold on to the first two rules, for example when a real performance issue is reported, you need to take action. But as you are going to invest work into performance optimization, you can as well invest it efficiently. In most applications, there is a 90/10 rule in effect, stating that 90 percent of the runtime is spent in just 10 percent of the code. If you don’t know exactly where your performance bottleneck is, find it using a profiler and remember the 90/10 rule. It’s not efficient nor effective to improve the 90 percent of your code that doesn’t matter in regard to performance.

If you have identified the piece of code that most likely slows your application down, you should remember the second part of the third rule: Never make performance optimizations without a meaningful benchmark that you can run beforehand and afterwards. All to often, the clever performance trick you remember from long ago is actually hurting your performance now. A meaningful benchmark will tell you if you did good. To make a benchmark “meaningful”, you really need to read up on benchmarking in your target platform. In Java, for example, you need to know about proper warm-up of the VM and perform enough cycles to not include one-time effects in your numbers. If you’ve written such a benchmark, keep it! Try to fully automate it and let it be the cornerstone of your growing performance test suite. There might come the day when this test/benchmark tells you that your formerly clever optimization is now obsolete due to internal platform changes.

Conclusion

If you follow these three simple rules, you won’t automatically write high performance software. But you will spend your valuable time fixing real performance issues instead of tinkering with your code to no effect. You definitely won’t optimize prematurely and steer clear of this “root of all evil”.

Using Rails with a legacy database schema

Rails is known for its convention over configuration design paradigm. For example, database table and column names are automatically derived and generated from the model classes. This is very convenient, but when you want to build a new application upon an existing (“legacy”) database schema you have to explore the configuration side of this paradigm.

The most basic operation for dealing with a legacy schema in Rails is to explicitly set the table_names and the primary_keys of model classes:

class User < ActiveRecord::Base
  self.table_name = 'benutzer'
  self.primary_key = 'benutzer_nr'
  # ...
end

Additionally you might want to define aliases for your column names, which are mapped by ActiveRecord to attributes:

class Article < ActiveRecord::Base
  # ...
  alias_attribute :author_id, :autor_nr
  alias_attribute :title, :titel
  alias_attribute :date, :datum
end

This automatically generates getter, setter and query methods for the new alias names. For associations like belongs_to, has_one, has_many you can explicitly specify the foreign_key:

class Article < ActiveRecord::Base
  # ...
  belongs_to :author, class_name: 'User', foreign_key: :autor_nr
end

Here you have to repeat the original name. You can’t use the previously defined alias attribute name. Another place where you have to live with the legacy names are SQL queries:

q = "%#{query.downcase}%"
Article.where('lower(titel) LIKE ?', q).order('datum')

While the usual attribute based finders such as find_by_* are aware of ActiveRecord aliases, Arel queries aren’t:

articles = Article.arel_table
Article.where(articles[:titel].matches("%#{query}%"))

And lastly, the YML files with test fixture data must be named after the database table name, not after the model name. So for the example above the fixture file name would be benutzer.yml, not user.yml.

Conclusion

If you step outside the well-trodden path of convention be prepared for some inconveniences.

Next part: Primary key schema definition and value generation

CMake as a project model

One thing I actually like about Maven is its attempt to create a project model, conventions and a basic project structure. It allows easy integration IDEs, build servers and so on. For projects in C/C++ I find CMake an attractive way to specify the project model.

Despite its name CMake is not really a build system but more of a cross-platform project description with an integrated build system generator. Cross-platform means not only operating system (OS) here but development environment in general. CMake thus can generate project files for MS Visual Studio 20xx, Eclipse CDT, Code::Blocks, KDevelop and the like. QtCreator can even import a CMake-based project natively which brings you up to speed in almost no time.

CMake allows you organise your project in modules and manage your external dependencies. It does not impose as strict rules as Maven does and lacks an own artifact repository infrastructure but you can use the CMake package definitions or pkg-config information many projects provide.

Packaging and deployment with CPack allows you to deliver releases of your project the way your user would expect it: Depending on the target platform you can package your software as a Windows installer executable (using NSIS), several options for Mac OS X (like Package Maker or Bundle) and the popular DEB and RPM package formats for Linux Distributions.

Conclusion

CMake is more than just another build system. It is a flexible tool to define your project and integrate it with your favorite development tools. It can support the whole project lifecyle from initial creation and normal development to deployment and delivery of your software to the user.