Are programming books overrated?

A little insight gathered through feedback from an internship. Software development books are somewhat overrated as they can’t teach practice well.

In the last few weeks, we had an internship of a student that just finished academic high school (“Gymnasium”) and is looking forward to take up studies in computer science. He wanted to get in touch with the practical aspects of the career he is about to choose. The programming courses in school merely covered the basics of a programming language (Java) and some UML.

We prepared the student for the internship by feeding him several books we thought were appropriate for his level of knowledge. The books were a beginner’s book about Java (Head First Java), an introduction to unit testing (Pragmatic Unit Testing) and a foundation on clean code programming (Refactoring). Our student read them thoroughly and could make references to the chapters during pair programming sessions.

Retrospective on the books

But one feedback we got from him was that the books alone were nearly useless for his case. If there wouldn’t have been tutorial style pair programming coding sessions and several short lectures , he couldn’t grasp the deeper meaning of the book chapters he read (he suffered from the “blank slate blockade” several times). This came a bit as a surprise for us, as the student was very clever and really into it. It wasn’t the student, it was the books.

But you can’t blame it on “Refactoring”, for example, as this book is an all-time classic filled with really important knowledge. It has to be the medium itself, books are not the ideal source to learn about programming and software development.

Books are part of the academics

There is an old question in our profession. It revolves around if we are more like engineers or artists, craftsmen or scientists. In the core of this question is a uncertainty about the right model of education. Artists and craftsmen prefer more practical training, with apprentice/master relations and personal knowledge transfer. For engineers and scientists, literature and more standardized lectures are better suited. Academic knowledge is transferred during debate, not during exercises.

The duality of our profession

Projecting the feedback of our student onto this question, there seems to be a duality in our profession: Both (or all four if you want) approaches are needed to form a whole. You can’t learn the theory and expect to excel on the job. But pratical experience alone will not suffice to keep up with the pace of our profession. Good books are like afterburners here, you’ll be hurled forward by every page.

Conclusion

If it’s really true that we need to learn our profession both ways at once, pair programming (in the tour guide or backseat driver style) is an essential part of our qualification. And our current university curriculum fails to deliver this part. Students nowadays can team up to program together on an assignment, but that’s not learning from a master (unless one in the team has distinctly more experience than everybody else and is able to transfer it). So I vote to bring more craftsmanship to the academic education, as the books alone won’t cut it.

Your opinion?

What’s your opinion on this topic? Drop us a line about your thoughts.

The C++ Shoot-yourself-in-the-Foot of the Week

I think we can all agree that C++, compared to other languages, provides quite a number of possibilities to screw up. Everybody working with the language at some point probably had problems with e.g. its exception system, operator overloading or automatic type conversions – to name just a few of the darker corners.

There are also many mitigation strategies which all come down to ban certain aspects of the language and/or define strict code conventions. If you follow Google’s style guide, for example, you cannot use exceptions and you are restricted to a very small set of boost libs.

But developers – being humans – often find creative and surprising ways to thwart every good intentions. In an external project the following conventions are in place:

  • Use const wherever you possibly can
  • Use boost::shared_ptr wherever it makes sense.
  • Define typedefs to shared_ptrs  in order to make code more readable.
  • typedefs to shared_ptrs are to be defined like this:
typedef boost::shared_ptr<MySuperDuperClass> MySuperDuperClassPtr;
  • typedefs to shared const pointers are to be defined like this:
typedef boost::shared_ptr<const MySuperDuperClass> MySuperDuperClassCPtr;

As you can see, postfixes Ptr and CPtr are the markers for normal shared_ptrs and constant shared_ptrs.

Last week, a compile error about some const to non-const conversion made me nearly pull my hair out. The types of variables that were involved all had the CPtr postfix but still the code didn’t compile. After a little digging I found that one of the typedefs involved was like this:

typedef boost::shared_ptr<  MySuperDuperClass> MySuperDuperClassCPtr;

Somebody just deleted the const modifier in front of MySuperDuperClass but left the name with the CPtr untouched. And because non-const to const conversions are allowed this was not detected for a couple of weeks. Nice going!

Any suggestions for a decent style checker for c++? Thanks in advance 😉

Readability of Guard Clauses in Methods

A little story about two opinions on readability of methods containing if-clauses.

Browsing through the code base of one of our customers I frequently stumbled over methods that were roughtly structured like this:

void theMethod
{
  if (some_expression)
  {
    // rest of the method body
    // ...
  }
  // no more code here!
}

And most of the time I was tempted to refactor the method using a guard clause, like so:

void theMethod
{
  if (!some_expression)
  {
    return;
  }
  // rest of the method body
  // ...
}

because this is far more readable for me. When I noticed that the methods were written all by the same guy I told him about by refactoring ideas in absolute certainty that he would agree with me. It came as quite a surprise when, in fact, he didn’t agree with me, at all. Even something like this:

void theMethod
{
  if (some_expression)
  {
    // some code
    // ...
    if (another_expression)
    {
      // some more code
      // ...
    }
    // no more code here ..
  }
  // ... and here
}

was in his eyes far more readable than the refactored version with guard clauses. His rational was that guard clauses make it harder for to see the program flow through the method. And a nested if(…) structure like above was very suitable to express slightly more complicated flows.

All my talks about crappy methods and the downsides of highly indented code were not able to change his mind.

I admit that I can somewhat understand his point about the visibility of the program flow through the method.  And sure, the (nested) ifs increase indentation and the number of possible code paths but since there are no elses and no code after the if-blocks, does that really increase the overall complexity?

Well, I still would prefer smaller methods with guard clauses but as you can see, to a great extend readability lies in the eyes of the beholder.

What do you find readable?

Start with the core

What’s the most important feature you cannot live without? Start with it, you can stop anytime because after that you have a minimal yet usuable system.

If you begin your new product, start from the core functionality. Not the core of the system or architecture.
Ask yourself: What’s the most important feature you cannot live without? Just name one, only one. Build this first and only. Eventually refine or redo if it doesn’t suit your needs. Then continue with the next. You always have a minimal but useable product and can stop anytime.
We once had to develop a web based system where users can file an application which can be reviewed, changed, rated and finally accepted or declined. We started with a web page where you could download a PDF and an email address to send it to when complete. Minimal? Yes. Useless? No. All the functionality which was needed was there. All other stuff could be done via email or phone. It was readily available and useable.
Start with the core.

Aligning the Abstraction Level with constant booleans

Constant booleans can help to maintain a single level of abstraction in one method. They are less expensive than a separate method and a big improvement over a mere comment.

If you ever have done consulting, mentoring or teaching on programming techniques I’m sure you have experienced joy as well as disappointment when your “students” either took on your advice and followed it in their day-to-day work or when they just did what you said as long as you sat next to them but forgot all about it the next day. The disappointing behavior often comes from them not fully appreciating, or not being able to fully recognize the advantages of your solution. (And as you are the mentor/consultant/teacher, the latter might also be your fault).

One example for that is the principle to operate on only one level of abstraction within a method or function. See here for a detailed explanation. I have been applying this technique more or less unconsciously already for a long time now and was reminded of it as the Single-Level-of-Abstraction-Principle in Robert C. Martin’s Clean Code.

I have been trying to put this principle in peoples minds for some time now but often with little success. Sure, they often do see the advantages of arriving at much more readable code but they often ignore it in their own code. Most of the time they just don’t see the necessity to create another method with a meaningful name or they content themselves just with putting a comment above some chunk of lower abstraction code (The resulting loud screams for a Extract-Method refactoring often remain unheard, too)

Lately, I did have fairly good success with one little sub-technique of this principle: constant booleans for if-statements. That is, instead of (C++ code):

void someMethod()
{
   if (hard_to_read_boolean_expression_using_lower_abstractions)
   {
      // do stuff
   }
}

you write:

void someMethod()
{
   const bool expressive_name = 
      hard_to_read_boolean_expression_using_lower_abstractions;
   if (expressive_name)
   {
      // do stuff
   }
}

I guess the main reason for the success of this sub-technique is that it increases readability a lot at a cost that is only a tiny bit greater than a simple comment.

True, in many cases it may be even more readable to put the whole if-statement in another method, but using a boolean constant like above is already a big improvement.

The fallacy of “the right tool”

There is a fallacy around Polyglot Programming, especially the term “the right tool for the job”: Programming languages aren’t tools.

Let me start this blog post with a disclaimer: I’m really convinced of the value of multilingual programming and also think that applying the “right tool for the job” is a good thing. But there is a fallacy around this concept in programming that i want to point out here. The fallacy doesn’t invalidate the concept, keep that in mind.

Polyglot cineasts

Let me start with an odd thought: What if there was a movie, a complicated international thriller around a political intrigue, playing in over half a dozen countries. The actors of each country speak their native tongue and no subtitles are provided. Who would be able to follow the plot? Only a chosen few of really polyglot cineasts would ever appreciate the movie. Most of us wouldn’t want to see it.

Polyglot programming

Our last web application project was comprised of that half a dozen languages (Groovy, Java, HTML, CSS, HQL/SQL, Ant). We could easily include more programming languages if we feel the need to do it. Adding Clojure, Scala or Ruby/JRuby doesn’t sound absurd to us. A programmer capable of knowing and switching between numerous programming languages is called a “Polyglot Programmer“.

The main justification for heterogeneous (polyglot) projects often is the concept of “using the right tool for the job”. The job often is a subtask of the whole project, like building the project, accessing the database, implementing the ever-changing business logic. For each subtask, some other language might outshine the competitors. Besides some reasonable doubt concerning the hidden cost of this approach, there is a misconception of the term “tool”.

Programming languages aren’t tools

If you use a tool in (basic or advanced) engineering, let’s say a hammer to drive some nails into a wooden plate or a screwdriver to decompose your computer, you’ll put the tool aside as soon as “the job” is finished. The resulting product (a new wooden cabinet or a collection of circuit boards) doesn’t include the tool. Most of the times, your job is really finished, without “change requests” to the product.

If your tool happens to be a programming language, you’ll produce source code bound to the tool. Without the tool, the product isn’t working at all. If you regard your compiled binaries as “the product”, you can’t deal with “change requests”, a concept that programmers learn early and painful. The product of a programmer obviously is source code. And programming languages don’t act as tools, but as materials in this respect. Tools go away, materials stick.

Programming languages are materials

As source code is tied to its programming language, they form a conceptional union. So I suggest to change the term to “using the right material for the job” when speaking about programming languages. This is a more profound decision to make in comparision to choosing between a Phillips style or a TORX screwdriver. Materials need to outlast when the tools are long put aside.

But there are tools, too

In my web application example above, we used a lot of tools. Grails is our framework of choice, Jetty our web container to deploy to, the Spring Framework provides mighty utilities and we used IDEA to bolt it all together. We could easily exchange Tomcat for Jetty or IDEA with Eclipse without changing the source code (the example doesn’t work that easy for Grails and Spring, though). Tools need to be replaceable or even disposable.

Summary

The term “the right tool for the job” cannot easily be applied to programming languages, as they aren’t tools, but materials. This is why polyglot programming is dangerous to when used heavily in a single project. It’s easy to end up with a tangled “amalgam project”.

Two more disclaimers:

  • If chosen right, “composite construction” is a powerful concept that unifies the advantages of two materials instead of adding up their drawbacks.
  • Being multilingual is advantageous for a programmer. Just don’t show it all off in one project.

Follow-up to our Dev Brunch November 2009

A follow-up to our November 2009 Dev Brunch, summarizing the talks and providing bonus material.

Today we held our Dev Brunch meeting for November 2009. It was the last possible date for this month, but we were affected by absences nonetheless. This is the follow-up posting for this rather small gathering, summarizing the topics and providing additional information.

The Dev Brunch

If you want to know more about the meaning of the term “Dev Brunch” or how we realize it, have a look at the follow-up posting of October’s brunch. This time, no notebook was needed.

The November 2009 Dev Brunch

The topics of this session were:

  • Object Calisthenics by example – Experiences gained while programming a small project following the Object Calisthenics rules while practicing Test Driven Development, too.
  • Object Calisthenics inspected – Observations and insights gained when explaining Object Calisthenics to several teams, programmers and student courses.

As you can immediately see, the meeting was small, but surprisingly consistent. We didn’t agree upon the topic beforehands, but it was a perfect match. Everybody who missed this brunch definitely missed some very interesting first-hand experiences on Object Calisthenics, too. To ease this lack a bit, let me rephrase the content a bit.

Object Calisthenics

You might have heard about Object Calisthenics before, on this blog or other resources on the net. Perhaps you’ve read the original article, which is highly advised. In short, Object Calisthenics are a set of inspiring, if not irritating programming rules that should lead to better programming style through excercise. You should consult the links above for specifics.

Object Calisthenics by example

When applying the rules to a domain class model, some new techniques arose to compensate the “train wreck line”-programming style (see rule 4) and to introduce first class collections (rule eight) and avoid getters and setters (rule 9). This techniques included the use of the Visitor design pattern, which wasn’t the author’s first choice beforehands. Test Driven Development alone wouldn’t have led to this solution, but the solution works well for the given use case.

The author softened some rules for his example and found valid explanations for doing so. This might be the content of an additional blog posting that still needs to be written. It will be announced in the comments when published.

Test Driven Development and Object Calisthenics do not interfere with each other. They both aim for better code and design, but through different means. They could be regarded as complements in a programmer’s toolbox.

Object Calisthenics inspected

When teaching the nine rules, some effects occurred repeatedly. The first observation was that the rules follow a dramatic composition that orders them from “most obvious and immediate code improvement” to “hardest to achieve code improvement” and in the same order from “easiest to acknowledge” to “most controversial”. At the end of the list, the audience rioted most of the time. But if you reject the last few rules, you’ve silently agreed to the first ones, the ones with the greatest potential for immediate improvement.

Another observation is that the rules stick. Even if you reject them on first notion, it creeps into your thinking, whispering that “it might be possible right now with this code“. It’s a learning catalyst for those of us that aren’t born as programming super-heros. To speak in terms Kent Beck coined: Object Calisthenics provide some handy practices that might eventually lead to a better understanding of their underlying principles. Even beginners can follow the practices and review their code on compliance. When they fully get to know the principles (like Law Of Demeter, for example), they are already halfway there.

The third observation was that most experienced programmers intuitively revealed the principles behind the rules before I could even try to explain. Some even found very interesting associations with other principles that weren’t so obvious.

At last, Object Calisthenics, if performed as a group exercise, can be a team solder. You can rant over code together without regrets – the rules were made elsewhere. And you can discuss different solutions without feeling pointless – fulfilling the rules is the common goal for a short time.

The Dev Brunch retrospected

This brunch was small both in attendee and topic count. That created a very productive discussion. We’ll try to grow the insights gained today into additional blog entries. Stay tuned.

Object Calisthenics On Existing Projects?

A few days ago we discussed Object Calisthenics which where introduced by Jeff Bay in an article for the ThoughtWorks Anthology book. In case you have no idea what I’m talking about, here are again the 9 rules in short form (or your can study them in detail in the book):

1. One level of indentation per method
2. No else keyword
3. Wrap all primitives and strings
4. Use only one dot per line
5. Don’t abbreviate names but keep them short
6. Keep all entities small
7. No more than two instance variables per class
8. Use first-class collections
9. Don’t use any getters/setters or properties

Following the rules supposedly leads to more object-oriented code with a special emphasis on encapsulation. In his article, Jeff Bay suggests to do a new 1000 lines project and to follow the rules excessively without thinking twice. But hey, more object-oriented code can’t be bad for existing projects, either, can it?

Not only on the first look, many of the rules seem pretty hard to follow. For example, check your projects for compatibility with rule 7. How many of your classes have more than two instance variables? That’s what I thought. And sure, some primitives and collections deserve wrapping them into an extra class (rules 3 and 8), but do you really wrap all of them? Well, neither do we.

Other rules lead directly to more readable code. If you value good code quality like we do, rules 1, 2, 5 and 6 are more or less already in the back of your head during your daily programming work.

Especially rule 1 is what you automatically aim for when you want your crap load to remain low.

What really got my attention was rule 9: “Don’t use any getters/setters or properties”. This is the “most object-oriented” rule because it targets the heart of what an object should be: a combination of data and the behavior that uses the data.

But doing a little mental code browsing through our projects, it was easy to see that this rule is not easily retrofitted into an existing code base. The fact that our code is generally well covered with automated tests and considered awesome by a number of software metrics tools does not change that, either. Which is, of course, not surprising since committing to rule 9 is a downright big architectural decision.

So despite the fact that it is difficult to virtually impossible to use the rules in our existing projects right away, Object Calisthenics were certainly very valuable as motivation to constantly improving ourselves and our code. A good example is rule 2 (“No else”) which gets even more attention from now on. And there are definitely one or two primitives and collections that get their own class during the next refactoring.

Always be aware of the charset encoding hell

Most developers already struggled with textual data from some third party system and getting garbage special characters and the like because of wrong character encodings.  Some days ago we encountered an obscure problem when it was possible to login into one of our apps from the computer with the password database running but not from other machines using the same db.  After diving into the problem we found out that they SHA-1 hashes generated from our app were slightly different. Looking at the code revealed that platform encoding was used and that lead to different results:platform-encoding

The apps were running on Windows XP and Windows 2k3 Server respectively and you would expect that it would not make much of a difference but in fact it did!

Lesson:

Always specify the encoding explicitly, when exchanging character data with any other system. Here are some examples:

  • String.getBytes(“utf-8”), new Printwriter(file, “ascii”) in Java
  • HTML-Forms with attribute accept-charset="ISO-8859-1"
  • In XML headers <?xml version="1.0" encoding="ISO-8859-15"?>
  • In your Database and/or JDBC driver
  • In your file format documentation
  • In LaTeX documents
  • everywhere where you can provide that info easily (e.g. as a comment in a config file)

Problems with character encodings seem to appear every once in a while either as end user, when your umlauts get garbled or as a programmer that has to deal with third party input like web forms or text files.

The text file rant

After stumbling over an encoding problem *again* I thought a bit about the whole issue and some of my thought manifested in this rant about text files. I do not want to blame our computer science predecessors for inventing and using restricted charsets like ASCII or iso8859. Nobody has forseen the rapid development of computers and their worldwide adoption and use in everyday life and thus need for an extensible charset (think of the addition of new symbols like the €), let aside performance and memory considerations. The problem I see with text files is that there is no standard way to describe the used encoding. Most text files just leave it to the user to guess what the encoding might be whereas almost all binary file formats feature some kind of defined header with metadata about the content, e.g. bit depth and compression method in image files. For text files you usually have to use heuristical tools which work  more or less depending on the input.

A standardized header for text files right from the start would have helped to indicate the encoding and possibly language or encoding version information of the text and many problems we have today would not exist. The encoding attribute in the XML header or the byte order mark in UTF-8 are workarounds for the fundamental problem of a missing text file header.

Stacked smartness doesn’t add up

When a software is composed of different layers, friction occurs. That’s when features turn into bugs.

houseofcardsThere is a strong urge to make software smart. Whenever something smart gets built in, it’s called a feature. Features of a software are effects you don’t foresee, but find handy for your use case. If your use case is impaired by a feature, you’ll likely call it a bug.

Some features of various software

To make my point clear, i have to introduce two features of software that are very practical for their anticipated use case and then change their context by adding another layer:

Ant filesets

If you use Ant as a build script language, you’ll find filesets very pratical. If you want to modify, copy or delete a bunch of files, you specify a root directory and some similarities between the files (like equal filetypes or names) and you’re done. Let me give you an example to show the use case:

<delete>
    <fileset dir="${basedir}" include="**/build.xml,**/pom.xml"/>
</delete>

This will very likely delete all ant build scripts and maven setting files in your project (so please use with care). Notice how the include attribute is comma-separated for multiple patterns. According to the documentation, the comma can be omitted for a space character.

Then, there is Hudson, a very powerful continuous integration server. One source of its power is the familiarity of configuration syntax, specifically when accessing a bunch of files:

hudson-include

The given text field specifies the include attribute of an Ant fileset. You immediately inherit all the power of ant’s fileset, but the features, too. Here, it’s a feature that two pattern can be excluded by just a space character. If your path contains spaces, you cannot express your pattern in this text field. When using the fileset directly in Ant, you can alter the syntax and use multiple nested include tags, but within Hudson, you are stuck with the single include attribute.

Struts2 internationalization

The second example is fully described in my previous blog entry (“The perils of \u0027”).

As a short summary: The Struts2 framework inherits the power of Java’s MessageFormat when loading language dependent text. As the apostrophe is a special character to MessageFormat, it cannot be used directly in the text entries.

The principle behind the examples

Both examples share a common principle: “Stacked smartness doesn’t add up“. What’s a feature to one software, may be a bug to a software that builds on top of it.

Software developers tend to “stack up” different third party software products to compose their own product with even higher-level functionality. There is nothing wrong with this approach, as long as the context of the underlying products doesn’t change much. If it changes, features begin to behave like bugs.

The cost of stacking

Stacked products are likely to increase the ability of skilled users to re-use their knowledge. Every developer familiar to Ant will instantly be empowered to use the file patterns of Hudson. Every developer familiar to MessageFormat can produce powerful i18n entries that do most of the formatting automatically. That’s a great productivity gain.

But on the other side, if you aren’t familiar to ant when using hudson or know nothing about MessageFormat when just translating the i18n entries of a Struts2 webapp, you’ll be surprised by strange effects. And you won’t find sufficient documentation of these effects in the first place. There will be a link to some obscure project or class you never heard about, telling you all sorts of details you don’t want to hear right now. You can’t easily put them into the right context anyway. You will be down to trial and error, frustrated that your use case seems impossible without explanation. That’s a great productivity loss.

Often, you can’t blame any part of the stack, not even the topmost, for the occuring bugs. If a specific stack maintains and increases productivity, depends on the use case of the topmost layer compared to the underlying anticipated ones. If those aren’t documentated, its hard to notice the displacement.

A metaphor on software stacks

Whenever I hear about a software stack, a picture of a man on a stack of crates occurs to me. Here is the original photo of my thoughts.

What’s your encounter with a shaky stacking?