Arbitrary limits in IT

When I was a young teenager, a game captured my attention for months: Sid Meyer’s Railroad Tycoon. The essence of the game was to build railways between cities and transport as much passengers and freight goods between them as possible.

I tell this story because it was the first time I discovered a software bug and exploited it. In a way, it was the start of my “hacking” career. Don’t worry, I’m a software developer now, not a hacker. I produce the bugs, I don’t search and exploit them.

The bug in Railroad Tycoon had to do with buying industry. You could not only build tracks and buy trains, you could also acquire industrial complexes like petroleum plants and factories. And while you couldn’t build more tracks once your money ran out, you could accrue debt by buying industry. If you did this extensively, your debt suddenly turned into a small fortune. I didn’t know about the exact mechanics back then, but it was a classic 16-bit signed integer overflow bug. If your debt exceeded 32.768 dollars, the sign turned positive. That was a lot of money in the game and you had a lot of industry, too. The english wikipedia article seems to be a bit inaccurate.

If you are accustomed with IT, there a some numbers that you immediately recognize as problematic, like 255, 32.767, 65.535 or 2.147.483.647. If anything unusual or bad happens around these numbers, you know what’s up. It’s usually related to an integer overflow or (in the case of Railroad Tycoon) underflow.

But then, there are problematic numbers that just seem random. If you want to name a table in an older Oracle database, you couldn’t name it longer than 30 characters. Not 32 or something that could somehow be related to a technical cause, but 30. Certain text values couldn’t be longer than 2000 characters. Not 2048 (or 2047 with a terminating zero character), but straight 2000. These numbers look more “usual” for a normal human, but they appear just as arbitrary to the IT professional’s eye as 2048 might seem to others.

Recently, I introduced a new developer to an internal project of ours. We set up the development environment and let the program run a few times. Then, we introduced some observable changes to the code to explain the different parts. But suddenly, a console output didn’t appear. All we did was to introduce a line of code in the form of:

System.out.println(output);

And the output just didn’t show up. We checked that the program executed the code beforehands and even fired up a debugger (something I’m not really fond of) to see that the output string is really filled.

We changed the line of code to:

System.out.println(output.length());

And got our result: 32.483 characters.

As you can see, the number is somewhat near the 32k danger zone. But in IT, these danger zones are really small. You can be directly besides it and don’t notice anything, but one step more and you’re in trouble. In a way, they act like minefields.

There should be nothing wrong with 32.483 characters printed on a console. Well, unless you use Eclipse and Windows. If you do, there is a new danger zone, starting with 32.000 characters. And this zone isn’t small. In fact, it affects any text with more than 32.000 characters that should be printed in an Eclipse console on Windows:

https://bugs.eclipse.org/bugs/show_bug.cgi?id=23406

‘ScriptShape’ WINAPI fails with E_FAIL for OpenType fonts when the length of the string is over 32000 (0x7d00) characters

Notes: 32000 is hardcoded in ‘gdi32full!GenericEngineGetGlyphs’ Windows function.

https://bugs.eclipse.org/bugs/show_bug.cgi?id=23406#c31

There is nothing special about the number 32.000. My guess is that some developer at Microsoft in the nineties had to impose a limit and just thought that “32.000 characters ought to be enough for anybody”. This is a common mistake made by Microsoft and the whole IT industry.

The problem is that now, 20 or even 30 years later, this limit is still in place. Our processing power grew by a factor of 1.000 (yes, one thousand times more power), the amount of available memory even by a factor of 16.000 and we are still limited to 32.000 characters for a line of text. If the limit would grow accordingly, you could now fit up to 32.000.000 characters in that string and it would just work.

So, what is the moral of this story? IT and software development are minefields where you can step on a mine that was hidden 20+ years ago at any turn. But even more important: If you write code, please be aware that every limit you introduce into your solution will cause trouble in the future. Some limits can be explained by other limits, but others are just arbitrary. Make the arbitrary limits visible and maybe even configurable!

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Google photo

You are commenting using your Google account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s

This site uses Akismet to reduce spam. Learn how your comment data is processed.