Schemas, naming and search path in PostgreSQL

Modern (SQL) databases provide a multi-level hierarchy of separated contexts.

A database management system (DBMS) can manage multiple databases.

A database can contain multiple schemas.

A schema usually contains multiple tables, views, sequences and other objects.

Most of the time we developers only care about our database and the objects it contains – ignoring schemas and the DBMS.

In PostgreSQL instances this means we are using the default schema – usually called public – without mentioning it anywhere. Neither in the connection string or in any queries.

Ok, and why does all of the above even matter?

Special situations

Sometimes customers or operators require us to use special schemas in certain databases. When we do not have full control over our database deployment and usage we have to adapt to the situation we encounter.

Camel-casing

One thing I really advise against is using upper case names for tables, columns and so on. SQL itself is not case sensitive but many DBMSes differentiate case when it comes to naming. So they require you to use double-quotes to reference mixed-case objects like schemas and tables, e.g. "MyImportantTable". Imho it is far better to use only lower case letters and use snake_case for all names.

Multiple schemas in one database

I do not endorse using multiple names schemas in one database. Most of the time you do not need this level of separation and can simply use multiple databases in the same DBMS with their default schemas. That way you do not have to specify the schema in your queries.

What if you are required to use a non-default schema? At least for PostgreSQL you can modify your login role to change the search path. So you can leave all your queries and maybe other deployments untouched and without some schema name scattered all around your project:

-- instead of specifying schema name like
select * from "MyTargetSchema".my_table;

-- you can alter the search path of your role
alter role my_login set search_path = public,"MyTargetSchema";

-- to only write
select * from my_table;

Wrapping it up

Today’s software systems are quite complex and the problems we are trying to solve with them have their own, ever growing complexity. So when using DBMSes follow common advice like the above to reduce complexity in your solution even if it seems to be mandated.

Sometimes there are easy to follow conventions or configuration options that can reduce your burden as a developer even if you do not have complete control over the relevant parts of the deployment environment.

Ignoring YAGNI – 12 years later

Fourteen years ago, we started to build a distributed system to gather environmental data in an automated 24/7 fashion. Our development process was agile and made heavy use of short iterations (at least that was what they were then, today they are normal-sized). So the system grew with many small new features and improvements, giving the customer immediate business value.

One part of the system was the task scheduler. Because the system had to run 24/7 and be mostly independent of human interaction, the task scheduler’s job was to launch different measurement processes at the right time. We had done extensive domain crunching and figured out that all tasks follow a rigid time regime like “start every 10 minutes” or “start every hour”, regardless of the processes’ runtime. This made the scheduler rather easy to develop. You should keep it simple, after all.

But another result of the domain crunching bothered us: The schedule of all tasks originated from the previous software system, built 30 years ago and definitely unfit for the modern software world. The schedules weren’t really rooted in the domain, they all had technical explanations like “the recording of the values is done sequentially and takes up to 8 minutes, we can’t record them more often than that”. For our project, the measurement hardware was changed, so our recording took a couple of milliseconds. We could store and display the values continuously, if the need arises.

So we discussed the required simpleness or complexity of the task scheduler with the customer and they seemed pleased with all the new possibilities. But they decided that the current schedules were sufficient and didn’t need to be changed. We could go ahead and build our simple task scheduler.

And this is when we decided to abandon KISS and make the task scheduler more powerful than needed. “But you ain’t going to need it!” was the enemy. Because we knew that the customer will inevitably come around and make use of their new possibilities. We knew that if we build the system with more complexity, we would be the heroes in a future time, wearing a smug smile and telling the customer: “We’ve already built this, you can use it right away”. Oh how glorious this prospect of the future shone! Just a few more thoughts going into the code and we’re set for a bright future.

Let me tell you a few details about the “few more thoughts” with the example of an “every hour” task schedule. Instead of hard-coding the schedule, we added a configuration file with a cron-like expression for the schedule. You could now leverage the power of cron expressions to design your schedule as you see fit. If you wanted to change the schedule from “every hour” to “every odd minute and when the pale moon rises”, you could do so. The task scheduler had to interpret the configuration file and make sure that tasks don’t pile up: If you schedule a task to run “every minute”, but it takes two minutes to process, you’ve essentially built a time-bomb for your system load. This must not be feasible.

But it doesn’t stop there. A lot of functionality, most of which wasn’t even present or outlined at the time of our decision, relies implicitly on that schedule. Two examples: There are manual operations that must not be performed during the execution of the task. The system goes into a “protected state” around the task execution. It disables these operations a few minutes before the scheduled execution and even some time afterwards. If you had a fixed schedule of “every hour”, you could even hard-code the protected timespan. With a possible dynamic schedule, you have to calculate your timespan based on the current schedule and warn your operator if it isn’t possible anymore to find a time slot to even perform the manual operation.
The second example is a functionality that supervises the completeness of the recorded data. The problem is: This functionality is on another computer (it’s a distributed system, remember?) that doesn’t know about the configuration files. To be able to scan the data archive and say “everything that should be there, is there”, the second computer needs to know about all the schedules of the first computers (there are many of them, recording their data on their own schedules and transferring it to the second computer). And if a schedule changes, the second computer needs to take the change into account and scan the data archive for two areas: one area with the old schedule and one area with the new schedule. Otherwise, there would be false alarms.

You can probably see that the one decision to make the task scheduler a little more complex and configurable as required had quite some impact on the complexity of other parts of the system. But this investment will be worth it as soon as the customer changes the schedule! The whole system is programmed, tested and documented to facilitate schedule changes. We are ready!

It’s been over twelve years since we wrote the first line of code for the more complex implementation (I’ve checked the source control logs). The customer hasn’t changed a single bit of the schedule yet. There are over twenty “first computers” and they all still run the same task schedule as initially planned. Our decision did nothing but to add accidental complexity to the system. It probably introduced some bugs along the way, too. It certainly increased our required level of awareness (“hurdle of understanding”) during the development of features that are somewhat coupled with the task schedule.

In short: It’s been a disaster. The smug smile we thought we’d wear has been replaced by a deep frown. Who wrote all that mess? And why? It wasn’t the customer, it was us. We will never be going to need it.

Quick and dirty is a skill

This post is a reminder, first and foremost to myself, that it is important to work according to what’s essential right now: quality or speed.

Being clean coders we build our software based on quality and reflect on how we do it. We set internal standards in code and UIs, we write tests, we polish.
But there are times when all this focus on quality is obstructive. Times when we need to learn something. For example: at a start of a project when fundamental questions like is it feasible, how should this interaction work, what’s the right order of steps are unanswered, learning needs to be as cheap as possible.
Here quick and dirty is important. The problem is our ego. We want to polish it, we want to build real software with a sound structure. But quality takes time. The problem is quality is not important when answering the fundamental project questions, learning is. May be a mockup in Powerpoint is enough? (not even writing code? ugh). A simple sketch on a piece of paper. Or maybe just a quick demo hacked together in an afternoon.
I know these suggestions may insult our pride. But we need to focus on what’s important: sometimes that’s quality, sometimes that’s speed.
Decades ago when I started coding, quick and dirty wasn’t a problem. Everything I wrote was quick and dirty. I was learning all the time. Over time I got better at developing software, structuring applications and building robust systems. But quick and dirty was lost along the way.
When you write something for the purpose of learning it can happen that you are wrong and all the code has to be thrown away. If it was just 2 hours patching something together that’s okay, but what if you spent a whole week? Just like writing quality software, quick and dirty is a skill in itself and as with other skills we need to practice it.
But beware this is not only a problem at the start of a project: often as developers we tend to overthink something, we plan for every possible outcome, imagine scenarios with weirdly acting users or systems. This is the time to stop and implement something to learn. To get feedback. Not to overanalyse or overdesign. Just release something and test it with real users, it doesn’t need to be part of the software in production, just use a demo or a staging environment. But if you need to learn something, focus on that, not on quality.

There should be a stakeholder for simplicity

You have stakeholders for your product idea, you have stakeholders for your clients and their ideas, you have stakeholders for clean code, for quality, for object oriented programming, … I think we need also stakeholders for simplicity.

Usually in every project ideas are abundant. Your client has many ideas what features he wants. You and your coworkers have a rich background in the problem domain and the technologies you use so that solutions are not sparse. But often this experience and confidence leads to abandoning an important trait: simplicity.
Most of the time you are focused on getting good solutions for the problems that arise. From past experiences with clients who could not exactly explain what they really need (which is different from what they want most of the time) you tend to include a little extra flexibility in your system. Maybe you need this and that variation some time in the future but at this very moment it’s a guess at best. And often this guess costs you. You could argue that you are investing into your project. But how many times did this investment really pay off? And how many times did you have a hard time just because you did not want to make restrictions? At first it seems like a little work. But with the next feature you have to continue supporting your little ‘extra’. Over time it infuses your system like leaven does it with bread. In the end it is more work to make it simple than to keep it simple.
So with every project you approach there should be a stakeholder for simplicity. Someone who focusses on simple solutions. Sometimes you have to cut a bit away from the feature or you have to view the problem from a different angle. Some other time you have to dig deeper into the problem domain or you need real data from your users (which is always better than what you can make up in your mind). Finding simple solutions is work but it is much more work to support your over-engineered solutions.