Database – Schneide Blog

Partial Indexes in PostgreSQL: Index Only What Matters

Indexes are one of the most effective tools for improving database performance. However, they come at a cost: they consume disk space, slow down write operations, and require maintenance. In many cases, a full index contains a lot of entries that are never used by the queries we want to optimize.

This is where PostgreSQL’s partial indexes become useful. A partial index contains only the rows that satisfy a specified condition. Instead of indexing an entire table, we can index only the subset of data that is relevant for our queries.

Consider a simple user table:

CREATE TABLE users (
    id BIGSERIAL PRIMARY KEY,
    username TEXT NOT NULL,
    active BOOLEAN NOT NULL
);

Suppose that most users are inactive, but our application frequently searches for active users:

SELECT *
FROM users
WHERE active = true
  AND username = 'alice';

A conventional index would cover all rows:

CREATE INDEX idx_users_username
ON users (username);

If only a small fraction of users are active, this index contains many entries that will never help this query.

A partial index can be defined as:

CREATE INDEX idx_active_users_username
ON users (username)
WHERE active = true;

Now the index contains only active users. The PostgreSQL query planner can use this index whenever it detects that the query condition implies the index predicate:

SELECT *
FROM users
WHERE active = true
  AND username = 'alice';

Because the query explicitly restricts the result set to active users, the planner knows that every matching row must be present in the partial index.

Why use partial indexes?

The obvious benefit is size. Imagine a table with ten million users, but only five percent are active. A conventional index stores ten million entries, while the partial index stores only five hundred thousand.

Smaller indexes provide several advantages: less disk usage, reduced memory consumption, faster index scans, lower maintenance overhead during INSERT and UPDATE operations.

In workloads where the indexed subset is significantly smaller than the table, these benefits can be substantial.

A common use case

Soft deletion is a frequent pattern in business applications:

CREATE TABLE orders (
    id BIGSERIAL PRIMARY KEY,
    customer_id BIGINT NOT NULL,
    deleted BOOLEAN NOT NULL DEFAULT false
);

Most queries ignore deleted records:

SELECT *
FROM orders
WHERE deleted = false
  AND customer_id = 42;

Instead of indexing all rows, we can focus on the rows that are actually queried:

CREATE INDEX idx_active_orders_customer
ON orders (customer_id)
WHERE deleted = false;

As the number of logically deleted rows grows over time, the index remains compact.

Limitations

Partial indexes are not a universal solution. The query must contain a condition that allows PostgreSQL to infer the index predicate. For example, the index

WHERE active = true

cannot be used for a query that only filters by username:

SELECT *
FROM users
WHERE username = 'alice';

The planner cannot assume that the result should contain only active users.

Another consideration is changing data distributions. A partial index is most effective when the indexed subset remains relatively small. If almost all rows eventually satisfy the predicate, the advantage largely disappears.

Conclusion

By indexing only the rows that are relevant to specific queries, they can reduce index size, and improve query performance. Whenever you notice that your queries consistently target a small subset of a large table, a partial index may be worth considering.

Insert or Update? Upserts in Three Popular Databases

When working with databases, a very common task is this: you want to store a row, but you don’t know if it already exists. If it doesn’t exist, you insert it. If it does exist, you update it. That is what is called an upsert.

Without upsert support, you would first run a SELECT, then decide whether to INSERT or UPDATE. That works, but it has two problems. First, it’s slower because you need multiple queries. Second, it can break under concurrency: two processes might try to insert the same row at the same time. Upsert solves this by doing everything in one step inside the database.

PostgreSQL

PostgreSQL has a very direct way to express this. You try to insert a row, and if there is a conflict, you say what should happen instead:

			
INSERT INTO users (id, name, email)
VALUES (1, 'Alice', 'alice@example.com')
ON CONFLICT (id)
DO UPDATE SET
    name = EXCLUDED.name,
    email = EXCLUDED.email;

		

The important part is ON CONFLICT (id). It means: if the id already exists, don’t fail. Just update the row. EXCLUDED is just the new data you tried to insert.

This is easy to read and gives you good control, because you can clearly say which column causes the conflict.

MySQL

MySQL does something very similar, but a bit simpler and less explicit:

			
INSERT INTO users (id, name, email)
VALUES (1, 'Alice', 'alice@example.com')
ON DUPLICATE KEY UPDATE
    name = VALUES(name),
    email = VALUES(email);

		

Here, MySQL automatically reacts when a primary key or unique key is violated. You don’t explicitly name the column. It just uses whatever key causes the problem. This makes the syntax shorter, but also a bit less clear if your table has multiple unique constraints.

Oracle

Oracle uses a different concept called MERGE. It looks more complicated, but it is also more flexible:

			
MERGE INTO users u
USING (SELECT 1 AS id, 'Alice' AS name, 'alice@example.com' AS email FROM dual) src
ON (u.id = src.id)
WHEN MATCHED THEN
    UPDATE SET
        u.name = src.name,
        u.email = src.email
WHEN NOT MATCHED THEN
    INSERT (id, name, email)
    VALUES (src.id, src.name, src.email);

		

Instead of saying “insert and maybe update”, you describe how two datasets relate to each other. If a row matches, you update it. If not, you insert it. For simple cases, this feels a bit heavy. But if you need more complex logic, MERGE becomes very useful.

Can you write one query that works everywhere?

Unfortunately, the answer is: Not really. There is no single SQL statement that works the same way in Oracle, PostgreSQL, and MySQL. The syntax is just too different. You can write something that works everywhere by doing it in two steps, like this:

			
UPDATE users
SET name = 'Alice', email = 'alice@example.com'
WHERE id = 1;
INSERT INTO users (id, name, email)
SELECT 1, 'Alice', 'alice@example.com'
WHERE NOT EXISTS (
    SELECT 1 FROM users WHERE id = 1
);

		

This works in all three databases, but it’s not perfect. It’s not truly atomic unless you handle transactions very carefully. Because of that, many people don’t try to force a single SQL solution. Instead, they use different queries per database or let a framework handle it.

Out of Memory when loading Records in Rails

Recently I ran into a problem that only showed up outside the development environment.

I had a small script that needed to iterate over all records in the database and load blobs.

			
Document.all.each do |doc|
  process(doc.blob)
end

With a small dataset everything worked as expected.
With production-sized data, however, the job was terminated by the runtime with an out-of-memory error.

This behaviour is not surprising once you look at what all.each actually does.

How all.each works

When calling all.each ActiveRecord is loading the complete result set into memory before the iteration starts.
For large tables this means that thousands or even millions of Ruby objects are instantiated at once.

If each record also references additional data — for example blobs, attachments, or associations — the memory usage grows quickly.

Loading Records with find_each

ActiveRecord provides find_each for exactly this scenario:

			
Document.find_each do |doc|
  process(doc.blob)
end

In contrast to each, this method does not load all records at once.
Instead, records are fetched in batches and yielded one by one.

Conceptually the process looks like this:

Load a limited number of records
Yield them to the block
Discard them
Load the next batch

By default, find_each loads records in batches of 1000.
The batch size can be configured:

			
Document.find_each(batch_size: 100) do |doc|
  process(doc.blob)
end

find_each always iterates in primary key order. This means the model must have a primary key that is orderable like integer or string. Any explicit ordering will be ignored.

If more control is required, find_in_batches can be used instead. It requires manual iteration over the batches.

Conclusion

Iterating over large tables with all.each is easy to write but can lead to excessive memory usage once the dataset grows.

For batch processing tasks, find_each is usually the safer default because it limits the number of instantiated records and keeps memory usage predictable.

Temporary Tables in PostgreSQL and Oracle

Temporary tables are useful when you need to store data only for a short time while running SQL statements. Instead of writing one very large query, you can break the work into steps and store intermediate results in a temporary table.

PostgreSQL and Oracle both support temporary tables, but they implement them differently. This post explains how temporary tables work in both systems and highlights the main differences between them.

What Is a Temporary Table?

A temporary table stores data that is only needed for a limited time, typically during a session or a transaction. Typical situations include saving intermediate query results, preparing data before inserting it into other tables, simplifying complex SQL statements, or staging data during imports or batch jobs.

Temporary tables behave like normal tables in many ways. You can insert, update, and query them using standard SQL. The main difference is that their lifetime is limited and they are automatically cleaned up later.

Temporary Tables in PostgreSQL

In PostgreSQL, the table itself is temporary. The table is created during a session and automatically removed when the session ends.

This means temporary tables are usually created inside scripts, functions, or interactive sessions whenever they are needed. There is no need to define them permanently in the database schema.

Here’s how to create a temporary table:

CREATE TEMP TABLE temp_orders (
    id     INT,
    amount NUMERIC
);

You can use the table like any normal table.

INSERT INTO temp_orders VALUES (1, 100);

SELECT * FROM temp_orders;

When the session ends, the table is automatically dropped. Another useful property is that different sessions can create temporary tables with the same name without interfering with each other.

Transaction Options

PostgreSQL also supports an ON COMMIT clause that controls what happens when a transaction commits.

Example:

CREATE TEMP TABLE temp_orders (
    id INT
) ON COMMIT DELETE ROWS;

Possible options are ON COMMIT PRESERVE ROWS (the default), ON COMMIT DELETE ROWS, and ON COMMIT DROP.

Example:

CREATE TEMP TABLE temp_orders (
    id INT
) ON COMMIT DROP;

After the commit, the table itself is removed.

Temporary Tables in Oracle

Oracle implements temporary tables differently. Oracle provides Global Temporary Tables. The key idea is that the table definition is permanent, but the data stored in the table is temporary. Usually the table is created once as part of the database schema and reused by applications.

Example:

CREATE GLOBAL TEMPORARY TABLE temp_orders (
  id     NUMBER,
  amount NUMBER
)
ON COMMIT DELETE ROWS;

Like PostgreSQL, Oracle also supports ON COMMIT clauses. They determine when the rows disappear. However, Oracle does not support ON COMMIT DROP.

Example: Working with Intermediate Results

Temporary tables are often used to store intermediate query results. For example, suppose you want to find customers with the highest total spending.

First step:

CREATE TEMP TABLE customer_totals AS
  SELECT customer_id, SUM(amount) AS total_spent
    FROM orders
    GROUP BY customer_id;

Second step:

SELECT *
  FROM customer_totals
  ORDER BY total_spent DESC
  LIMIT 10;

Splitting a task into multiple steps like this can make queries easier to read and maintain.

Why PostgreSQL Temporary Tables Are Often More Practical

The PostgreSQL approach is often more flexible in everyday work. Because the table itself is temporary, it can be created exactly when it is needed and disappears automatically afterwards. This keeps the database schema cleaner because temporary helper tables do not remain in the system permanently.

This model also works well for scripts, reporting tasks, and data processing jobs. A script can create the temporary tables it needs, perform several processing steps, and then end the session. When the session ends, the database removes the tables automatically without any extra cleanup.

In contrast, Oracle requires the temporary table structure to exist permanently in the schema. Even though the stored data is temporary, the table itself remains part of the database design. Over time this can lead to many helper tables that exist only to support certain procedures or batch jobs.

Another advantage of the PostgreSQL approach is flexibility. Developers can define temporary tables with different structures whenever they are needed, without changing the permanent schema. This can make development, testing, and data exploration much easier.

For these reasons, I find the PostgreSQL model more natural and convenient, especially for ad-hoc queries, data analysis, and multi-step data processing tasks.

Preserving Datatypes When Reusing Views in Oracle

Views are often used as building blocks for other database objects in SQL databases like Oracle. You might start by prototyping logic in a view, then later build a reporting view on top of it, or eventually turn the data into a physical table for performance or snapshot reasons.

When you create a table from a view or build a new view on top of an existing one, the same surprise often appears: the column data types are not what you expected. You run something simple like:

			
CREATE TABLE new_table AS
  SELECT * FROM my_view;

… and later discover that some columns have changed. A NUMBER column may have lost its precision or scale, character columns may be longer or shorter than expected, or calculated columns may end up with odd default datatypes.

This post explains how Oracle decides datatypes when you create tables or views from a view, what usually goes wrong, and how to avoid these problems by being explicit.

Why data types “change” when you create a table from a view

When you use CTAS (Create Table As Select), Oracle does not copy column definitions from the source view. Instead, it looks at each expression in the SELECT list and decides the datatype based on Oracle’s SQL expression rules. In other words, Oracle uses what the query returns, not what you intended.

Problems usually appear when a view contains expressions such as CASE, string concatenation, or arithmetic operations, functions like NVL, COALESCE, TO_CHAR, TRUNC, or ROUND, implicit datatype conversions between numbers, strings, and dates, constants such as NULL, 0, or 'X', or set operations like UNION and UNION ALL where the branches use different types. Oracle’s rules are consistent, but the resulting datatypes do not always match the schema you had in mind.

If a view is little more than a wrapper around base table columns, CTAS usually works fine. It is generally safe when the view selects columns directly, avoids expressions and implicit conversions, and does not use UNION or UNION ALL with mismatched types.

Using CAST to control datatypes

The most reliable way to force a specific datatype is to CAST each expression to the type you want.

For example, if your view calculates values and you want a stable and predictable table schema, you can do this:

			
CREATE TABLE sales_snapshot AS
SELECT
  CAST(order_id AS NUMBER(10))         AS order_id,
  CAST(customer_name AS VARCHAR2(100)) AS customer_name,
  CAST(order_date AS DATE)             AS order_date,
  CAST(amount AS NUMBER(12,2))         AS amount,
  CAST(status AS VARCHAR2(20))         AS status
FROM sales_v;

		

By casting the columns yourself, you remove any guesswork and prevent Oracle from choosing a datatype you did not intend.

When you should always cast

Some expressions are especially likely to cause datatype issues and should almost always be cast explicitly:

NULL columns in a view
SELECT NULL AS some_col ... becomes an untyped null. CTAS can’t infer a useful type unless you cast: CAST(NULL AS VARCHAR2(30)) AS some_col

NVL / COALESCE
These can promote a column to a different type depending on arguments: CAST(COALESCE(num_col, 0) AS NUMBER(10,2)) AS num_col

CASE expressions
All branches should be the same type, or Oracle will pick a “common type” that may surprise you: CAST( CASE WHEN flag = 'Y' THEN amount ELSE 0 END AS NUMBER(12,2) ) AS amount

Concatenation (||)
This always yields a character type; explicitly size it: CAST(first_name || ' ' || last_name AS VARCHAR2(500)) AS full_name

Date formatting
If you convert dates to strings in the view, CTAS will store strings. If you want DATE, don’t TO_CHAR in the view – or cast back (better: avoid the conversion).

What CTAS still does not copy

Even when the column datatypes are correct, CREATE TABLE AS SELECT does not copy everything. Primary keys, foreign keys, check constraints, indexes, triggers, grants, column comments, and default values are not included and must be recreated manually.

Avoiding datatype drift

Datatype problems in Oracle do not only happen when creating tables from views. They often start earlier, when one view is built on top of another and Oracle silently infers a slightly different datatype. That inferred datatype then carries forward into every downstream view or table.

By casting derived columns early and treating views as real schema objects rather than throwaway queries, you can prevent datatype drift and make sure that both views and tables behave exactly the way you expect.

Nested queries like N+1 in practice: a 840-fold performance gain

Once every while, I remember a public software engineering talk, held by Daniel and visited by me (not only), at a time before I accidentally started working for the Softwareschneiderei for several years, which turned out splendid – but I digress – and while the topic of that talk was (clearly) a culinary excursion about… how onions are superior to spaghetti… one side discussion spawned that quote (originally attributed to Michael Jackson e.g. here)

You know the First Rule of Optimization?

… don’t.

You know the Second Rule?

… don’t do it YET.

I like that a lot, because while we’ve all been at places where we knew a certain algorithm to be theoretically sub-optimal, measures of optimal is also “will it make my customer happy in a given time” and “will our programming resources be used at the most effective questions at hand”.

And that is especially tricky in software in which a customer starts with a wish like “I want a certain view on my data that I never had before” (thinking of spreadsheets, anyone?) and because they do not know what they will learn from that, no specification can even be thought-out. Call it “research” or “just horsing around”, but they will be thankful if you have a robust software that does the job, and customers can be very forgiving about speed issues, especially when they accumulate slowly over time.

Now we all know what N+1 query problems are (see e.g. Frederik’s post from some weeks ago), and even without databases, we are generally wary about any nested loops. But given the circumstances, one might write specific queries where you do allow yourself some.

There it not only “I have no time”. Sometimes you can produce much more readable code by doing nested queries. It can make sense. I mean, LINQ and the likes have a telling appearance, e.g. one can read

var plansForOffer = db.PlansFor(week).Where(p => p.Offers.Contains(offer.Id));
if (plansForOffer.Any()) { 
  lastDay = plansForOffer.Max(p => p.ProposedDay);
}

surely quite well; but here alone do the .Any() and the .Max() loop over similar data needlesly, and probably the helper PlansFor(…) does something like it, and then run that in a loop over many “offers” and quickly, there goes your performance down the drain only because your customers have now 40 instead of 20 entities.

To cut a long story short – with given .NET and Entity Framework in place, and in the situation of now-the-customer-finds-it-odd-that-some-queries-take-seven-minutes-to-completion, I did some profiling. In software where there are few users and ideally one instance of the software on one dedicated machine, memory is not the issue. So I contrasted the readable, “agile” version with some queries at startup and converged at the following pattern

var allOffers = await db.Offers
    .Where(o => ... whatever interests you ... )
    .Include(o => ... whatever EF relations you need ...)
    .ToListAsync();
var allOfferIds = allOffsets
    .Select(o => o.Id)
    .ToArray();
var allPlans = await db.PlanningUnits
    .Where(p => p.Contains(offer.Id))
    .AsNoTracking()
    .ToListAsync();
var allPlanIds = allPlans
    .Select(p => p.Id)
    .ToArray();
var allAppointments = await db.Appointments
    .Where(a => allPlanIds.Contains(a.PlanId))
    .AsNoTracking()
    .ToListAsync();
var requiredSupervisors = allAppointments
    .Select(a => a.Supervisor)
    .Distinct()
    .ToArray();
var requiredRooms = await db.Rooms
    .Where(r => requiredSupervisors.Contains(r.SupervisorId)
    .AsNoTracking()
    .ToDictionaryAsync(r => r.Id);

// ...

return (
    allOffers
    .Select(o => CollectEverythingFromMemory(o, week, allOffers, allPlans, allAppointments, ...))
    .ToDictionary(o => o.Id);
);

So the patterns are

Collect every data pool that you will possibly query as completely as you need
Load it into memory as .ToList() and where you can – e.g. in a server application, use await … .ToListAsync() in order to ease the the request thread pool
With ID lists and subsequent .Contains(), the .ToArray() call is even enough as arrays come with less overhead – although this is less important here.
Use .ToDictionaryAsync() for constant-time lookups (although more modern .NET/EF versions might have advanced functions there, this is the more basic fallback that worked for me)
Also, note the .AsNoTracking() where you are querying the database only (if not mutating any of this data), so EF can save all the memory overhead.

With that pattern, all inner queries transform to quite efficient in-memory-filtering of appearance like

var lastDay = allPlans
    .Where(p => p.Week == week
        && allPlanIds.Contains(p.Id)
        && planOfferIds.Contains(offer.Id)
    )
    .Max(p => p.ProposedDay);

and while this greatly blew up function signatures, and also required some duplication because these inner functions are not as modular, as stand-alone anymore, our customers now enjoyed a reduction in query size of really

Before: ~ 7 Minutes
After: ~ half a second

i.e. a 840-fold increase in performance.

Conclusion: not regretting much

It is somewhat of a dilemma. The point of “optimization…? don’t!” in its formulation as First and Second Rule holds true. I would not have written the first version of the algorithm in that consolidated .ToList(), .ToArray()., ToDictionary() shape when still in a phase where the customer gets new ideas every few days. You will need code that is easier to change, and easier to reckon about.

By the way – cf. the source again – ,the Third Rule is “Profile Before Optimizing”. When dealing with performance, it’s always crucial to find the largest culprit first.

And then, know that there are general structures which make such queries efficient. I can apply these thoughts to other projects and probably do not have to finely dig into the exact details of other algorithms, I just need to make that trade-off above.

Scheduling Jobs in Oracle Database for Reliable Background Tasks

As a software developer, you often write code that must run reliably in the background: data cleanup, periodic recalculations, imports, exports, and batch processing. Pushing this responsibility to external scripts or application-level schedulers is one way to do it, but it can add complexity and increase failure points.

Oracle’s job scheduling feature lets you move this logic into the database, close to the data it operates on, using plain SQL and PL/SQL. The result is simpler code, fewer moving parts, and background tasks that run predictably without constant supervision.

The Scheduler

This feature is provided by the DBMS_SCHEDULER package.

A basic example is a daily cleanup task. Suppose you have a table called ORDERS and you want to remove records older than five years every night. First, you create a stored procedure that performs the cleanup.

CREATE OR REPLACE PROCEDURE cleanup_old_orders IS
BEGIN
  DELETE FROM orders
  WHERE order_date < ADD_MONTHS(SYSDATE, -60);
  COMMIT;
END;
/

Next, you create a scheduler job that runs this procedure every day at 2 a.m.:

BEGIN
  DBMS_SCHEDULER.create_job (
    job_name        => 'cleanup_old_orders_job',
    job_type        => 'STORED_PROCEDURE',
    job_action      => 'CLEANUP_OLD_ORDERS',
    start_date      => TIMESTAMP '2026-01-01 02:00:00',
    repeat_interval => 'FREQ=DAILY;BYHOUR=2;BYMINUTE=0;BYSECOND=0',
    enabled         => TRUE
  );
END;
/

Once enabled, Oracle runs this job automatically every night.

Another common example is running a job at short, repeating intervals. For instance, you may want to refresh a summary table every 10 minutes:

BEGIN
  DBMS_SCHEDULER.create_job (
    job_name        => 'refresh_sales_summary_job',
    job_type        => 'PLSQL_BLOCK',
    job_action      => '
    BEGIN
      refresh_sales_summary;
    END;',
    repeat_interval => 'FREQ=MINUTELY;INTERVAL=10',
    enabled         => TRUE
  );
END;
/

The scheduler can also be used to run jobs only once. For example, you might need to perform a one-time data fix during a maintenance window:

BEGIN
  DBMS_SCHEDULER.create_job (
    job_name   => 'one_time_data_fix_job',
    job_type   => 'PLSQL_BLOCK',
    job_action => '
     BEGIN
       update_customer_status;
     END;',
    start_date => TIMESTAMP '2026-03-15 23:00:00',
    enabled    => TRUE
  );
END;
/

Required privileges

In addition to defining jobs, it is important to understand the required privileges. To create and manage scheduler jobs, a user needs the CREATE JOB privilege. To create jobs in another schema, CREATE ANY JOB is required. Running jobs that use DBMS_SCHEDULER also requires EXECUTE privilege on the DBMS_SCHEDULER package.

If a job runs a stored procedure, the job owner must have direct privileges on the objects used by that procedure, not privileges granted through roles. For example, if a job deletes rows from the ORDERS table, the job owner must have a direct DELETE grant on that table. For external jobs, additional privileges such as CREATE EXTERNAL JOB and specific operating system credentials are required.

These rules ensure that jobs run securely and only perform actions explicitly allowed by the database administrator.

Monitoring

For monitoring, Oracle DB stores job execution details in system views. You can check whether a job is enabled and when it last ran with a simple query:

SELECT job_name, enabled, last_start_date, last_run_duration
  FROM  dba_scheduler_jobs
  WHERE job_name = 'CLEANUP_OLD_ORDERS_JOB';

To see errors and execution history, you can query the job run details:

SELECT job_name, status, actual_start_date, run_duration, error#
  FROM  dba_scheduler_job_run_details
  WHERE job_name = 'CLEANUP_OLD_ORDERS_JOB'
  ORDER BY actual_start_date DESC;

These examples show how Oracle job scheduling works in practice. You define the SQL or PL/SQL logic, attach it to a schedule, and let the database handle execution and logging. This approach keeps automation close to the data, reduces manual intervention, and makes recurring tasks easier to manage and troubleshoot.

Schemas, naming and search path in PostgreSQL

Modern (SQL) databases provide a multi-level hierarchy of separated contexts.

A database management system (DBMS) can manage multiple databases.

A database can contain multiple schemas.

A schema usually contains multiple tables, views, sequences and other objects.

Most of the time we developers only care about our database and the objects it contains – ignoring schemas and the DBMS.

In PostgreSQL instances this means we are using the default schema – usually called public – without mentioning it anywhere. Neither in the connection string or in any queries.

Ok, and why does all of the above even matter?

Special situations

Sometimes customers or operators require us to use special schemas in certain databases. When we do not have full control over our database deployment and usage we have to adapt to the situation we encounter.

Camel-casing

One thing I really advise against is using upper case names for tables, columns and so on. SQL itself is not case sensitive but many DBMSes differentiate case when it comes to naming. So they require you to use double-quotes to reference mixed-case objects like schemas and tables, e.g. "MyImportantTable". Imho it is far better to use only lower case letters and use snake_case for all names.

Multiple schemas in one database

I do not endorse using multiple names schemas in one database. Most of the time you do not need this level of separation and can simply use multiple databases in the same DBMS with their default schemas. That way you do not have to specify the schema in your queries.

What if you are required to use a non-default schema? At least for PostgreSQL you can modify your login role to change the search path. So you can leave all your queries and maybe other deployments untouched and without some schema name scattered all around your project:

-- instead of specifying schema name like
select * from "MyTargetSchema".my_table;

-- you can alter the search path of your role
alter role my_login set search_path = public,"MyTargetSchema";

-- to only write
select * from my_table;

Wrapping it up

Today’s software systems are quite complex and the problems we are trying to solve with them have their own, ever growing complexity. So when using DBMSes follow common advice like the above to reduce complexity in your solution even if it seems to be mandated.

Sometimes there are easy to follow conventions or configuration options that can reduce your burden as a developer even if you do not have complete control over the relevant parts of the deployment environment.

Common SQL Performance Gotchas in Application Development

When building apps that use a SQL database, it’s easy to run into performance problems without noticing. Many of these issues come from the way queries are written and used in the code. Below are seven common SQL mistakes developers make, why they happen, and how you can avoid them.

Not Using Prepared Statements

One of the most common mistakes is building SQL queries by concatenating strings. This approach not only introduces the risk of SQL injection but also prevents the database from reusing execution plans. Prepared statements or parameterized queries let the database understand the structure of the query ahead of time, which improves performance and security. They also help avoid subtle bugs caused by incorrect string formatting or escaping.

// Vulnerable and inefficient
String userId = "42";
Statement stmt = connection.createStatement();
ResultSet rs = stmt.executeQuery("SELECT * FROM users WHERE id = " + userId);

// Safe and performant
String sql = "SELECT * FROM users WHERE id = ?";
PreparedStatement ps = connection.prepareStatement(sql);
ps.setInt(1, 42);
ResultSet rs = ps.executeQuery();

The N+1 Query Problem

The N+1 problem happens when an application fetches a list of items and then runs a separate query for each item to retrieve related data. For example, fetching a list of users and then querying each user’s posts in a loop. This results in one query to fetch the users and N additional queries for their posts. The fix is to restructure the query using joins or batch-fetching strategies, so all the data can be retrieved in fewer queries.

We have written about it on our blog before: Understanding, identifying and fixing the N+1 query problem

Missing Indexes

When queries filter or join on columns that do not have indexes, the database may need to scan entire tables to find matching rows. This can be very slow, especially as data grows. Adding the right indexes can drastically improve performance. It’s important to monitor slow queries and check whether indexes exist on the columns used in WHERE clauses, JOINs, and ORDER BY clauses.

Here’s how to create an index on an “orders” table for its “customer_id” column:

CREATE INDEX idx_orders_customer_id ON orders(customer_id);

Once the index is added, the query can efficiently find matching rows without scanning the full table.

Retrieving Too Much Data

Using SELECT * to fetch all columns from a table is a common habit, but it often retrieves more data than the application needs. This can increase network load and memory usage. Similarly, not using pagination when retrieving large result sets can lead to long query times and a poor user experience. Always select only the necessary columns and use LIMIT or OFFSET clauses to manage result size.

For example:

String sql = "SELECT id, name, price FROM products LIMIT ? OFFSET ?";
PreparedStatement ps = connection.prepareStatement(sql);
ps.setInt(1, 50);
ps.setInt(2, 0);
ResultSet rs = ps.executeQuery();

Chatty Database Interactions

Some applications make many small queries in a single request cycle, creating high overhead from repeated database access. Each round-trip to the database introduces latency. Here’s an inefficient example:

for (int id : productIds) {
    PreparedStatement ps = connection.prepareStatement(
        "UPDATE products SET price = price * 1.1 WHERE id = ?"
    );
    ps.setInt(1, id);
    ps.executeUpdate();
}

Instead of issuing separate queries, it’s often better to combine them or use batch operations where possible. This reduces the number of database interactions and improves overall throughput:

PreparedStatement ps = connection.prepareStatement(
    "UPDATE products SET price = price * 1.1 WHERE id = ?"
);

for (int id : productIds) {
    ps.setInt(1, id);
    ps.addBatch();
}
ps.executeBatch();

Improper Connection Pooling

Establishing a new connection to the database for every query or request is slow and resource-intensive. Connection pooling allows applications to reuse database connections, avoiding the cost of repeatedly opening and closing them. Applications that do not use pooling efficiently may suffer from connection exhaustion or high latency under load. To avoid this use a connection pooler and configure it with appropriate limits for the workload.

Unbounded Wildcard Searches

Using wildcard searches with patterns like '%term%' in a WHERE clause causes the database to scan the entire table, because indexes cannot be used effectively. These searches are expensive and scale poorly. To handle partial matches more efficiently, consider using full-text search features provided by the database, which are designed for fast text searching. Here’s an example in PosgreSQL:

SELECT * FROM articles
WHERE to_tsvector('english', title) @@ to_tsquery('database');

One of our previous blog posts dives deeper into this topic: Full-text Search with PostgreSQL.

By being mindful of these common pitfalls, you can write SQL that scales well and performs reliably under load. Good database performance isn’t just about writing correct queries – it’s about writing efficient ones.

Have you faced any of these problems before? Every project is different, and we all learn a lot from the challenges we run into. Feel free to share your experiences or tips in the comments. Your story could help someone else improve their app’s performance too.

Using GENERATED AS IDENTITY Instead of SERIAL in PostgreSQL

In PostgreSQL, the SERIAL keyword is commonly used to create auto-incrementing primary keys. While it remains supported and functional, newer versions of PostgreSQL (version 10 and later) offer a more standardized and flexible alternative: the GENERATED … AS IDENTITY syntax.

Limitations of SERIAL

When you define a column as SERIAL, PostgreSQL automatically creates and links a sequence to that column behind the scenes. But this linkage is not explicitly part of the table definition. This can complicate schema management and make the behavior of the column less transparent.

The SERIAL keyword is also not part of the official SQL standard, which may be a concern in environments where cross-database compatibility is important. Additionally, the column remains writable, meaning it’s possible to insert values manually, potentially leading to inconsistencies or conflicts.

Identity Columns

The GENERATED … AS IDENTITY syntax addresses these concerns by making the auto-increment behavior explicit and standards-compliant. An identity column is defined as follows:

CREATE TABLE users (
  id INT GENERATED ALWAYS AS IDENTITY PRIMARY KEY,
  username TEXT NOT NULL
);

This syntax makes it clear that the column is managed by the system. PostgreSQL offers two modes for identity columns:

GENERATED ALWAYS: PostgreSQL always generates a value. Manual insertion requires an override.

GENERATED BY DEFAULT: The application can supply a value, or PostgreSQL will use the next sequence value automatically.

To insert a value manually into an ALWAYS identity column, you must use the OVERRIDING SYSTEM VALUE clause:

INSERT INTO users (id, username)
  VALUES (999, 'admin') OVERRIDING SYSTEM VALUE;

Managing Sequences

Since identity columns integrate the sequence into the column definition, managing them is more straightforward. For example, to reset the sequence:

ALTER TABLE users ALTER COLUMN id RESTART WITH 1000;

The sequence is tied to the column, making it easier to inspect, back up, and restore using tools like pg_dump. This helps avoid issues that can arise with the implicit sequences used by SERIAL.

Conclusion

The GENERATED AS IDENTITY syntax offers clearer semantics, better standards compliance, and more predictable behavior than SERIAL. For new database designs, it is generally the preferred choice. While SERIAL continues to be supported, identity columns provide more transparency and control, especially in environments where portability and schema clarity are important.

	Anonymous on Avoiding Code Style Discu…
	Anonymous on What Happens When We Don’t Lis…
	Writing Integration… on Every Unit Test Is a Stage Pla…
	mariuselvert on C# is very strict about modify…
	Anonymous on C# is very strict about modify…