Preserving Datatypes When Reusing Views in Oracle

Views are often used as building blocks for other database objects in SQL databases like Oracle. You might start by prototyping logic in a view, then later build a reporting view on top of it, or eventually turn the data into a physical table for performance or snapshot reasons.

When you create a table from a view or build a new view on top of an existing one, the same surprise often appears: the column data types are not what you expected. You run something simple like:

CREATE TABLE new_table AS
SELECT * FROM my_view;

… and later discover that some columns have changed. A NUMBER column may have lost its precision or scale, character columns may be longer or shorter than expected, or calculated columns may end up with odd default datatypes.

This post explains how Oracle decides datatypes when you create tables or views from a view, what usually goes wrong, and how to avoid these problems by being explicit.

Why data types “change” when you create a table from a view

When you use CTAS (Create Table As Select), Oracle does not copy column definitions from the source view. Instead, it looks at each expression in the SELECT list and decides the datatype based on Oracle’s SQL expression rules. In other words, Oracle uses what the query returns, not what you intended.

Problems usually appear when a view contains expressions such as CASE, string concatenation, or arithmetic operations, functions like NVL, COALESCE, TO_CHAR, TRUNC, or ROUND, implicit datatype conversions between numbers, strings, and dates, constants such as NULL, 0, or 'X', or set operations like UNION and UNION ALL where the branches use different types. Oracle’s rules are consistent, but the resulting datatypes do not always match the schema you had in mind.

If a view is little more than a wrapper around base table columns, CTAS usually works fine. It is generally safe when the view selects columns directly, avoids expressions and implicit conversions, and does not use UNION or UNION ALL with mismatched types.

Using CAST to control datatypes

The most reliable way to force a specific datatype is to CAST each expression to the type you want.

For example, if your view calculates values and you want a stable and predictable table schema, you can do this:

CREATE TABLE sales_snapshot AS
SELECT
CAST(order_id AS NUMBER(10)) AS order_id,
CAST(customer_name AS VARCHAR2(100)) AS customer_name,
CAST(order_date AS DATE) AS order_date,
CAST(amount AS NUMBER(12,2)) AS amount,
CAST(status AS VARCHAR2(20)) AS status
FROM sales_v;

By casting the columns yourself, you remove any guesswork and prevent Oracle from choosing a datatype you did not intend.

When you should always cast

Some expressions are especially likely to cause datatype issues and should almost always be cast explicitly:

NULL columns in a view
SELECT NULL AS some_col ... becomes an untyped null. CTAS can’t infer a useful type unless you cast: CAST(NULL AS VARCHAR2(30)) AS some_col

NVL / COALESCE
These can promote a column to a different type depending on arguments: CAST(COALESCE(num_col, 0) AS NUMBER(10,2)) AS num_col

CASE expressions
All branches should be the same type, or Oracle will pick a “common type” that may surprise you: CAST( CASE WHEN flag = 'Y' THEN amount ELSE 0 END AS NUMBER(12,2) ) AS amount

Concatenation (||)
This always yields a character type; explicitly size it: CAST(first_name || ' ' || last_name AS VARCHAR2(500)) AS full_name

Date formatting
If you convert dates to strings in the view, CTAS will store strings. If you want DATE, don’t TO_CHAR in the view – or cast back (better: avoid the conversion).

What CTAS still does not copy

Even when the column datatypes are correct, CREATE TABLE AS SELECT does not copy everything. Primary keys, foreign keys, check constraints, indexes, triggers, grants, column comments, and default values are not included and must be recreated manually.

Avoiding datatype drift

Datatype problems in Oracle do not only happen when creating tables from views. They often start earlier, when one view is built on top of another and Oracle silently infers a slightly different datatype. That inferred datatype then carries forward into every downstream view or table.

By casting derived columns early and treating views as real schema objects rather than throwaway queries, you can prevent datatype drift and make sure that both views and tables behave exactly the way you expect.

Nested queries like N+1 in practice: a 840-fold performance gain

Once every while, I remember a public software engineering talk, held by Daniel and visited by me (not only), at a time before I accidentally started working for the Softwareschneiderei for several years, which turned out splendid – but I digress – and while the topic of that talk was (clearly) a culinary excursion about… how onions are superior to spaghetti… one side discussion spawned that quote (originally attributed to Michael Jackson e.g. here)

You know the First Rule of Optimization?

… don’t.

You know the Second Rule?

… don’t do it YET.

I like that a lot, because while we’ve all been at places where we knew a certain algorithm to be theoretically sub-optimal, measures of optimal is also “will it make my customer happy in a given time” and “will our programming resources be used at the most effective questions at hand”.

And that is especially tricky in software in which a customer starts with a wish like “I want a certain view on my data that I never had before” (thinking of spreadsheets, anyone?) and because they do not know what they will learn from that, no specification can even be thought-out. Call it “research” or “just horsing around”, but they will be thankful if you have a robust software that does the job, and customers can be very forgiving about speed issues, especially when they accumulate slowly over time.

Now we all know what N+1 query problems are (see e.g. Frederik’s post from some weeks ago), and even without databases, we are generally wary about any nested loops. But given the circumstances, one might write specific queries where you do allow yourself some.

There it not only “I have no time”. Sometimes you can produce much more readable code by doing nested queries. It can make sense. I mean, LINQ and the likes have a telling appearance, e.g. one can read

var plansForOffer = db.PlansFor(week).Where(p => p.Offers.Contains(offer.Id));
if (plansForOffer.Any()) { 
  lastDay = plansForOffer.Max(p => p.ProposedDay);
}

surely quite well; but here alone do the .Any() and the .Max() loop over similar data needlesly, and probably the helper PlansFor(…) does something like it, and then run that in a loop over many “offers” and quickly, there goes your performance down the drain only because your customers have now 40 instead of 20 entities.

To cut a long story short – with given .NET and Entity Framework in place, and in the situation of now-the-customer-finds-it-odd-that-some-queries-take-seven-minutes-to-completion, I did some profiling. In software where there are few users and ideally one instance of the software on one dedicated machine, memory is not the issue. So I contrasted the readable, “agile” version with some queries at startup and converged at the following pattern

var allOffers = await db.Offers
    .Where(o => ... whatever interests you ... )
    .Include(o => ... whatever EF relations you need ...)
    .ToListAsync();
var allOfferIds = allOffsets
    .Select(o => o.Id)
    .ToArray();
var allPlans = await db.PlanningUnits
    .Where(p => p.Contains(offer.Id))
    .AsNoTracking()
    .ToListAsync();
var allPlanIds = allPlans
    .Select(p => p.Id)
    .ToArray();
var allAppointments = await db.Appointments
    .Where(a => allPlanIds.Contains(a.PlanId))
    .AsNoTracking()
    .ToListAsync();
var requiredSupervisors = allAppointments
    .Select(a => a.Supervisor)
    .Distinct()
    .ToArray();
var requiredRooms = await db.Rooms
    .Where(r => requiredSupervisors.Contains(r.SupervisorId)
    .AsNoTracking()
    .ToDictionaryAsync(r => r.Id);

// ...

return (
    allOffers
    .Select(o => CollectEverythingFromMemory(o, week, allOffers, allPlans, allAppointments, ...))
    .ToDictionary(o => o.Id);
);
    

So the patterns are

  • Collect every data pool that you will possibly query as completely as you need
  • Load it into memory as .ToList() and where you can – e.g. in a server application, use await … .ToListAsync() in order to ease the the request thread pool
  • With ID lists and subsequent .Contains(), the .ToArray() call is even enough as arrays come with less overhead – although this is less important here.
  • Use .ToDictionaryAsync() for constant-time lookups (although more modern .NET/EF versions might have advanced functions there, this is the more basic fallback that worked for me)
  • Also, note the .AsNoTracking() where you are querying the database only (if not mutating any of this data), so EF can save all the memory overhead.

With that pattern, all inner queries transform to quite efficient in-memory-filtering of appearance like

var lastDay = allPlans
    .Where(p => p.Week == week
        && allPlanIds.Contains(p.Id)
        && planOfferIds.Contains(offer.Id)
    )
    .Max(p => p.ProposedDay);

and while this greatly blew up function signatures, and also required some duplication because these inner functions are not as modular, as stand-alone anymore, our customers now enjoyed a reduction in query size of really

  • Before: ~ 7 Minutes
  • After: ~ half a second

i.e. a 840-fold increase in performance.

Conclusion: not regretting much

It is somewhat of a dilemma. The point of “optimization…? don’t!” in its formulation as First and Second Rule holds true. I would not have written the first version of the algorithm in that consolidated .ToList(), .ToArray()., ToDictionary() shape when still in a phase where the customer gets new ideas every few days. You will need code that is easier to change, and easier to reckon about.

By the way – cf. the source again – ,the Third Rule is “Profile Before Optimizing”. When dealing with performance, it’s always crucial to find the largest culprit first.

And then, know that there are general structures which make such queries efficient. I can apply these thoughts to other projects and probably do not have to finely dig into the exact details of other algorithms, I just need to make that trade-off above.

Scheduling Jobs in Oracle Database for Reliable Background Tasks

As a software developer, you often write code that must run reliably in the background: data cleanup, periodic recalculations, imports, exports, and batch processing. Pushing this responsibility to external scripts or application-level schedulers is one way to do it, but it can add complexity and increase failure points.

Oracle’s job scheduling feature lets you move this logic into the database, close to the data it operates on, using plain SQL and PL/SQL. The result is simpler code, fewer moving parts, and background tasks that run predictably without constant supervision.

The Scheduler

This feature is provided by the DBMS_SCHEDULER package.

A basic example is a daily cleanup task. Suppose you have a table called ORDERS and you want to remove records older than five years every night. First, you create a stored procedure that performs the cleanup.

CREATE OR REPLACE PROCEDURE cleanup_old_orders IS
BEGIN
  DELETE FROM orders
  WHERE order_date < ADD_MONTHS(SYSDATE, -60);
  COMMIT;
END;
/

Next, you create a scheduler job that runs this procedure every day at 2 a.m.:

BEGIN
  DBMS_SCHEDULER.create_job (
    job_name        => 'cleanup_old_orders_job',
    job_type        => 'STORED_PROCEDURE',
    job_action      => 'CLEANUP_OLD_ORDERS',
    start_date      => TIMESTAMP '2026-01-01 02:00:00',
    repeat_interval => 'FREQ=DAILY;BYHOUR=2;BYMINUTE=0;BYSECOND=0',
    enabled         => TRUE
  );
END;
/

Once enabled, Oracle runs this job automatically every night.

Another common example is running a job at short, repeating intervals. For instance, you may want to refresh a summary table every 10 minutes:

BEGIN
  DBMS_SCHEDULER.create_job (
    job_name        => 'refresh_sales_summary_job',
    job_type        => 'PLSQL_BLOCK',
    job_action      => '
    BEGIN
      refresh_sales_summary;
    END;',
    repeat_interval => 'FREQ=MINUTELY;INTERVAL=10',
    enabled         => TRUE
  );
END;
/

The scheduler can also be used to run jobs only once. For example, you might need to perform a one-time data fix during a maintenance window:

BEGIN
  DBMS_SCHEDULER.create_job (
    job_name   => 'one_time_data_fix_job',
    job_type   => 'PLSQL_BLOCK',
    job_action => '
     BEGIN
       update_customer_status;
     END;',
    start_date => TIMESTAMP '2026-03-15 23:00:00',
    enabled    => TRUE
  );
END;
/
Required privileges

In addition to defining jobs, it is important to understand the required privileges. To create and manage scheduler jobs, a user needs the CREATE JOB privilege. To create jobs in another schema, CREATE ANY JOB is required. Running jobs that use DBMS_SCHEDULER also requires EXECUTE privilege on the DBMS_SCHEDULER package.

If a job runs a stored procedure, the job owner must have direct privileges on the objects used by that procedure, not privileges granted through roles. For example, if a job deletes rows from the ORDERS table, the job owner must have a direct DELETE grant on that table. For external jobs, additional privileges such as CREATE EXTERNAL JOB and specific operating system credentials are required.

These rules ensure that jobs run securely and only perform actions explicitly allowed by the database administrator.

Monitoring

For monitoring, Oracle DB stores job execution details in system views. You can check whether a job is enabled and when it last ran with a simple query:

SELECT job_name, enabled, last_start_date, last_run_duration
  FROM  dba_scheduler_jobs
  WHERE job_name = 'CLEANUP_OLD_ORDERS_JOB';

To see errors and execution history, you can query the job run details:

SELECT job_name, status, actual_start_date, run_duration, error#
  FROM  dba_scheduler_job_run_details
  WHERE job_name = 'CLEANUP_OLD_ORDERS_JOB'
  ORDER BY actual_start_date DESC;

These examples show how Oracle job scheduling works in practice. You define the SQL or PL/SQL logic, attach it to a schedule, and let the database handle execution and logging. This approach keeps automation close to the data, reduces manual intervention, and makes recurring tasks easier to manage and troubleshoot.

Schemas, naming and search path in PostgreSQL

Modern (SQL) databases provide a multi-level hierarchy of separated contexts.

A database management system (DBMS) can manage multiple databases.

A database can contain multiple schemas.

A schema usually contains multiple tables, views, sequences and other objects.

Most of the time we developers only care about our database and the objects it contains – ignoring schemas and the DBMS.

In PostgreSQL instances this means we are using the default schema – usually called public – without mentioning it anywhere. Neither in the connection string or in any queries.

Ok, and why does all of the above even matter?

Special situations

Sometimes customers or operators require us to use special schemas in certain databases. When we do not have full control over our database deployment and usage we have to adapt to the situation we encounter.

Camel-casing

One thing I really advise against is using upper case names for tables, columns and so on. SQL itself is not case sensitive but many DBMSes differentiate case when it comes to naming. So they require you to use double-quotes to reference mixed-case objects like schemas and tables, e.g. "MyImportantTable". Imho it is far better to use only lower case letters and use snake_case for all names.

Multiple schemas in one database

I do not endorse using multiple names schemas in one database. Most of the time you do not need this level of separation and can simply use multiple databases in the same DBMS with their default schemas. That way you do not have to specify the schema in your queries.

What if you are required to use a non-default schema? At least for PostgreSQL you can modify your login role to change the search path. So you can leave all your queries and maybe other deployments untouched and without some schema name scattered all around your project:

-- instead of specifying schema name like
select * from "MyTargetSchema".my_table;

-- you can alter the search path of your role
alter role my_login set search_path = public,"MyTargetSchema";

-- to only write
select * from my_table;

Wrapping it up

Today’s software systems are quite complex and the problems we are trying to solve with them have their own, ever growing complexity. So when using DBMSes follow common advice like the above to reduce complexity in your solution even if it seems to be mandated.

Sometimes there are easy to follow conventions or configuration options that can reduce your burden as a developer even if you do not have complete control over the relevant parts of the deployment environment.

Common SQL Performance Gotchas in Application Development

When building apps that use a SQL database, it’s easy to run into performance problems without noticing. Many of these issues come from the way queries are written and used in the code. Below are seven common SQL mistakes developers make, why they happen, and how you can avoid them.

Not Using Prepared Statements

One of the most common mistakes is building SQL queries by concatenating strings. This approach not only introduces the risk of SQL injection but also prevents the database from reusing execution plans. Prepared statements or parameterized queries let the database understand the structure of the query ahead of time, which improves performance and security. They also help avoid subtle bugs caused by incorrect string formatting or escaping.

// Vulnerable and inefficient
String userId = "42";
Statement stmt = connection.createStatement();
ResultSet rs = stmt.executeQuery("SELECT * FROM users WHERE id = " + userId);
// Safe and performant
String sql = "SELECT * FROM users WHERE id = ?";
PreparedStatement ps = connection.prepareStatement(sql);
ps.setInt(1, 42);
ResultSet rs = ps.executeQuery();

The N+1 Query Problem

The N+1 problem happens when an application fetches a list of items and then runs a separate query for each item to retrieve related data. For example, fetching a list of users and then querying each user’s posts in a loop. This results in one query to fetch the users and N additional queries for their posts. The fix is to restructure the query using joins or batch-fetching strategies, so all the data can be retrieved in fewer queries.

We have written about it on our blog before: Understanding, identifying and fixing the N+1 query problem

Missing Indexes

When queries filter or join on columns that do not have indexes, the database may need to scan entire tables to find matching rows. This can be very slow, especially as data grows. Adding the right indexes can drastically improve performance. It’s important to monitor slow queries and check whether indexes exist on the columns used in WHERE clauses, JOINs, and ORDER BY clauses.

Here’s how to create an index on an “orders” table for its “customer_id” column:

CREATE INDEX idx_orders_customer_id ON orders(customer_id);

Once the index is added, the query can efficiently find matching rows without scanning the full table.

Retrieving Too Much Data

Using SELECT * to fetch all columns from a table is a common habit, but it often retrieves more data than the application needs. This can increase network load and memory usage. Similarly, not using pagination when retrieving large result sets can lead to long query times and a poor user experience. Always select only the necessary columns and use LIMIT or OFFSET clauses to manage result size.

For example:

String sql = "SELECT id, name, price FROM products LIMIT ? OFFSET ?";
PreparedStatement ps = connection.prepareStatement(sql);
ps.setInt(1, 50);
ps.setInt(2, 0);
ResultSet rs = ps.executeQuery();

Chatty Database Interactions

Some applications make many small queries in a single request cycle, creating high overhead from repeated database access. Each round-trip to the database introduces latency. Here’s an inefficient example:

for (int id : productIds) {
    PreparedStatement ps = connection.prepareStatement(
        "UPDATE products SET price = price * 1.1 WHERE id = ?"
    );
    ps.setInt(1, id);
    ps.executeUpdate();
}

Instead of issuing separate queries, it’s often better to combine them or use batch operations where possible. This reduces the number of database interactions and improves overall throughput:

PreparedStatement ps = connection.prepareStatement(
    "UPDATE products SET price = price * 1.1 WHERE id = ?"
);

for (int id : productIds) {
    ps.setInt(1, id);
    ps.addBatch();
}
ps.executeBatch();

Improper Connection Pooling

Establishing a new connection to the database for every query or request is slow and resource-intensive. Connection pooling allows applications to reuse database connections, avoiding the cost of repeatedly opening and closing them. Applications that do not use pooling efficiently may suffer from connection exhaustion or high latency under load. To avoid this use a connection pooler and configure it with appropriate limits for the workload.

Unbounded Wildcard Searches

Using wildcard searches with patterns like '%term%' in a WHERE clause causes the database to scan the entire table, because indexes cannot be used effectively. These searches are expensive and scale poorly. To handle partial matches more efficiently, consider using full-text search features provided by the database, which are designed for fast text searching. Here’s an example in PosgreSQL:

SELECT * FROM articles
WHERE to_tsvector('english', title) @@ to_tsquery('database');

One of our previous blog posts dives deeper into this topic: Full-text Search with PostgreSQL.

By being mindful of these common pitfalls, you can write SQL that scales well and performs reliably under load. Good database performance isn’t just about writing correct queries – it’s about writing efficient ones.

Have you faced any of these problems before? Every project is different, and we all learn a lot from the challenges we run into. Feel free to share your experiences or tips in the comments. Your story could help someone else improve their app’s performance too.

Using GENERATED AS IDENTITY Instead of SERIAL in PostgreSQL

In PostgreSQL, the SERIAL keyword is commonly used to create auto-incrementing primary keys. While it remains supported and functional, newer versions of PostgreSQL (version 10 and later) offer a more standardized and flexible alternative: the GENERATED … AS IDENTITY syntax.

Limitations of SERIAL

When you define a column as SERIAL, PostgreSQL automatically creates and links a sequence to that column behind the scenes. But this linkage is not explicitly part of the table definition. This can complicate schema management and make the behavior of the column less transparent.

The SERIAL keyword is also not part of the official SQL standard, which may be a concern in environments where cross-database compatibility is important. Additionally, the column remains writable, meaning it’s possible to insert values manually, potentially leading to inconsistencies or conflicts.

Identity Columns

The GENERATED … AS IDENTITY syntax addresses these concerns by making the auto-increment behavior explicit and standards-compliant. An identity column is defined as follows:

CREATE TABLE users (
  id INT GENERATED ALWAYS AS IDENTITY PRIMARY KEY,
  username TEXT NOT NULL
);

This syntax makes it clear that the column is managed by the system. PostgreSQL offers two modes for identity columns:

GENERATED ALWAYS: PostgreSQL always generates a value. Manual insertion requires an override.

GENERATED BY DEFAULT: The application can supply a value, or PostgreSQL will use the next sequence value automatically.

To insert a value manually into an ALWAYS identity column, you must use the OVERRIDING SYSTEM VALUE clause:

INSERT INTO users (id, username)
  VALUES (999, 'admin') OVERRIDING SYSTEM VALUE;

Managing Sequences

Since identity columns integrate the sequence into the column definition, managing them is more straightforward. For example, to reset the sequence:

ALTER TABLE users ALTER COLUMN id RESTART WITH 1000;

The sequence is tied to the column, making it easier to inspect, back up, and restore using tools like pg_dump. This helps avoid issues that can arise with the implicit sequences used by SERIAL.

Conclusion

The GENERATED AS IDENTITY syntax offers clearer semantics, better standards compliance, and more predictable behavior than SERIAL. For new database designs, it is generally the preferred choice. While SERIAL continues to be supported, identity columns provide more transparency and control, especially in environments where portability and schema clarity are important.

Oracle and the materialized view update

Materialized views are powerful. They give us precomputed, queryable snapshots of expensive joins and aggregations. But the moment you start layering other views on top of them, you enter tricky territory.

The Scenario

You define a materialized view to speed up a reporting query. Soon after, others discover it and start building new views on top of it. The structure spreads.

Now imagine: you need to extend the base materialized view. Maybe add a column, or adjust its definition. That’s when the trouble starts.

The Problem

Unlike regular views, materialized views don’t offer a convenient CREATE OR REPLACE. You can’t just adjust the definition in place. Oracle also doesn’t allow a simple ALTER to add a column or tweak the structure—recreating the materialized views is often the only option.

Things get even more complicated when other views depend on your materialized view. In that case, Oracle won’t even let you drop it. Instead, you’re greeted with an error about dependent objects, leaving you stuck in a dependency lock-in.

The more dependencies there are, the more brittle the setup becomes. What started as a performance optimization can lock you into a rigid structure that resists change.

As a short example, let’s look at how other databases handle this scenario. In Postgres, you can drop a materialized view even if other views depend on it. The dependent views temporarily lose their base and will fail if queried, but you won’t get an error on the drop. Once you recreate the materialized view with the same name and structure, the dependent views automatically start working again.

What to Do?

That is the hard question. Sometimes you can try to hide materialized views behind stable views. Or you take the SQL of all dependent views, drop them, change the materialized view, and then recreate all dependent views— a process that can be a huge pain.

How do you manage changes to materialized views that already have dependent views stacked on top? Do you design around it, fight with rebuild scripts every time, or have another solution?

Tuning Without Dropping: Oracle’s Invisible Indexes

When tuning database performance, removing unused indexes can help reduce write overhead and improve efficiency. But dropping an index outright is risky, especially in production systems, because it’s hard to know for sure whether it’s still needed. Oracle Database offers a practical solution: invisible indexes. This feature allows you to hide an index from the optimizer without deleting it, giving you a safe way to test and fine-tune your indexing strategy.

An invisible index behaves like a regular index in most respects. It is maintained during inserts, updates, and deletes. It consumes storage and has a presence in the data dictionary. However, it is ignored by the optimizer when generating execution plans for SQL statements unless explicitly instructed to consider it.

Creating an Invisible Index

To define an index as invisible at creation time, the INVISIBLE keyword is used:

CREATE INDEX idx_salary ON employees(salary) INVISIBLE;

In this example, the index on the salary column will be maintained, but Oracle’s optimizer will not consider it when evaluating execution plans.

Making an Existing Index Invisible

You can also alter an existing index to become invisible:

ALTER INDEX idx_salary INVISIBLE;

To revert the change and make the index visible again:

ALTER INDEX idx_salary VISIBLE;

This change is instantaneous and does not require the index to be rebuilt. It is also fully reversible.

Verifying Index Visibility

To check the visibility status of an index, query the DBA_INDEXES or USER_INDEXES data dictionary view:

SELECT index_name, visibility
  FROM user_indexes
  WHERE table_name = 'EMPLOYEES';

This will show whether each index is VISIBLE or INVISIBLE.

Forcing the Optimizer to Use Invisible Indexes

By default, the optimizer does not use invisible indexes. However, you can enable their use in a specific session by setting the following parameter:

ALTER SESSION SET OPTIMIZER_USE_INVISIBLE_INDEXES = TRUE;

With this setting in place, the optimizer will consider invisible indexes as if they were visible. This is particularly useful when testing query performance with and without a specific index.

Alternatively, you can use a SQL hint to explicitly direct the optimizer to use a specific index, even if it is invisible:

SELECT /*+ INDEX(employees idx_salary) */ *
  FROM employees
  WHERE salary > 100000;

This gives fine-grained control over execution plans without changing global settings or making the index permanently visible.

Use Cases for Invisible Indexes

Invisible indexes are helpful in scenarios where performance needs to be tested under different indexing strategies. For example, if you suspect that an index is unused or causing performance issues, you can make it invisible and observe how queries behave without it. This avoids the risk of dropping an index that might still be needed.

Invisible indexes also provide a safe way to prepare for index removal in production systems. If no queries rely on the index while it is invisible, it is likely safe to drop it later.

They can also be used for temporarily disabling indexes during bulk data loads, without affecting the application logic that relies on the schema.

PostgreSQL’s Foreign Data Wrappers

When people think of SQL and PostgreSQL, they usually picture databases full of rows and columns – clean, organized, and stored on a persistent volume like a disk. But what if your data isn’t in a database at all? What if it lives on the web, behind a REST API, in JSON format?

Usually, you’d write a script. You’d use Python or JavaScript, send a request to the API, parse the JSON, and maybe insert the results into your database for analysis. But PostgreSQL has a shortcut that many people don’t know about: Foreign Data Wrappers.

Just like FUSE (Filesystem in Userspace) lets you mount cloud drives or remote folders as if they were local, Foreign Data Wrappers lets PostgreSQL mount external resources like other database management systems, files, or web APIs and treat them like SQL tables.

Example

In this article we’ll use http_fdw as an example. With it, you can run a SQL query against a URL and read the result – no extra code, no data pipeline, just SQL. Here’s how you could set that up.

Let’s say you want to explore data from JSONPlaceholder, a free fake API for testing. It has a /posts endpoint that returns a list of blog posts in JSON format.

First, make sure the extension is installed (this may require building from source, depending on your system):

CREATE EXTENSION http_fdw;

Now create a foreign server that points to the API:

CREATE SERVER json_api_server
  FOREIGN DATA WRAPPER http_fdw
  OPTIONS (uri 'https://jsonplaceholder.typicode.com/posts', format 'json');

Then define a foreign table that maps to the shape of the JSON response:

CREATE FOREIGN TABLE api_posts (
  userId integer,
  id     integer,
  title  text,
  body   text
)
SERVER json_api_server
OPTIONS (rowpath '');

Since the API returns an array of posts, and each one is an object with userId, id, title, and body, this table matches that structure.

Now you can query it:

SELECT title FROM api_posts WHERE userId = 1;

Behind the scenes, PostgreSQL sends a GET request to the API, parses the JSON, and returns the result like any other table.

Things to Keep in Mind

Not everything is perfect. http_fdw is read-only, so you can’t use it to send data back to the API. It also relies on the API being available and responsive, and it doesn’t support authentication out of the box – you’ll need to handle that with custom headers if the API requires it. Complex or deeply nested JSON might also require some extra configuration.

But for many use cases, it’s an interesting option to work with external data. You don’t have to leave SQL. You don’t have to wire up a data pipeline. You just run a query.

What are GIN Indexes in PostgreSQL?

If you’ve worked with PostgreSQL and dealt with things like full-text search, arrays, or JSON data, you might have heard about GIN indexes. But what exactly are they, and why are they useful?

GIN stands for Generalized Inverted Index. Most indexes (like the default B-tree index) work best when there’s one clear value per row – like a number or a name. But sometimes, a single column can hold many values. Think of a column that stores a list of tags, words in a document, or key-value data in JSON. That’s where GIN comes in.

Let’s walk through a few examples to see how GIN indexes work and why they’re helpful.

Full-text search example

Suppose you have a table of articles:

CREATE TABLE articles (
  id    serial PRIMARY KEY,
  title text,
  body  text
);

You want to let users search the content of these articles. PostgreSQL has built-in support for full-text search, which works with a special data type called tsvector. To get started, you’d add a column to store this processed version of your article text:

ALTER TABLE articles ADD COLUMN tsv tsvector;
UPDATE articles SET tsv = to_tsvector('english', body);

Now, to speed up searches, you create a GIN index:

CREATE INDEX idx_articles_tsv ON articles USING GIN(tsv);

With that in place, you can search for articles quickly:

SELECT * FROM articles WHERE tsv @@ to_tsquery('tonic & water');

This finds all articles that contain both “tonic” and “water”, and thanks to the GIN index, it’s fast – even if you have thousands of articles.

Array example

GIN is also great for columns that store arrays. Let’s say you have a table of photos, and each photo can have several tags:

CREATE TABLE photos (
  id   serial PRIMARY KEY,
  tags text[]
);

You want to find all photos tagged with “capybara”. You can create a GIN index on the tags column:

CREATE INDEX idx_photos_tags ON photos USING GIN(tags);
SELECT * FROM photos WHERE tags @> ARRAY['capybara'];

(The @> operator means “contains” or “is a superset of”.)

The index lets PostgreSQL find matching rows quickly, without scanning the entire table.

JSONB example

PostgreSQL’s jsonb type lets you store flexible key-value data. Imagine a table of users with extra info stored in a jsonb column:

CREATE TABLE users (
  id   serial PRIMARY KEY,
  data jsonb
);

One row might store {"age": 42, "city": "Karlsruhe"}. To find all users from New York, you can use:

SELECT * FROM users WHERE data @> '{"city": "Karlsruhe"}';

And again, with a GIN index on the data column, this query becomes much faster:

CREATE INDEX idx_users_data ON users USING GIN(data);

Things to keep in mind

GIN indexes are very powerful, but they come with some tradeoffs. They’re slower to build and can make insert or update operations a bit heavier. So they’re best when you read (search) data often, but don’t write to the table constantly.

In short, GIN indexes are your friend when you’re dealing with columns that contain multiple values – like arrays, full-text data, or JSON. They let PostgreSQL break apart those values and build a fast lookup system. If your queries feel slow and you’re working with these kinds of columns, adding a GIN index might be exactly what you need.