Re-ordering table columns in an Oracle database

In an Oracle database, once a table is created, there is no obvious way to change the order of its columns. Sometimes you add a new column to an existing table and want it to be displayed in a different position by default for query results via SELECT *. Imagine you add an ID column to a table after the fact. Wouldn’t it be nice if this appeared in the first position?

Of course you can drop the whole table and create it again with the new column order. But this is cumbersome and potentially dangerous if the table is already filled with data. However, there is a trick that allows you to rearrange the columns without having to recreate the table.

The key to this is an Oracle feature that allows invisible columns. The feature itself is interesting in its own right, but it has a useful side effect that we’ll exploit. The documentation says:

When you make an invisible column visible, the column is included in the table’s column order as the last column. When you make a visible column invisible, the invisible column is not included in the column order, and the order of the visible columns in the table might be re-arranged.

So the plan is to make the appropriate columns invisible first by clever choice, and then to make them visible again in the desired order. This is how it works:

First we have a table with the following columns.

CREATE TABLE t (a NUMBER, b NUMBER, c NUMBER, e NUMBER, f NUMBER);

Later we realize that we need a column d that should be between c and f. So we add it to the table:

ALTER TABLE t ADD (d NUMBER);

This is of course added at the end:

DESC t;
Name Null? Type   
---- ----- ------ 
A          NUMBER 
B          NUMBER 
C          NUMBER 
E          NUMBER 
F          NUMBER 
D          NUMBER 

To get it in the right position, we first hide the columns e and f, and then make them visible again.

ALTER TABLE t MODIFY (e INVISIBLE, f INVISIBLE);
ALTER TABLE t MODIFY (e VISIBLE, f VISIBLE);

And voilà, we have our desired order:

DESC t;
Name Null? Type   
---- ----- ------ 
A          NUMBER 
B          NUMBER 
C          NUMBER 
D          NUMBER 
E          NUMBER 
F          NUMBER 

Note that this doesn’t change the internal, physical layout of the table on the disk. It’s just a cosmetic change.

Full-text Search with PostgreSQL

If you want to add simple text search functionality to an application backed by an SQL database one of the first things that may come to your mind is the SQL LIKE operator. The LIKE operator and its case-insensitive sibling ILIKE find substrings in text data via wildcards such as %, which matches any sequence of zero or more characters:

SELECT * FROM book WHERE title ILIKE '%dog%'.

However, this approach satisfies only very basic requirements for text search, because it only matches exact substrings. That’s why application developers often use an external search engine like Elasticsearch based on the Apache Lucene library.

With a PostgreSQL database there is another option: it comes with a built-in full-text search. A full-text search analyzes text according to the language of the text, parses it into tokens and converts them into so-called lexemes. These are strings, just like tokens, but they have been normalized so that different forms of the same word, for example “pony” and “ponies”, are made alike. Additionally, stop words are eliminated, which are words that are so common that they are useless for searching, like “a” or “the”. For this purpose the search engine uses a dictionary of the target language.

In PostgreSQL, there are two main functions to perform full-text search: they are to_tsvector and to_tsquery. The ts part in the function names stands for “text search”. The to_tsvector function breaks up the input string and creates a vector of lexemes out of it, which are then used to perform full-text search using the to_tsquery function. The two functions can be combined with the @@ (match) operator, which applies a search query to a search vector:

SELECT title
  FROM book
  WHERE to_tsvector(title) @@ to_tsquery('(cat | dog) & pony')

The query syntax of ts_query supports boolean operators like | (or), & (and), ! (not) and grouping using parentheses, but also other operators like and <-> (“followed by”) and * (prefix matching).

You can specify the target language as a parameter of to_tsvector:

# SELECT to_tsvector('english', 'Thousands of ponies were grazing on the prairie.');

'graze':5 'poni':3 'prairi':8 'thousand':1

Here’s another example in German:

# SELECT to_tsvector('german', 'Wer einen Fehler begeht, und ihn nicht korrigiert, begeht einen zweiten (Konfuzius)');

'begeht':4,9 'fehl':3 'konfuzius':12 'korrigiert':8 'wer':1 'zweit':11

PostgreSQL supports dictionaries for about 80+ languages out-of-the-box.

The examples in this article are just a small glimpse of what is possible with regards to full-text search in PostgreSQL. If you want to learn more you should consult the documentation. The key takeaway is that there is another option between simple LIKE clauses and an external search engine.

Commenting SQL database objects

Did you know that you can annotate database object like tables, views and columns with comments in many SQL database systems? By that I don’t mean comments in SQL scripts, indicated by double dashes (--), but comments attached to the objects themselves, stored in the database. These may be helpful to the database admin by providing context via a description text on what is stored in these objects.

For PostgreSQL and Oracle databases the syntax is as follows:

COMMENT ON TABLE [schema_name.]table_name IS '...';
COMMENT ON COLUMN [schema_name.]table_name.column_name IS '...';

For example:

COMMENT ON COLUMN books.author IS 'The main author''s last name';
COMMENT ON TABLE books IS 'Contains only the best books';

These comments can be viewed in database tools like SQL Developer:

Comments on columns
Comments on tables

You can also view the comments in psql:

db=# \d+ books
 Column |  Type   |          Description
--------+---------+------------------------------
id      | integer |
author  | text    | The main author''s last name
title   | text    |

And for a table:

db=# \dt+ books
                    List of relations
 Schema | Name  | Type  |     |        Description
--------+-------+-------+ ... +------------------------------
public  | books | table |     | Contains only the best books

In Oracle you can query the comments from the data dictionary views ALL_TAB_COMMENTS and ALL_COL_COMMENTS:

> SELECT * FROM all_col_comments WHERE table_name='BOOKS';
OWNER    TABLE_NAME  COLUMN_NAME  COMMENTS
--------------------------------------------------------------
LIBRARY	 BOOKS	     ID           (null)
LIBRARY	 BOOKS	     AUTHOR       The main author's last name
LIBRARY	 BOOKS	     TITLE        (null)

> SELECT * FROM all_tab_comments WHERE table_name='BOOKS';
OWNER    TABLE_NAME  TABLE_TYPE  COMMENTS
--------------------------------------------------------------
LIBRARY	 BOOKS	     TABLE       Contains only the best books

In Oracle comments are limited to tables, views, materialized views, columns, operators and indextypes, but in PostgreSQL you can attach comments to nearly everything. Another good use case for this are documentation comments on database functions:

COMMENT ON FUNCTION my_function IS $$
This function does something important.

Parameters:
...
Example usage:
...
$$;

Note: the $$ delimits multi-line strings (called dollar quoted string constants).

Generating Rows in Oracle Database

Sometimes you want to automatically populate a database table with a number rows. Maybe you need a big table with lots of entries for a performance experiment or some dummy data for development. Unfortunately, there’s no standard SQL statement to achieve this task. There are different possibilities for the various database management systems. For the Oracle database (10g or later) I will show you the simplest one I have encountered so far. It actually “abuses” an unrelated functionality: the CONNECT BY clause for hierarchical queries in combination with the DUAL table.

Here’s how it can be used:

SELECT ROWNUM id
FROM dual
CONNECT BY LEVEL <= 1000;

This select creates a result set with the numbers from 1 to 1000. You can combine it with INSERT to populate the following table with rows:

CREATE TABLE example (
  id   NUMBER(5,0),
  name VARCHAR2(200)
);

INSERT INTO example (id, name)
SELECT ROWNUM, 'Name '||ROWNUM
FROM dual
CONNECT BY LEVEL <= 10;

The resulting table is:

ID  NAME
1   Name 1
2   Name 2
3   Name 3
...
10  Name 10

Of course, you can use the incrementing ROWNUM in more creative ways. The following example populates a table for time series data with a million values forming a sinus curve with equidistant timestamps (in this case 15 minute intervals) starting with a specified time:

CREATE TABLE example (
  id    NUMBER(5,0),
  time  TIMESTAMP,
  value NUMBER
);

INSERT INTO example (id, time, value)
SELECT
  ROWNUM,
  TIMESTAMP'2020-05-01 12:00:00'
     + (ROWNUM-1)*(INTERVAL '15' MINUTE),
  SIN(ROWNUM/10)
FROM dual
CONNECT BY LEVEL <= 1000000;
ID  TIME              VALUE
1   2020-05-01 12:00  0.099833
2   2020-05-01 12:15  0.198669
3   2020-05-01 12:30  0.295520
...

As mentioned at the beginning, there are other row generator techniques to achieve this. But this one is the simplest so far, at least for Oracle.

Database table naming conventions

Naming things well is an important part of writing maintainable software, and renaming things once their names have become established in a code base can be tedious work. This is true as well for the names of an application database schema, where a schema change usually requires a database migration script. That’s why you should take some time beforehand to set up a naming convention.

Many applications use object-relational mappers (ORM), which have a default naming convention to map class and property names to table and column names. But if you’re not using an ORM, you should set up conventions as well. Here are some tips:

  • Be consistent. For example, choose either only plural or singular for table names, e.g. “books” or “book”, and stick to it. Many sources recommend singular for table names.
  • On abbreviations: Some database systems like Oracle have a character limit for names. The limit for Oracle database table names is 30 characters, which means abbreviations are almost inevitable. If you introduce abbreviations be consistent and document them in a glossary, for example in the project Wiki.
  • Separate word boundaries with underscores and form a hierarchy like “namespace_entity_subentity”, e.g. “blog_post_author”. This way you can sort the tables by name and have them grouped by topic.
  • Avoid unnecessary type markers. A table is still a table if you don’t prefix it with “tbl_”, and adding a “_s” postfix to a column of type string doesn’t really add useful information that couldn’t be seen in the schema browser of any database tool. This is similar to Hungarian notation, which has fallen out of use in today’s software development. If you still want to mark special database objects, for example materialised views, then you should prefer a postfix, e.g. “_mv”, over a prefix, because a prefix would mess up the lexicographic hierarchy established by the previous tip.

And the final advice: Document your conventions so that other team members are aware of them and make them mandatory.

Organize complex SQL queries with Common Table Expressions

Complex SQL database queries often contain subqueries.

SELECT * FROM ... 
   WHERE name IN (SELECT name 
         FROM ... 
         WHERE ...)

These can quickly become unreadable, especially if multiple subqueries are involved. A nice way to organise such queries with multiple subqueries is called Common Table Expressions (CTE), or colloquially: “WITH queries”. They are supported by most contemporary SQL databases.

When using Common Table Expressions you list all the subqueries after a WITH keyword in front of the actual query and assign a name to each subquery:

WITH
  <subquery1_name> AS (SELECT ...),
  <subquery2_name> AS (SELECT ...),
  [ ... ]
SELECT ...
FROM ...
[ ... ]

You can now refer to these subqueries in the main query by their names. They are conceptually just like temporary views. You can also refer to a subquery in the main query multiple times, whereas if the subquery was inlined you would have to repeat it. A subquery defined as a Common Table Expression can also refer to the preceding subqueries in the subquery list.

Recursion

A Common Table Expression can even refer to itself, which is a recursive definition. In some database systems you have to add the RECURSIVE keyword after WITH, in others you can leave it out. Here’s a recursive CTE that calculates factorial numbers:

WITH RECURSIVE factorials (n, fact) AS 
  (SELECT 0, 1
   UNION ALL
   SELECT n+1, (n+1)*fact FROM factorials
          WHERE n < 6)
SELECT * FROM factorials;
N FACT
0    1
1    1
2    2
3    6
4   24
5  120

You could also use such a recursive query to descend into a tree hierarchy.

While the use cases for recursive queries are less frequent, I find the general concept of Common Table Expressions very useful to make complex queries more readable.

Determining the sizes of Oracle database tables and indexes

For one of our projects we store large amounts of timeseries data in an Oracle database. Sometimes we want to get an overview of how big the tables and related indexes are. Some database client tools like Toad for Oracle can show this information directly in their user interface, but if you use other tools like the SQuirreL SQL Client or JetBrains DataGrip you have to gather this information yourself via SQL queries.

DBA_SEGMENTS and DBA_INDEXES

For Oracle databases this meta information is available via the DBA_SEGMENTS and DBA_INDEXES tables. To query the sizes of several tables in MB use the following query:

SELECT segment_name, segment_type, bytes/1024/1024 MB
  FROM dba_segments
  WHERE segment_type = 'TABLE'
    AND segment_name IN ('TABLE_NAME_1', 'TABLE_NAME_2');

This SQL query returns the sizes of TABLE_NAME_1 and TABLE_NAME_2.

If you want to see the sizes of all the indexes that are associated with a table or a set of tables you can use the following query:

SELECT idx.table_name, idx.index_name, SUM(bytes)/1024/1024 MB
  FROM dba_segments seg,
       dba_indexes idx
  WHERE idx.table_owner = 'SCHEMA_NAME'
    AND idx.table_name IN ('TABLE_NAME_1', 'TABLE_NAME_2')
    AND idx.owner       = seg.owner
    AND idx.index_name  = seg.segment_name
  GROUP BY idx.index_name, idx.table_name;

Of course, you have to replace SCHEMA_NAME, and TABLE_NAME_x with the names in your database.

Unfortunately, access to this kind of meta information is different for each database system, and the queries above only work for Oracle databases.

Selecting all columns of a database table with an SQL GROUP BY expression

Suppose we have an SQL database table named “temperatures” with the following contents:

LOCATION  TIME        CELSIUS
inside    2018-08-01  24
inside    2018-08-02  28
inside    2018-08-03  21
inside    2018-08-04  28
outside   2018-08-01  29
outside   2018-08-02  31
outside   2018-08-03  25
outside   2018-08-04  30

We want to find the highest temperature for each location. We use the MAX aggregate function and a GROUP BY expression:

SELECT location, MAX(celsius) celsius
FROM temperatures
GROUP BY location;

As expected, the result is:

LOCATION  CELSIUS
outside   31
inside    28

Now suppose we also want to know when each of these extreme temperatures occured. Naively, we try the following query:

SELECT location, time, MAX(celsius) celsius
FROM temperatures
GROUP BY location;

The response is an error: “not a GROUP BY expression”. In a GROUP BY expression all selected columns must be either part of the GROUP BY clause or an aggregate.

To achieve what we want we can use a JOIN:

SELECT
  t.location, t.time, t.celsius
FROM
  temperatures t
JOIN (SELECT location, MAX(celsius) celsius
      FROM temperatures
      GROUP BY location) tmax
ON
  t.location=tmax.location AND t.celsius=tmax.celsius;

This query results in multiple rows per location if the maximum temperature was recorded at different times:

LOCATION  TIME        CELSIUS
outside   2018-08-02  31
inside    2018-08-04  28
inside    2018-08-02  28

If we are only interested in the first occurrence of the maximum temperature per location, we can use the following query:

SELECT
  location,
  MIN(time) KEEP (DENSE_RANK LAST ORDER BY celsius) time,
  MAX(celsius) celsius
FROM
  temperatures
GROUP BY
  location;
LOCATION  TIME        CELSIUS
inside    2018-08-02  28
outside   2018-08-02  31

Here we don’t need a JOIN anymore, because select clause for the time column is an aggregate as well.

Handling database warnings with JDBC

Database administrators have the possibility to set lifetimes for user passwords. This can be considered a security feature, so that passwords get updated regularly. But if one of your software services logs into the database with such an account, you want to know when the password expires in good time before this happens, so that you can update the password. Otherwise your service will stop working unexpectedly.

Of course, you can mark the date in your calendar in order to be reminded beforehand, and you probably should. But there is an additional measure you can take. The database administrator can not only set the lifetime of a password, but also a “grace period”. For example:

ALTER PROFILE app_user LIMIT PASSWORD_LIFE_TIME 180 PASSWORD_GRACE_TIME 14;

This SQL command sets the password life time to 180 days (roughly six months) and the grace period to 14 days (two weeks). If you log into the database with this user you will see a warning two weeks before the password will expire. For Oracle databases the warning looks like this:

ORA-28002: the password will expire within 14 days

But your service logs in automatically, without any user interaction. Is it possible to programmatically detect a warning like this? Yes, it is. For example, with JDBC the following code detects warnings after a connection was established:

// Error codes for ORA-nnnnn warnings
static final int passwordWillExpireSoon = 28002;
static final int accountWillExpireSoon = 28011;

void handleWarnings(Connection connection) throws SQLException {
    SQLWarning warning = connection.getWarnings();
    while (null != warning) {
        String message = warning.getMessage();
        log.warn(message);

        int code = warning.getErrorCode();
        if (code == passwordWillExpireSoon) {
            System.out.println("ORA-28002 warning detected");
            // handle appropriately
        }
        if (code == accountWillExpireSoon) {
            System.out.println("ORA-28011 warning detected");
            // handle appropriately
        }
        warning = warning.getNextWarning();
    }
}

Instead of just logging the warnings, you can use this code to send an email to your address, so that you will get notified about a soon-to-be-expired password in advance. The error code depends on your database system.

With this in place you should not be unpleasantly surprised by an expired password. Of course, this only works if the administrator sets a grace period, so you should agree on this approach with your administrator.

Monitoring long running operations in Oracle databases

We regularly work with database tables with hundreds of millions of entries. Some operations on these table can take a while. Not necessarily queries, but operations in preparation to make queries fast, for example the creation of materialized views or indexes.

The problem with most SQL tools is: once you run your SQL statement you have no indication of how long it will take to complete the operation. No progress bar and no display of the remaining time. Will it take minutes or hours?

Oracle databases have a nice feature I learned about recently that can answer these questions. Operations that take longer than 6 seconds to complete are considered “long operations” and get an entry in a special view called V$SESSION_LONGOPS.

This view does not only contain the currently running long operations but also the history of completed long operations. You can query the status of the current long operations like this:

SELECT * FROM V$SESSION_LONGOPS 
  WHERE time_remaining > 0;

This view contains columns like

  • TARGET (table or view on which the operation is carried out)
  • SOFAR (units of work done so far)
  • TOTALWORK (total units of work)
  • ELAPSED_SECONDS (number of elapsed seconds from the start of the operation)

Based on these values the view offers another column, which contains the estimated remaining time in seconds: TIME_REMAINING.

This remaining time is really just an estimate, because it assumes long running operations to be linear, which is not necessarily true. Also some SQL statements can spawn multiple consecutive operations, e.g. first a “Table Scan” operation and then a “Sort Output” operation, which will only become visible after the first operation has finished. Nevertheless I found this feature quite helpful to get a rough idea of how long I will have to wait or to inform decisions such as whether I really want to perform an operation until completion or if I want to cancel it.