Unveiling the secrets of invisible database columns

After my last blog post, where I wrote about Generated and Virtual Columns, I would like to dedicate this post to another type of database column: Invisible Columns. This feature exists in MySQL since version 8.0 and in Oracle Database since version 12c. PostgreSQL and MS SQL Server do not support this feature.

Invisible columns, as the name suggests, are columns within a table that are hidden from standard query results by default. Unlike traditional columns that are visible and accessible in query results, invisible columns are not included unless explicitly specified in the query.

This feature provides a level of control over data visibility, allowing developers to hide certain columns from applications or other database users while still retaining their functionality within the database.

Defining invisible columns

When creating a table in MySQL or Oracle, you can designate certain columns as invisible by using the INVISIBLE keyword in the column definition. For example:

CREATE TABLE your_table (
  visible_column   INT,
  invisible_column INT INVISIBLE
);

In this example, the invisible_column is marked as invisible, while the visible_column remains visible by default. To alter an existing table and make a column invisible:

ALTER TABLE your_table
  MODIFY COLUMN existing_column_name INVISIBLE;

Replace your_table with the name of your table and existing_column_name with the name of the column you want to make invisible.

When querying the your_table, the invisible column will not be included in the result set unless explicitly specified:

SELECT * FROM your_table;

visible_column
--------------
   4
   8
  15

By default, invisible columns are hidden from query results, providing a cleaner and more concise view of the data. However, developers can still access invisible columns when needed by explicitly including them in the query:

SELECT visible_column, invisible_column FROM your_table;

visible_column | invisible_column
---------------------------------
   4           |   16
   8           |   23
  15           |   42
Unveiling invisible columns

To list the invisible columns of a table in MySQL, you can query the information_schema.columns system table and filter the results based on the COLUMN_DEFAULT column. Invisible columns have NULL as their default value. Here’s a simple SQL query to accomplish this:

SELECT COLUMN_NAME
  FROM information_schema.columns
  WHERE TABLE_SCHEMA = 'your_database'
    AND TABLE_NAME = 'your_table'
    AND COLUMN_DEFAULT IS NULL;

In Oracle, you can query the USER_TAB_COLUMNS or ALL_TAB_COLUMNS data dictionary views to list the invisible columns of a table. Here’s how you can do it:

SELECT COLUMN_NAME
  FROM USER_TAB_COLUMNS
  WHERE TABLE_NAME = 'your_table'
    AND INVISIBLE = 'YES';

If you want to list invisible columns from all tables in the current schema, you can use the ALL_TAB_COLUMNS view instead:

SELECT TABLE_NAME, COLUMN_NAME
  FROM ALL_TAB_COLUMNS
  WHERE INVISIBLE = 'YES';
Are invisible columns actually useful?

Invisible columns can make schema evolution easier by providing a flexible mechanism for evolving database schemas over time without disrupting existing applications or queries. You can test new features or data structures without committing to them fully. Invisible columns provide a way to add experimental columns to your tables without exposing them to production environments until they are fully tested and ready for use.

They can create cleaner and more concise views of your data by hiding less relevant columns. This can make it easier for developers, analysts, and users to work with the data without unnecessary clutter. However, I would argue that this is also achievable with normal database views.

The downside of introducing invisible columns is that they add complexity to the database schema, which can make it harder to understand and maintain, especially for developers who are not familiar with the invisible columns feature. They also add potential for confusion: Developers may forget about the presence of invisible columns, leading to unexpected behavior in queries or applications.

You probably shouldn’t use them to hide sensitive data, since invisible columns don’t have any additional access control, and security through obscurity is not a good idea. If you grant SELECT permission on the table to a user, they will be able to query visible and invisible columns alike.

Now that you know about them, you can make your own choice.

Half table, half view: Generated Columns

Anyone familiar with SQL database basics knows the two fundamental structures of relational databases: tables and views. Data is stored in tables, while views are virtual tables calculated on-the-fly from a SQL query. Additionally, relational database management systems often support materialized views, which, like views, are based on a query from other tables, but their results are actually persisted and only recalculated as needed.

What many don’t know is that the most common SQL databases (PostgreSQL, MySQL, Oracle) nowadays also support something in between: we’re talking about Generated Columns, which will be introduced in this blog post.

So, what are Generated Columns? Generated Columns are columns within a normal database table. But unlike regular columns, their values are not stored as independent individual values; rather, they are computed from other values in the table.

Below is an example of how to define a Generated Column. The example is for PostgreSQL, but the syntax is similar in other popular relational database systems that support this feature.

CREATE TABLE products (
id SERIAL PRIMARY KEY,
name VARCHAR(100),
quantity INTEGER,
unit_price DECIMAL(10, 2),
total_price DECIMAL(10, 2) GENERATED ALWAYS
AS (quantity * unit_price) STORED
);

As seen above, a Generated Column is defined with the keywords GENERATED ALWAYS AS. The GENERATED ALWAYS is even optional (you could just write AS), but it clarifies what it’s about. Following AS is the expression that computes the value.

At the end of the column definition, either the keyword STORED or VIRTUAL can be used. In the example above, it says STORED, which means the computed value is physically stored on the disk. The value is recalculated and stored only after an INSERT or UPDATE. In contrast, with VIRTUAL, the value is not stored but always computed on-the-fly. Thus, virtual Generated Columns behave similarly to a view, while STORED is more comparable to a materialized view.

The choice between the two options depends on the specific requirements. Stored Generated Columns consume more disk space, while virtual Generated Columns save space at the expense of performance.

In the expression following AS, other columns of the table can be referenced. Even other Generated Columns can be referenced, as long as they are specified in the table definition before the current column. However, SQL subqueries cannot be used in the expression.

In conclusion, Generated Columns are a useful feature that combines parts of a table with the benefits of a view.

SQL Database Window Functions

Window functions allow users to perform calculations across a set of rows that are somehow related to the current row. This can include calculations like running totals, moving averages, and ranking without the need to group the entire query into one aggregate result.

Despite their flexibility, window functions are sometimes underutilised, either because users are unaware of them or because they’re considered too complex for everyday tasks. Learning how to effectively use window functions can improve the efficiency and readability of SQL queries, particularly for reporting and data analysis purposes. This article will explore several use cases.

Numbering Rows

The simplest application area for window functions is the numbering of rows. The ROW_NUMBER() function assigns a unique number to each row within the partition of a result set. The numbering is sequential and starts at 1. It’s useful for creating a unique identifier for rows within a partition, even when the rows are identical in terms of data.

Consider the following database table of library checkouts:

bookcheckout_datemember_id
The Great Adventure2024-02-15102
The Great Adventure2024-01-10105
Mystery of the Seas2024-01-20103
Mystery of the Seas2024-03-01101
Journey Through Time2024-02-01104
Journey Through Time2024-02-18102

We want to assign a unique row number to each checkout instance for every book, ordered by the checkout date to analyze the circulation trend:

SELECT
 book,
checkout_date,
member_id,
ROW_NUMBER() OVER (PARTITION BY book ORDER BY checkout_date) AS checkout_order
FROM library_checkouts;

The result:

bookcheckout_datemember_idcheckout_order
The Great Adventure2024-01-101051
The Great Adventure2024-02-151022
Mystery of the Seas2024-01-201031
Mystery of the Seas2024-03-011012
Journey Through Time2024-02-011041
Journey Through Time2024-02-181022

Ranking

In the context of SQL and specifically regarding window functions, “ranking” refers to the process of assigning a unique position or rank to each row within a partition of a result set based on a specified ordering.

The RANK() function provides a ranking for each row within a partition, with gaps in the ranking sequence when there are ties. It’s useful for ranking items that have the same value.

Consider the following database table of scores in a game tournament:

playergamescore
AliceSpace Invaders4200
BobSpace Invaders5700
CharlieSpace Invaders5700
DanaDonkey Kong6000
EveDonkey Kong4800
FrankDonkey Kong6000
AliceAsteroids8500
BobAsteroids9300
CharlieAsteroids7600

We want to rank the players within each game based on their score, with gaps in rank for ties:

SELECT
 player,
 game,
score,
RANK() OVER (PARTITION BY game ORDER BY score DESC) AS rank
FROM scores;

The result looks like this:

playergamescorerank
BobSpace Invaders57001
CharlieSpace Invaders57001
AliceSpace Invaders42003
DanaDonkey Kong60001
FrankDonkey Kong60001
EveDonkey Kong48003
BobAsteroids93001
AliceAsteroids85002
CharlieAsteroids76003

If you don’t want to have gaps in the ranking sequence when there are ties, you can substitute DENSE_RANK() for RANK().

Cumulative Sum

The SUM() function can be used as a window function to calculate the cumulative sum of a column over a partition of rows.

Example: We are tracking our garden’s vegetable harvest in a database table, and we want to calculate the cumulative yield for each type of vegetable over the harvesting season.

vegetableharvest_dateyield_kg
Carrots2024-06-1810
Carrots2024-07-1015
Tomatos2024-06-1520
Tomatos2024-07-0130
Tomatos2024-07-2025
Zucchini2024-06-2015
Zucchini2024-07-0520

We calculate the running total (cumulative yield) for each vegetable type as the season progresses, using the SUM() function:

SELECT
 vegetable,
harvest_date,
yield_kg,
SUM(yield_kg) OVER (PARTITION BY vegetable ORDER BY harvest_date ASC) AS cumulative_yield
FROM garden_harvest;

Now we can see which vegetables are most productive and how yield accumulates throughout the season:

vegetableharvest_dateyield_kgcumulative_yield
Carrots2024-06-181010
Carrots2024-07-101525
Tomatos2024-06-152020
Tomatos2024-07-013050
Tomatos2024-07-202575
Zucchini2024-06-201515
Zucchini2024-07-052035

Time travel with Oracle database’s Flashback Queries

Oracle’s database management system offers a feature known as Flashback Queries, allowing users to peek into the past and retrieve data as it existed at a previous point in time. This functionality can eliminate the need for manual data restoration from backups, making it a useful asset for both developers and database administrators.

Enabling Flashback Queries

Before using Flashback Queries, ensure that the database has the required configuration. Firstly, confirm that the database’s DB_FLASHBACK_RETENTION_TARGET parameter is appropriately set. This parameter defines the period for which historical data is retained. Adjust it based on your organization’s data retention policies. Before making changes, you can check its current value:

SHOW PARAMETER DB_FLASHBACK_RETENTION_TARGET;

Use the ALTER SYSTEM command to set the parameter. For example, to set it to retain data for 7 days:

ALTER SYSTEM SET DB_FLASHBACK_RETENTION_TARGET=604800 SCOPE=BOTH;

604800 is the retention period in seconds (7 days × 24 hours × 60 minutes × 60 seconds). Please note that setting the parameter to a higher value consumes more space in the flashback recovery area, so consider your storage constraints. SCOPE=BOTH ensures that the change persists across database restarts, i.e. it changes the value both in memory and in the server parameter file.

To enable Flashback Queries for a specific table, execute the ALTER TABLE command with the FLASHBACK option:

ALTER TABLE table_name FLASHBACK ARCHIVE;

This setup allows Oracle to maintain historical changes for the specified table.

Using Flashback Queries

Consider a scenario where an employee accidentally updates critical data in the employees table. With Flashback Queries, you can rectify the mistake:

SELECT * FROM employees AS OF TIMESTAMP TO_TIMESTAMP('2023-11-10 15:00:00', 'YYYY-MM-DD HH24:MI:SS');

This query retrieves the data from the employees table as it existed before the erroneous update.

You can also recover dropped tables if they are still within the retention period:

FLASHBACK TABLE orders TO BEFORE DROP;

This command restores the dropped table and its data.

Flashback Queries offer a mechanism to navigate through time within a database, providing a simple way to recover historical data or inspect changes. They stand as a useful asset in the arsenal of database administrators and developers, fostering greater confidence in managing data.

PostgreSQL’s auto-explain feature and execution plans

PostgreSQL’s auto-explain is a built-in feature that automatically generates and logs execution plans for SQL statements. It’s a useful tool for developers to understand how the query planner is executing SQL queries.

You enable and configure auto-explain by setting parameters in the PostgreSQL configuration file (postgresql.conf). Set auto_explain.log_analyze to on to log execution plans along with statistics, and set auto_explain.log_min_duration to specify the minimum execution time in milliseconds that a query must take to be logged. For example, if you want to log queries taking longer than 100 milliseconds, set it to 100. Set auto_explain.log_buffers to on if you want to include information about memory usage, and auto_explain.log_timing to log timing information.

Here’s an example of how to configure these parameters in postgresql.conf:

auto_explain.log_analyze = on
auto_explain.log_buffers = on
auto_explain.log_timing = on
auto_explain.log_min_duration = 100

Reading the execution plan

Suppose you have a “recipe database” that stores recipes, ingredients, and chefs. You want to retrieve a list of recipes along with the names of the chefs who created them and the ingredients they contain. Here’s a query that accomplishes this:

SELECT recipes.recipe_name, chefs.chef_name, ingredients.ingredient_name
  FROM recipes
  JOIN chefs ON recipes.chef_id=chefs.chef_id
  JOIN recipe_ingredients ON recipes.recipe_id=recipe_ingredients.recipe_id
  JOIN ingredients ON recipe_ingredients.ingredient_id=ingredients.ingredient_id
WHERE recipes.cuisine='Italian';

This query fetches Italian recipes, their respective chefs, and the ingredients they use.

When you run this query with auto-explain enabled, PostgreSQL will log the execution plan. The query plan might look something like this:

Hash Join  (cost=100.25..350.75 rows=50 width=96)
  Hash Cond: (recipe_ingredients.recipe_id = recipes.recipe_id)
  ->  Hash Join  (cost=50.12..200.37 rows=50 width=60)
        Hash Cond: (recipes.chef_id = chefs.chef_id)
        ->  Seq Scan on recipes  (cost=0.00..100.00 rows=50 width=24)
              Filter: (cuisine = 'Italian'::text)
        ->  Hash  (cost=30.00..30.00 rows=1000 width=36)
              ->  Seq Scan on chefs  (cost=0.00..30.00 rows=1000 width=36)
  ->  Hash  (cost=30.00..30.00 rows=1000 width=36)
        ->  Seq Scan on recipe_ingredients  (cost=0.00..30.00 rows=1000 width=36)
              Filter: (recipe_id IS NOT NULL)

In this query plan Hash Join indicates a join operation using a hash-based algorithm. Seq Scan signifies a sequential scan of the table, which might imply a full table scan. Hash Cond shows the join condition for the respective hash join.

cost represents the estimated execution cost for each operation, and rows indicates the estimated number of rows returned by each operation.

The estimated cost in PostgreSQL query execution plans is typically represented in an abstract unit known as “cost units.” These cost units are used for relative cost estimation and are not expressed in any specific real-world measurement like time or money. They are designed to provide a relative measure of the cost of different query plan operations so that the query planner can make informed decisions about which plan to choose.

Reading this plan, PostgreSQL starts by filtering Italian recipes (a Seq Scan with a filter). It then joins the recipes with chefs using a hash join, and the result is further joined with ingredients using another hash join. The cost values provide relative estimates of resource usage, allowing you to identify potentially expensive parts of the query, and you can consider improving the performance of the SQL statement with optimisations like indexing.

Porting an application from Oracle to PostgreSQL

Currently, we are in the process of porting a larger application from an Oracle database to a PostgreSQL database as an additional database system. The aim is for the application to be functional not only with one database system but with both: with Oracle for one customer and with PostgreSQL for another customer.

The first step in this process was to conduct a high-level analysis of what effort this would entail and where the application would need to be adjusted. The application consists of multiple services. A frontend service in Ruby on Rails and a variety of backend services in Java. Rails uses an O/R mapper called Active Record, which is relatively database-agnostic. No major adaptation effort was expected here. Ultimately, only the Active Record adapter needs to be configured per instance.

The Java services do not use an O/R mapper. They make direct SQL queries to the target database system using JDBC. This is where the greatest potential for issues existed. Fortunately, all database queries here are encapsulated in repository classes, making the SQL queries easily accessible and adaptable for different database management systems. However, this has not proven necessary so far. The SQL queries are written in standard SQL, without using database-specific features. The code is limited to basic features such as tables, views, sequences, SELECTs, INSERTs, UPDATEs, and JOINS, all of which are supported by both databases. Particularly, no stored procedures are used, which wouldn’t necessarily be portable.

For implementation, the initial goal was to make the existing Oracle database schema of the application reproducible on a PostgreSQL database, in a way that a new instance can be set up fresh in a Docker container at any time.

The best way to achieve this is with a database migration tool. The options considered were: Active Record Migrations, Liquibase, and Flyway. We ultimately chose Liquibase, which we have had positive experiences with. It can be used standalone and allows for formulating migrations in a database-independent manner. Additionally, it offers the option to export an existing schema as a migrations changelog file using the generate-changelog command.

We only had to slightly adjust the generated changelog to make it work for both database systems. In a Liquibase changelog, you can define variables referred to as properties, which can take on different values for various database systems. One use case for this is having different names for column data types:

<property name="varchar" value="varchar" dbms="postgresql"/>
<property name="varchar" value="VARCHAR2" dbms="oracle"/>

These properties can then be used, for example, in the following way:

<column name="DESCRIPTION" type="${varchar}(4000)">

Here’s a helpful table that maps Oracle datatypes to their equivalent PostgreSQL data types. After some editing of the changelog, mainly through search and replace, we had a migration for the database schema that works for both systems.

Table inheritance in PostgreSQL

In a previous post I have written about tables as data types in PostgreSQL. In addition to that, PostgreSQL has a feature similar to the inheritance mechanism of data types in object-oriented programming: table inheritance.

Table inheritance allows you to create a hierarchical relationship between tables. With this feature you define a parent table, and child tables inherit columns and some constraints (like CHECK constraints and NOT NULL constraints) from it.

How it works

To begin, we create the parent table products using the following SQL code:

CREATE TABLE products (
  id     SERIAL         PRIMARY KEY,
  name   TEXT           NOT NULL,
  price  DECIMAL(10, 2) NOT NULL
);

Next, let’s create child tables that inherit from the products table. We will create two product categories: electronics and clothing. Each child table will have its own specific columns in addition to the inherited columns. To set up the inheritance relation we use the INHERITS keyword:

CREATE TABLE electronics (
  warranty_period   INT,
  power_consumption INT
) INHERITS (products);

CREATE TABLE clothing (
  size  TEXT,
  color TEXT
) INHERITS (products);

Now we insert some rows into the child tables:

INSERT INTO electronics
  (name, price, warranty_period, power_consumption)
  VALUES ('Laptop', 1439.0, 2, 536);

INSERT INTO clothing
  (name, price, size, color)
  VALUES ('T-Shirt', 5.99, 'L', 'red');

When we query each child table individually, we only get the rows we inserted into it, as expected. However, when we query the parent table, we get all the products from the child the tables:

SELECT * FROM products;

id	name	 price
--------------------
1	Laptop	 1439.00
2	T-Shirt	 5.99

Note that you can also insert rows into the parent table:

INSERT INTO products
  (name, price)
  VALUES ('Milk', 1.15);

This row will not show up in the child tables, only when querying the parent table. If you are asking yourself now if there is an equivalent to an “abstract class”, there isn’t one. Just don’t insert into the parent table if it doesn’t make sense in your domain.

If you want to query rows only from the parent table you can use the ONLY keyword:

SELECT name, price FROM ONLY products;

This keyword works also with other commands like UPDATE and DELETE.

On a final note, you can also remove the inheritance relationship from a child table with the NO INHERIT clause:

ALTER TABLE clothing NO INHERIT products;

Oracle DB’s Gradual Password Rollover Feature

It is good security practice to change passwords regularly. When changing a database password, however, the problem arises that applications that access this database have to be reconfigured if the password changes. If multiple applications or services use the same database user, then they all need to be reconfigured at once, typically during a scheduled downtime.

Oracle 21c introduced a new feature called Gradual Password Rollover that can help make such a password change less disruptive. The feature was also backported to Oracle 19c. If this feature is switched on for a user profile, a transition time is granted when the password is changed, during which both the old and the new password are valid. The applications can then change their configuration to the new password within this period according to their own schedule.

How to enable it

You must first be logged in as a privileged user who is allowed to manage users. The grace period for which both passwords should be valid after a password change is set via a user profile. A user profile is a set of limits on the database resources and the user password. The profile setting for this feature is called PASSWORD_ROLLOVER_TIME. You either create a new profile and specify this setting as a limit, or you adjust an existing profile. Here are both variants:

-- Create a new profile ...
CREATE PROFILE example_profile LIMIT PASSWORD_ROLLOVER_TIME 1;

-- ... or alter an existing profile
ALTER PROFILE example_profile LIMIT PASSWORD_ROLLOVER_TIME 1;

The unit of this setting is days. The minimum value is one hour (1/24) and the maximum value is 60 (days). You can assign this profile to a user with the following statement:

ALTER USER example_user PROFILE example_profile;

Now change the user’s password:

ALTER USER example_user IDENTIFIED BY thenewpassword;

Now you should be able to log in as this user with both the old and the new password. You can query the current status from the dba_users table:

SELECT username, account_status, profile
  FROM  dba_users
  WHERE username='example_user';

The value of the account_status column should have changed from OPEN to OPEN & IN ROLLOVER. This indicates that the user account is in the password rollover phase, and two passwords are active at the same time. You can end this period early with the following command:

ALTER USER example_user EXPIRE PASSWORD ROLLOVER PERIOD;

A final note: If you change the password again during the rollover period only the original password (the one before the rollover period was started) and the latest password are valid, which means a user account can’t have more than two valid passwords at the same time.

Materialized views in Oracle

Most relational database management systems (RDBMs) support not only views, but also materialized views. Materialized views and normal views are both database objects used to present data to users, but they work in different ways.

A database view is a virtual table or a named query that presents data from one or more tables in a specific way. They are not physical tables and do not store data directly, but instead retrieve data from the underlying tables based on a specified query.

Materialized views are similar to normal views, but they store pre-computed results. When a materialized view is created, the results of the underlying query are computed and stored in the database. One advantage of materialized views is faster query performance by avoiding the need to compute the same results repeatedly. This is especially useful for complex and time-consuming queries, as the results can be stored and accessed quickly.

The syntax for creating a normal view in an Oracle database is as follows:

CREATE VIEW view_name AS SELECT … FROM … WHERE …;

To create a materialized view instead of a normal view you add the MATERIALIZED keyword:

CREATE MATERIALIZED VIEW view_name AS SELECT … FROM … WHERE …;

When creating a materialized view you should think about and decide on three aspects of materialized views:

  1. the refresh method,
  2. the refresh interval,
  3. and the storage properties

The refresh method

The refresh method determines how the data in the materialized view is updated or refreshed to reflect changes in the base tables.

  • COMPLETE: This one completely rebuilds the materialized view from scratch. It drops the existing contents of the materialized view and then re-executes the query to populate it with fresh data. This method can be resource-intensive and slow, especially for large materialized views.
  • FAST: Updates only the rows in the materialized view that have changed since the last refresh. It uses the materialized view logs on the base tables to identify the changed rows and then applies the changes to the materialized view. It can be much faster than a complete refresh, especially if there are only a few changes to the data.
  • FORCE: Tries to perform a fast refresh if possible, but falls back to a complete refresh if necessary. This method is useful if you want to try to perform a fast refresh, but you’re not sure if it will be possible due to the nature of the data or the query.

You can specify the refresh method when creating the materialized view using the REFRESH keyword:

CREATE MATERIALIZED VIEW view_name
  REFRESH FAST
  AS SELECT ...;

If you do not specify a refresh mode FORCE is the default.

The refresh interval

The refresh interval controls how often the materialized view is automatically refreshed. It determines how frequently the materialized view is updated to reflect changes in the underlying data.

Some refresh interval options in Oracle are:

  • ON COMMIT: The materialized view is refreshed automatically every time a transaction that modifies the underlying data is committed. This interval is useful when you need to keep the materialized view up-to-date in near real-time.
  • ON DEMAND: The materialized view is refreshed only when you explicitly request a refresh using the DBMS_MVIEW.REFRESH.
  • START WITH … NEXT: With this interval, the materialized view is refreshed automatically at regular intervals. It is useful when you want to balance the need for up-to-date data with the resources required to refresh the materialized view.

You can specify the refresh interval when creating the materialized view by adding it to the REFRESH clause when creating the view:

CREATE MATERIALIZED VIEW view_name
  REFRESH FAST ON COMMIT
  AS SELECT ...;

The following materialized view gets refreshed every hour:

CREATE MATERIALIZED VIEW view_name
  REFRESH FAST START WITH SYSDATE NEXT SYSDATE + 1/24
  AS SELECT ...;

Storage properties

Storage properties affect how the data in the materialized view is stored and accessed. In Oracle, some of these are:

  • CACHE: The data is stored in the database buffer cache, which is a portion of memory used to cache frequently accessed data. It improves query performance by reducing disk I/O, but it can consume a significant amount of memory.
  • LOGGING: Changes to the materialized view data are logged in the database redo logs. This property ensures that changes to the materialized view can be recovered in case of a system failure but can result in additional overhead.
  • TABLESPACE: Allows you to specify the tablespace where the materialized view data is stored.

Again, you can specify these properties when creating the materialized view:

CREATE MATERIALIZED VIEW view_name
  CACHE
  LOGGING
  TABLESPACE tablespace_name
AS SELECT ... FROM ... WHERE ...;

Now you know the basics for creating materialized views in an Oracle database when needed. There is still more to learn about them. You can find the full reference here.

Partitioning in Oracle Database: Because Who Wants to Search an Endless Table?

As data volumes continue to grow, managing large database tables and indexes can become a challenge. This is where partitioning comes in. Partitioning is a feature of database systems that allows you to divide large tables and indexes into smaller, more manageable parts, known as partitions. This can improve the performance and manageability of your database. Aside from performance considerations, maintenance operations, such as backups and index rebuilds, can become easier by allowing them to be performed on smaller subsets of data.

This is achieved by reducing the amount of data that needs to be scanned during query execution. When a query is executed, the database can use the partitioning information to skip over partitions that do not contain the relevant data, instead of having to scan the entire table. This reduces the amount of I/O required to execute the query, which can result in significant performance gains, especially for large tables.

There are several types of partitioning available in Oracle Database, including range partitioning, hash partitioning, list partitioning, and composite partitioning. Each type of partitioning is suited to different use cases and can be used to optimize the performance of your database in different ways. In this blog post we will look range partitioning.

Range partitioning

Here is an example of range-based partitioning in Oracle:

CREATE TABLE books (
  id NUMBER,
  title VARCHAR2(200),
  publication_year NUMBER
)

PARTITION BY RANGE (publication_year) (
  PARTITION p_before_2000 VALUES LESS THAN (2000),
  PARTITION p_2000s VALUES LESS THAN (2010),
  PARTITION p_2010s VALUES LESS THAN (2020),
  PARTITION p_after_2020 VALUES LESS THAN (MAXVALUE)
);

In this example, we have created a table called books that stores book titles, partitioned by the year of publication. We have defined four partitions, p_before_2000, p_2000s, p_2010s, and p_after_2020.

Now, when we insert data into the books table, it will automatically be placed in the appropriate partition based on the year of publication:

INSERT INTO books (id, title, publication_year)
  VALUES (1, 'Nineteen Eighty-Four', 1949);

This book will be inserted into partition p_before_2000, as the year of publication is before 2000. The following book will be placed into partition p_2000s:

INSERT INTO books (id, title, publication_year)
  VALUES (2, 'The Hunger Games', 2008);

When we query the books table, the database will only access the partitions that contain the data we need. For example, if we want to retrieve data for books published in 2015 and 2016, the database will only access partition p_2010s.

SELECT * FROM books WHERE publication_year>=2015 AND publication_year<=2016:

However, you should be aware that while partitioning can improve query performance for some types of queries, it can also negatively impact query performance for others, especially if the partitioning scheme does not align well with the query patterns. Therefore, you should tailor the partitioning to your needs and check if it brings the desired effect.