postgresql – Page 2 – Schneide Blog

Working with JSON-DOM mapping in EntityFramework and PostgreSQL

A while ago, one of my colleagues covered JSON usage in PostgreSQL on the database level in two interesting blog posts (“Working with JSON data in PostgreSQL” and “JSON as a table in PostgreSQL 17”).

Today, I want to show the usage of JSON in EntityFramework with PostgreSQL as the database. We have an event sourcing application similar to the one in my colleagues first blog post written in C#/AspNetCore using EntityFramework Core (EF Core). Fortunately, EF Core and the PostgreSQL database driver have relatively easy to use JSON support.

You have essentially three options when working with JSON data and EF Core:

Simple string
EF owned entities
System.Text.Json DOM types

Our event sourcing use case requires query support on the JSON data and the data has no stable and fixed schema, so the first two options are not really appealing. For more information on them, see the npgsql documentation.

Let us have a deeper look at the third option which suits our event sourcing use-case best.

Setup

The setup is ultra-simple. Just declare the relevant properties in your entities as JsonDocument and make them disposable:

public class Event : IDisposable
{
    public long Id { get; set; }

    public DateTime Date { get; set; }
    public string Type { get; set; }
    public JsonDocument Data { get; set; }
    public string Username { get; set; }
    public void Dispose() => Json?.Dispose();
}

Using dotnet ef migrate EventJsonSupport should generate changes for the database migrations and the database context. Now we are good to start querying and deserializing our JSON data.

Saving our events to the database does not require additional changes!

Writing queries using JSON properties

With this setup we can use JSON properties in our LINQ database queries like this:

var eventsForId = db.Events.Where(ev =>
  ev.Data.RootElement.GetProperty("payload").GetProperty("id").GetInt64() == id
)
.ToList();

Deserializing the JSON data

Now, that our entities contain JsonDocument (or JsonElement) properties, we can of course use the System.Text.Json API to create our own domain objects from the JSON data as we need it:

var eventData = event.Data.RootElement.GetProperty("payload");
return new HistoryEntry
{
    Timestamp = eventData.Date,
    Action = new Action
    {
        Id = eventData.GetProperty("id").GetInt64(),
        Action = eventData.GetProperty("action").GetString(),
    },
    Username = eventData.Username,
};

We could for example deserialize different domain object depending on the event type or deal with evolution of our JSON data over time to accomodate new features or refactorings on the data side.

Conclusion

Working with JSON data inside a classical application using an ORM and a relational database has become suprisingly easy and efficient. The times of fragile full-text queries using LIKE or similar stuff to find your data are over!

JSON as a table in PostgreSQL 17

Some time ago, I described in a blog post how to work with JSON data in a PostgreSQL database. Last month (September 2024), PostgreSQL 17 was released, which offers another feature for working with JSON data: the JSON_TABLE() function. Such a function already existed in other database systems, such as MySQL and Oracle.

The core idea behind JSON_TABLE() is to temporarily transform JSON data into a relational format that can be queried using standard SQL commands. In doing so, developers can apply the full power of SQL, such as filtering, sorting, aggregating, and joining, to data that originally exists in JSON format: It enables you to use SQL to query JSON data stored in a column as though it were relational data, and you can join JSON data with regular relational tables for a seamless mix of structured and semi-structured data.

Syntax

The syntax for JSON_TABLE looks a bit complex at first glance, but once you break it down, it becomes quite clear. Here’s the basic structure:

JSON_TABLE(
  json_doc, path
  COLUMNS (
    column_name type PATH 'path_to_value' [DEFAULT default_value ON ERROR],
    ...
  )
) AS alias_name

The json_doc is the JSON data you’re querying. It can be a JSON column or a JSON string. It is followed by a path expression, which describes the location in the JSON document that you want to process. Path expressions are specified as strings in single quotation marks and have a special syntax called JSONPath (loosely inspired by XPath for XML). For example, such an expression could look like this: '$.store.book[0].title'. The dollar sign represents the root of the JSON tree, the sub-properties are separated by dots, and array elements are referenced using square brackets.

This is followed by the COLUMNS keyword. It specifies the columns to be extracted from the JSON document. Each column is specified by a name of the column in the resulting table, a data type of the extracted value (e.g., VARCHAR, INT), followed by the PATH keyword and a JSON path expression that references a JSON value below the original path expression.

The DEFAULT keyword optionally provides a default value if the path does not exist or if there’s an error during extraction.

After the JSON_TABLE function call you can specify the alias name for the table with the AS keyword.

Example

It’s time for an example. Let’s assume we have a table named fruit_store with a column named json_details that holds the some JSON data about fruits:

INSERT INTO fruit_store (id, json_details) VALUES (
    1,
    '{
        "category": "Citrus",
        "fruits": [
            {
                "name": "Orange",
                "color": "Orange",
                "taste": "Sweet",
                "price_per_kg": 3.5
            },
            {
                "name": "Lemon",
                "color": "Yellow",
                "taste": "Sour",
                "price_per_kg": 4.0
            }
        ]
    }'
);

Now we can use the JSON_TABLE function to extract the details of each fruit from the JSON document. The goal is to retrieve the fruit name, color, taste, and price for each fruit in the array. Here’s the query:

SELECT *
  FROM fruit_store,
  JSON_TABLE(
    json_details, '$.fruits[*]'
    COLUMNS (
      fruit_name   VARCHAR(50)  PATH '$.name',
      fruit_color  VARCHAR(50)  PATH '$.color',
      fruit_taste  VARCHAR(50)  PATH '$.taste',
      price_per_kg DECIMAL(5,2) PATH '$.price_per_kg'
    )
  ) AS fruit_table;

Running this query will give you the following result:

id	fruit_name	fruit_color	fruit_taste	price_per_kg
1	Orange	Orange	Sweet	3.50
1	Lemon	Yellow	Sour	4.00

Defeating TimescaleDB or shooting yourself in the foot

Today, almost everything produces periodically data: A car, cloud-connected smart-home solutions, your “smart” refrigerator and yourself using a fitness tracker. This data has to be stored in a certain way to be usable. We want fast access, beautiful charts and different aggregations over time.

There a several options both free and commercial for storing such time-series data like InfluxDB, Prometheus or TimescaleDB. They are optimised for this kind of data taking advantage of different time-series properties:

Storing is (mostly) append-only
Data points have a (strict) ordering in time
Data points often have fixed and variable meta data in addition to the data value itself

In one of our projects we have to store large quantities of data every 200ms. Knowing PostgreSQL as a powerful relational database management system we opted for TimescaleDB that promises time-series features for an SQL database.

Ingestion, probably the biggest problem of traditional (SQL) databases, of the data worked as expected from the beginning. We were able to insert new data at a constant rate without degrading performance.

The problem

However we got severe performance problems when querying data after some time…

So one of the key points of using a time-series database strangely did not work for us… Fortunately, since Timescale is only an extension to PostgreSQL we could use just use explain/explain analyse with our slow queries to find out what was going on.

It directly showed us that *all* chunks of our hypertable were queried instead of only the ones containing the data we were looking for. Checking our setup and hypertable definition showed that Timescale itself was working as expected and chunking the data correctly on the time axis.

After a bit more analysis and thought is was clear: The problem was our query! We used something like start_time <= to AND from <= end_time in our where clause to denote the time interval containing the requested data. Here start_time is the partition column of our hypertable.

This way we were forcing the database to look at all chunks from long ago until our end-time timestamp.

The solution

Ok, so we have to reformulate our where clause that only the relevant chunks are queried. Timescale can easily do this when we do something like start_time >= from AND start_time < to where from and to are the start and end of our desired interval. That way usually only one or only a few chunks of the hypertable are search and everything is lighning-fast even with billions of rows in our hypertable.

Of course the results of our two queries are not 100% the same sematically. But you can easily achieve the desired results by correctly calculation the start and end of the time-series data to fetch.

Conclusion

Time-series databases are powerful tools for todays data requirements. As with other tools you need some understanding of the underlying concepts and sometimes even implementation to not accidently defeat the purpose and features of the tools.

Used in the correct way they enable you to work with large amounts of data in a performant way todays users would expect. We had similar experiences with full text search engines like solr and ElasticSearch, too.

Time Intervals in SQL

SQL not only supports date and timestamp data types but also time intervals.

An interval in SQL represents a period of time, such as days, hours, minutes, or seconds. SQL supports intervals in two main types: INTERVAL YEAR TO MONTH and INTERVAL DAY TO SECOND.

The two types

YEAR TO MONTH is used to represent a span of years and months. Examples include calculating ages, durations of contracts, or project timelines.

DAY TO SECOND is used to represent a span of days, hours, minutes, and seconds. Examples include scheduling events, logging activities, and tracking precise durations.

-- Represents 1 year and 6 months
INTERVAL '1-6' YEAR TO MONTH

-- Represents 5 days, 12 hours, 30 minutes, and 45 seconds
INTERVAL '5 12:30:45' DAY TO SECOND

This distinction may seem arbitrary at first, but it aligns with real-world use cases. An interval of “2 months” is clearly different from “60 days,” even though they might seem similar. The former is handled as a “YEAR TO MONTH” interval, and the latter as a “DAY TO SECOND” interval, making it clear how operations should be performed. Storing only the relevant units also reduces the amount of space required. For instance, storing a “YEAR TO MONTH” interval doesn’t need to allocate space for hours or seconds, thus saving storage.

Despite its name a “DAY TO SECOND” interval can also include fractional seconds:

INTERVAL '1 12:30:45.123' DAY TO SECOND

There are also shorter forms of interval literals if you only want to use one or several units of the interval:

For “YEAR TO MONTH”:

INTERVAL '3' YEAR   -- 3 years
INTERVAL '5' MONTH  -- 5 months

For “DAY TO SECOND”:

INTERVAL '2' DAY                -- 2 days
INTERVAL '5:30' HOUR TO MINUTE  -- 5 hours and 30 minutes
INTERVAL '15' MINUTE            -- 15 minutes
INTERVAL '30' SECOND            -- 30 seconds
INTERVAL '45.678' SECOND        -- 45.678 seconds

Note that the syntax always uses the singular forms of the unit words.

The two types can be used as data types for table columns:

CREATE TABLE interval_examples (
  id NUMBER,
  a INTERVAL YEAR TO MONTH,
  b INTERVAL DAY TO SECOND
);

Calculations with intervals

Intervals can be added to or subtracted from dates to calculate future or past dates.

-- Adding 10 days to the current date
CURRENT_DATE + INTERVAL '10' DAY

-- Subtracting 3 months from a specific date
DATE '2024-06-14' - INTERVAL '3' MONTH

It’s important to understand that intervals of type YEAR TO MONTH do not have a fixed duration. The number of days added to CURRENT_TIMESTAMP when you add an INTERVAL ‘1’ MONTH depends on the current month and year, as different months have different numbers of days.

Intervals can also be compared to each other with the usual operators such as =, <, <=, >, >=, <>.

Combining both types

Suppose you need to add an interval of 1 year, 2 months, 3 days, 4 hours, 5 minutes, and 6 seconds to the current timestamp. In this case you need to combine intervals of the two types:

SELECT (CURRENT_TIMESTAMP + INTERVAL '1-2' YEAR TO MONTH) + INTERVAL '3 04:05:06' DAY TO SECOND AS result;

Notes on PostgreSQL

PostgreSQL supports the above standard interval syntax, but it also supports a unified syntax for intervals with the following syntax:

INTERVAL '1 year 2 months'
INTERVAL '10 days 12 hours 30 minutes 45.123 seconds'
INTERVAL '1 year 2 months 3 days 4 hours 5 minutes 6 seconds'

In fact, in PostgreSQL, YEAR TO MONTH and DAY TO SECOND are not a distinct data type on their own; they are specializations of the broader INTERVAL data type.

It also supports multiplication and division of intervals:

-- Results in an interval of 6 days
SELECT INTERVAL '2 days' * 3 AS multiplied_interval;

-- Results in an interval of 2 hours
SELECT INTERVAL '3 hours' / 1.5 AS divided_interval;

Half table, half view: Generated Columns

Anyone familiar with SQL database basics knows the two fundamental structures of relational databases: tables and views. Data is stored in tables, while views are virtual tables calculated on-the-fly from a SQL query. Additionally, relational database management systems often support materialized views, which, like views, are based on a query from other tables, but their results are actually persisted and only recalculated as needed.

What many don’t know is that the most common SQL databases (PostgreSQL, MySQL, Oracle) nowadays also support something in between: we’re talking about Generated Columns, which will be introduced in this blog post.

So, what are Generated Columns? Generated Columns are columns within a normal database table. But unlike regular columns, their values are not stored as independent individual values; rather, they are computed from other values in the table.

Below is an example of how to define a Generated Column. The example is for PostgreSQL, but the syntax is similar in other popular relational database systems that support this feature.

CREATE TABLE products (
  id          SERIAL PRIMARY KEY,
  name        VARCHAR(100),
  quantity    INTEGER,
  unit_price  DECIMAL(10, 2),
  total_price DECIMAL(10, 2) GENERATED ALWAYS
                             AS (quantity * unit_price) STORED
);

As seen above, a Generated Column is defined with the keywords GENERATED ALWAYS AS. The GENERATED ALWAYS is even optional (you could just write AS), but it clarifies what it’s about. Following AS is the expression that computes the value.

At the end of the column definition, either the keyword STORED or VIRTUAL can be used. In the example above, it says STORED, which means the computed value is physically stored on the disk. The value is recalculated and stored only after an INSERT or UPDATE. In contrast, with VIRTUAL, the value is not stored but always computed on-the-fly. Thus, virtual Generated Columns behave similarly to a view, while STORED is more comparable to a materialized view.

The choice between the two options depends on the specific requirements. Stored Generated Columns consume more disk space, while virtual Generated Columns save space at the expense of performance.

In the expression following AS, other columns of the table can be referenced. Even other Generated Columns can be referenced, as long as they are specified in the table definition before the current column. However, SQL subqueries cannot be used in the expression.

In conclusion, Generated Columns are a useful feature that combines parts of a table with the benefits of a view.

SQL Database Window Functions

Window functions allow users to perform calculations across a set of rows that are somehow related to the current row. This can include calculations like running totals, moving averages, and ranking without the need to group the entire query into one aggregate result.

Despite their flexibility, window functions are sometimes underutilised, either because users are unaware of them or because they’re considered too complex for everyday tasks. Learning how to effectively use window functions can improve the efficiency and readability of SQL queries, particularly for reporting and data analysis purposes. This article will explore several use cases.

Numbering Rows

The simplest application area for window functions is the numbering of rows. The ROW_NUMBER() function assigns a unique number to each row within the partition of a result set. The numbering is sequential and starts at 1. It’s useful for creating a unique identifier for rows within a partition, even when the rows are identical in terms of data.

Consider the following database table of library checkouts:

book	checkout_date	member_id
The Great Adventure	2024-02-15	102
The Great Adventure	2024-01-10	105
Mystery of the Seas	2024-01-20	103
Mystery of the Seas	2024-03-01	101
Journey Through Time	2024-02-01	104
Journey Through Time	2024-02-18	102

We want to assign a unique row number to each checkout instance for every book, ordered by the checkout date to analyze the circulation trend:

SELECT
  book,
  checkout_date,
  member_id,
  ROW_NUMBER() OVER (PARTITION BY book ORDER BY checkout_date) AS checkout_order
FROM library_checkouts;

The result:

book	checkout_date	member_id	checkout_order
The Great Adventure	2024-01-10	105	1
The Great Adventure	2024-02-15	102	2
Mystery of the Seas	2024-01-20	103	1
Mystery of the Seas	2024-03-01	101	2
Journey Through Time	2024-02-01	104	1
Journey Through Time	2024-02-18	102	2

Ranking

In the context of SQL and specifically regarding window functions, “ranking” refers to the process of assigning a unique position or rank to each row within a partition of a result set based on a specified ordering.

The RANK() function provides a ranking for each row within a partition, with gaps in the ranking sequence when there are ties. It’s useful for ranking items that have the same value.

Consider the following database table of scores in a game tournament:

player	game	score
Alice	Space Invaders	4200
Bob	Space Invaders	5700
Charlie	Space Invaders	5700
Dana	Donkey Kong	6000
Eve	Donkey Kong	4800
Frank	Donkey Kong	6000
Alice	Asteroids	8500
Bob	Asteroids	9300
Charlie	Asteroids	7600

We want to rank the players within each game based on their score, with gaps in rank for ties:

SELECT
 player, 
 game, 
  score,
  RANK() OVER (PARTITION BY game ORDER BY score DESC) AS rank
FROM scores;

The result looks like this:

player	game	score	rank
Bob	Space Invaders	5700	1
Charlie	Space Invaders	5700	1
Alice	Space Invaders	4200	3
Dana	Donkey Kong	6000	1
Frank	Donkey Kong	6000	1
Eve	Donkey Kong	4800	3
Bob	Asteroids	9300	1
Alice	Asteroids	8500	2
Charlie	Asteroids	7600	3

If you don’t want to have gaps in the ranking sequence when there are ties, you can substitute DENSE_RANK() for RANK().

Cumulative Sum

The SUM() function can be used as a window function to calculate the cumulative sum of a column over a partition of rows.

Example: We are tracking our garden’s vegetable harvest in a database table, and we want to calculate the cumulative yield for each type of vegetable over the harvesting season.

vegetable	harvest_date	yield_kg
Carrots	2024-06-18	10
Carrots	2024-07-10	15
Tomatos	2024-06-15	20
Tomatos	2024-07-01	30
Tomatos	2024-07-20	25
Zucchini	2024-06-20	15
Zucchini	2024-07-05	20

We calculate the running total (cumulative yield) for each vegetable type as the season progresses, using the SUM() function:

SELECT
 vegetable,
 harvest_date,
 yield_kg,
 SUM(yield_kg) OVER (PARTITION BY vegetable ORDER BY harvest_date ASC) AS cumulative_yield
FROM garden_harvest;

Now we can see which vegetables are most productive and how yield accumulates throughout the season:

vegetable	harvest_date	yield_kg	cumulative_yield
Carrots	2024-06-18	10	10
Carrots	2024-07-10	15	25
Tomatos	2024-06-15	20	20
Tomatos	2024-07-01	30	50
Tomatos	2024-07-20	25	75
Zucchini	2024-06-20	15	15
Zucchini	2024-07-05	20	35

PostgreSQL’s auto-explain feature and execution plans

PostgreSQL’s auto-explain is a built-in feature that automatically generates and logs execution plans for SQL statements. It’s a useful tool for developers to understand how the query planner is executing SQL queries.

You enable and configure auto-explain by setting parameters in the PostgreSQL configuration file (postgresql.conf). Set auto_explain.log_analyze to on to log execution plans along with statistics, and set auto_explain.log_min_duration to specify the minimum execution time in milliseconds that a query must take to be logged. For example, if you want to log queries taking longer than 100 milliseconds, set it to 100. Set auto_explain.log_buffers to on if you want to include information about memory usage, and auto_explain.log_timing to log timing information.

Here’s an example of how to configure these parameters in postgresql.conf:

auto_explain.log_analyze = on
auto_explain.log_buffers = on
auto_explain.log_timing = on
auto_explain.log_min_duration = 100

Reading the execution plan

Suppose you have a “recipe database” that stores recipes, ingredients, and chefs. You want to retrieve a list of recipes along with the names of the chefs who created them and the ingredients they contain. Here’s a query that accomplishes this:

SELECT recipes.recipe_name, chefs.chef_name, ingredients.ingredient_name
  FROM recipes
  JOIN chefs ON recipes.chef_id=chefs.chef_id
  JOIN recipe_ingredients ON recipes.recipe_id=recipe_ingredients.recipe_id
  JOIN ingredients ON recipe_ingredients.ingredient_id=ingredients.ingredient_id
WHERE recipes.cuisine='Italian';

This query fetches Italian recipes, their respective chefs, and the ingredients they use.

When you run this query with auto-explain enabled, PostgreSQL will log the execution plan. The query plan might look something like this:

Hash Join  (cost=100.25..350.75 rows=50 width=96)
  Hash Cond: (recipe_ingredients.recipe_id = recipes.recipe_id)
  ->  Hash Join  (cost=50.12..200.37 rows=50 width=60)
        Hash Cond: (recipes.chef_id = chefs.chef_id)
        ->  Seq Scan on recipes  (cost=0.00..100.00 rows=50 width=24)
              Filter: (cuisine = 'Italian'::text)
        ->  Hash  (cost=30.00..30.00 rows=1000 width=36)
              ->  Seq Scan on chefs  (cost=0.00..30.00 rows=1000 width=36)
  ->  Hash  (cost=30.00..30.00 rows=1000 width=36)
        ->  Seq Scan on recipe_ingredients  (cost=0.00..30.00 rows=1000 width=36)
              Filter: (recipe_id IS NOT NULL)

In this query plan Hash Join indicates a join operation using a hash-based algorithm. Seq Scan signifies a sequential scan of the table, which might imply a full table scan. Hash Cond shows the join condition for the respective hash join.

cost represents the estimated execution cost for each operation, and rows indicates the estimated number of rows returned by each operation.

The estimated cost in PostgreSQL query execution plans is typically represented in an abstract unit known as “cost units.” These cost units are used for relative cost estimation and are not expressed in any specific real-world measurement like time or money. They are designed to provide a relative measure of the cost of different query plan operations so that the query planner can make informed decisions about which plan to choose.

Reading this plan, PostgreSQL starts by filtering Italian recipes (a Seq Scan with a filter). It then joins the recipes with chefs using a hash join, and the result is further joined with ingredients using another hash join. The cost values provide relative estimates of resource usage, allowing you to identify potentially expensive parts of the query, and you can consider improving the performance of the SQL statement with optimisations like indexing.

Porting an application from Oracle to PostgreSQL

Currently, we are in the process of porting a larger application from an Oracle database to a PostgreSQL database as an additional database system. The aim is for the application to be functional not only with one database system but with both: with Oracle for one customer and with PostgreSQL for another customer.

The first step in this process was to conduct a high-level analysis of what effort this would entail and where the application would need to be adjusted. The application consists of multiple services. A frontend service in Ruby on Rails and a variety of backend services in Java. Rails uses an O/R mapper called Active Record, which is relatively database-agnostic. No major adaptation effort was expected here. Ultimately, only the Active Record adapter needs to be configured per instance.

The Java services do not use an O/R mapper. They make direct SQL queries to the target database system using JDBC. This is where the greatest potential for issues existed. Fortunately, all database queries here are encapsulated in repository classes, making the SQL queries easily accessible and adaptable for different database management systems. However, this has not proven necessary so far. The SQL queries are written in standard SQL, without using database-specific features. The code is limited to basic features such as tables, views, sequences, SELECTs, INSERTs, UPDATEs, and JOINS, all of which are supported by both databases. Particularly, no stored procedures are used, which wouldn’t necessarily be portable.

For implementation, the initial goal was to make the existing Oracle database schema of the application reproducible on a PostgreSQL database, in a way that a new instance can be set up fresh in a Docker container at any time.

The best way to achieve this is with a database migration tool. The options considered were: Active Record Migrations, Liquibase, and Flyway. We ultimately chose Liquibase, which we have had positive experiences with. It can be used standalone and allows for formulating migrations in a database-independent manner. Additionally, it offers the option to export an existing schema as a migrations changelog file using the generate-changelog command.

We only had to slightly adjust the generated changelog to make it work for both database systems. In a Liquibase changelog, you can define variables referred to as properties, which can take on different values for various database systems. One use case for this is having different names for column data types:

<property name="varchar" value="varchar" dbms="postgresql"/>
<property name="varchar" value="VARCHAR2" dbms="oracle"/>

These properties can then be used, for example, in the following way:

<column name="DESCRIPTION" type="${varchar}(4000)">

Here’s a helpful table that maps Oracle datatypes to their equivalent PostgreSQL data types. After some editing of the changelog, mainly through search and replace, we had a migration for the database schema that works for both systems.

Table inheritance in PostgreSQL

In a previous post I have written about tables as data types in PostgreSQL. In addition to that, PostgreSQL has a feature similar to the inheritance mechanism of data types in object-oriented programming: table inheritance.

Table inheritance allows you to create a hierarchical relationship between tables. With this feature you define a parent table, and child tables inherit columns and some constraints (like CHECK constraints and NOT NULL constraints) from it.

How it works

To begin, we create the parent table products using the following SQL code:

CREATE TABLE products (
  id     SERIAL         PRIMARY KEY,
  name   TEXT           NOT NULL,
  price  DECIMAL(10, 2) NOT NULL
);

Next, let’s create child tables that inherit from the products table. We will create two product categories: electronics and clothing. Each child table will have its own specific columns in addition to the inherited columns. To set up the inheritance relation we use the INHERITS keyword:

CREATE TABLE electronics (
  warranty_period   INT,
  power_consumption INT
) INHERITS (products);

CREATE TABLE clothing (
  size  TEXT,
  color TEXT
) INHERITS (products);

Now we insert some rows into the child tables:

INSERT INTO electronics
  (name, price, warranty_period, power_consumption)
  VALUES ('Laptop', 1439.0, 2, 536);

INSERT INTO clothing
  (name, price, size, color)
  VALUES ('T-Shirt', 5.99, 'L', 'red');

When we query each child table individually, we only get the rows we inserted into it, as expected. However, when we query the parent table, we get all the products from the child the tables:

SELECT * FROM products;

id	name	 price
--------------------
1	Laptop	 1439.00
2	T-Shirt	 5.99

Note that you can also insert rows into the parent table:

INSERT INTO products
  (name, price)
  VALUES ('Milk', 1.15);

This row will not show up in the child tables, only when querying the parent table. If you are asking yourself now if there is an equivalent to an “abstract class”, there isn’t one. Just don’t insert into the parent table if it doesn’t make sense in your domain.

If you want to query rows only from the parent table you can use the ONLY keyword:

SELECT name, price FROM ONLY products;

This keyword works also with other commands like UPDATE and DELETE.

On a final note, you can also remove the inheritance relationship from a child table with the NO INHERIT clause:

ALTER TABLE clothing NO INHERIT products;

PostgreSQL’s hstore module for semi-structured data

PostgreSQL has an extension module called hstore that allows you to store semi-structured data in a key/value format. Values of an hstore object are stored like in a dictionary. You can also reference its values in SQL queries.

To use the extension, it must first be loaded into the current database:

CREATE EXTENSION hstore;

Now you can use the data type hstore. Here, we create a table with some regular columns and one column of type hstore:

CREATE TABLE animals (
    id     serial PRIMARY KEY,
    name   text,
    props  hstore
);

Literals of type hstore are written in single quotes, containing a set of key => value pairs separated by commas:

INSERT INTO
    animals (name, props)
VALUES
    ('Octopus', 'arms => 8, habitat => sea, color => varying'),
    ('Cat',     'legs => 4, fur => soft'),
    ('Bee',     'legs => 6, wings => 4, likes => pollen');

The order of the pairs is irrelevant. Keys within a hstore are unique. If you declare the same key more than once only one instance will be kept and the others will be discarded. You can use double quotes to include spaces or special characters:

'"fun-fact" => "Cats sleep for around 13 to 16 hours a day (70% of their life)"'

If the type of the literal can’t be inferred you can append ::hstore as a type indicator:

'legs => 4, fur => soft'::hstore

Both keys and values are stored as strings, so these two are equivalent:

'legs => 4, fur => soft'
'"legs" => "4", "fur" => "soft"'

Another limitation of hstore values is that they cannot be nested, which means they are less powerful than JSON objects.

You can use the -> operator to dereference a key, for example in a SELECT:

SELECT
    name, props->'legs' AS number_of_legs
FROM
    animals;

It returns NULL if the key is not present. Of course, you can also use it in a WHERE clause:

SELECT * FROM animals WHERE props->'fur' = 'soft';

There are many other operators and functions that can be used with hstore objects. Here is a small selection (please refer to the documentation for a complete list):

The || operator concatenates (merges) two hstores: a || b
The ? operator checks the existence of a key and returns a boolean value: props ? 'fur'
The - operator deletes a key from a hstore: props - 'fur'
The akeys function returns an array of a hstore’s keys: akeys(hstore)

You can also convert a hstore object to JSON: hstore_to_json(hstore). If you want to learn more about JSON in PostgreSQL you can continue reading this blog post: Working with JSON data in PostgreSQL

	mariuselvert on Calculating the Number of Segm…
	Anonymous on Calculating the Number of Segm…
	Anonymous on Avoiding Code Style Discu…
	Anonymous on What Happens When We Don’t Lis…
	Understanding the in… on Understanding, identifying and…