PostgreSQL’s auto-explain is a built-in feature that automatically generates and logs execution plans for SQL statements. It’s a useful tool for developers to understand how the query planner is executing SQL queries.
You enable and configure auto-explain by setting parameters in the PostgreSQL configuration file (postgresql.conf). Set auto_explain.log_analyze to on to log execution plans along with statistics, and set auto_explain.log_min_duration to specify the minimum execution time in milliseconds that a query must take to be logged. For example, if you want to log queries taking longer than 100 milliseconds, set it to 100. Set auto_explain.log_buffers to on if you want to include information about memory usage, and auto_explain.log_timing to log timing information.
Here’s an example of how to configure these parameters in postgresql.conf:
auto_explain.log_analyze = on
auto_explain.log_buffers = on
auto_explain.log_timing = on
auto_explain.log_min_duration = 100
Reading the execution plan
Suppose you have a “recipe database” that stores recipes, ingredients, and chefs. You want to retrieve a list of recipes along with the names of the chefs who created them and the ingredients they contain. Here’s a query that accomplishes this:
SELECT recipes.recipe_name, chefs.chef_name, ingredients.ingredient_name
FROM recipes
JOIN chefs ON recipes.chef_id=chefs.chef_id
JOIN recipe_ingredients ON recipes.recipe_id=recipe_ingredients.recipe_id
JOIN ingredients ON recipe_ingredients.ingredient_id=ingredients.ingredient_id
WHERE recipes.cuisine='Italian';
This query fetches Italian recipes, their respective chefs, and the ingredients they use.
When you run this query with auto-explain enabled, PostgreSQL will log the execution plan. The query plan might look something like this:
Hash Join (cost=100.25..350.75 rows=50 width=96)
Hash Cond: (recipe_ingredients.recipe_id = recipes.recipe_id)
-> Hash Join (cost=50.12..200.37 rows=50 width=60)
Hash Cond: (recipes.chef_id = chefs.chef_id)
-> Seq Scan on recipes (cost=0.00..100.00 rows=50 width=24)
Filter: (cuisine = 'Italian'::text)
-> Hash (cost=30.00..30.00 rows=1000 width=36)
-> Seq Scan on chefs (cost=0.00..30.00 rows=1000 width=36)
-> Hash (cost=30.00..30.00 rows=1000 width=36)
-> Seq Scan on recipe_ingredients (cost=0.00..30.00 rows=1000 width=36)
Filter: (recipe_id IS NOT NULL)
In this query plan Hash Join indicates a join operation using a hash-based algorithm. Seq Scan signifies a sequential scan of the table, which might imply a full table scan. Hash Cond shows the join condition for the respective hash join.
cost represents the estimated execution cost for each operation, and rows indicates the estimated number of rows returned by each operation.
The estimated cost in PostgreSQL query execution plans is typically represented in an abstract unit known as “cost units.” These cost units are used for relative cost estimation and are not expressed in any specific real-world measurement like time or money. They are designed to provide a relative measure of the cost of different query plan operations so that the query planner can make informed decisions about which plan to choose.
Reading this plan, PostgreSQL starts by filtering Italian recipes (a Seq Scan with a filter). It then joins the recipes with chefs using a hash join, and the result is further joined with ingredients using another hash join. The cost values provide relative estimates of resource usage, allowing you to identify potentially expensive parts of the query, and you can consider improving the performance of the SQL statement with optimisations like indexing.