SQL Performance in Practice with PostgreSQL: from slow to fast

Learn how to compare a slow query and an optimized query in PostgreSQL using EXPLAIN ANALYZE." excerpt: "A practical guide to measuring and improving SQL performance with indexes, sargable filters, and real execution plans.

April 25, 20266 min readTawan Silva
SQL Performance in Practice with PostgreSQL: from slow to fast

Share this post

Send it to someone or save the link for later.

SQL Performance in Practice with PostgreSQL: From Slow to Fast

If you work with APIs, dashboards, or reports, a slow query can become a bottleneck really fast, and sometimes you do not even notice it at first.

In this post, we are going to take a real-world-ish case and improve performance in an objective way, measuring the result with EXPLAIN ANALYZE.

Practical problem

We have a high-volume orders table and we need to fetch revenue from the last 30 days for paid orders.

The first version works, but it scales and performs terribly :(

Complete example

The complete example for this post is available on my GitHub. Give it a star while you are there :)

Code walkthrough

1) Start the environment

Go to the example folder:

cd contents/003-sql-performance-postgres

Create the .env file from the example:

cp .env.example .env

On PowerShell, use:

Copy-Item .env.example .env -Force

Then start PostgreSQL:

docker compose up -d

The docker-compose.yml file mounts two important volumes:

  • sql/bootstrap at /docker-entrypoint-initdb.d, to create and seed the database on first startup

  • sql at /sql, to run the benchmark scripts inside the container

2) Understand the data a little bit

The example starts with the bootstrap scripts:

  • sql/bootstrap/01-schema.sql

  • sql/bootstrap/02-seed.sql

In short, we create customers and orders, then generate enough data to make the query plan interesting.

This matters because SQL performance without volume tends to hide bottlenecks.

The main schema is:

CREATE TABLE orders (

id BIGSERIAL PRIMARY KEY,

customer_id BIGINT NOT NULL REFERENCES customers(id),

status VARCHAR(20) NOT NULL,

total_amount NUMERIC(10, 2) NOT NULL CHECK (total_amount >= 0),

created_at TIMESTAMPTZ NOT NULL DEFAULT NOW()

);

The script loads 200000 orders distributed across 365 days, also known as one year:

INSERT INTO orders (customer_id, status, total_amount, created_at)

SELECT

((random() * 4999)::INT + 1) AS customer_id,

(ARRAY['PENDING', 'PAID', 'SHIPPED', 'CANCELLED'])[1 + (random() * 3)::INT] AS status,

round((random() * 500 + 20)::NUMERIC, 2) AS total_amount,

NOW() - (random() * INTERVAL '365 days') AS created_at

FROM generate_series(1, 200000);

At the end of the seed, the script runs:

ANALYZE customers;

ANALYZE orders;

This updates the statistics the optimizer uses to choose the execution plan.

3) Run the non-optimized query with EXPLAIN ANALYZE

Run the first benchmark:

docker compose exec postgres psql -U app_user -d performance_lab -f /sql/benchmarks/01-non-optimized.sql

The sql/benchmarks/01-non-optimized.sql file runs the query with EXPLAIN ANALYZE:

EXPLAIN ANALYZE

SELECT

date_trunc('day', created_at) AS order_day,

COUNT(*) AS total_orders,

SUM(total_amount) AS total_revenue

FROM orders

WHERE status = 'PAID'

AND created_at::date BETWEEN CURRENT_DATE - INTERVAL '30 days' AND CURRENT_DATE

GROUP BY order_day

ORDER BY order_day;

The problematic part is the date filter, as usual:

WHERE status = 'PAID'

AND created_at::date BETWEEN CURRENT_DATE - INTERVAL '30 days' AND CURRENT_DATE

The cast on created_at::date transforms the column before the comparison.

In practice, this can hurt efficient index usage because the database has to compute the expression before comparing rows.

Even if an index exists on created_at, this filter does not match it well because the condition is applied to an expression derived from the column.

In the optimized version, the direct range comparison allows the plan to use the composite index with Bitmap Index Scan and then fetch the rows with Bitmap Heap Scan.

When reading the EXPLAIN ANALYZE output from the non-optimized query, look mainly for:

  • Seq Scan, when the database reads the entire table or a large part of it

  • Rows Removed by Filter, which shows rows read and discarded by the filter

  • Execution Time, which is the real time spent executing the query

  • Planning Time, which is the time spent building the plan

A bad plan usually reveals that the query had to inspect many rows to return only a few days of revenue.

In the optimized query, Rows Removed by Filter may stop being the main signal; in that case, look at Bitmap Index Scan, Bitmap Heap Scan, Recheck Cond, and Heap Blocks.

4) Run the optimized query with EXPLAIN ANALYZE

Run the second benchmark:

docker compose exec postgres psql -U app_user -d performance_lab -f /sql/benchmarks/02-optimized.sql

In the optimized version, the index is created to match the status plus time-window access pattern:

CREATE INDEX IF NOT EXISTS idx_orders_status_created_at ON orders (status, created_at);

Then the script updates the statistics:

ANALYZE orders;

And the filter starts comparing created_at directly:

WHERE status = 'PAID'

AND created_at >= date_trunc('day', NOW()) - INTERVAL '30 days'

AND created_at < date_trunc('day', NOW()) + INTERVAL '1 day'

This gives the optimizer a much better chance to use the composite index and reduce read cost.

The complete file looks like this:

CREATE INDEX IF NOT EXISTS idx_orders_status_created_at ON orders (status, created_at);

ANALYZE orders;

EXPLAIN ANALYZE

SELECT

date_trunc('day', created_at) AS order_day,

COUNT(*) AS total_orders,

SUM(total_amount) AS total_revenue

FROM orders

WHERE status = 'PAID'

AND created_at >= date_trunc('day', NOW()) - INTERVAL '30 days'

AND created_at < date_trunc('day', NOW()) + INTERVAL '1 day'

GROUP BY order_day

ORDER BY order_day;

In the EXPLAIN ANALYZE output, compare it with the first version and look for:

  • Use of Bitmap Index Scan on idx_orders_status_created_at

  • Bitmap Heap Scan on orders, using the bitmap produced by the index

  • Index Cond or Recheck Cond containing status = 'PAID' and the created_at range

  • Reduction in Execution Time

  • Lower cost to locate PAID orders inside the date window

Cloud providers usually charge for processing, so well-optimized queries hurt your wallet a little less :)

A common optimized plan for this example looks like this:

Bitmap Heap Scan on orders

Recheck Cond: status = 'PAID' AND created_at >= ... AND created_at < ...

-> Bitmap Index Scan on idx_orders_status_created_at

Index Cond: status = 'PAID' AND created_at >= ... AND created_at < ...

This is not worse than a simple Index Scan.

When many rows match the filter, PostgreSQL may prefer to build a bitmap with the positions found in the index and then fetch the table blocks more efficiently.

5) Compare the results

For a clean comparison, run the non-optimized version first and then the optimized version on the same freshly-created database:

docker compose exec postgres psql -U app_user -d performance_lab -f /sql/benchmarks/01-non-optimized.sql

docker compose exec postgres psql -U app_user -d performance_lab -f /sql/benchmarks/02-optimized.sql

If you have already run the optimized version before, the idx_orders_status_created_at index already exists.

In that case, recreate the environment or drop the index before measuring the non-optimized query again.

The exact value can change because the dataset uses random(), but the reading should follow this idea:

What to observe

Non-optimized, filter with created_at::date, higher chance of broad reads and discarded rows.

Optimized, (status, created_at) index, less work to find the interval.

The goal is not to memorize one timing number.

The goal is to prove, through the real execution plan, that the optimized query does less work.

Common mistakes

  • Comparing queries with different data volumes.

  • Creating an index and forgetting ANALYZE.

  • Using functions/casts on the filtered column and losing selectivity.

  • Optimizing without measuring with EXPLAIN ANALYZE.

Conclusion and next steps

SQL performance is not only about creating indexes.

It is about combining:

  • Correct minimal modeling

  • Well-written filtering

  • Reading a plan that matches reality

And be critical. Keep analyzing things constantly. In some cases, time is money, especially in the cloud.

Next steps:

  1. Test date partitioning

  2. Evaluate a partial index for status = 'PAID'

  3. Add slow-query monitoring in the real environment

Support the content

Enjoyed the post? Buy me a coffee.

If this helped you, you can support the content with a small PayPal contribution.

Comments

Comments are moderated before they become visible.

Sign in to comment on this post.

No comments yet. Be the first one!