The Day I Learned SQL Doesn’t Care About My Feelings

Vishnu Kainadan

Introduction

Last Thursday, right after standup, one of our senior engineers walked over to my desk.

“Got an hour?” he asked.
“Let’s look at how queries actually run.”

No alarms. No urgency. Just curiosity.

What followed wasn’t a lecture or a code review. It was a quiet, focused session—one of those conversations that subtly rewires how you think about a core tool you use every day.

We didn’t talk about syntax.
We didn’t talk about ORMs.

We talked about how the database thinks.

This is what I learned.

The Ghost in the Machine

What Is the Query Planner, Really?

When you write SQL, you’re not writing instructions—you’re writing a wish list.

\r SELECT *\r FROM orders\r WHERE customer_id = 42;
\r

This doesn’t tell PostgreSQL how to find those rows.
It just says what you want.

The query planner is the invisible architect that figures out the how.

It evaluates multiple strategies:

Should I scan the whole table?
Should I use an index?
In what order should I join tables?

Then it picks the cheapest option based on:

Statistics
Row estimates
Available indexes

The Catch

The planner doesn’t always get it right.

It makes educated guesses using:

Outdated statistics
Incomplete information
Assumptions about data distribution

Two queries that look similar can behave very differently.

\r — Query A\r \r SELECT *FROM users WHERE email = 'alice@example.com';
\r \r — Query B\r \r SELECT * FROM users WHERE status = 'active';
\r

If email has an index and status doesn’t:

Query A → instant
Query B → scans millions of rows

The planner knows this.
You need to know it too.

Key Insight
The planner is cost-based, not rule-based.
Having an index does not guarantee it’ll be used.

If status = ‘active’ matches 90% of rows, PostgreSQL may ignore your beautiful index entirely.

CTEs: The Helper You Trust a Bit Too Much

What’s a CTE?

A Common Table Expression (CTE) is a named subquery you can reference later.

\r
WITH recent_orders AS (\r
\r
  SELECT user_id, order_id, total_amount\r
  FROM orders\r
  WHERE created_at > NOW()  - INTERVAL '7 days'\r
)\r
\r
SELECT user_id,\r
  COUNT(*) AS order_count,\r
  SUM(total_amount) AS total_spent\r
FROM recent_orders\r
GROUP BY user_id;\r

Developers love CTEs because they:

Improve readability
Replace deeply nested subqueries
Make complex logic feel clean and logical

But here’s what many junior devs miss:

CTEs can lie to the optimizer.

Optimization Fences (The Old Days)

In PostgreSQL versions before 12, CTEs were always materialized.

That means:

PostgreSQL executes the CTE
Stores the result
Uses that stored result—even if it blocks better optimizations

Modern PostgreSQL is smarter, but materialization still matters.

When the Database Takes Your CTE Literally

What Is Materialization?

Materialization means:

The CTE runs once
Results are written to memory (or disk)
The rest of the query treats it like a physical table

When It’s Good

\r
WITH expensive_calculation AS (\r
\r
  SELECT product_id, AVG(rating) AS avg_rating\r
  FROM reviews\r
  WHERE review_date > '2024-01-01'\r
  GROUP BY product_id\r
)\r
\r
SELECT p.name, ec.avg_rating\r
FROM products p\r
JOIN expensive_calculation ec\r
  ON p.id = ec.product_id\r
\r
WHERE ec.avg_rating > 4.5;\r

If expensive_calculation is reused multiple times:

Compute once
Reuse everywhere
👍 Worth it

When It’s Terrible

\r
WITH all_users AS (\r
  SELECT * FROM users -- 10 million rows\r
)\r
\r
SELECT * FROM all_users\r
WHERE account_type = 'premium';\r

Here:

PostgreSQL materializes 10 million rows
Then applies the filter

Without the CTE, PostgreSQL could:

Push the filter down
Use an index

Avoid loading everything

PostgreSQL 12+ Controls

\r — Force materialization\r WITH active_users AS MATERIALIZED (...)\r \r — Let the planner inline it\r WITH active_users AS NOT MATERIALIZED (...)\r

Rule of Thumb

Used once + downstream filters → NOT MATERIALIZED
Expensive + reused → MATERIALIZED

Two Ways to Dance: Nested Loop vs Hash Join

Why Joins Decide Performance

Joins are where queries live or die.

PostgreSQL chooses join strategies based on:

Data size
Indexes
Available memory

Nested Loop Join

Conceptually:

for each row in outer_table:

find matching rows in inner_table

Example:

\r
SELECT  o.order_id, c.name\r
\r
FROM orders o\r
JOIN customers c \r
  ON o.customer_id = c.id\r
\r
WHERE o.created_at = '2025-01-10';\r

If:

The date filter returns ~50 rows
id is indexed

→ Perfect use case.

When it’s good

Small outer table
Indexed inner table
High selectivity

When it’s a disaster

Large outer table
No index
Two big tables joined blindly

Hash Join

\r
SELECT u.name, COUNT(p.post_id) AS post_count\r
FROM users u\r
JOIN posts p \r
  ON u.id = p.author_id\r
GROUP BY u.name;\r

How it works:

Build a hash table in memory
Probe it for matches

Time complexity: O(n + m)

When it’s good

Large tables
Equality joins
Enough work_mem

The catch
If the hash doesn’t fit in memory → disk spill → slow query.

Breaking the Fourth Wall: LATERAL Joins

What Is LATERAL?

A LATERAL join allows a subquery to reference columns from earlier tables.

Perfect for Top-N per group problems.

\r
SELECT c.name,  recent.order_id,  recent.created_at\r
FROM customers c\r
CROSS JOIN LATERAL (\r
  SELECT order_id, created_at\r
  FROM orders\r
  WHERE orders.customer_id = c.id\r
  ORDER BY created_at DESC\r
  LIMIT 3\r
) \r
recent;\r

Why it’s powerful

Clean syntax
Avoids window-function complexity
Planner can optimize it well

Why it’s risky

Subquery runs once per outer row
100k customers → 100k subqueries

Use it when

Outer table is small or heavily filtered
Proper indexes exist

Why GROUP BY Is Quietly Expensive

The Hidden Cost of Aggregation

GROUP BY does more than it looks:

Groups rows
Applies aggregates
Returns one row per group

Two strategies:

Hash Aggregation

Fast
Memory-heavy

Group Aggregation

Sorts first
Slower without pre-sorted data

\r SELECT department, COUNT(*) AS employee_count\r FROM employees\r GROUP BY department;
\r

If there are 50 departments → tiny hash table → efficient.

But grouping by a unique column?
You’re rebuilding the entire table in memory.

Optimization tip
Only group by what you need.
Group by IDs, join names later.

Indexes that match GROUP BY columns can eliminate sorting entirely.

The Autopsy Report

Why EXPLAIN ANALYZE Is Your Best Friend

You can’t optimize what you don’t measure.

\r EXPLAIN (ANALYZE, BUFFERS)\r SELECT department, COUNT(*) FROM employees\r WHERE hire_date > '2024-01-01'\r GROUP BY department;
\r

Key things to watch:

Seq Scan on large tables → missing index
Estimated vs actual rows → stale statistics
Nested Loop with many loops → danger
High buffer reads → disk I/O

The most important line:

actual time=45.231..45.298 rows=48 loops=1

If loops = 10,000, even “fast” steps become slow.

Before vs After

CREATE INDEX idx_hire_date ON employees (hire_date);

Rerun:

Index Scan using idx_hire_date

(actual time=0.045..12.123)

45ms → 12ms

Measurement wins.

Closing Thoughts: Thinking Like the Database

What stayed with me wasn’t a trick or command—it was a shift in perspective.

SQL isn’t a black box.
Performance issues aren’t random.

Every slow query is the result of a logical decision made by the planner using the information it had.

Senior developers don’t fight the database.
They listen to it.

They:

Open EXPLAIN ANALYZE
Read the plan top to bottom
Ask simple questions:
- Why this scan?
- Why this join?
- Why this estimate?

Then they change one thing and measure again.

That’s the real skill.

Not memorizing syntax—but learning how to have a conversation with the planner.

And once you do, SQL stops hurting your feelings.