Inside the Query Optimizer: How Databases choose and Evaluate Query Plans

You write a query.

The database runs it.

You get your result.

Simple, right?

Not quite.

Before your query ever runs, the database is already making decisions behind the scenes. It evaluates different ways to execute your query, compares them, and chooses what it believes is the most efficient path. After execution, it also measures how well that decision performed.

This is where the query optimizer and performance evaluation come in.

Understanding both is what takes you from simply writing queries to truly understanding how databases behave.

Series: Database Query Fundamentals

Part 1: What Happens When You Run a Query? → Query part 1
Part 2: Why some queries are fast and others are slow → Query Part 2
Part 3: How to Read Query Execution Plans and Fix Slow Queries → Query Part 3
Part 4: How Databases Choose and Evaluate Query Plans: Inside the Optimizer (this article)

The Role of the Query Optimizer

When you write a SQL query, you describe what you want, not how to get it.

For example:

SELECT * FROM orders WHERE customer_id = 100;

There are multiple ways the database could execute this query. It could scan the entire table and filter rows one by one, or it could use an index to quickly locate only the matching records.

The query optimizer is responsible for deciding which approach to use.

It does this by generating several possible execution plans, estimating the cost of each one, and selecting the plan with the lowest estimated cost.

A Simple Example

To make this clearer, imagine a table with one million rows.

If you run the query above, the database has two obvious options. It can scan all one million rows to find matches, or it can use an index to jump directly to the rows where the customer ID is 100.

Scanning the entire table means checking every single row. Using an index means going straight to the relevant data.

Even before running the query, the optimizer can estimate that using the index will require far less work. Based on that estimate, it chooses the index-based plan.

What Cost Really Means

Cost is one of the most misunderstood concepts in databases.

You might see something like this in an execution plan:

(cost=0.00..1025.00)

At first glance, it is tempting to assume this represents time. It does not.

Cost has no real-world unit. It is not measured in milliseconds or seconds.

Instead, it is an internal score the database uses to estimate how much work a query will require.

A helpful way to think about it is like a fuel estimate for a journey. It does not tell you exactly how long the trip will take, but it helps you compare different routes and choose the one that requires less effort.

In the same way, the optimizer compares costs and selects the plan with the lowest estimated effort.

That said, a lower cost plan is only an estimate of being faster. It is not a guarantee. This is why actual execution time still matters.

What Affects Query Cost

The optimizer does not make decisions randomly. Its estimates are based on several important factors.

Disk I O cost: Reading data from disk is expensive, especially when large tables are involved. Queries that require scanning a lot of data tend to have higher costs.
CPU cost: Every row that is processed requires computation. Filtering rows, evaluating conditions, and performing joins all consume CPU resources.
Memory usage :Operations such as sorting and joining often rely on memory to be efficient. When there is not enough memory available, the database may fall back to disk operations, which slows things down.
Size of intermediate results: During execution, queries often generate temporary datasets. If these datasets are large, performance can degrade significantly. For example, joining two large tables before filtering can create unnecessary work.
Network cost: Moving data between servers introduces latency and increases the overall cost of execution.
Query complexity: Queries with multiple joins, nested subqueries, or aggregations naturally require more processing.
Query plan structure : The order in which operations are performed can have a big impact. Filtering early reduces the amount of data that needs to be processed later, while joining smaller datasets is usually more efficient.
Indexes: They allow the database to locate data quickly. Without them, the system may be forced to scan entire tables.
Data statistics: These statistics describe things like the number of rows and how data is distributed. If they are outdated or inaccurate, the optimizer may choose a poor execution plan.

When the Optimizer Gets It Wrong

Although the optimizer is powerful, it is not perfect.

There are situations where it makes the wrong decision. This often happens when statistics are outdated, when data distribution is uneven, or when the optimizer’s assumptions do not match reality.

As a result, you may see an execution plan that looks efficient but performs poorly in practice.

This is why it is important to go beyond estimates and look at actual performance.

How Query Performance Is Evaluated

Choosing a plan is only half the story. The next step is understanding how well that plan performs in practice.

Here are the main approaches:

Execution plan analysis: Study how data is accessed, how many rows are processed, and where time is spent.
Benchmarking: Run the same query multiple times under similar conditions to compare results.
Profiling: Track how system resources like CPU, memory, and disk I/O are used during execution.
Tuning experiments: Make small changes (like adding an index or rewriting a query) and measure performance before and after.

Each method helps you see whether the optimizer’s choice was truly efficient or just looked good on paper.

Key Performance Metrics

To properly evaluate a query, you need to look at a combination of metrics.

Execution time: tells you how long the query takes to run.
Disk I O operations: show how much data is read from disk, which is often a major factor in performance.
CPU usage: reflects the amount of processing required.
Memory consumption: indicates how much memory is used during execution.
Throughput: measures how many queries the system can handle over time.
Plan stability: shows whether the execution plan remains consistent or changes frequently.

Finally, comparing cost estimates with actual performance helps you identify when the optimizer’s predictions are inaccurate.

A Common Mistake

One common mistake is focusing only on execution time.

A query might be fast today but become slow as the dataset grows.

Looking at how many rows are processed, how data is accessed, and why a particular plan was chosen gives you a deeper understanding of performance.

Putting It All Together

The optimizer chooses a plan based on estimated cost. The execution plan shows how the query is actually run. Performance metrics reveal how well that plan performs in reality.

When you understand all three, you gain full visibility into query performance.

Here is a simple way to think about it:

Your query defines what you want.
The optimizer decides how to get it.
The execution plan shows the path taken.
The metrics tell you how good that path was.

Takeaway

Database performance is not just about writing correct queries. It is about understanding how the database thinks, how it makes decisions, and how it evaluates those decisions.

Once you understand this, you stop guessing. You start making informed improvements. And that is what elevates you from SQL user to database engineer.