How to Set Up pg_stat_statements: The Complete Guide

If you want to understand what's happening inside your PostgreSQL database, pg_stat_statements is the single most important extension you can enable. It tracks execution statistics for every SQL statement your database processes: how often each query runs, how long it takes, how much data it reads, and whether it's hitting the cache or going to disk.

Most PostgreSQL monitoring tools, including Basira, are built on top of pg_stat_statements. Understanding how it works gives you a better mental model of what your monitoring tool is actually showing you.

What pg_stat_statements Does

pg_stat_statements is a PostgreSQL extension that records cumulative execution statistics for all SQL statements. It normalizes queries by replacing literal values with parameter placeholders, so SELECT * FROM users WHERE id = 42 and SELECT * FROM users WHERE id = 99 are tracked as the same query: SELECT * FROM users WHERE id = $1.

This normalization is what makes it useful. Instead of seeing millions of individual query executions, you see patterns: 50 unique query shapes, each with aggregate statistics showing total execution time, call count, rows returned, and buffer usage.

The extension has been included in PostgreSQL's contrib modules since version 8.4, and it's available on every major managed PostgreSQL service: AWS RDS, Google Cloud SQL, Azure Database for PostgreSQL, and Supabase. You don't need to install anything extra. You just need to enable it.

Enabling pg_stat_statements

Step 1: Add to shared_preload_libraries

pg_stat_statements must be loaded at server startup. Add it to shared_preload_libraries in your postgresql.conf:

shared_preload_libraries = 'pg_stat_statements'

This requires a PostgreSQL restart. There's no way around this. On managed services like RDS, you modify the parameter group and reboot the instance.

Step 2: Create the Extension

After restarting, connect to your database and create the extension:

CREATE EXTENSION IF NOT EXISTS pg_stat_statements;

You need to do this in each database you want to monitor. The extension tracks queries per-database, so if you have a myapp database and an analytics database, you need to create the extension in both.

Step 3: Verify It's Working

Run a few queries, then check:

SELECT count(*) FROM pg_stat_statements;

If this returns rows, you're good. If it returns zero, verify the extension was loaded correctly with:

SHOW shared_preload_libraries;

Configuration Parameters

pg_stat_statements has several configuration parameters that control its behavior. These go in postgresql.conf and require a restart to change (except pg_stat_statements.track, which can be changed with a reload).

pg_stat_statements.max (default: 5000)

The maximum number of distinct query statements tracked. When this limit is reached, the least-executed statements are evicted to make room for new ones. For most production databases, the default of 5,000 is sufficient. If you have a highly dynamic workload with thousands of unique query shapes (common in ORMs that generate lots of ad-hoc queries), increase this to 10,000 or 20,000.

The memory cost is roughly 500 bytes per entry, so 10,000 entries use about 5MB of shared memory. This is negligible on any production server.

pg_stat_statements.max = 10000

pg_stat_statements.track (default: top)

Controls which statements are tracked:

top: Only top-level statements (what your application sends directly). This is what you want in most cases.
all: Includes statements inside functions and stored procedures. Enable this if you use PL/pgSQL functions heavily and want to see which queries inside those functions are slow.
none: Disables tracking entirely.

pg_stat_statements.track = top

pg_stat_statements.track_utility (default: on)

Whether to track utility commands like CREATE TABLE, VACUUM, ANALYZE, COPY. Turning this off reduces noise if you only care about DML queries (SELECT, INSERT, UPDATE, DELETE). In practice, leaving it on is usually fine because utility commands are a small fraction of total queries.

pg_stat_statements.track_planning (default: off, PG 13+)

When enabled, tracks planning time separately from execution time. This is useful for identifying queries where the planner is spending significant time evaluating join strategies or index choices. The overhead is minimal, and it's worth enabling on PostgreSQL 13+.

pg_stat_statements.track_planning = on

Understanding the Output

The pg_stat_statements view has many columns. Here are the ones that matter most for performance monitoring:

The Essential Columns

Column	What It Tells You
`queryid`	A hash identifying the normalized query shape
`query`	The normalized query text (with `$1`, `$2` parameters)
`calls`	Total number of times this query has been executed
`total_exec_time`	Total execution time across all calls (milliseconds)
`mean_exec_time`	Average execution time per call (milliseconds)
`rows`	Total number of rows returned or affected
`shared_blks_hit`	Number of shared buffer cache hits
`shared_blks_read`	Number of blocks read from disk (cache misses)

Derived Metrics

From these raw columns, you can compute several useful metrics:

Cache hit ratio per query:

shared_blks_hit::float / NULLIF(shared_blks_hit + shared_blks_read, 0)

A ratio below 0.95 for a frequently-called query suggests it's reading more from disk than it should. This could mean the working set doesn't fit in shared_buffers, or the query is scanning more data than necessary.

Time per row:

total_exec_time / NULLIF(rows, 0)

Useful for comparing similar queries. If one query returns 100 rows in 50ms and another returns 100 rows in 500ms, the second one likely has an optimization opportunity.

Essential Queries for Analyzing Performance

Top queries by total time

The most impactful optimization target. Queries with the highest total execution time are consuming the most database resources, regardless of whether individual calls are fast or slow.

SELECT
  queryid,
  left(query, 80) AS query_preview,
  calls,
  round(total_exec_time::numeric, 2) AS total_time_ms,
  round(mean_exec_time::numeric, 2) AS mean_time_ms,
  rows
FROM pg_stat_statements
ORDER BY total_exec_time DESC
LIMIT 20;

Most frequently called queries

High call count with low mean time might indicate an N+1 query pattern. If you see a simple SELECT being called 100,000 times per hour, the application is probably issuing it in a loop instead of using a batch query.

SELECT
  queryid,
  left(query, 80) AS query_preview,
  calls,
  round(mean_exec_time::numeric, 2) AS mean_time_ms,
  rows / NULLIF(calls, 0) AS avg_rows_per_call
FROM pg_stat_statements
ORDER BY calls DESC
LIMIT 20;

Queries with the worst cache hit ratio

These queries are hitting disk more than cache, which is slow and puts pressure on your I/O subsystem.

SELECT
  queryid,
  left(query, 80) AS query_preview,
  calls,
  shared_blks_hit,
  shared_blks_read,
  round(
    shared_blks_hit::numeric / NULLIF(shared_blks_hit + shared_blks_read, 0),
    4
  ) AS cache_hit_ratio
FROM pg_stat_statements
WHERE shared_blks_hit + shared_blks_read > 100
ORDER BY cache_hit_ratio ASC
LIMIT 20;

Queries getting slower over time

This is where delta computation becomes essential (see next section).

Why You Need Delta Computation

The values in pg_stat_statements are cumulative. They accumulate from the moment the extension was loaded (or the last time statistics were reset). This means total_exec_time for a query might be 500,000 milliseconds, but you don't know if that accumulated over 6 months of normal operation or spiked in the last hour.

To get meaningful time-series data, you need to take snapshots and compute deltas. The process:

Record a snapshot of all rows in pg_stat_statements at time T1
Record another snapshot at time T2 (say, 60 seconds later)
For each queryid, compute: delta_calls = calls_T2 - calls_T1, delta_time = total_exec_time_T2 - total_exec_time_T1
Now you know: in the last 60 seconds, this query was called N times and took M milliseconds total

This delta computation is the foundation of how monitoring tools turn raw pg_stat_statements data into useful dashboards. Without it, you're looking at all-time cumulative numbers that tell you very little about current performance.

Manual delta computation is tedious. You'd need a cron job that snapshots the view, stores it somewhere, and computes diffs. This is exactly the kind of operational overhead that a monitoring tool should handle for you.

Common Pitfalls

Forgetting shared_preload_libraries

If you skip this step and just run CREATE EXTENSION pg_stat_statements, the extension will be created but won't collect any data. The view will exist but will always be empty. This is the most common setup mistake.

Not resetting statistics after schema changes

After a major schema change (adding indexes, restructuring queries), the cumulative statistics still include data from the old schema. This can make it look like a query is performing the same as before when it's actually much faster now. Use pg_stat_statements_reset() judiciously after significant changes to get a clean baseline.

queryid collisions

The queryid is a hash. In rare cases, two different queries can produce the same hash. This is uncommon but can cause confusing results where a single queryid appears to have wildly inconsistent performance. PostgreSQL 14 improved the hashing algorithm to reduce collisions.

Running out of max entries

If pg_stat_statements.max is too low for your workload, the least-executed queries get evicted. This means you might miss infrequent but slow queries. Monitor the number of rows in the view relative to your max setting. If they're consistently equal, increase the max.

Querying pg_stat_statements too aggressively

The view acquires a lightweight lock when read. On very busy databases, querying it every second can add contention. Once per minute is a good collection interval for most workloads.

How Monitoring Tools Use pg_stat_statements

Understanding this pipeline helps you evaluate monitoring tools:

Collection: An agent runs near your database and queries pg_stat_statements at regular intervals (typically every 60 seconds).
Delta computation: The agent computes deltas between consecutive snapshots to produce per-interval metrics.
Normalization: Query text is further normalized and fingerprinted for consistent grouping across database restarts and statistics resets.
Storage: Deltas are shipped to a time-series database (ClickHouse, in Basira's case) for long-term retention and fast analytical queries.
Visualization: Dashboards show query performance trends, top queries by various dimensions, and alert on significant changes.

The quality difference between monitoring tools largely comes down to how well they handle steps 2-4. Delta computation with proper handling of statistics resets, queryid changes across PostgreSQL major versions, and eviction from the max entries pool is where the complexity lives.

Getting Started

If you haven't enabled pg_stat_statements yet, do it now. Even if you're not using a monitoring tool, the raw data is invaluable for understanding your database's workload.

For a quick start:

-- After enabling in postgresql.conf and restarting
CREATE EXTENSION IF NOT EXISTS pg_stat_statements;

-- Verify it's working
SELECT count(*) FROM pg_stat_statements;

-- See your top queries by total time
SELECT left(query, 60), calls, round(total_exec_time::numeric) AS total_ms
FROM pg_stat_statements
ORDER BY total_exec_time DESC
LIMIT 10;

Once you have this running, you'll immediately see which queries dominate your database's workload. From there, you can either build your own analysis workflow or use a tool like Basira to automate the collection, delta computation, and visualization. The agent setup is fully API-driven and takes about a minute. The pricing is $29/database/month with no per-query or per-metric charges, so the cost stays predictable as you scale.