PostgreSQL Indices: A Complete Guide - The Caffeinated Engineer

Understanding PostgreSQL indices is crucial for database performance. Indices are data structures that improve the speed of data retrieval operations, but they come with trade-offs in storage space and write performance. Let's explore the different types of indices available in PostgreSQL and when to use each one.

What Are Indices?

An index is a data structure that provides a fast lookup mechanism for data in a database table. Think of it like an index in a book—instead of reading through every page to find a topic, you can quickly jump to the right section.

B-tree Indices (Default)

B-tree indices are the most common and versatile type in PostgreSQL. They're automatically created for PRIMARY KEY and UNIQUE constraints.

-- Create a B-tree index on a single column
CREATE INDEX idx_users_email ON users(email);

-- Create a composite B-tree index
CREATE INDEX idx_users_name_email ON users(last_name, first_name, email);

-- B-tree indices work well with these operators:
-- =, <, <=, >, >=, BETWEEN, IN, IS NULL, IS NOT NULL

When to use B-tree indices:

Equality comparisons (=)
Range queries (<, >, BETWEEN)
Sorting operations (ORDER BY)
Most common use cases

Hash Indices

Hash indices are optimized for equality comparisons only.

-- Create a hash index
CREATE INDEX idx_users_email_hash ON users USING hash(email);

-- Hash indices only work with equality operators (=)
-- They don't support range queries or sorting

When to use hash indices:

Only equality comparisons
When you don't need range queries or sorting
Can be faster than B-tree for simple equality lookups

GiST (Generalized Search Tree) Indices

GiST indices are useful for geometric data and full-text search.

-- Create a GiST index for geometric data
CREATE INDEX idx_locations_coordinates ON locations USING gist(coordinates);

-- Create a GiST index for full-text search
CREATE INDEX idx_documents_content ON documents USING gist(to_tsvector('english', content));

When to use GiST indices:

Geometric data types (point, line, polygon)
Full-text search with tsvector
Custom data types that implement GiST operators

GIN (Generalized Inverted Index) Indices

GIN indices are excellent for composite values and arrays.

-- Create a GIN index for arrays
CREATE INDEX idx_products_tags ON products USING gin(tags);

-- Create a GIN index for full-text search
CREATE INDEX idx_documents_content_gin ON documents USING gin(to_tsvector('english', content));

-- Create a GIN index for JSONB
CREATE INDEX idx_users_preferences ON users USING gin(preferences);

When to use GIN indices:

Array columns
JSONB columns
Full-text search (more efficient than GiST for search)
Composite values

BRIN (Block Range INdex) Indices

BRIN indices are space-efficient and work well with naturally ordered data.

-- Create a BRIN index on a timestamp column
CREATE INDEX idx_logs_created_at ON logs USING brin(created_at);

-- Create a BRIN index with custom page range
CREATE INDEX idx_logs_created_at_brin ON logs USING brin(created_at) WITH (pages_per_range = 128);

When to use BRIN indices:

Large tables with naturally ordered data
When storage space is a concern
Time-series data
When exact precision isn't required

Partial Indices

Partial indices only index rows that satisfy a condition.

-- Create a partial index for active users only
CREATE INDEX idx_users_active_email ON users(email) WHERE active = true;

-- Create a partial index for non-null values
CREATE INDEX idx_products_price ON products(price) WHERE price IS NOT NULL;

When to use partial indices:

When you frequently query a subset of data
To reduce index size and maintenance overhead
For conditional queries

Expression Indices

Expression indices index the result of an expression rather than a column.

-- Create an index on a function result
CREATE INDEX idx_users_lower_email ON users(LOWER(email));

-- Create an index on a computed column
CREATE INDEX idx_products_name_length ON products(LENGTH(name));

When to use expression indices:

When you frequently use functions in WHERE clauses
For case-insensitive searches
Computed columns that are queried often

Index Maintenance

-- Check index usage statistics
SELECT schemaname, tablename, indexname, idx_scan, idx_tup_read, idx_tup_fetch
FROM pg_stat_user_indexes
ORDER BY idx_scan DESC;

-- Rebuild an index
REINDEX INDEX idx_users_email;

-- Analyze table to update statistics
ANALYZE users;

Best Practices

Don't over-index: Each index has maintenance costs
Monitor index usage: Remove unused indices
Consider the query pattern: Index columns that appear in WHERE, JOIN, and ORDER BY clauses
Test performance: Use EXPLAIN ANALYZE to verify index usage
Consider storage impact: Indices consume disk space

When NOT to Index

Small tables (sequential scan might be faster)
Columns with low selectivity (many duplicate values)
Tables that are mostly read-only
Columns that are rarely queried

Performance Impact

-- Check if an index is being used
EXPLAIN ANALYZE SELECT * FROM users WHERE email = 'user@example.com';

-- Compare performance with and without index
-- (Drop and recreate index to test)

Remember: Indices are a powerful tool, but they're not a silver bullet. Always test and measure the performance impact of your indexing strategy.