PostgreSQL Indices: A Complete Guide
Understanding PostgreSQL indices is crucial for database performance. Indices are data structures that improve the speed of data retrieval operations, but they come with trade-offs in storage space and write performance. Let's explore the different types of indices available in PostgreSQL and when to use each one.
What Are Indices?
An index is a data structure that provides a fast lookup mechanism for data in a database table. Think of it like an index in a book—instead of reading through every page to find a topic, you can quickly jump to the right section.
B-tree Indices (Default)
B-tree indices are the most common and versatile type in PostgreSQL. They're automatically created for PRIMARY KEY and UNIQUE constraints.
-- Create a B-tree index on a single column
CREATE INDEX idx_users_email ON users(email);
-- Create a composite B-tree index
CREATE INDEX idx_users_name_email ON users(last_name, first_name, email);
-- B-tree indices work well with these operators:
-- =, <, <=, >, >=, BETWEEN, IN, IS NULL, IS NOT NULL
When to use B-tree indices:
- Equality comparisons (
=
) - Range queries (
<
,>
,BETWEEN
) - Sorting operations (
ORDER BY
) - Most common use cases
Hash Indices
Hash indices are optimized for equality comparisons only.
-- Create a hash index
CREATE INDEX idx_users_email_hash ON users USING hash(email);
-- Hash indices only work with equality operators (=)
-- They don't support range queries or sorting
When to use hash indices:
- Only equality comparisons
- When you don't need range queries or sorting
- Can be faster than B-tree for simple equality lookups
GiST (Generalized Search Tree) Indices
GiST indices are useful for geometric data and full-text search.
-- Create a GiST index for geometric data
CREATE INDEX idx_locations_coordinates ON locations USING gist(coordinates);
-- Create a GiST index for full-text search
CREATE INDEX idx_documents_content ON documents USING gist(to_tsvector('english', content));
When to use GiST indices:
- Geometric data types (point, line, polygon)
- Full-text search with
tsvector
- Custom data types that implement GiST operators
GIN (Generalized Inverted Index) Indices
GIN indices are excellent for composite values and arrays.
-- Create a GIN index for arrays
CREATE INDEX idx_products_tags ON products USING gin(tags);
-- Create a GIN index for full-text search
CREATE INDEX idx_documents_content_gin ON documents USING gin(to_tsvector('english', content));
-- Create a GIN index for JSONB
CREATE INDEX idx_users_preferences ON users USING gin(preferences);
When to use GIN indices:
- Array columns
- JSONB columns
- Full-text search (more efficient than GiST for search)
- Composite values
BRIN (Block Range INdex) Indices
BRIN indices are space-efficient and work well with naturally ordered data.
-- Create a BRIN index on a timestamp column
CREATE INDEX idx_logs_created_at ON logs USING brin(created_at);
-- Create a BRIN index with custom page range
CREATE INDEX idx_logs_created_at_brin ON logs USING brin(created_at) WITH (pages_per_range = 128);
When to use BRIN indices:
- Large tables with naturally ordered data
- When storage space is a concern
- Time-series data
- When exact precision isn't required
Partial Indices
Partial indices only index rows that satisfy a condition.
-- Create a partial index for active users only
CREATE INDEX idx_users_active_email ON users(email) WHERE active = true;
-- Create a partial index for non-null values
CREATE INDEX idx_products_price ON products(price) WHERE price IS NOT NULL;
When to use partial indices:
- When you frequently query a subset of data
- To reduce index size and maintenance overhead
- For conditional queries
Expression Indices
Expression indices index the result of an expression rather than a column.
-- Create an index on a function result
CREATE INDEX idx_users_lower_email ON users(LOWER(email));
-- Create an index on a computed column
CREATE INDEX idx_products_name_length ON products(LENGTH(name));
When to use expression indices:
- When you frequently use functions in WHERE clauses
- For case-insensitive searches
- Computed columns that are queried often
Index Maintenance
-- Check index usage statistics
SELECT schemaname, tablename, indexname, idx_scan, idx_tup_read, idx_tup_fetch
FROM pg_stat_user_indexes
ORDER BY idx_scan DESC;
-- Rebuild an index
REINDEX INDEX idx_users_email;
-- Analyze table to update statistics
ANALYZE users;
Best Practices
- Don't over-index: Each index has maintenance costs
- Monitor index usage: Remove unused indices
- Consider the query pattern: Index columns that appear in WHERE, JOIN, and ORDER BY clauses
- Test performance: Use EXPLAIN ANALYZE to verify index usage
- Consider storage impact: Indices consume disk space
When NOT to Index
- Small tables (sequential scan might be faster)
- Columns with low selectivity (many duplicate values)
- Tables that are mostly read-only
- Columns that are rarely queried
Performance Impact
-- Check if an index is being used
EXPLAIN ANALYZE SELECT * FROM users WHERE email = 'user@example.com';
-- Compare performance with and without index
-- (Drop and recreate index to test)
Remember: Indices are a powerful tool, but they're not a silver bullet. Always test and measure the performance impact of your indexing strategy.