What are indexes and why are they used?

Answer

Indexes are database objects that improve the speed of data retrieval operations on a table. They work like an index in a book, providing a fast path to locate specific data without scanning the entire table.

How Indexes Work

Think of an index as a separate structure that contains:

  • Key values from one or more columns
  • Pointers to the actual rows in the table
  • Sorted organization for fast searching
-- Without index: Full table scan
SELECT * FROM employees WHERE last_name = 'Smith';
-- Database scans all 100,000 rows

-- With index on last_name: Index seek
CREATE INDEX idx_last_name ON employees(last_name);
SELECT * FROM employees WHERE last_name = 'Smith';
-- Database uses index to find rows quickly

Types of Indexes

1. Clustered Index

  • Physical ordering: Rows are stored in the same order as the index
  • One per table: Only one clustered index allowed
  • Primary key: Usually created automatically on primary key
-- Clustered index (usually on primary key)
CREATE CLUSTERED INDEX idx_emp_id ON employees(emp_id);

2. Non-Clustered Index

  • Logical ordering: Separate structure pointing to data rows
  • Multiple allowed: Can have many non-clustered indexes
  • Additional storage: Requires extra disk space
-- Non-clustered indexes
CREATE INDEX idx_last_name ON employees(last_name);
CREATE INDEX idx_department ON employees(department);
CREATE INDEX idx_hire_date ON employees(hire_date);

3. Unique Index

  • Enforces uniqueness: Prevents duplicate values
  • Automatic creation: Created automatically with UNIQUE constraints
-- Unique index
CREATE UNIQUE INDEX idx_email ON employees(email);

4. Composite Index

  • Multiple columns: Index on combination of columns
  • Column order matters: Most selective column should be first
-- Composite index
CREATE INDEX idx_dept_salary ON employees(department, salary);

-- Effective for queries like:
SELECT * FROM employees WHERE department = 'IT' AND salary > 50000;

Benefits of Indexes

1. Faster SELECT Operations

-- Without index: O(n) - linear scan
SELECT * FROM employees WHERE emp_id = 12345;

-- With index: O(log n) - binary search
-- Dramatically faster on large tables

2. Faster JOIN Operations

-- Indexes on join columns improve performance
SELECT e.name, d.dept_name
FROM employees e
JOIN departments d ON e.dept_id = d.dept_id; -- Index on dept_id helps

3. Faster ORDER BY

-- Index on ordered column avoids sorting
SELECT * FROM employees ORDER BY last_name; -- Index on last_name helps

4. Faster GROUP BY

-- Index on grouped column improves performance
SELECT department, COUNT(*) 
FROM employees 
GROUP BY department; -- Index on department helps

Drawbacks of Indexes

1. Storage Overhead

-- Each index requires additional disk space
-- Rule of thumb: 10-20% of table size per index

2. Slower INSERT/UPDATE/DELETE

-- Every modification must update indexes too
INSERT INTO employees (name, department) VALUES ('John Doe', 'IT');
-- Must update table + all indexes on the table

3. Maintenance Overhead

-- Indexes need maintenance as data changes
-- Fragmentation can occur over time

When to Create Indexes

Good Candidates:

  • Frequently queried columns in WHERE clauses
  • JOIN columns used in table relationships
  • ORDER BY columns for sorting
  • Foreign key columns for referential integrity
-- Good index candidates
CREATE INDEX idx_customer_id ON orders(customer_id); -- FK column
CREATE INDEX idx_order_date ON orders(order_date);   -- Frequently filtered
CREATE INDEX idx_status ON orders(status);           -- WHERE clause column

Poor Candidates:

  • Small tables (< 1000 rows)
  • Frequently updated columns
  • Columns with low selectivity (few unique values)
-- Poor index candidates
-- CREATE INDEX idx_gender ON employees(gender);     -- Only M/F values
-- CREATE INDEX idx_active ON users(is_active);      -- Only true/false

Index Management

Creating Indexes

-- Basic index
CREATE INDEX idx_name ON table_name(column_name);

-- Composite index
CREATE INDEX idx_multi ON table_name(col1, col2, col3);

-- Unique index
CREATE UNIQUE INDEX idx_unique ON table_name(column_name);

-- Partial index (PostgreSQL)
CREATE INDEX idx_active_users ON users(email) WHERE is_active = true;

Viewing Indexes

-- MySQL
SHOW INDEXES FROM table_name;

-- PostgreSQL
\d table_name

-- SQL Server
SELECT * FROM sys.indexes WHERE object_id = OBJECT_ID('table_name');

Dropping Indexes

DROP INDEX idx_name ON table_name; -- MySQL
DROP INDEX idx_name;                -- PostgreSQL/SQL Server

Performance Examples

Before Index (Slow)

-- Table with 1 million employees
SELECT * FROM employees WHERE last_name = 'Smith';
-- Execution time: 2.5 seconds (full table scan)
-- Rows examined: 1,000,000

After Index (Fast)

CREATE INDEX idx_last_name ON employees(last_name);
SELECT * FROM employees WHERE last_name = 'Smith';
-- Execution time: 0.01 seconds (index seek)
-- Rows examined: 127 (only matching rows)

Index Design Best Practices

1. Column Order in Composite Indexes

-- Good: Most selective column first
CREATE INDEX idx_search ON products(category, brand, price);
-- Effective for: WHERE category = 'Electronics' AND brand = 'Apple'

-- Poor: Less selective column first
CREATE INDEX idx_poor ON products(price, category, brand);
-- Less effective for category-based searches

2. Include Columns (SQL Server)

-- Covering index with included columns
CREATE INDEX idx_covering ON employees(department) 
INCLUDE (name, salary, hire_date);
-- Query can be satisfied entirely from index

3. Filtered Indexes

-- Index only active records
CREATE INDEX idx_active_employees ON employees(last_name) 
WHERE status = 'ACTIVE';

Monitoring Index Usage

-- Check index usage statistics (SQL Server)
SELECT 
    i.name AS index_name,
    s.user_seeks,
    s.user_scans,
    s.user_lookups
FROM sys.indexes i
JOIN sys.dm_db_index_usage_stats s ON i.object_id = s.object_id;

Interview Tips

  • Understand that indexes trade storage space for query speed
  • Know the difference between clustered and non-clustered indexes
  • Be able to identify good and poor index candidates
  • Understand the impact on INSERT/UPDATE/DELETE operations
  • Know how to create composite indexes with proper column ordering

Test Your Knowledge

Take a quick quiz to test your understanding of this topic.