What is the difference between correlated and non-correlated subqueries?
Answer
The main difference lies in how the subquery executes and whether it depends on values from the outer query.
Non-Correlated Subquery
Definition: A subquery that executes independently of the outer query and can run on its own.
Characteristics:
- Independent execution - Runs once before the outer query
 - No reference to outer query columns
 - Better performance - Executes only once
 - Self-contained - Can be run separately
 
-- Non-correlated subquery example
SELECT name, salary
FROM employees
WHERE department_id IN (
    SELECT department_id 
    FROM departments 
    WHERE location = 'New York'
);
-- The subquery can run independently:
SELECT department_id FROM departments WHERE location = 'New York';Correlated Subquery
Definition: A subquery that references columns from the outer query and executes once for each row processed by the outer query.
Characteristics:
- Dependent execution - Runs for each outer query row
 - References outer query columns
 - Slower performance - Executes multiple times
 - Cannot run alone - Depends on outer query context
 
-- Correlated subquery example
SELECT name, salary
FROM employees e1
WHERE salary > (
    SELECT AVG(salary)
    FROM employees e2
    WHERE e2.department_id = e1.department_id  -- References outer query
);
-- The subquery cannot run independently due to e1.department_id referenceExecution Flow Comparison
Non-Correlated Execution:
-- Step 1: Execute subquery once
SELECT department_id FROM departments WHERE location = 'New York';
-- Result: [10, 20, 30]
-- Step 2: Use result in outer query
SELECT name, salary FROM employees WHERE department_id IN (10, 20, 30);Correlated Execution:
-- For each employee row:
-- Row 1: John (dept_id = 10)
SELECT AVG(salary) FROM employees WHERE department_id = 10;
-- Compare John's salary with department 10 average
-- Row 2: Jane (dept_id = 20)  
SELECT AVG(salary) FROM employees WHERE department_id = 20;
-- Compare Jane's salary with department 20 average
-- ... continues for each rowPerformance Comparison
Non-Correlated (Faster)
-- Executes subquery once for 1000 employees
SELECT customer_name
FROM customers
WHERE customer_id IN (
    SELECT DISTINCT customer_id 
    FROM orders 
    WHERE order_date >= '2024-01-01'
);
-- Subquery executions: 1Correlated (Slower)
-- Executes subquery 1000 times for 1000 employees
SELECT customer_name
FROM customers c
WHERE EXISTS (
    SELECT 1 
    FROM orders o 
    WHERE o.customer_id = c.customer_id 
    AND o.order_date >= '2024-01-01'
);
-- Subquery executions: 1000Practical Examples
Sample Data Setup
CREATE TABLE employees (
    emp_id INT PRIMARY KEY,
    name VARCHAR(100),
    department_id INT,
    salary DECIMAL(10,2),
    hire_date DATE
);
CREATE TABLE departments (
    department_id INT PRIMARY KEY,
    department_name VARCHAR(100),
    location VARCHAR(50)
);Non-Correlated Examples
-- 1. Employees in specific locations
SELECT name, salary
FROM employees
WHERE department_id IN (
    SELECT department_id 
    FROM departments 
    WHERE location IN ('New York', 'California')
);
-- 2. Employees with above-average company salary
SELECT name, salary
FROM employees
WHERE salary > (SELECT AVG(salary) FROM employees);
-- 3. Employees in departments with more than 10 people
SELECT name, department_id
FROM employees
WHERE department_id IN (
    SELECT department_id
    FROM employees
    GROUP BY department_id
    HAVING COUNT(*) > 10
);Correlated Examples
-- 1. Employees earning more than their department average
SELECT name, salary, department_id
FROM employees e1
WHERE salary > (
    SELECT AVG(salary)
    FROM employees e2
    WHERE e2.department_id = e1.department_id
);
-- 2. Employees who are the highest paid in their department
SELECT name, salary, department_id
FROM employees e1
WHERE salary = (
    SELECT MAX(salary)
    FROM employees e2
    WHERE e2.department_id = e1.department_id
);
-- 3. Customers with their latest order date
SELECT customer_name
FROM customers c
WHERE EXISTS (
    SELECT 1
    FROM orders o
    WHERE o.customer_id = c.customer_id
    AND o.order_date = (
        SELECT MAX(order_date)
        FROM orders o2
        WHERE o2.customer_id = c.customer_id
    )
);Converting Between Types
Correlated to Non-Correlated (When Possible)
-- Correlated version
SELECT name FROM employees e1
WHERE salary > (
    SELECT AVG(salary) FROM employees e2 
    WHERE e2.department_id = e1.department_id
);
-- Non-correlated version using window functions
SELECT name
FROM (
    SELECT 
        name,
        salary,
        AVG(salary) OVER (PARTITION BY department_id) as dept_avg
    FROM employees
) t
WHERE salary > dept_avg;Non-Correlated to Correlated
-- Non-correlated version
SELECT name FROM employees
WHERE department_id IN (
    SELECT department_id FROM departments WHERE location = 'New York'
);
-- Correlated version (less efficient)
SELECT name FROM employees e
WHERE EXISTS (
    SELECT 1 FROM departments d 
    WHERE d.department_id = e.department_id 
    AND d.location = 'New York'
);When to Use Each Type
Use Non-Correlated When:
- Subquery result is independent of outer query
 - Performance is critical
 - Subquery returns a small result set
 - Logic can be expressed without row-by-row comparison
 
Use Correlated When:
- Need row-by-row comparison with outer query
 - Logic requires context from outer query
 - Using EXISTS/NOT EXISTS for existence checks
 - Finding records based on relationships within the same table
 
Optimization Strategies
1. Replace Correlated with JOINs
-- Slow correlated subquery
SELECT c.customer_name
FROM customers c
WHERE EXISTS (
    SELECT 1 FROM orders o 
    WHERE o.customer_id = c.customer_id
);
-- Faster JOIN
SELECT DISTINCT c.customer_name
FROM customers c
INNER JOIN orders o ON c.customer_id = o.customer_id;2. Use Window Functions
-- Instead of correlated subquery
SELECT name, salary
FROM employees e1
WHERE salary = (
    SELECT MAX(salary) FROM employees e2 
    WHERE e2.department_id = e1.department_id
);
-- Use window function
SELECT name, salary
FROM (
    SELECT 
        name, 
        salary,
        MAX(salary) OVER (PARTITION BY department_id) as max_dept_salary
    FROM employees
) t
WHERE salary = max_dept_salary;Common Patterns
EXISTS Pattern (Usually Correlated)
-- Find customers who have placed orders
SELECT customer_name
FROM customers c
WHERE EXISTS (
    SELECT 1 FROM orders o 
    WHERE o.customer_id = c.customer_id
);IN Pattern (Usually Non-Correlated)
-- Find employees in specific departments
SELECT name
FROM employees
WHERE department_id IN (
    SELECT department_id FROM departments WHERE budget > 100000
);Interview Tips
- Understand execution patterns: non-correlated runs once, correlated runs per row
 - Know performance implications: non-correlated is generally faster
 - Practice identifying which type a subquery is
 - Learn to convert between types when possible
 - Understand when each type is most appropriate
 
Test Your Knowledge
Take a quick quiz to test your understanding of this topic.