What are aggregate functions in SQL?
Answer
Aggregate functions perform calculations on a set of values and return a single result. They are essential for data analysis, reporting, and statistical operations in SQL.
Common Aggregate Functions
1. COUNT()
Counts the number of rows or non-NULL values.
-- Count all rows
SELECT COUNT(*) FROM employees;
-- Count non-NULL values in a specific column
SELECT COUNT(email) FROM employees;
-- Count distinct values
SELECT COUNT(DISTINCT department) FROM employees;
2. SUM()
Calculates the total sum of numeric values.
-- Total salary expense
SELECT SUM(salary) FROM employees;
-- Sum by department
SELECT department, SUM(salary) as total_salary
FROM employees
GROUP BY department;
3. AVG()
Calculates the average of numeric values.
-- Average salary
SELECT AVG(salary) FROM employees;
-- Average salary by department
SELECT department, AVG(salary) as avg_salary
FROM employees
GROUP BY department;
4. MIN()
Returns the minimum value.
-- Lowest salary
SELECT MIN(salary) FROM employees;
-- Earliest hire date by department
SELECT department, MIN(hire_date) as earliest_hire
FROM employees
GROUP BY department;
5. MAX()
Returns the maximum value.
-- Highest salary
SELECT MAX(salary) FROM employees;
-- Latest hire date by department
SELECT department, MAX(hire_date) as latest_hire
FROM employees
GROUP BY department;
Sample Data for Examples
CREATE TABLE employees (
emp_id INT PRIMARY KEY,
name VARCHAR(100),
department VARCHAR(50),
salary DECIMAL(10,2),
hire_date DATE
);
INSERT INTO employees VALUES
(1, 'John Doe', 'Engineering', 75000, '2022-01-15'),
(2, 'Jane Smith', 'Marketing', 65000, '2022-03-20'),
(3, 'Mike Johnson', 'Engineering', 80000, '2021-11-10'),
(4, 'Sarah Wilson', 'Marketing', 70000, '2023-02-05'),
(5, 'Tom Brown', 'Engineering', 85000, '2021-08-30');
Practical Examples
Basic Statistics
-- Overall company statistics
SELECT
COUNT(*) as total_employees,
SUM(salary) as total_payroll,
AVG(salary) as average_salary,
MIN(salary) as lowest_salary,
MAX(salary) as highest_salary
FROM employees;
-- Result:
-- total_employees | total_payroll | average_salary | lowest_salary | highest_salary
-- 5 | 375000 | 75000 | 65000 | 85000
Department Analysis
-- Statistics by department
SELECT
department,
COUNT(*) as employee_count,
AVG(salary) as avg_salary,
MIN(salary) as min_salary,
MAX(salary) as max_salary
FROM employees
GROUP BY department;
-- Result:
-- department | employee_count | avg_salary | min_salary | max_salary
-- Engineering | 3 | 80000 | 75000 | 85000
-- Marketing | 2 | 67500 | 65000 | 70000
Advanced Examples
-- Departments with more than 2 employees
SELECT department, COUNT(*) as emp_count
FROM employees
GROUP BY department
HAVING COUNT(*) > 2;
-- Salary ranges by department
SELECT
department,
MAX(salary) - MIN(salary) as salary_range
FROM employees
GROUP BY department;
-- Percentage of total payroll by department
SELECT
department,
SUM(salary) as dept_payroll,
ROUND((SUM(salary) * 100.0 / (SELECT SUM(salary) FROM employees)), 2) as percentage
FROM employees
GROUP BY department;
Aggregate Functions with NULL Values
-- Sample data with NULLs
INSERT INTO employees VALUES (6, 'Alex Green', 'Sales', NULL, '2023-01-10');
-- COUNT(*) includes NULLs, COUNT(column) excludes NULLs
SELECT
COUNT(*) as total_rows, -- Returns 6
COUNT(salary) as non_null_salaries -- Returns 5
FROM employees;
-- Other aggregates ignore NULL values
SELECT
AVG(salary), -- Calculates average of non-NULL values only
SUM(salary) -- Sums non-NULL values only
FROM employees;
String Aggregate Functions
GROUP_CONCAT() / STRING_AGG()
Concatenates values from multiple rows.
-- MySQL syntax
SELECT department, GROUP_CONCAT(name) as employees
FROM employees
GROUP BY department;
-- PostgreSQL/SQL Server syntax
SELECT department, STRING_AGG(name, ', ') as employees
FROM employees
GROUP BY department;
Mathematical Aggregate Functions
STDDEV() and VARIANCE()
-- Standard deviation and variance of salaries
SELECT
department,
STDDEV(salary) as salary_stddev,
VARIANCE(salary) as salary_variance
FROM employees
GROUP BY department;
Window Functions vs Aggregate Functions
-- Aggregate function (one result per group)
SELECT department, AVG(salary) as avg_salary
FROM employees
GROUP BY department;
-- Window function (keeps all rows)
SELECT
name,
salary,
department,
AVG(salary) OVER (PARTITION BY department) as dept_avg_salary
FROM employees;
Performance Considerations
- Indexes: Aggregate functions benefit from indexes on grouped columns
- WHERE vs HAVING: Filter with WHERE before grouping when possible
- DISTINCT: Use COUNT(DISTINCT) carefully as it can be expensive
- Large datasets: Consider using approximate functions for very large tables
Common Mistakes
❌ Mixing aggregates with non-aggregated columns
-- This will cause an error in most databases
SELECT name, COUNT(*) FROM employees;
✅ Correct usage with GROUP BY
SELECT department, COUNT(*) FROM employees GROUP BY department;
Interview Tips
- Know the five main aggregate functions: COUNT, SUM, AVG, MIN, MAX
- Understand how NULL values are handled
- Practice using aggregates with GROUP BY and HAVING
- Know the difference between COUNT(*) and COUNT(column)
- Be familiar with window functions as an alternative
Test Your Knowledge
Take a quick quiz to test your understanding of this topic.