When working with databases, duplicate values in query results can clutter your output and make data analysis more difficult. PostgreSQL's DISTINCT
keyword provides a powerful solution for eliminating duplicate rows from your result sets. This guide covers everything you need to know about using DISTINCT
effectively in your SQL queries.
Understanding the DISTINCT Keyword
The DISTINCT
keyword is used in SQL to remove duplicate rows from the result set. When you're querying data, especially from large tables, you may encounter duplicate rows that aren't what you need for your analysis or reporting.
You place DISTINCT
directly after SELECT
:
SELECT DISTINCT column1 FROM table_name;
This tells PostgreSQL to return only unique values for the specified column(s).
Setting Up Sample Data
Let's create an employees
table to demonstrate DISTINCT
functionality:
CREATE TABLE employees (
id SERIAL PRIMARY KEY,
name TEXT,
department TEXT,
salary NUMERIC(10,2),
city TEXT
);
INSERT INTO employees (name, department, salary, city) VALUES
('Alice', 'Engineering', 75000.00, 'New York'),
('Bob', 'HR', 50000.00, 'Chicago'),
('Charlie', 'Engineering', 72000.00, 'New York'),
('Diana', 'Marketing', 65000.00, 'Los Angeles'),
('Eve', 'HR', 51000.00, 'Chicago'),
('Frank', 'Marketing', 62000.00, 'New York'),
('Grace', 'Engineering', 78000.00, 'San Francisco');
Basic DISTINCT Usage
Single Column DISTINCT
Let's say we want to know all departments in our company. Without DISTINCT
:
SELECT department FROM employees;
Result:
Engineering
HR
Engineering
Marketing
HR
Marketing
Engineering
With DISTINCT
:
SELECT DISTINCT department FROM employees;
Result:
Engineering
HR
Marketing
Much cleaner and more useful for analysis!
Multiple Examples
-- Get all unique cities
SELECT DISTINCT city FROM employees;
-- Get all unique salary levels
SELECT DISTINCT salary FROM employees ORDER BY salary DESC;
-- Count unique departments
SELECT COUNT(DISTINCT department) AS department_count FROM employees;
Query Purpose | Example | Use Case |
---|---|---|
Unique categories | SELECT DISTINCT department FROM employees |
Dropdown lists, filters |
Unique values with sorting | SELECT DISTINCT salary FROM employees ORDER BY salary |
Salary ranges, price lists |
Count unique values | SELECT COUNT(DISTINCT city) FROM employees |
Analytics, reporting |
DISTINCT with Multiple Columns
You can use DISTINCT
with multiple columns to find unique combinations:
SELECT DISTINCT department, city FROM employees;
This query removes rows where the combination of department and city is the same, not just individual columns.
Result:
Engineering | New York
HR | Chicago
Marketing | Los Angeles
Marketing | New York
Engineering | San Francisco
Understanding Multi-Column DISTINCT
Concept | Explanation | Example |
---|---|---|
Row-level uniqueness | Entire row combination must be unique | ('Engineering', 'New York') vs ('Engineering', 'Chicago') |
Not column-level | Individual columns can still have duplicates | 'Engineering' appears multiple times but with different cities |
Order matters | Column order affects result presentation | DISTINCT dept, city vs DISTINCT city, dept |
Practical Multi-Column Examples
-- Unique department-salary combinations
SELECT DISTINCT department, salary
FROM employees
ORDER BY department, salary DESC;
-- Unique city-department pairs with employee count
SELECT
department,
city,
COUNT(*) as employee_count
FROM employees
GROUP BY department, city
ORDER BY department, city;
DISTINCT vs GROUP BY
While DISTINCT
and GROUP BY
can sometimes produce similar results, they serve different purposes:
Aspect | DISTINCT | GROUP BY |
---|---|---|
Purpose | Remove duplicates | Group rows for aggregation |
Aggregation | Limited (COUNT only) | Full aggregation support |
Performance | Generally faster for simple deduplication | Better for complex aggregations |
Syntax | SELECT DISTINCT col FROM table |
SELECT col FROM table GROUP BY col |
When to Use Each
-- Use DISTINCT for simple deduplication
SELECT DISTINCT department FROM employees;
-- Use GROUP BY when you need aggregation
SELECT department, COUNT(*) as employee_count
FROM employees
GROUP BY department;
-- DISTINCT with COUNT
SELECT COUNT(DISTINCT department) FROM employees;
-- GROUP BY equivalent (more complex)
SELECT COUNT(*) FROM (
SELECT department FROM employees GROUP BY department
) subquery;
Advanced DISTINCT Techniques
DISTINCT with Expressions
-- Distinct calculated values
SELECT DISTINCT
EXTRACT(YEAR FROM hire_date) as hire_year
FROM employees
ORDER BY hire_year;
-- Distinct string manipulations
SELECT DISTINCT
UPPER(LEFT(name, 1)) as first_letter
FROM employees
ORDER BY first_letter;
-- Distinct conditional values
SELECT DISTINCT
CASE
WHEN salary >= 70000 THEN 'High'
WHEN salary >= 60000 THEN 'Medium'
ELSE 'Low'
END as salary_category
FROM employees;
DISTINCT in Subqueries
-- Find employees in departments with multiple locations
SELECT name, department
FROM employees
WHERE department IN (
SELECT DISTINCT department
FROM employees
GROUP BY department
HAVING COUNT(DISTINCT city) > 1
);
-- Employees with unique salary in their department
SELECT name, department, salary
FROM employees e1
WHERE salary = (
SELECT DISTINCT salary
FROM employees e2
WHERE e2.department = e1.department
AND salary = (
SELECT MAX(salary)
FROM employees e3
WHERE e3.department = e1.department
)
);
Performance Considerations
Impact on Query Performance
Scenario | Performance Impact | Optimization Strategy |
---|---|---|
Small result sets | Minimal impact | Use freely |
Large tables | Can be expensive | Add indexes on DISTINCT columns |
Multiple columns | Higher cost | Consider composite indexes |
Complex expressions | Significant overhead | Use functional indexes |
Optimization Examples
-- Create index to speed up DISTINCT queries
CREATE INDEX idx_employees_department ON employees(department);
CREATE INDEX idx_employees_city_dept ON employees(city, department);
-- Use EXPLAIN to analyze performance
EXPLAIN ANALYZE
SELECT DISTINCT department FROM employees;
-- Compare with and without index
EXPLAIN ANALYZE
SELECT DISTINCT department, city FROM employees;
Real-World Use Cases
Data Analysis and Reporting
-- Customer segmentation
SELECT DISTINCT
CASE
WHEN total_purchases > 1000 THEN 'VIP'
WHEN total_purchases > 500 THEN 'Regular'
ELSE 'New'
END as customer_segment
FROM customer_summary;
-- Product categories for filters
SELECT DISTINCT category
FROM products
WHERE active = true
ORDER BY category;
-- Geographic analysis
SELECT DISTINCT
state,
COUNT(*) OVER (PARTITION BY state) as location_count
FROM store_locations
ORDER BY location_count DESC;
Application Development
-- Populate dropdown menus
SELECT DISTINCT department as value, department as label
FROM employees
WHERE active = true
ORDER BY department;
-- User preference options
SELECT DISTINCT
preference_category,
preference_value
FROM user_preferences
WHERE preference_category = 'notification_type';
-- Data validation
SELECT DISTINCT email_domain
FROM users
WHERE email_domain NOT IN ('gmail.com', 'yahoo.com', 'outlook.com');
Common Pitfalls and Solutions
Pitfall 1: Unnecessary DISTINCT Usage
-- Unnecessary - ID is already unique
SELECT DISTINCT id, name FROM employees;
-- Better - Remove DISTINCT when not needed
SELECT id, name FROM employees;
Pitfall 2: DISTINCT with ORDER BY
-- This might not work as expected
SELECT DISTINCT department
FROM employees
ORDER BY salary; -- Error: salary not in SELECT
-- Correct approaches
SELECT DISTINCT department
FROM employees
ORDER BY department;
-- Or include the ORDER BY column
SELECT DISTINCT department, salary
FROM employees
ORDER BY salary;
Pitfall 3: Performance Issues
-- Slow on large tables
SELECT DISTINCT expensive_function(column) FROM large_table;
-- Better - use GROUP BY with proper indexing
SELECT expensive_function(column)
FROM large_table
GROUP BY expensive_function(column);
Testing and Validation
Verify DISTINCT Results
-- Check if DISTINCT is working correctly
SELECT
COUNT(*) as total_rows,
COUNT(DISTINCT department) as unique_departments
FROM employees;
-- Validate multi-column DISTINCT
SELECT
department,
city,
COUNT(*) as occurrence_count
FROM employees
GROUP BY department, city
HAVING COUNT(*) > 1; -- Should return no rows if DISTINCT worked
Best Practices
- Use DISTINCT purposefully: Don't add it by default—verify that duplicates actually exist
- Consider alternatives: Sometimes GROUP BY with aggregation is more appropriate
- Index appropriately: Create indexes on columns used with DISTINCT for better performance
- Test with real data: Verify that DISTINCT produces the expected results
- Monitor performance: Use EXPLAIN ANALYZE to understand query costs
- Document intent: Make it clear why DISTINCT is necessary in your queries
Practice Exercises
Try these exercises with the sample data:
- List all unique salaries in the employees table
- Show all unique pairs of department and city
- Find the number of unique departments
- Get unique cities where Engineering employees work
- List unique salary ranges (High: >70k, Medium: 60-70k, Low: <60k)
Exercise Solutions
-- 1. Unique salaries
SELECT DISTINCT salary FROM employees ORDER BY salary DESC;
-- 2. Unique department-city pairs
SELECT DISTINCT department, city FROM employees ORDER BY department, city;
-- 3. Count unique departments
SELECT COUNT(DISTINCT department) as unique_departments FROM employees;
-- 4. Cities with Engineering employees
SELECT DISTINCT city FROM employees WHERE department = 'Engineering';
-- 5. Salary ranges
SELECT DISTINCT
CASE
WHEN salary > 70000 THEN 'High'
WHEN salary >= 60000 THEN 'Medium'
ELSE 'Low'
END as salary_range
FROM employees;