Pattern Matching with LIKE, ILIKE, and SIMILAR TO

PostgreSQL Queries

The Ultimate PostGIS course

Get full access to this course and all its content.

₹3399.00 ₹1299.00
Buy Course Now
Chapter Info
Course The Ultimate PostGIS course
Module PostgreSQL Queries
Chapter Pattern Matching with LIKE, ILIKE, and SIMILAR TO

Chapter Content

Text pattern matching is a fundamental requirement in database applications, from implementing search functionality to data validation and cleanup. PostgreSQL provides powerful pattern matching capabilities through LIKE, ILIKE, and SIMILAR TO operators, each serving different use cases and offering varying levels of sophistication. This comprehensive guide covers all aspects of pattern matching in PostgreSQL.

Introduction to Pattern Matching

Pattern matching allows you to search for specific formats or patterns in text fields rather than exact matches. PostgreSQL offers three main pattern matching tools:

Operator Case Sensitivity Pattern Type Best Use Case
LIKE Case-sensitive Simple wildcards Basic pattern matching
ILIKE Case-insensitive Simple wildcards User-friendly searches
SIMILAR TO Case-sensitive Regular expressions Complex pattern matching

Setting Up Sample Data

Let's create a users table to demonstrate pattern matching capabilities:

CREATE TABLE users (
  id SERIAL PRIMARY KEY,
  name TEXT,
  email TEXT,
  phone TEXT,
  company TEXT,
  website TEXT
);

INSERT INTO users (name, email, phone, company, website) VALUES
('Alice Smith', 'alice.smith@example.com', '+1-555-0101', 'TechCorp Inc.', 'www.techcorp.com'),
('Bob Johnson', 'bob.j@example.com', '555.0102', 'DataSoft LLC', 'datasoft.net'),
('Charlie Brown', 'charlie.b@example.com', '(555) 0103', 'WebSolutions', 'websolutions.org'),
('David King', 'king.david@gmail.com', '555-0104', 'King Consulting', 'kingconsult.co.uk'),
('Eva Queen', 'evaq@hotmail.com', '5550105', 'Queen Enterprises', 'queen-ent.com'),
('Frank Miller', 'f.miller@company.co.uk', '+44-20-7946-0958', 'Miller & Associates', 'miller-assoc.co.uk'),
('Grace Lee', 'grace@startup.io', '1-800-STARTUP', 'StartupHub', 'startup.io');

LIKE Operator - Case-Sensitive Pattern Matching

The LIKE operator uses two special wildcard characters:

Wildcard Matches Example Description
% Zero or more characters 'A%' Starts with 'A'
_ Exactly one character 'A_' Two characters starting with 'A'

Basic LIKE Examples

-- Names starting with 'A'
SELECT name FROM users WHERE name LIKE 'A%';

-- Names ending with 'n'
SELECT name FROM users WHERE name LIKE '%n';

-- Names containing 'o'
SELECT name FROM users WHERE name LIKE '%o%';

-- Names with exactly 8 characters
SELECT name FROM users WHERE name LIKE '________';

-- Second character is 'a'
SELECT name FROM users WHERE name LIKE '_a%';

Email Pattern Matching

-- Gmail addresses
SELECT name, email FROM users WHERE email LIKE '%@gmail.com';

-- Emails with single character before @
SELECT name, email FROM users WHERE email LIKE '_@%';

-- Company email addresses (not free providers)
SELECT name, email FROM users 
WHERE email NOT LIKE '%@gmail.com' 
  AND email NOT LIKE '%@hotmail.com'
  AND email NOT LIKE '%@yahoo.com';

Phone Number Patterns

-- Phone numbers with parentheses
SELECT name, phone FROM users WHERE phone LIKE '(%)%';

-- Phone numbers with dashes
SELECT name, phone FROM users WHERE phone LIKE '%-%';

-- International numbers (starting with +)
SELECT name, phone FROM users WHERE phone LIKE '+%';

-- US format numbers
SELECT name, phone FROM users WHERE phone LIKE '___-____' OR phone LIKE '(___) ___-____';

ILIKE Operator - Case-Insensitive Pattern Matching

ILIKE works exactly like LIKE but ignores case differences, making it perfect for user-facing search functionality:

-- Case-insensitive name search
SELECT name FROM users WHERE name ILIKE 'alice%';
SELECT name FROM users WHERE name ILIKE 'ALICE%';
SELECT name FROM users WHERE name ILIKE 'Alice%';
-- All three queries return the same result

-- Flexible company search
SELECT name, company FROM users WHERE company ILIKE '%tech%';
SELECT name, company FROM users WHERE company ILIKE '%CORP%';

-- Email domain search (case-insensitive)
SELECT name, email FROM users WHERE email ILIKE '%@EXAMPLE.COM';

User Search Implementation

-- Flexible user search function
CREATE OR REPLACE FUNCTION search_users(search_term TEXT)
RETURNS TABLE(name TEXT, email TEXT, company TEXT) AS $$
BEGIN
  RETURN QUERY
  SELECT u.name, u.email, u.company
  FROM users u
  WHERE u.name ILIKE '%' || search_term || '%'
     OR u.email ILIKE '%' || search_term || '%'
     OR u.company ILIKE '%' || search_term || '%';
END;
$$ LANGUAGE plpgsql;

-- Usage examples
SELECT * FROM search_users('smith');
SELECT * FROM search_users('TECH');
SELECT * FROM search_users('example');

SIMILAR TO - Regular Expression Pattern Matching

SIMILAR TO uses SQL standard regular expressions, providing more powerful pattern matching capabilities:

Basic SIMILAR TO Syntax

Pattern Meaning Example
| Alternation (OR) '(Alice|Bob)'
* Zero or more 'A*'
+ One or more 'A+'
? Zero or one 'A?'
{n} Exactly n times 'A{3}'
{n,m} Between n and m times 'A{2,4}'
[...] Character class '[A-Z]'

SIMILAR TO Examples

-- Names starting with Alice or Bob
SELECT name FROM users WHERE name SIMILAR TO '(Alice|Bob)%';

-- Email addresses ending with .com or .org
SELECT name, email FROM users WHERE email SIMILAR TO '%\.(com|org)';

-- Phone numbers with exactly 10 digits
SELECT name, phone FROM users WHERE phone SIMILAR TO '%[0-9]{10}%';

-- Names with repeated letters
SELECT name FROM users WHERE name SIMILAR TO '%([A-Za-z])\1%';

-- Complex email validation
SELECT name, email FROM users 
WHERE email SIMILAR TO '[A-Za-z0-9._%+-]+@[A-Za-z0-9.-]+\.[A-Za-z]{2,}';

Advanced Pattern Examples

-- UK phone numbers
SELECT name, phone FROM users 
WHERE phone SIMILAR TO '\+44-[0-9]{2}-[0-9]{4}-[0-9]{4}';

-- Website URLs
SELECT name, website FROM users 
WHERE website SIMILAR TO '(www\.)?[a-zA-Z0-9-]+\.(com|net|org|co\.uk|io)';

-- Company names with specific patterns
SELECT name, company FROM users 
WHERE company SIMILAR TO '.*(Inc\.|LLC|Ltd\.).*';

Escaping Special Characters

When you need to match literal special characters, use the ESCAPE clause:

-- Match literal % character
SELECT * FROM users WHERE company LIKE '%\%%' ESCAPE '\';

-- Match literal underscore
SELECT * FROM users WHERE name LIKE '%\_%' ESCAPE '\';

-- Custom escape character
SELECT * FROM users WHERE email LIKE '%#_%' ESCAPE '#';

-- SIMILAR TO escaping
SELECT * FROM users WHERE website SIMILAR TO '%\\.com';

Performance Considerations

Index Usage and Performance

Pattern Type Index Usage Performance Recommendation
'prefix%' Can use B-tree index Fast Preferred for prefix searches
'%suffix' Cannot use standard index Slow on large tables Consider reverse indexes or full-text search
'%middle%' Cannot use standard index Slow on large tables Use trigram indexes or full-text search

Optimizing Pattern Matching

-- Enable trigram extension for better LIKE performance
CREATE EXTENSION IF NOT EXISTS pg_trgm;

-- Create trigram index
CREATE INDEX idx_users_name_trgm ON users USING gin (name gin_trgm_ops);
CREATE INDEX idx_users_email_trgm ON users USING gin (email gin_trgm_ops);

-- Analyze performance
EXPLAIN ANALYZE
SELECT * FROM users WHERE name ILIKE '%smith%';

-- Compare with and without index
EXPLAIN ANALYZE
SELECT * FROM users WHERE email LIKE '%@example.com';

Performance Testing

-- Create larger dataset for testing
INSERT INTO users (name, email, company)
SELECT 
  'User ' || generate_series(1, 100000),
  'user' || generate_series(1, 100000) || '@test.com',
  'Company ' || (generate_series(1, 100000) % 1000);

-- Test different pattern types
EXPLAIN (ANALYZE, BUFFERS)
SELECT COUNT(*) FROM users WHERE name LIKE 'User 1%';  -- Fast (prefix)

EXPLAIN (ANALYZE, BUFFERS)
SELECT COUNT(*) FROM users WHERE name LIKE '%User%';   -- Slower (contains)

EXPLAIN (ANALYZE, BUFFERS)
SELECT COUNT(*) FROM users WHERE name LIKE '%123';     -- Slowest (suffix)

Real-World Applications

Data Validation and Cleanup

-- Find invalid email formats
SELECT name, email FROM users 
WHERE email NOT SIMILAR TO '[A-Za-z0-9._%+-]+@[A-Za-z0-9.-]+\.[A-Za-z]{2,}';

-- Standardize phone number formats
UPDATE users 
SET phone = REGEXP_REPLACE(phone, '[^0-9+]', '', 'g')
WHERE phone SIMILAR TO '%[^0-9+\-\(\) ]%';

-- Find inconsistent company naming
SELECT DISTINCT company FROM users 
WHERE company SIMILAR TO '.*(Inc|Inc\.|Incorporated).*'
ORDER BY company;

Search and Filtering

-- Flexible search with multiple criteria
SELECT name, email, company
FROM users
WHERE (name ILIKE '%' || $1 || '%' 
       OR email ILIKE '%' || $1 || '%' 
       OR company ILIKE '%' || $1 || '%')
  AND ($2 IS NULL OR email LIKE '%@' || $2)
  AND ($3 IS NULL OR phone SIMILAR TO $3);

-- Domain-based filtering
SELECT name, email, 
  CASE 
    WHEN email LIKE '%@gmail.com' THEN 'Gmail'
    WHEN email LIKE '%@hotmail.com' THEN 'Hotmail'
    WHEN email LIKE '%@yahoo.com' THEN 'Yahoo'
    WHEN email SIMILAR TO '%@[a-zA-Z0-9-]+\.(edu|gov)' THEN 'Institutional'
    ELSE 'Corporate'
  END as email_category
FROM users;

Content Management

-- URL validation and categorization
SELECT name, website,
  CASE 
    WHEN website LIKE '%.com' THEN 'Commercial'
    WHEN website LIKE '%.org' THEN 'Organization'
    WHEN website LIKE '%.edu' THEN 'Educational'
    WHEN website LIKE '%.gov' THEN 'Government'
    WHEN website SIMILAR TO '%\.(co\.uk|org\.uk)' THEN 'UK Domain'
    ELSE 'Other'
  END as domain_type
FROM users
WHERE website IS NOT NULL;

Advanced Techniques

Dynamic Pattern Building

-- Build patterns dynamically
CREATE OR REPLACE FUNCTION flexible_search(
  search_term TEXT,
  search_type TEXT DEFAULT 'contains'
) RETURNS TABLE(name TEXT, email TEXT) AS $$
DECLARE
  pattern TEXT;
BEGIN
  CASE search_type
    WHEN 'starts_with' THEN pattern := search_term || '%';
    WHEN 'ends_with' THEN pattern := '%' || search_term;
    WHEN 'exact' THEN pattern := search_term;
    ELSE pattern := '%' || search_term || '%';
  END CASE;
  
  RETURN QUERY
  SELECT u.name, u.email
  FROM users u
  WHERE u.name ILIKE pattern OR u.email ILIKE pattern;
END;
$$ LANGUAGE plpgsql;

-- Usage
SELECT * FROM flexible_search('smith', 'starts_with');
SELECT * FROM flexible_search('example.com', 'ends_with');

Pattern Matching with Arrays

-- Search for multiple patterns
SELECT name, email FROM users
WHERE email SIMILAR TO '%(gmail|hotmail|yahoo)\.com';

-- Using ANY with LIKE
SELECT name, company FROM users
WHERE company LIKE ANY(ARRAY['%Tech%', '%Data%', '%Web%']);

-- Complex multi-pattern search
WITH search_patterns AS (
  SELECT unnest(ARRAY['%corp%', '%inc%', '%llc%']) as pattern
)
SELECT DISTINCT u.name, u.company
FROM users u, search_patterns sp
WHERE u.company ILIKE sp.pattern;

Common Pitfalls and Solutions

Pitfall 1: Case Sensitivity Confusion

-- Problem: Case-sensitive LIKE missing results
SELECT * FROM users WHERE name LIKE 'alice%';  -- Returns nothing

-- Solution: Use ILIKE for case-insensitive search
SELECT * FROM users WHERE name ILIKE 'alice%'; -- Returns Alice Smith

Pitfall 2: Performance Issues with Leading Wildcards

-- Problem: Slow query with leading wildcard
SELECT * FROM users WHERE email LIKE '%@example.com';

-- Solution: Use trigram index or restructure query
CREATE INDEX idx_users_email_trgm ON users USING gin (email gin_trgm_ops);

-- Or consider domain extraction
ALTER TABLE users ADD COLUMN email_domain TEXT;
UPDATE users SET email_domain = split_part(email, '@', 2);
CREATE INDEX idx_users_email_domain ON users(email_domain);
SELECT * FROM users WHERE email_domain = 'example.com';

Pitfall 3: Incorrect Escaping

-- Problem: Not escaping special characters
SELECT * FROM users WHERE company LIKE '%&%';  -- Doesn't work as expected

-- Solution: Proper escaping
SELECT * FROM users WHERE company LIKE '%\&%' ESCAPE '\';

Testing and Validation

Pattern Validation Functions

-- Email validation function
CREATE OR REPLACE FUNCTION is_valid_email(email TEXT)
RETURNS BOOLEAN AS $$
BEGIN
  RETURN email SIMILAR TO '[A-Za-z0-9._%+-]+@[A-Za-z0-9.-]+\.[A-Za-z]{2,}';
END;
$$ LANGUAGE plpgsql;

-- Phone validation function
CREATE OR REPLACE FUNCTION is_valid_phone(phone TEXT)
RETURNS BOOLEAN AS $$
BEGIN
  RETURN phone SIMILAR TO '(\+?[1-9]\d{1,14}|[0-9\-\(\) ]{10,})';
END;
$$ LANGUAGE plpgsql;

-- Test validation
SELECT name, email, is_valid_email(email) as valid_email,
       phone, is_valid_phone(phone) as valid_phone
FROM users;

Best Practices

  1. Choose the right operator: Use LIKE for simple patterns, ILIKE for case-insensitive searches, SIMILAR TO for complex patterns
  2. Consider performance: Avoid leading wildcards when possible, use appropriate indexes
  3. Escape special characters: Always escape literal special characters in patterns
  4. Validate patterns: Test your patterns thoroughly with edge cases
  5. Use functions for complex logic: Encapsulate complex pattern matching in functions
  6. Document patterns: Comment complex regular expressions for maintainability
  7. Consider alternatives: For complex text search, consider PostgreSQL's full-text search capabilities

Practice Exercises

Try these exercises with the sample data:

  1. Find all users whose names start with 'C'
  2. Find all users with Gmail addresses
  3. Find all names that contain exactly 5 characters
  4. Case-insensitively search for names starting with 'e'
  5. Use SIMILAR TO to find users with email domains ending in .com or .org
  6. Find phone numbers that contain parentheses
  7. Find companies with 'Inc.' or 'LLC' in their names
  8. Validate all email addresses using SIMILAR TO

Exercise Solutions

-- 1. Names starting with 'C'
SELECT name FROM users WHERE name LIKE 'C%';

-- 2. Gmail addresses
SELECT name, email FROM users WHERE email LIKE '%@gmail.com';

-- 3. Names with exactly 5 characters
SELECT name FROM users WHERE name LIKE '_____';

-- 4. Case-insensitive search for names starting with 'e'
SELECT name FROM users WHERE name ILIKE 'e%';

-- 5. Email domains ending in .com or .org
SELECT name, email FROM users WHERE email SIMILAR TO '%\.(com|org)';

-- 6. Phone numbers with parentheses
SELECT name, phone FROM users WHERE phone LIKE '%(%';

-- 7. Companies with Inc. or LLC
SELECT name, company FROM users WHERE company SIMILAR TO '%.*(Inc\.|LLC).*';

-- 8. Email validation
SELECT name, email, 
       email SIMILAR TO '[A-Za-z0-9._%+-]+@[A-Za-z0-9.-]+\.[A-Za-z]{2,}' as is_valid
FROM users;