Fixed-Length vs. Variable-Length
PostgreSQL offers three core text types:
Type | Description | Maximum Length |
---|---|---|
CHAR(n) |
Fixed-length, blank-padded | 1 to 10,485,76 |
VARCHAR(n) |
Variable-length with limit | 1 to 10,485,76 |
TEXT |
Variable unlimited length | Unlimited |
Use Cases:
CHAR(n)
: ISO country codes (CHAR(2)
)VARCHAR(n)
: Email addresses with known max lengthTEXT
: Descriptions, comments, logs
CREATE TABLE addresses (
id SERIAL PRIMARY KEY,
country_code CHAR(2),
email VARCHAR(255),
notes TEXT
);
INSERT INTO addresses (country_code, email, notes)
VALUES
('US', 'alice@example.com', 'First contact.'),
('IN', 'bob@example.in', 'Follow-up pending.');
Case-Insensitive Text with CITEXT
The citext
extension provides case-insensitive text comparisons:
CREATE EXTENSION IF NOT EXISTS citext;
CREATE TABLE users (
username CITEXT PRIMARY KEY,
full_name TEXT
);
INSERT INTO users (username, full_name)
VALUES ('Alice', 'Alice Smith');
-- This matches regardless of case:
SELECT * FROM users WHERE username = 'alice';
Benefit: Simplifies comparisons and uniqueness.
Domain Types for Validation
Wrap text in a domain with a regex check, e.g., email format:
CREATE DOMAIN email_addr AS VARCHAR(255)
CHECK (VALUE ~* '^[A-Z0-9._%+-]+@[A-Z0-9.-]+\.[A-Z]{2,}$');
CREATE TABLE contacts (
id SERIAL,
email email_addr
);
INSERT INTO contacts (email) VALUES ('valid@example.com');
-- Invalid addresses will be rejected.
Altering Column Types
Convert between types safely:
-- TEXT to VARCHAR(100)
ALTER TABLE addresses
ALTER COLUMN notes TYPE VARCHAR(100)
USING LEFT(notes,100);
Advice: Use USING
to trim or cast values during conversion.
Full-Text Search Basics
PostgreSQL supports full-text search via tsvector
and tsquery
:
CREATE TABLE articles (
id SERIAL,
title TEXT,
body TEXT
);
INSERT INTO articles (title, body)
VALUES ('Postgres Tips', 'Learn about text types and full-text search.');
-- Create a tsvector column for search:
ALTER TABLE articles ADD COLUMN document tsvector;
UPDATE articles SET document = to_tsvector(body);
-- Search for keyword 'text':
SELECT id, title
FROM articles
WHERE document @@ to_tsquery('text');
Indexing Text Columns
- B-tree on
VARCHAR
/TEXT
for equality - GIN on
tsvector
for full-text search
-- Equality index:
CREATE INDEX idx_email ON contacts (email);
-- Full-text index:
CREATE INDEX idx_article_search ON articles USING GIN(document);
Performance and Storage
CHAR(n)
always stores n bytes (plus overhead)VARCHAR(n)
andTEXT
store actual length + 1–4 bytesCITEXT
adds overhead (~5–10 bytes per value)
Use the smallest type that satisfies your needs to save space.
Character Encoding & Collation
What is Character Encoding?
Character encoding determines how characters are stored and interpreted in bytes. PostgreSQL supports many encodings, with UTF8
(Unicode) being the most common.
Why Encoding Matters:
- Without proper encoding, international text like Hindi, Japanese, or emoji can break.
- Text input/output needs consistent encoding across database, client, and application layers.
Check Current Encoding:
SHOW SERVER_ENCODING;
Set Encoding (only at database creation):
CREATE DATABASE international_db WITH ENCODING = 'UTF8';
What is Collation?
Collation defines the rules for sorting and comparing text, such as alphabetic order, case sensitivity, and locale preferences.
Check Column Collation:
SELECT column_name, collation_name
FROM information_schema.columns
WHERE table_name = 'contacts';
Set Collation on Column/Table:
CREATE TABLE cities_de (
name TEXT COLLATE "de_DE"
);
CREATE TABLE cities_en (
name TEXT COLLATE "en_US"
);
List Available Collations:
SELECT * FROM pg_collation;
Encoding vs. Collation Summary
Feature | Purpose | Set At | Affects |
---|---|---|---|
Encoding | Byte representation of characters | Database creation | Input/output |
Collation | Sort and comparison rules | Column/table | ORDER BY, LIKE, = |
Practical Examples
-- Incorrect sorting:
SELECT name FROM cities ORDER BY name;
-- Correct with collation:
SELECT name FROM cities ORDER BY name COLLATE "en_US";
Best Practices
- Always use
UTF8
encoding unless working with legacy systems. - Set collation explicitly if your application supports multiple languages/locales.
- Avoid mixing collations across tables unless needed — it can cause query errors or performance issues.
- Set default collation at database creation:
CREATE DATABASE mydb
LC_COLLATE = 'en_US.UTF-8'
LC_CTYPE = 'en_US.UTF-8'
ENCODING = 'UTF8';