Text Data Types

Postgresql datatypes

Video Locked

Please log in to watch this video

Log In
Chapter Info
Course The Ultimate PostGIS course
Module Postgresql datatypes
Chapter Text Data Types

Chapter Content

Fixed-Length vs. Variable-Length

PostgreSQL offers three core text types:

Type Description Maximum Length
CHAR(n) Fixed-length, blank-padded 1 to 10,485,76
VARCHAR(n) Variable-length with limit 1 to 10,485,76
TEXT Variable unlimited length Unlimited

Use Cases:

  • CHAR(n): ISO country codes (CHAR(2))
  • VARCHAR(n): Email addresses with known max length
  • TEXT: Descriptions, comments, logs
CREATE TABLE addresses (
  id SERIAL PRIMARY KEY,
  country_code CHAR(2),
  email VARCHAR(255),
  notes TEXT
);

INSERT INTO addresses (country_code, email, notes)
VALUES
  ('US', 'alice@example.com', 'First contact.'),
  ('IN', 'bob@example.in', 'Follow-up pending.');

Case-Insensitive Text with CITEXT

The citext extension provides case-insensitive text comparisons:

CREATE EXTENSION IF NOT EXISTS citext;

CREATE TABLE users (
  username CITEXT PRIMARY KEY,
  full_name TEXT
);

INSERT INTO users (username, full_name)
VALUES ('Alice', 'Alice Smith');

-- This matches regardless of case:
SELECT * FROM users WHERE username = 'alice';

Benefit: Simplifies comparisons and uniqueness.

Domain Types for Validation

Wrap text in a domain with a regex check, e.g., email format:

CREATE DOMAIN email_addr AS VARCHAR(255)
  CHECK (VALUE ~* '^[A-Z0-9._%+-]+@[A-Z0-9.-]+\.[A-Z]{2,}$');

CREATE TABLE contacts (
  id SERIAL,
  email email_addr
);

INSERT INTO contacts (email) VALUES ('valid@example.com');
-- Invalid addresses will be rejected.

Altering Column Types

Convert between types safely:

-- TEXT to VARCHAR(100)
ALTER TABLE addresses
  ALTER COLUMN notes TYPE VARCHAR(100)
  USING LEFT(notes,100);

Advice: Use USING to trim or cast values during conversion.

Full-Text Search Basics

PostgreSQL supports full-text search via tsvector and tsquery:

CREATE TABLE articles (
  id SERIAL,
  title TEXT,
  body TEXT
);

INSERT INTO articles (title, body)
VALUES ('Postgres Tips', 'Learn about text types and full-text search.');

-- Create a tsvector column for search:
ALTER TABLE articles ADD COLUMN document tsvector;
UPDATE articles SET document = to_tsvector(body);

-- Search for keyword 'text':
SELECT id, title
FROM articles
WHERE document @@ to_tsquery('text');

Indexing Text Columns

  • B-tree on VARCHAR/TEXT for equality
  • GIN on tsvector for full-text search
-- Equality index:
CREATE INDEX idx_email ON contacts (email);

-- Full-text index:
CREATE INDEX idx_article_search ON articles USING GIN(document);

Performance and Storage

  • CHAR(n) always stores n bytes (plus overhead)
  • VARCHAR(n) and TEXT store actual length + 1–4 bytes
  • CITEXT adds overhead (~5–10 bytes per value)

Use the smallest type that satisfies your needs to save space.

Character Encoding & Collation

What is Character Encoding?

Character encoding determines how characters are stored and interpreted in bytes. PostgreSQL supports many encodings, with UTF8 (Unicode) being the most common.

Why Encoding Matters:

  • Without proper encoding, international text like Hindi, Japanese, or emoji can break.
  • Text input/output needs consistent encoding across database, client, and application layers.

Check Current Encoding:

SHOW SERVER_ENCODING;

Set Encoding (only at database creation):

CREATE DATABASE international_db WITH ENCODING = 'UTF8';

What is Collation?

Collation defines the rules for sorting and comparing text, such as alphabetic order, case sensitivity, and locale preferences.

Check Column Collation:

SELECT column_name, collation_name
FROM information_schema.columns
WHERE table_name = 'contacts';

Set Collation on Column/Table:

CREATE TABLE cities_de (
  name TEXT COLLATE "de_DE"
);

CREATE TABLE cities_en (
  name TEXT COLLATE "en_US"
);

List Available Collations:

SELECT * FROM pg_collation;

Encoding vs. Collation Summary

Feature Purpose Set At Affects
Encoding Byte representation of characters Database creation Input/output
Collation Sort and comparison rules Column/table ORDER BY, LIKE, =

Practical Examples

-- Incorrect sorting:
SELECT name FROM cities ORDER BY name;

-- Correct with collation:
SELECT name FROM cities ORDER BY name COLLATE "en_US";

Best Practices

  • Always use UTF8 encoding unless working with legacy systems.
  • Set collation explicitly if your application supports multiple languages/locales.
  • Avoid mixing collations across tables unless needed — it can cause query errors or performance issues.
  • Set default collation at database creation:
CREATE DATABASE mydb
  LC_COLLATE = 'en_US.UTF-8'
  LC_CTYPE = 'en_US.UTF-8'
  ENCODING = 'UTF8';