Text Data Types

Postgresql datatypes

The Ultimate PostGIS course

Get full access to this course and all its content.

₹3399.00 ₹1299.00
Buy Course Now
Chapter Info
Course The Ultimate PostGIS course
Module Postgresql datatypes
Chapter Text Data Types

Chapter Content

Fixed-Length vs. Variable-Length

PostgreSQL offers three core text types:

Type Description Maximum Length
CHAR(n) Fixed-length, blank-padded 1 to 10,485,76
VARCHAR(n) Variable-length with limit 1 to 10,485,76
TEXT Variable unlimited length Unlimited

Use Cases:

  • CHAR(n): ISO country codes (CHAR(2))
  • VARCHAR(n): Email addresses with known max length
  • TEXT: Descriptions, comments, logs
CREATE TABLE addresses (
  id SERIAL PRIMARY KEY,
  country_code CHAR(2),
  email VARCHAR(255),
  notes TEXT
);

INSERT INTO addresses (country_code, email, notes)
VALUES
  ('US', 'alice@example.com', 'First contact.'),
  ('IN', 'bob@example.in', 'Follow-up pending.');

Case-Insensitive Text with CITEXT

The citext extension provides case-insensitive text comparisons:

CREATE EXTENSION IF NOT EXISTS citext;

CREATE TABLE users (
  username CITEXT PRIMARY KEY,
  full_name TEXT
);

INSERT INTO users (username, full_name)
VALUES ('Alice', 'Alice Smith');

-- This matches regardless of case:
SELECT * FROM users WHERE username = 'alice';

Benefit: Simplifies comparisons and uniqueness.

Domain Types for Validation

Wrap text in a domain with a regex check, e.g., email format:

CREATE DOMAIN email_addr AS VARCHAR(255)
  CHECK (VALUE ~* '^[A-Z0-9._%+-]+@[A-Z0-9.-]+\.[A-Z]{2,}$');

CREATE TABLE contacts (
  id SERIAL,
  email email_addr
);

INSERT INTO contacts (email) VALUES ('valid@example.com');
-- Invalid addresses will be rejected.

Altering Column Types

Convert between types safely:

-- TEXT to VARCHAR(100)
ALTER TABLE addresses
  ALTER COLUMN notes TYPE VARCHAR(100)
  USING LEFT(notes,100);

Advice: Use USING to trim or cast values during conversion.

Full-Text Search Basics

PostgreSQL supports full-text search via tsvector and tsquery:

CREATE TABLE articles (
  id SERIAL,
  title TEXT,
  body TEXT
);

INSERT INTO articles (title, body)
VALUES ('Postgres Tips', 'Learn about text types and full-text search.');

-- Create a tsvector column for search:
ALTER TABLE articles ADD COLUMN document tsvector;
UPDATE articles SET document = to_tsvector(body);

-- Search for keyword 'text':
SELECT id, title
FROM articles
WHERE document @@ to_tsquery('text');

Indexing Text Columns

  • B-tree on VARCHAR/TEXT for equality
  • GIN on tsvector for full-text search
-- Equality index:
CREATE INDEX idx_email ON contacts (email);

-- Full-text index:
CREATE INDEX idx_article_search ON articles USING GIN(document);

Performance and Storage

  • CHAR(n) always stores n bytes (plus overhead)
  • VARCHAR(n) and TEXT store actual length + 1–4 bytes
  • CITEXT adds overhead (~5–10 bytes per value)

Use the smallest type that satisfies your needs to save space.

Character Encoding & Collation

What is Character Encoding?

Character encoding determines how characters are stored and interpreted in bytes. PostgreSQL supports many encodings, with UTF8 (Unicode) being the most common.

Why Encoding Matters:

  • Without proper encoding, international text like Hindi, Japanese, or emoji can break.
  • Text input/output needs consistent encoding across database, client, and application layers.

Check Current Encoding:

SHOW SERVER_ENCODING;

Set Encoding (only at database creation):

CREATE DATABASE international_db WITH ENCODING = 'UTF8';

What is Collation?

Collation defines the rules for sorting and comparing text, such as alphabetic order, case sensitivity, and locale preferences.

Check Column Collation:

SELECT column_name, collation_name
FROM information_schema.columns
WHERE table_name = 'contacts';

Set Collation on Column/Table:

CREATE TABLE cities_de (
  name TEXT COLLATE "de_DE"
);

CREATE TABLE cities_en (
  name TEXT COLLATE "en_US"
);

List Available Collations:

SELECT * FROM pg_collation;

Encoding vs. Collation Summary

Feature Purpose Set At Affects
Encoding Byte representation of characters Database creation Input/output
Collation Sort and comparison rules Column/table ORDER BY, LIKE, =

Practical Examples

-- Incorrect sorting:
SELECT name FROM cities ORDER BY name;

-- Correct with collation:
SELECT name FROM cities ORDER BY name COLLATE "en_US";

Best Practices

  • Always use UTF8 encoding unless working with legacy systems.
  • Set collation explicitly if your application supports multiple languages/locales.
  • Avoid mixing collations across tables unless needed — it can cause query errors or performance issues.
  • Set default collation at database creation:
CREATE DATABASE mydb
  LC_COLLATE = 'en_US.UTF-8'
  LC_CTYPE = 'en_US.UTF-8'
  ENCODING = 'UTF8';