Krishna G. Lodha

Chapter Content

Proximity analysis is one of the most fundamental operations in GIS, answering the critical question "what's near what?" Whether you're analyzing service coverage, assessing environmental impacts, or planning emergency response, understanding distance relationships is essential. PostGIS provides powerful tools for proximity analysis through distance functions and buffer operations that enable sophisticated spatial analysis.

Understanding Distance-Based Analysis

Distance-based spatial analysis helps us understand spatial relationships by measuring and analyzing proximity between geographic features. This type of analysis is crucial for:

Service area planning and coverage analysis
Emergency response and accessibility studies
Environmental impact assessment
Market analysis and customer catchment areas
Transportation and logistics optimization
Risk assessment and safety planning

Function	Purpose	Performance	Best Use Case
ST_Distance	Calculate exact distance between geometries	Slower for large datasets	Precise measurements, analysis
ST_DWithin	Test if geometries are within distance	Fast with spatial indexes	Proximity filtering, selection
ST_Buffer	Create zones around geometries	Medium, depends on complexity	Service areas, impact zones
KNN (`<->`)	Find nearest neighbors efficiently	Very fast with indexes	Nearest facility, routing

Setting Up Sample Data

Let's create a realistic dataset representing hospitals, residential areas, and transportation infrastructure:

-- Create hospitals table
CREATE TABLE hospitals (
  id SERIAL PRIMARY KEY,
  name TEXT,
  hospital_type TEXT,
  bed_capacity INTEGER,
  emergency_services BOOLEAN,
  geom GEOMETRY(Point, 4326)
);

-- Create residential areas table
CREATE TABLE residential_areas (
  id SERIAL PRIMARY KEY,
  area_name TEXT,
  population INTEGER,
  income_level TEXT,
  geom GEOMETRY(POLYGON, 4326)
);

-- Create roads table
CREATE TABLE roads (
  id SERIAL PRIMARY KEY,
  name TEXT,
  road_type TEXT,
  speed_limit INTEGER,
  geom GEOMETRY(LINESTRING, 4326)
);

-- Insert sample hospitals (Delhi and Mumbai coordinates)
INSERT INTO hospitals (name, hospital_type, bed_capacity, emergency_services, geom) VALUES
('All India Institute of Medical Sciences', 'Government', 2500, true, ST_Point(77.2090, 28.5672)),
('Apollo Hospital Delhi', 'Private', 800, true, ST_Point(77.2290, 28.5420)),
('Fortis Hospital', 'Private', 600, true, ST_Point(77.2180, 28.5950)),
('Safdarjung Hospital', 'Government', 1500, true, ST_Point(77.1950, 28.5680)),
('Tata Memorial Hospital Mumbai', 'Specialized', 1200, true, ST_Point(72.8560, 19.0110)),
('Kokilaben Hospital Mumbai', 'Private', 750, true, ST_Point(72.8290, 19.1350)),
('King Edward Memorial Hospital', 'Government', 1800, true, ST_Point(72.8310, 18.9890));

-- Insert residential areas
INSERT INTO residential_areas (area_name, population, income_level, geom) VALUES
('Connaught Place', 25000, 'High', ST_GeomFromText('POLYGON((77.215 28.625, 77.225 28.625, 77.225 28.635, 77.215 28.635, 77.215 28.625))', 4326)),
('Karol Bagh', 45000, 'Medium', ST_GeomFromText('POLYGON((77.185 28.645, 77.200 28.645, 77.200 28.660, 77.185 28.660, 77.185 28.645))', 4326)),
('Lajpat Nagar', 35000, 'Medium', ST_GeomFromText('POLYGON((77.235 28.565, 77.250 28.565, 77.250 28.580, 77.235 28.580, 77.235 28.565))', 4326)),
('Dwarka', 60000, 'Medium', ST_GeomFromText('POLYGON((77.045 28.585, 77.070 28.585, 77.070 28.610, 77.045 28.610, 77.045 28.585))', 4326)),
('Bandra Mumbai', 40000, 'High', ST_GeomFromText('POLYGON((72.825 19.050, 72.840 19.050, 72.840 19.065, 72.825 19.065, 72.825 19.050))', 4326)),
('Andheri Mumbai', 55000, 'Medium', ST_GeomFromText('POLYGON((72.835 19.115, 72.855 19.115, 72.855 19.135, 72.835 19.135, 72.835 19.115))', 4326));

-- Insert major roads
INSERT INTO roads (name, road_type, speed_limit, geom) VALUES
('Ring Road Delhi', 'Highway', 80, ST_GeomFromText('LINESTRING(77.150 28.550, 77.200 28.580, 77.250 28.600, 77.280 28.640)', 4326)),
('NH-1 Delhi', 'National Highway', 100, ST_GeomFromText('LINESTRING(77.180 28.500, 77.210 28.580, 77.240 28.660)', 4326)),
('Western Express Highway Mumbai', 'Highway', 80, ST_GeomFromText('LINESTRING(72.820 18.980, 72.830 19.050, 72.840 19.120)', 4326)),
('Eastern Express Highway Mumbai', 'Highway', 80, ST_GeomFromText('LINESTRING(72.880 19.000, 72.870 19.070, 72.860 19.140)', 4326));

-- Create spatial indexes
CREATE INDEX idx_hospitals_geom ON hospitals USING GIST (geom);
CREATE INDEX idx_residential_geom ON residential_areas USING GIST (geom);
CREATE INDEX idx_roads_geom ON roads USING GIST (geom);

-- Create additional indexes for performance
CREATE INDEX idx_hospitals_type ON hospitals (hospital_type);
CREATE INDEX idx_hospitals_emergency ON hospitals (emergency_services);

Using ST_Distance for Precise Measurements

ST_Distance calculates the exact distance between geometries. While not optimized for large-scale filtering, it's essential for precise measurements and analysis.

Basic Distance Calculations

-- Calculate distances from residential areas to nearest hospitals
SELECT 
  r.area_name,
  r.population,
  h.name AS nearest_hospital,
  h.hospital_type,
  ROUND(ST_Distance(r.geom::geography, h.geom::geography) / 1000, 2) AS distance_km
FROM residential_areas r
CROSS JOIN LATERAL (
  SELECT name, hospital_type, geom
  FROM hospitals
  ORDER BY r.geom <-> geom
  LIMIT 1
) h
ORDER BY distance_km;

-- Distance analysis by hospital type
SELECT 
  r.area_name,
  r.population,
  r.income_level,
  h.hospital_type,
  h.name AS hospital_name,
  ROUND(ST_Distance(r.geom::geography, h.geom::geography) / 1000, 2) AS distance_km
FROM residential_areas r
CROSS JOIN hospitals h
ORDER BY r.area_name, distance_km;

Statistical Distance Analysis

-- Analyze healthcare accessibility by income level
WITH distance_analysis AS (
  SELECT 
    r.income_level,
    r.area_name,
    r.population,
    MIN(ST_Distance(r.geom::geography, h.geom::geography)) / 1000 AS nearest_hospital_km
  FROM residential_areas r
  CROSS JOIN hospitals h
  GROUP BY r.id, r.income_level, r.area_name, r.population
)
SELECT 
  income_level,
  COUNT(*) AS area_count,
  SUM(population) AS total_population,
  ROUND(AVG(nearest_hospital_km), 2) AS avg_distance_km,
  ROUND(MIN(nearest_hospital_km), 2) AS min_distance_km,
  ROUND(MAX(nearest_hospital_km), 2) AS max_distance_km,
  ROUND(STDDEV(nearest_hospital_km), 2) AS stddev_distance_km
FROM distance_analysis
GROUP BY income_level
ORDER BY avg_distance_km;

-- Population-weighted accessibility analysis
WITH weighted_distances AS (
  SELECT 
    r.area_name,
    r.population,
    r.income_level,
    MIN(ST_Distance(r.geom::geography, h.geom::geography)) / 1000 AS nearest_hospital_km
  FROM residential_areas r
  CROSS JOIN hospitals h
  GROUP BY r.id, r.area_name, r.population, r.income_level
)
SELECT 
  'Overall' AS category,
  SUM(population * nearest_hospital_km) / SUM(population) AS weighted_avg_distance_km,
  SUM(population) AS total_population
FROM weighted_distances

UNION ALL

SELECT 
  income_level AS category,
  SUM(population * nearest_hospital_km) / SUM(population) AS weighted_avg_distance_km,
  SUM(population) AS total_population
FROM weighted_distances
GROUP BY income_level
ORDER BY weighted_avg_distance_km;

Using ST_DWithin for Efficient Proximity Filtering

ST_DWithin is optimized for spatial filtering, making it ideal for finding features within a specified distance.

Basic Proximity Queries

-- Find residential areas within 5km of hospitals
SELECT 
  r.area_name,
  r.population,
  h.name AS nearby_hospital,
  h.hospital_type,
  ROUND(ST_Distance(r.geom::geography, h.geom::geography) / 1000, 2) AS distance_km
FROM residential_areas r
JOIN hospitals h ON ST_DWithin(r.geom::geography, h.geom::geography, 5000)
ORDER BY r.area_name, distance_km;

-- Emergency services coverage analysis
SELECT 
  r.area_name,
  r.population,
  COUNT(h.id) AS emergency_hospitals_within_10km,
  array_agg(h.name ORDER BY ST_Distance(r.geom, h.geom)) AS hospital_names
FROM residential_areas r
LEFT JOIN hospitals h ON ST_DWithin(r.geom::geography, h.geom::geography, 10000)
  AND h.emergency_services = true
GROUP BY r.id, r.area_name, r.population
ORDER BY emergency_hospitals_within_10km DESC;

Multi-Distance Analysis

-- Healthcare accessibility at different distance thresholds
WITH accessibility_analysis AS (
  SELECT 
    r.area_name,
    r.population,
    r.income_level,
    COUNT(CASE WHEN ST_DWithin(r.geom::geography, h.geom::geography, 2000) THEN 1 END) AS hospitals_2km,
    COUNT(CASE WHEN ST_DWithin(r.geom::geography, h.geom::geography, 5000) THEN 1 END) AS hospitals_5km,
    COUNT(CASE WHEN ST_DWithin(r.geom::geography, h.geom::geography, 10000) THEN 1 END) AS hospitals_10km,
    COUNT(CASE WHEN ST_DWithin(r.geom::geography, h.geom::geography, 15000) THEN 1 END) AS hospitals_15km
  FROM residential_areas r
  CROSS JOIN hospitals h
  GROUP BY r.id, r.area_name, r.population, r.income_level
)
SELECT 
  area_name,
  population,
  income_level,
  hospitals_2km,
  hospitals_5km,
  hospitals_10km,
  hospitals_15km,
  CASE 
    WHEN hospitals_2km > 0 THEN 'Excellent'
    WHEN hospitals_5km > 0 THEN 'Good'
    WHEN hospitals_10km > 0 THEN 'Fair'
    WHEN hospitals_15km > 0 THEN 'Poor'
    ELSE 'Very Poor'
  END AS accessibility_rating
FROM accessibility_analysis
ORDER BY hospitals_2km DESC, hospitals_5km DESC;

Creating Buffer Zones with ST_Buffer

ST_Buffer creates polygonal zones around geometries, perfect for visualizing service areas and impact zones.

Basic Buffer Creation

-- Create service areas around hospitals
SELECT 
  h.name,
  h.hospital_type,
  h.bed_capacity,
  ST_Buffer(h.geom::geography, 2000)::geometry AS service_area_2km,
  ST_Buffer(h.geom::geography, 5000)::geometry AS service_area_5km,
  ST_Buffer(h.geom::geography, 10000)::geometry AS service_area_10km
FROM hospitals h
ORDER BY h.bed_capacity DESC;

-- Calculate buffer areas and population coverage
WITH hospital_buffers AS (
  SELECT 
    h.id,
    h.name,
    h.hospital_type,
    h.bed_capacity,
    ST_Buffer(h.geom::geography, 5000)::geometry AS buffer_5km
  FROM hospitals h
)
SELECT 
  hb.name,
  hb.hospital_type,
  hb.bed_capacity,
  ROUND(ST_Area(hb.buffer_5km::geography) / 1000000, 1) AS service_area_km2,
  COUNT(r.id) AS residential_areas_served,
  SUM(r.population) AS population_served,
  ROUND(SUM(r.population)::NUMERIC / hb.bed_capacity, 0) AS people_per_bed
FROM hospital_buffers hb
LEFT JOIN residential_areas r ON ST_Intersects(r.geom, hb.buffer_5km)
GROUP BY hb.id, hb.name, hb.hospital_type, hb.bed_capacity, hb.buffer_5km
ORDER BY population_served DESC;

Multi-Ring Buffer Analysis

-- Create concentric service zones
WITH hospital_rings AS (
  SELECT 
    h.name,
    h.hospital_type,
    -- Inner ring (0-2km)
    ST_Buffer(h.geom::geography, 2000)::geometry AS inner_ring,
    -- Middle ring (2-5km)
    ST_Difference(
      ST_Buffer(h.geom::geography, 5000)::geometry,
      ST_Buffer(h.geom::geography, 2000)::geometry
    ) AS middle_ring,
    -- Outer ring (5-10km)
    ST_Difference(
      ST_Buffer(h.geom::geography, 10000)::geometry,
      ST_Buffer(h.geom::geography, 5000)::geometry
    ) AS outer_ring
  FROM hospitals h
  WHERE h.emergency_services = true
)
SELECT 
  hr.name,
  hr.hospital_type,
  'Inner (0-2km)' AS service_zone,
  COUNT(r.id) AS residential_areas,
  COALESCE(SUM(r.population), 0) AS population_served
FROM hospital_rings hr
LEFT JOIN residential_areas r ON ST_Intersects(r.geom, hr.inner_ring)
GROUP BY hr.name, hr.hospital_type

UNION ALL

SELECT 
  hr.name,
  hr.hospital_type,
  'Middle (2-5km)' AS service_zone,
  COUNT(r.id) AS residential_areas,
  COALESCE(SUM(r.population), 0) AS population_served
FROM hospital_rings hr
LEFT JOIN residential_areas r ON ST_Intersects(r.geom, hr.middle_ring)
GROUP BY hr.name, hr.hospital_type

UNION ALL

SELECT 
  hr.name,
  hr.hospital_type,
  'Outer (5-10km)' AS service_zone,
  COUNT(r.id) AS residential_areas,
  COALESCE(SUM(r.population), 0) AS population_served
FROM hospital_rings hr
LEFT JOIN residential_areas r ON ST_Intersects(r.geom, hr.outer_ring)
GROUP BY hr.name, hr.hospital_type

ORDER BY name, service_zone;

Advanced Proximity Analysis

Transportation-Based Accessibility

-- Analyze accessibility considering road proximity
WITH road_accessibility AS (
  SELECT 
    r.area_name,
    r.population,
    MIN(ST_Distance(r.geom::geography, rd.geom::geography)) / 1000 AS nearest_road_km,
    MIN(ST_Distance(r.geom::geography, h.geom::geography)) / 1000 AS nearest_hospital_km
  FROM residential_areas r
  CROSS JOIN roads rd
  CROSS JOIN hospitals h
  GROUP BY r.id, r.area_name, r.population
)
SELECT 
  area_name,
  population,
  ROUND(nearest_road_km, 2) AS road_access_km,
  ROUND(nearest_hospital_km, 2) AS hospital_access_km,
  CASE 
    WHEN nearest_road_km < 0.5 AND nearest_hospital_km < 5 THEN 'Excellent Access'
    WHEN nearest_road_km < 1.0 AND nearest_hospital_km < 10 THEN 'Good Access'
    WHEN nearest_road_km < 2.0 AND nearest_hospital_km < 15 THEN 'Fair Access'
    ELSE 'Poor Access'
  END AS overall_accessibility
FROM road_accessibility
ORDER BY nearest_hospital_km;

Competitive Analysis

-- Hospital market analysis - overlapping service areas
WITH hospital_competition AS (
  SELECT 
    h1.name AS hospital1,
    h2.name AS hospital2,
    h1.hospital_type AS type1,
    h2.hospital_type AS type2,
    ST_Distance(h1.geom::geography, h2.geom::geography) / 1000 AS distance_km,
    ST_Area(ST_Intersection(
      ST_Buffer(h1.geom::geography, 5000)::geometry,
      ST_Buffer(h2.geom::geography, 5000)::geometry
    )::geography) / 1000000 AS overlap_area_km2
  FROM hospitals h1
  JOIN hospitals h2 ON h1.id < h2.id
  WHERE ST_DWithin(h1.geom::geography, h2.geom::geography, 10000)
)
SELECT 
  hospital1,
  hospital2,
  type1,
  type2,
  ROUND(distance_km, 2) AS distance_km,
  ROUND(overlap_area_km2, 2) AS service_overlap_km2,
  CASE 
    WHEN overlap_area_km2 > 50 THEN 'High Competition'
    WHEN overlap_area_km2 > 20 THEN 'Medium Competition'
    WHEN overlap_area_km2 > 0 THEN 'Low Competition'
    ELSE 'No Competition'
  END AS competition_level
FROM hospital_competition
WHERE overlap_area_km2 > 0
ORDER BY overlap_area_km2 DESC;

Combining Distance Functions

-- Comprehensive proximity analysis combining multiple approaches
WITH proximity_comprehensive AS (
  SELECT 
    r.area_name,
    r.population,
    r.income_level,
    -- Nearest hospital analysis
    (SELECT h.name 
     FROM hospitals h 
     ORDER BY r.geom <-> h.geom 
     LIMIT 1) AS nearest_hospital,
    (SELECT ROUND(ST_Distance(r.geom::geography, h.geom::geography) / 1000, 2)
     FROM hospitals h 
     ORDER BY r.geom <-> h.geom 
     LIMIT 1) AS nearest_hospital_km,
    -- Emergency services analysis
    (SELECT COUNT(*) 
     FROM hospitals h 
     WHERE ST_DWithin(r.geom::geography, h.geom::geography, 10000)
       AND h.emergency_services = true) AS emergency_hospitals_10km,
    -- Hospital capacity within range
    (SELECT SUM(h.bed_capacity)
     FROM hospitals h 
     WHERE ST_DWithin(r.geom::geography, h.geom::geography, 15000)) AS total_beds_15km,
    -- Road accessibility
    (SELECT MIN(ST_Distance(r.geom::geography, rd.geom::geography)) / 1000
     FROM roads rd) AS nearest_road_km
  FROM residential_areas r
)
SELECT 
  area_name,
  population,
  income_level,
  nearest_hospital,
  nearest_hospital_km,
  emergency_hospitals_10km,
  total_beds_15km,
  ROUND(nearest_road_km, 2) AS road_access_km,
  ROUND(population::NUMERIC / NULLIF(total_beds_15km, 0), 0) AS people_per_bed_ratio,
  CASE 
    WHEN nearest_hospital_km < 5 AND emergency_hospitals_10km >= 2 THEN 'Excellent'
    WHEN nearest_hospital_km < 10 AND emergency_hospitals_10km >= 1 THEN 'Good'
    WHEN nearest_hospital_km < 15 THEN 'Fair'
    ELSE 'Poor'
  END AS healthcare_accessibility_score
FROM proximity_comprehensive
ORDER BY healthcare_accessibility_score, nearest_hospital_km;

Performance Optimization

Technique	Description	Performance Impact	When to Use
Spatial Indexes	GIST indexes on geometry columns	Dramatic improvement	Always for proximity queries
KNN Operator (`<->`)	Efficient nearest neighbor search	Very fast	Finding closest features
Geography vs Geometry	Use geography for accurate distances	Slower but accurate	Global or large-scale analysis
Bounding Box Filters	Pre-filter with && operator	Significant improvement	Large datasets
LATERAL Joins	Efficient for nearest neighbor queries	Better than CROSS JOIN	Finding k-nearest for each feature

Performance Optimization Examples

-- Optimized nearest neighbor query
EXPLAIN ANALYZE
SELECT r.area_name, h.name, ST_Distance(r.geom::geography, h.geom::geography) / 1000 AS distance_km
FROM residential_areas r
CROSS JOIN LATERAL (
  SELECT name, geom
  FROM hospitals
  ORDER BY r.geom <-> geom
  LIMIT 1
) h;

-- Compare with less efficient approach
EXPLAIN ANALYZE
SELECT DISTINCT ON (r.area_name) 
  r.area_name, h.name, ST_Distance(r.geom::geography, h.geom::geography) / 1000 AS distance_km
FROM residential_areas r
CROSS JOIN hospitals h
ORDER BY r.area_name, r.geom <-> h.geom;

Real-World Applications

Emergency Response Planning

-- Emergency response coverage analysis
WITH emergency_coverage AS (
  SELECT 
    r.area_name,
    r.population,
    COUNT(h.id) AS emergency_hospitals_5km,
    MIN(ST_Distance(r.geom::geography, h.geom::geography)) / 1000 AS nearest_emergency_km,
    SUM(h.bed_capacity) AS total_emergency_beds
  FROM residential_areas r
  LEFT JOIN hospitals h ON ST_DWithin(r.geom::geography, h.geom::geography, 5000)
    AND h.emergency_services = true
  GROUP BY r.id, r.area_name, r.population
)
SELECT 
  area_name,
  population,
  emergency_hospitals_5km,
  ROUND(nearest_emergency_km, 2) AS response_distance_km,
  total_emergency_beds,
  CASE 
    WHEN emergency_hospitals_5km = 0 THEN 'Critical Gap'
    WHEN emergency_hospitals_5km = 1 AND nearest_emergency_km > 3 THEN 'Service Gap'
    WHEN emergency_hospitals_5km >= 2 THEN 'Well Served'
    ELSE 'Adequate'
  END AS emergency_coverage_status
FROM emergency_coverage
ORDER BY 
  CASE emergency_coverage_status
    WHEN 'Critical Gap' THEN 1
    WHEN 'Service Gap' THEN 2
    WHEN 'Adequate' THEN 3
    ELSE 4
  END,
  population DESC;

Public Health Planning

-- Healthcare equity analysis
WITH equity_analysis AS (
  SELECT 
    r.income_level,
    r.area_name,
    r.population,
    MIN(ST_Distance(r.geom::geography, h.geom::geography)) / 1000 AS nearest_hospital_km,
    COUNT(CASE WHEN ST_DWithin(r.geom::geography, h.geom::geography, 10000) THEN 1 END) AS hospitals_10km
  FROM residential_areas r
  CROSS JOIN hospitals h
  GROUP BY r.id, r.income_level, r.area_name, r.population
)
SELECT 
  income_level,
  COUNT(*) AS areas_count,
  SUM(population) AS total_population,
  ROUND(AVG(nearest_hospital_km), 2) AS avg_distance_km,
  ROUND(AVG(hospitals_10km), 1) AS avg_hospitals_nearby,
  ROUND(SUM(population * nearest_hospital_km) / SUM(population), 2) AS weighted_avg_distance
FROM equity_analysis
GROUP BY income_level
ORDER BY weighted_avg_distance DESC;

Best Practices

Use appropriate distance functions: ST_DWithin for filtering, ST_Distance for precise measurements
Choose correct coordinate systems: Use geography for accurate global distances
Create spatial indexes: Essential for good performance with proximity queries
Use KNN operator: Most efficient for nearest neighbor searches
Consider real-world constraints: Account for transportation networks and barriers
Validate distance units: Ensure consistency between meters, kilometers, and degrees
Test with known distances: Verify results with measurable real-world examples

Common Pitfalls and Solutions

Pitfall	Problem	Solution
Incorrect distance units	Mixing degrees and meters	Use geography type for metric distances
Poor performance	No spatial indexes	Create GIST indexes on geometry columns
Inaccurate results	Using planar calculations globally	Use geography type or appropriate projections
Inefficient queries	Using ST_Distance for filtering	Use ST_DWithin for proximity filtering
Complex buffer overlaps	Double-counting in analysis	Use ST_Union to merge overlapping buffers

Proximity and Distance Analysis

The Ultimate PostGIS course

Chapter Info