Proximity analysis is one of the most fundamental operations in GIS, answering the critical question "what's near what?" Whether you're analyzing service coverage, assessing environmental impacts, or planning emergency response, understanding distance relationships is essential. PostGIS provides powerful tools for proximity analysis through distance functions and buffer operations that enable sophisticated spatial analysis.
Understanding Distance-Based Analysis
Distance-based spatial analysis helps us understand spatial relationships by measuring and analyzing proximity between geographic features. This type of analysis is crucial for:
- Service area planning and coverage analysis
- Emergency response and accessibility studies
- Environmental impact assessment
- Market analysis and customer catchment areas
- Transportation and logistics optimization
- Risk assessment and safety planning
Function | Purpose | Performance | Best Use Case |
---|---|---|---|
ST_Distance | Calculate exact distance between geometries | Slower for large datasets | Precise measurements, analysis |
ST_DWithin | Test if geometries are within distance | Fast with spatial indexes | Proximity filtering, selection |
ST_Buffer | Create zones around geometries | Medium, depends on complexity | Service areas, impact zones |
KNN (<-> ) |
Find nearest neighbors efficiently | Very fast with indexes | Nearest facility, routing |
Setting Up Sample Data
Let's create a realistic dataset representing hospitals, residential areas, and transportation infrastructure:
-- Create hospitals table
CREATE TABLE hospitals (
id SERIAL PRIMARY KEY,
name TEXT,
hospital_type TEXT,
bed_capacity INTEGER,
emergency_services BOOLEAN,
geom GEOMETRY(Point, 4326)
);
-- Create residential areas table
CREATE TABLE residential_areas (
id SERIAL PRIMARY KEY,
area_name TEXT,
population INTEGER,
income_level TEXT,
geom GEOMETRY(POLYGON, 4326)
);
-- Create roads table
CREATE TABLE roads (
id SERIAL PRIMARY KEY,
name TEXT,
road_type TEXT,
speed_limit INTEGER,
geom GEOMETRY(LINESTRING, 4326)
);
-- Insert sample hospitals (Delhi and Mumbai coordinates)
INSERT INTO hospitals (name, hospital_type, bed_capacity, emergency_services, geom) VALUES
('All India Institute of Medical Sciences', 'Government', 2500, true, ST_Point(77.2090, 28.5672)),
('Apollo Hospital Delhi', 'Private', 800, true, ST_Point(77.2290, 28.5420)),
('Fortis Hospital', 'Private', 600, true, ST_Point(77.2180, 28.5950)),
('Safdarjung Hospital', 'Government', 1500, true, ST_Point(77.1950, 28.5680)),
('Tata Memorial Hospital Mumbai', 'Specialized', 1200, true, ST_Point(72.8560, 19.0110)),
('Kokilaben Hospital Mumbai', 'Private', 750, true, ST_Point(72.8290, 19.1350)),
('King Edward Memorial Hospital', 'Government', 1800, true, ST_Point(72.8310, 18.9890));
-- Insert residential areas
INSERT INTO residential_areas (area_name, population, income_level, geom) VALUES
('Connaught Place', 25000, 'High', ST_GeomFromText('POLYGON((77.215 28.625, 77.225 28.625, 77.225 28.635, 77.215 28.635, 77.215 28.625))', 4326)),
('Karol Bagh', 45000, 'Medium', ST_GeomFromText('POLYGON((77.185 28.645, 77.200 28.645, 77.200 28.660, 77.185 28.660, 77.185 28.645))', 4326)),
('Lajpat Nagar', 35000, 'Medium', ST_GeomFromText('POLYGON((77.235 28.565, 77.250 28.565, 77.250 28.580, 77.235 28.580, 77.235 28.565))', 4326)),
('Dwarka', 60000, 'Medium', ST_GeomFromText('POLYGON((77.045 28.585, 77.070 28.585, 77.070 28.610, 77.045 28.610, 77.045 28.585))', 4326)),
('Bandra Mumbai', 40000, 'High', ST_GeomFromText('POLYGON((72.825 19.050, 72.840 19.050, 72.840 19.065, 72.825 19.065, 72.825 19.050))', 4326)),
('Andheri Mumbai', 55000, 'Medium', ST_GeomFromText('POLYGON((72.835 19.115, 72.855 19.115, 72.855 19.135, 72.835 19.135, 72.835 19.115))', 4326));
-- Insert major roads
INSERT INTO roads (name, road_type, speed_limit, geom) VALUES
('Ring Road Delhi', 'Highway', 80, ST_GeomFromText('LINESTRING(77.150 28.550, 77.200 28.580, 77.250 28.600, 77.280 28.640)', 4326)),
('NH-1 Delhi', 'National Highway', 100, ST_GeomFromText('LINESTRING(77.180 28.500, 77.210 28.580, 77.240 28.660)', 4326)),
('Western Express Highway Mumbai', 'Highway', 80, ST_GeomFromText('LINESTRING(72.820 18.980, 72.830 19.050, 72.840 19.120)', 4326)),
('Eastern Express Highway Mumbai', 'Highway', 80, ST_GeomFromText('LINESTRING(72.880 19.000, 72.870 19.070, 72.860 19.140)', 4326));
-- Create spatial indexes
CREATE INDEX idx_hospitals_geom ON hospitals USING GIST (geom);
CREATE INDEX idx_residential_geom ON residential_areas USING GIST (geom);
CREATE INDEX idx_roads_geom ON roads USING GIST (geom);
-- Create additional indexes for performance
CREATE INDEX idx_hospitals_type ON hospitals (hospital_type);
CREATE INDEX idx_hospitals_emergency ON hospitals (emergency_services);
Using ST_Distance for Precise Measurements
ST_Distance
calculates the exact distance between geometries. While not optimized for large-scale filtering, it's essential for precise measurements and analysis.
Basic Distance Calculations
-- Calculate distances from residential areas to nearest hospitals
SELECT
r.area_name,
r.population,
h.name AS nearest_hospital,
h.hospital_type,
ROUND(ST_Distance(r.geom::geography, h.geom::geography) / 1000, 2) AS distance_km
FROM residential_areas r
CROSS JOIN LATERAL (
SELECT name, hospital_type, geom
FROM hospitals
ORDER BY r.geom <-> geom
LIMIT 1
) h
ORDER BY distance_km;
-- Distance analysis by hospital type
SELECT
r.area_name,
r.population,
r.income_level,
h.hospital_type,
h.name AS hospital_name,
ROUND(ST_Distance(r.geom::geography, h.geom::geography) / 1000, 2) AS distance_km
FROM residential_areas r
CROSS JOIN hospitals h
ORDER BY r.area_name, distance_km;
Statistical Distance Analysis
-- Analyze healthcare accessibility by income level
WITH distance_analysis AS (
SELECT
r.income_level,
r.area_name,
r.population,
MIN(ST_Distance(r.geom::geography, h.geom::geography)) / 1000 AS nearest_hospital_km
FROM residential_areas r
CROSS JOIN hospitals h
GROUP BY r.id, r.income_level, r.area_name, r.population
)
SELECT
income_level,
COUNT(*) AS area_count,
SUM(population) AS total_population,
ROUND(AVG(nearest_hospital_km), 2) AS avg_distance_km,
ROUND(MIN(nearest_hospital_km), 2) AS min_distance_km,
ROUND(MAX(nearest_hospital_km), 2) AS max_distance_km,
ROUND(STDDEV(nearest_hospital_km), 2) AS stddev_distance_km
FROM distance_analysis
GROUP BY income_level
ORDER BY avg_distance_km;
-- Population-weighted accessibility analysis
WITH weighted_distances AS (
SELECT
r.area_name,
r.population,
r.income_level,
MIN(ST_Distance(r.geom::geography, h.geom::geography)) / 1000 AS nearest_hospital_km
FROM residential_areas r
CROSS JOIN hospitals h
GROUP BY r.id, r.area_name, r.population, r.income_level
)
SELECT
'Overall' AS category,
SUM(population * nearest_hospital_km) / SUM(population) AS weighted_avg_distance_km,
SUM(population) AS total_population
FROM weighted_distances
UNION ALL
SELECT
income_level AS category,
SUM(population * nearest_hospital_km) / SUM(population) AS weighted_avg_distance_km,
SUM(population) AS total_population
FROM weighted_distances
GROUP BY income_level
ORDER BY weighted_avg_distance_km;
Using ST_DWithin for Efficient Proximity Filtering
ST_DWithin
is optimized for spatial filtering, making it ideal for finding features within a specified distance.
Basic Proximity Queries
-- Find residential areas within 5km of hospitals
SELECT
r.area_name,
r.population,
h.name AS nearby_hospital,
h.hospital_type,
ROUND(ST_Distance(r.geom::geography, h.geom::geography) / 1000, 2) AS distance_km
FROM residential_areas r
JOIN hospitals h ON ST_DWithin(r.geom::geography, h.geom::geography, 5000)
ORDER BY r.area_name, distance_km;
-- Emergency services coverage analysis
SELECT
r.area_name,
r.population,
COUNT(h.id) AS emergency_hospitals_within_10km,
array_agg(h.name ORDER BY ST_Distance(r.geom, h.geom)) AS hospital_names
FROM residential_areas r
LEFT JOIN hospitals h ON ST_DWithin(r.geom::geography, h.geom::geography, 10000)
AND h.emergency_services = true
GROUP BY r.id, r.area_name, r.population
ORDER BY emergency_hospitals_within_10km DESC;
Multi-Distance Analysis
-- Healthcare accessibility at different distance thresholds
WITH accessibility_analysis AS (
SELECT
r.area_name,
r.population,
r.income_level,
COUNT(CASE WHEN ST_DWithin(r.geom::geography, h.geom::geography, 2000) THEN 1 END) AS hospitals_2km,
COUNT(CASE WHEN ST_DWithin(r.geom::geography, h.geom::geography, 5000) THEN 1 END) AS hospitals_5km,
COUNT(CASE WHEN ST_DWithin(r.geom::geography, h.geom::geography, 10000) THEN 1 END) AS hospitals_10km,
COUNT(CASE WHEN ST_DWithin(r.geom::geography, h.geom::geography, 15000) THEN 1 END) AS hospitals_15km
FROM residential_areas r
CROSS JOIN hospitals h
GROUP BY r.id, r.area_name, r.population, r.income_level
)
SELECT
area_name,
population,
income_level,
hospitals_2km,
hospitals_5km,
hospitals_10km,
hospitals_15km,
CASE
WHEN hospitals_2km > 0 THEN 'Excellent'
WHEN hospitals_5km > 0 THEN 'Good'
WHEN hospitals_10km > 0 THEN 'Fair'
WHEN hospitals_15km > 0 THEN 'Poor'
ELSE 'Very Poor'
END AS accessibility_rating
FROM accessibility_analysis
ORDER BY hospitals_2km DESC, hospitals_5km DESC;
Creating Buffer Zones with ST_Buffer
ST_Buffer
creates polygonal zones around geometries, perfect for visualizing service areas and impact zones.
Basic Buffer Creation
-- Create service areas around hospitals
SELECT
h.name,
h.hospital_type,
h.bed_capacity,
ST_Buffer(h.geom::geography, 2000)::geometry AS service_area_2km,
ST_Buffer(h.geom::geography, 5000)::geometry AS service_area_5km,
ST_Buffer(h.geom::geography, 10000)::geometry AS service_area_10km
FROM hospitals h
ORDER BY h.bed_capacity DESC;
-- Calculate buffer areas and population coverage
WITH hospital_buffers AS (
SELECT
h.id,
h.name,
h.hospital_type,
h.bed_capacity,
ST_Buffer(h.geom::geography, 5000)::geometry AS buffer_5km
FROM hospitals h
)
SELECT
hb.name,
hb.hospital_type,
hb.bed_capacity,
ROUND(ST_Area(hb.buffer_5km::geography) / 1000000, 1) AS service_area_km2,
COUNT(r.id) AS residential_areas_served,
SUM(r.population) AS population_served,
ROUND(SUM(r.population)::NUMERIC / hb.bed_capacity, 0) AS people_per_bed
FROM hospital_buffers hb
LEFT JOIN residential_areas r ON ST_Intersects(r.geom, hb.buffer_5km)
GROUP BY hb.id, hb.name, hb.hospital_type, hb.bed_capacity, hb.buffer_5km
ORDER BY population_served DESC;
Multi-Ring Buffer Analysis
-- Create concentric service zones
WITH hospital_rings AS (
SELECT
h.name,
h.hospital_type,
-- Inner ring (0-2km)
ST_Buffer(h.geom::geography, 2000)::geometry AS inner_ring,
-- Middle ring (2-5km)
ST_Difference(
ST_Buffer(h.geom::geography, 5000)::geometry,
ST_Buffer(h.geom::geography, 2000)::geometry
) AS middle_ring,
-- Outer ring (5-10km)
ST_Difference(
ST_Buffer(h.geom::geography, 10000)::geometry,
ST_Buffer(h.geom::geography, 5000)::geometry
) AS outer_ring
FROM hospitals h
WHERE h.emergency_services = true
)
SELECT
hr.name,
hr.hospital_type,
'Inner (0-2km)' AS service_zone,
COUNT(r.id) AS residential_areas,
COALESCE(SUM(r.population), 0) AS population_served
FROM hospital_rings hr
LEFT JOIN residential_areas r ON ST_Intersects(r.geom, hr.inner_ring)
GROUP BY hr.name, hr.hospital_type
UNION ALL
SELECT
hr.name,
hr.hospital_type,
'Middle (2-5km)' AS service_zone,
COUNT(r.id) AS residential_areas,
COALESCE(SUM(r.population), 0) AS population_served
FROM hospital_rings hr
LEFT JOIN residential_areas r ON ST_Intersects(r.geom, hr.middle_ring)
GROUP BY hr.name, hr.hospital_type
UNION ALL
SELECT
hr.name,
hr.hospital_type,
'Outer (5-10km)' AS service_zone,
COUNT(r.id) AS residential_areas,
COALESCE(SUM(r.population), 0) AS population_served
FROM hospital_rings hr
LEFT JOIN residential_areas r ON ST_Intersects(r.geom, hr.outer_ring)
GROUP BY hr.name, hr.hospital_type
ORDER BY name, service_zone;
Advanced Proximity Analysis
Transportation-Based Accessibility
-- Analyze accessibility considering road proximity
WITH road_accessibility AS (
SELECT
r.area_name,
r.population,
MIN(ST_Distance(r.geom::geography, rd.geom::geography)) / 1000 AS nearest_road_km,
MIN(ST_Distance(r.geom::geography, h.geom::geography)) / 1000 AS nearest_hospital_km
FROM residential_areas r
CROSS JOIN roads rd
CROSS JOIN hospitals h
GROUP BY r.id, r.area_name, r.population
)
SELECT
area_name,
population,
ROUND(nearest_road_km, 2) AS road_access_km,
ROUND(nearest_hospital_km, 2) AS hospital_access_km,
CASE
WHEN nearest_road_km < 0.5 AND nearest_hospital_km < 5 THEN 'Excellent Access'
WHEN nearest_road_km < 1.0 AND nearest_hospital_km < 10 THEN 'Good Access'
WHEN nearest_road_km < 2.0 AND nearest_hospital_km < 15 THEN 'Fair Access'
ELSE 'Poor Access'
END AS overall_accessibility
FROM road_accessibility
ORDER BY nearest_hospital_km;
Competitive Analysis
-- Hospital market analysis - overlapping service areas
WITH hospital_competition AS (
SELECT
h1.name AS hospital1,
h2.name AS hospital2,
h1.hospital_type AS type1,
h2.hospital_type AS type2,
ST_Distance(h1.geom::geography, h2.geom::geography) / 1000 AS distance_km,
ST_Area(ST_Intersection(
ST_Buffer(h1.geom::geography, 5000)::geometry,
ST_Buffer(h2.geom::geography, 5000)::geometry
)::geography) / 1000000 AS overlap_area_km2
FROM hospitals h1
JOIN hospitals h2 ON h1.id < h2.id
WHERE ST_DWithin(h1.geom::geography, h2.geom::geography, 10000)
)
SELECT
hospital1,
hospital2,
type1,
type2,
ROUND(distance_km, 2) AS distance_km,
ROUND(overlap_area_km2, 2) AS service_overlap_km2,
CASE
WHEN overlap_area_km2 > 50 THEN 'High Competition'
WHEN overlap_area_km2 > 20 THEN 'Medium Competition'
WHEN overlap_area_km2 > 0 THEN 'Low Competition'
ELSE 'No Competition'
END AS competition_level
FROM hospital_competition
WHERE overlap_area_km2 > 0
ORDER BY overlap_area_km2 DESC;
Combining Distance Functions
-- Comprehensive proximity analysis combining multiple approaches
WITH proximity_comprehensive AS (
SELECT
r.area_name,
r.population,
r.income_level,
-- Nearest hospital analysis
(SELECT h.name
FROM hospitals h
ORDER BY r.geom <-> h.geom
LIMIT 1) AS nearest_hospital,
(SELECT ROUND(ST_Distance(r.geom::geography, h.geom::geography) / 1000, 2)
FROM hospitals h
ORDER BY r.geom <-> h.geom
LIMIT 1) AS nearest_hospital_km,
-- Emergency services analysis
(SELECT COUNT(*)
FROM hospitals h
WHERE ST_DWithin(r.geom::geography, h.geom::geography, 10000)
AND h.emergency_services = true) AS emergency_hospitals_10km,
-- Hospital capacity within range
(SELECT SUM(h.bed_capacity)
FROM hospitals h
WHERE ST_DWithin(r.geom::geography, h.geom::geography, 15000)) AS total_beds_15km,
-- Road accessibility
(SELECT MIN(ST_Distance(r.geom::geography, rd.geom::geography)) / 1000
FROM roads rd) AS nearest_road_km
FROM residential_areas r
)
SELECT
area_name,
population,
income_level,
nearest_hospital,
nearest_hospital_km,
emergency_hospitals_10km,
total_beds_15km,
ROUND(nearest_road_km, 2) AS road_access_km,
ROUND(population::NUMERIC / NULLIF(total_beds_15km, 0), 0) AS people_per_bed_ratio,
CASE
WHEN nearest_hospital_km < 5 AND emergency_hospitals_10km >= 2 THEN 'Excellent'
WHEN nearest_hospital_km < 10 AND emergency_hospitals_10km >= 1 THEN 'Good'
WHEN nearest_hospital_km < 15 THEN 'Fair'
ELSE 'Poor'
END AS healthcare_accessibility_score
FROM proximity_comprehensive
ORDER BY healthcare_accessibility_score, nearest_hospital_km;
Performance Optimization
Technique | Description | Performance Impact | When to Use |
---|---|---|---|
Spatial Indexes | GIST indexes on geometry columns | Dramatic improvement | Always for proximity queries |
KNN Operator (<-> ) |
Efficient nearest neighbor search | Very fast | Finding closest features |
Geography vs Geometry | Use geography for accurate distances | Slower but accurate | Global or large-scale analysis |
Bounding Box Filters | Pre-filter with && operator | Significant improvement | Large datasets |
LATERAL Joins | Efficient for nearest neighbor queries | Better than CROSS JOIN | Finding k-nearest for each feature |
Performance Optimization Examples
-- Optimized nearest neighbor query
EXPLAIN ANALYZE
SELECT r.area_name, h.name, ST_Distance(r.geom::geography, h.geom::geography) / 1000 AS distance_km
FROM residential_areas r
CROSS JOIN LATERAL (
SELECT name, geom
FROM hospitals
ORDER BY r.geom <-> geom
LIMIT 1
) h;
-- Compare with less efficient approach
EXPLAIN ANALYZE
SELECT DISTINCT ON (r.area_name)
r.area_name, h.name, ST_Distance(r.geom::geography, h.geom::geography) / 1000 AS distance_km
FROM residential_areas r
CROSS JOIN hospitals h
ORDER BY r.area_name, r.geom <-> h.geom;
Real-World Applications
Emergency Response Planning
-- Emergency response coverage analysis
WITH emergency_coverage AS (
SELECT
r.area_name,
r.population,
COUNT(h.id) AS emergency_hospitals_5km,
MIN(ST_Distance(r.geom::geography, h.geom::geography)) / 1000 AS nearest_emergency_km,
SUM(h.bed_capacity) AS total_emergency_beds
FROM residential_areas r
LEFT JOIN hospitals h ON ST_DWithin(r.geom::geography, h.geom::geography, 5000)
AND h.emergency_services = true
GROUP BY r.id, r.area_name, r.population
)
SELECT
area_name,
population,
emergency_hospitals_5km,
ROUND(nearest_emergency_km, 2) AS response_distance_km,
total_emergency_beds,
CASE
WHEN emergency_hospitals_5km = 0 THEN 'Critical Gap'
WHEN emergency_hospitals_5km = 1 AND nearest_emergency_km > 3 THEN 'Service Gap'
WHEN emergency_hospitals_5km >= 2 THEN 'Well Served'
ELSE 'Adequate'
END AS emergency_coverage_status
FROM emergency_coverage
ORDER BY
CASE emergency_coverage_status
WHEN 'Critical Gap' THEN 1
WHEN 'Service Gap' THEN 2
WHEN 'Adequate' THEN 3
ELSE 4
END,
population DESC;
Public Health Planning
-- Healthcare equity analysis
WITH equity_analysis AS (
SELECT
r.income_level,
r.area_name,
r.population,
MIN(ST_Distance(r.geom::geography, h.geom::geography)) / 1000 AS nearest_hospital_km,
COUNT(CASE WHEN ST_DWithin(r.geom::geography, h.geom::geography, 10000) THEN 1 END) AS hospitals_10km
FROM residential_areas r
CROSS JOIN hospitals h
GROUP BY r.id, r.income_level, r.area_name, r.population
)
SELECT
income_level,
COUNT(*) AS areas_count,
SUM(population) AS total_population,
ROUND(AVG(nearest_hospital_km), 2) AS avg_distance_km,
ROUND(AVG(hospitals_10km), 1) AS avg_hospitals_nearby,
ROUND(SUM(population * nearest_hospital_km) / SUM(population), 2) AS weighted_avg_distance
FROM equity_analysis
GROUP BY income_level
ORDER BY weighted_avg_distance DESC;
Best Practices
- Use appropriate distance functions: ST_DWithin for filtering, ST_Distance for precise measurements
- Choose correct coordinate systems: Use geography for accurate global distances
- Create spatial indexes: Essential for good performance with proximity queries
- Use KNN operator: Most efficient for nearest neighbor searches
- Consider real-world constraints: Account for transportation networks and barriers
- Validate distance units: Ensure consistency between meters, kilometers, and degrees
- Test with known distances: Verify results with measurable real-world examples
Common Pitfalls and Solutions
Pitfall | Problem | Solution |
---|---|---|
Incorrect distance units | Mixing degrees and meters | Use geography type for metric distances |
Poor performance | No spatial indexes | Create GIST indexes on geometry columns |
Inaccurate results | Using planar calculations globally | Use geography type or appropriate projections |
Inefficient queries | Using ST_Distance for filtering | Use ST_DWithin for proximity filtering |
Complex buffer overlaps | Double-counting in analysis | Use ST_Union to merge overlapping buffers |