Spatial Aggregate Functions

Working with data

Video Locked

Please log in to watch this video

Log In
Chapter Info
Course The Ultimate PostGIS course
Module Working with data
Chapter Spatial Aggregate Functions

Chapter Content

When working with large spatial datasets, the ability to summarize and merge geometries across groups becomes essential. Spatial aggregate functions in PostGIS provide powerful tools for combining geometries, calculating extents, and performing group-wise spatial analysis. This comprehensive guide covers all the key spatial aggregation techniques you need for advanced spatial data processing.

Introduction to Spatial Aggregates

Spatial aggregate functions allow you to summarize spatial data across groups, similar to how traditional SQL aggregates like SUM and COUNT work with numeric data. These functions are essential for:

  • Combining all roads in a city into a single network
  • Merging forest patches per district
  • Getting bounding boxes of all buildings in a region
  • Creating simplified representations of complex spatial datasets
  • Generating summary statistics for spatial reporting
Function Purpose Output Type Use Case
ST_Collect Groups geometries into collection GeometryCollection Bundling related features
ST_Union Merges overlapping geometries Single geometry Dissolving boundaries
ST_Extent Bounding box of group BOX2D Viewport calculation
ST_MemUnion Memory-efficient union Single geometry Large dataset merging
ST_Accum Array of geometries Geometry array Advanced processing

Setting Up Sample Data

Let's create realistic sample data representing spatial parcels in different districts:

-- Create table for spatial parcels
CREATE TABLE spatial_parcels (
  id SERIAL PRIMARY KEY,
  district TEXT,
  parcel_type TEXT,
  area_hectares NUMERIC,
  geom GEOMETRY(POLYGON, 4326)
);

-- Insert sample polygons representing parcels in Delhi and Mumbai
INSERT INTO spatial_parcels (district, parcel_type, area_hectares, geom) VALUES
-- Delhi parcels
('Delhi', 'Residential', 2.5, ST_GeomFromText('POLYGON((77.1 28.6, 77.15 28.6, 77.15 28.65, 77.1 28.65, 77.1 28.6))', 4326)),
('Delhi', 'Commercial', 1.8, ST_GeomFromText('POLYGON((77.15 28.6, 77.2 28.6, 77.2 28.65, 77.15 28.65, 77.15 28.6))', 4326)),
('Delhi', 'Industrial', 5.2, ST_GeomFromText('POLYGON((77.2 28.6, 77.3 28.6, 77.3 28.7, 77.2 28.7, 77.2 28.6))', 4326)),
('Delhi', 'Residential', 1.9, ST_GeomFromText('POLYGON((77.1 28.65, 77.15 28.65, 77.15 28.7, 77.1 28.7, 77.1 28.65))', 4326)),

-- Mumbai parcels
('Mumbai', 'Residential', 1.2, ST_GeomFromText('POLYGON((72.8 19.0, 72.85 19.0, 72.85 19.05, 72.8 19.05, 72.8 19.0))', 4326)),
('Mumbai', 'Commercial', 3.1, ST_GeomFromText('POLYGON((72.85 19.0, 72.9 19.0, 72.9 19.1, 72.85 19.1, 72.85 19.0))', 4326)),
('Mumbai', 'Industrial', 4.7, ST_GeomFromText('POLYGON((72.9 19.0, 73.0 19.0, 73.0 19.1, 72.9 19.1, 72.9 19.0))', 4326)),

-- Bangalore parcels
('Bangalore', 'Residential', 2.8, ST_GeomFromText('POLYGON((77.5 12.9, 77.55 12.9, 77.55 12.95, 77.5 12.95, 77.5 12.9))', 4326)),
('Bangalore', 'Commercial', 1.5, ST_GeomFromText('POLYGON((77.55 12.9, 77.6 12.9, 77.6 12.95, 77.55 12.95, 77.55 12.9))', 4326));

-- Create spatial index
CREATE INDEX idx_spatial_parcels_geom ON spatial_parcels USING GIST (geom);

ST_Collect - Grouping Geometries Together

ST_Collect groups multiple geometries into a GEOMETRYCOLLECTION without merging or dissolving boundaries. It simply bundles geometries together.

-- Basic collection by district
SELECT 
  district, 
  ST_Collect(geom) AS collected_geom,
  ST_NumGeometries(ST_Collect(geom)) AS num_parcels
FROM spatial_parcels
GROUP BY district;

-- Collection by parcel type
SELECT 
  parcel_type,
  ST_Collect(geom) AS collected_geom,
  COUNT(*) AS parcel_count,
  SUM(area_hectares) AS total_area_hectares
FROM spatial_parcels
GROUP BY parcel_type
ORDER BY total_area_hectares DESC;

Advanced ST_Collect Usage

-- Collect with filtering and ordering
SELECT 
  district,
  ST_Collect(geom ORDER BY area_hectares DESC) AS parcels_by_size,
  array_agg(area_hectares ORDER BY area_hectares DESC) AS areas_sorted
FROM spatial_parcels
WHERE area_hectares > 2.0
GROUP BY district;

-- Conditional collection
SELECT 
  district,
  ST_Collect(CASE WHEN parcel_type = 'Residential' THEN geom END) AS residential_parcels,
  ST_Collect(CASE WHEN parcel_type = 'Commercial' THEN geom END) AS commercial_parcels,
  ST_Collect(CASE WHEN parcel_type = 'Industrial' THEN geom END) AS industrial_parcels
FROM spatial_parcels
GROUP BY district;

ST_Union - Merging Geometries Together

ST_Union actually merges overlapping or adjacent polygons into a single seamless shape, dissolving internal boundaries.

-- Basic union by district
SELECT 
  district, 
  ST_Union(geom) AS merged_geom,
  ST_Area(ST_Union(geom)) AS total_area_degrees,
  COUNT(*) AS original_parcel_count
FROM spatial_parcels
GROUP BY district;

-- Union by parcel type across all districts
SELECT 
  parcel_type,
  ST_Union(geom) AS merged_geom,
  ST_Area(ST_Union(geom)) AS total_area_degrees,
  SUM(area_hectares) AS total_area_hectares
FROM spatial_parcels
GROUP BY parcel_type;

Comparing ST_Collect vs ST_Union

-- Compare collection vs union results
SELECT 
  district,
  ST_NumGeometries(ST_Collect(geom)) AS collected_parts,
  ST_NumGeometries(ST_Union(geom)) AS union_parts,
  ST_Area(ST_Collect(geom)) AS collected_area,
  ST_Area(ST_Union(geom)) AS union_area,
  CASE 
    WHEN ST_Area(ST_Collect(geom)) = ST_Area(ST_Union(geom)) THEN 'No Overlap'
    ELSE 'Has Overlap'
  END AS overlap_status
FROM spatial_parcels
GROUP BY district;
Aspect ST_Collect ST_Union
Boundary handling Preserves all boundaries Dissolves internal boundaries
Overlaps Preserves overlapping areas Merges overlapping areas
Performance Faster Slower (more processing)
Output complexity Multi-part geometry Simplified geometry
Use case Grouping for analysis Creating seamless areas

ST_Extent - Bounding Box of Grouped Features

ST_Extent returns the bounding box of all geometries in a group as a BOX2D type.

-- Calculate extents by district
SELECT 
  district, 
  ST_Extent(geom) AS bbox,
  ST_XMin(ST_Extent(geom)) AS min_longitude,
  ST_YMin(ST_Extent(geom)) AS min_latitude,
  ST_XMax(ST_Extent(geom)) AS max_longitude,
  ST_YMax(ST_Extent(geom)) AS max_latitude,
  (ST_XMax(ST_Extent(geom)) - ST_XMin(ST_Extent(geom))) * 
  (ST_YMax(ST_Extent(geom)) - ST_YMin(ST_Extent(geom))) AS bbox_area_degrees
FROM spatial_parcels
GROUP BY district;

-- Overall dataset extent
SELECT 
  'All Districts' AS scope,
  ST_Extent(geom) AS overall_bbox,
  COUNT(*) AS total_parcels,
  SUM(area_hectares) AS total_area_hectares
FROM spatial_parcels;

Converting Extent to Geometry

-- Convert extent to actual geometry for mapping
SELECT 
  district,
  ST_AsText(ST_Envelope(ST_Union(geom))) AS envelope_geom,
  ST_Area(ST_Envelope(ST_Union(geom))) AS envelope_area,
  ST_Area(ST_Union(geom)) AS actual_area,
  ST_Area(ST_Union(geom)) / ST_Area(ST_Envelope(ST_Union(geom))) AS fill_ratio
FROM spatial_parcels
GROUP BY district;

ST_MemUnion - Memory-Efficient Merging

ST_MemUnion is a memory-efficient version of ST_Union, ideal for large datasets where performance is critical.

-- Compare ST_Union vs ST_MemUnion performance
EXPLAIN ANALYZE
SELECT district, ST_Union(geom) AS union_result
FROM spatial_parcels
GROUP BY district;

EXPLAIN ANALYZE
SELECT district, ST_MemUnion(geom) AS memunion_result
FROM spatial_parcels
GROUP BY district;

-- Use ST_MemUnion for large dataset aggregation
SELECT 
  parcel_type,
  ST_MemUnion(geom) AS fast_merged_geom,
  COUNT(*) AS parcel_count,
  AVG(area_hectares) AS avg_area_hectares
FROM spatial_parcels
GROUP BY parcel_type;

ST_Accum - Array of Geometries

ST_Accum returns an array of geometries instead of a single collection, useful for advanced processing.

-- Create geometry arrays
SELECT 
  district,
  ST_Accum(geom) AS geom_array,
  array_length(ST_Accum(geom), 1) AS array_length,
  ST_Accum(area_hectares) AS area_array
FROM spatial_parcels
GROUP BY district;

-- Process geometry arrays
WITH district_arrays AS (
  SELECT 
    district,
    ST_Accum(geom) AS geom_array,
    ST_Accum(area_hectares) AS area_array
  FROM spatial_parcels
  GROUP BY district
)
SELECT 
  district,
  array_length(geom_array, 1) AS num_parcels,
  geom_array[1] AS first_parcel,
  geom_array[array_length(geom_array, 1)] AS last_parcel,
  area_array[1] AS first_area,
  (SELECT MAX(unnest) FROM unnest(area_array)) AS max_area
FROM district_arrays;

ST_ConcaveHull - Compact Aggregation Geometry

ST_ConcaveHull creates a tighter-fitting polygon than the convex hull around a group of geometries.

-- Compare different hull types
SELECT 
  district,
  ST_AsText(ST_ConvexHull(ST_Collect(geom))) AS convex_hull,
  ST_AsText(ST_ConcaveHull(ST_Collect(geom), 0.99)) AS concave_hull_99,
  ST_AsText(ST_ConcaveHull(ST_Collect(geom), 0.8)) AS concave_hull_80,
  ST_Area(ST_ConvexHull(ST_Collect(geom))) AS convex_area,
  ST_Area(ST_ConcaveHull(ST_Collect(geom), 0.99)) AS concave_area_99
FROM spatial_parcels
GROUP BY district;

-- Urban sprawl analysis using concave hull
SELECT 
  district,
  COUNT(*) AS parcel_count,
  ST_Area(ST_Union(geom)) AS actual_area,
  ST_Area(ST_ConcaveHull(ST_Collect(geom), 0.9)) AS sprawl_area,
  ST_Area(ST_ConcaveHull(ST_Collect(geom), 0.9)) / ST_Area(ST_Union(geom)) AS sprawl_ratio
FROM spatial_parcels
GROUP BY district
ORDER BY sprawl_ratio DESC;

Combining Spatial and Attribute Aggregates

Spatial aggregates can be combined with traditional aggregates for comprehensive analysis.

-- Comprehensive district analysis
SELECT 
  district,
  COUNT(*) AS parcel_count,
  SUM(area_hectares) AS total_area_hectares,
  AVG(area_hectares) AS avg_area_hectares,
  MIN(area_hectares) AS min_area_hectares,
  MAX(area_hectares) AS max_area_hectares,
  STDDEV(area_hectares) AS area_stddev,
  ST_Union(geom) AS district_boundary,
  ST_Area(ST_Union(geom)) AS boundary_area_degrees,
  ST_Centroid(ST_Union(geom)) AS district_center
FROM spatial_parcels
GROUP BY district
ORDER BY total_area_hectares DESC;

-- Parcel type analysis across districts
SELECT 
  parcel_type,
  COUNT(*) AS total_parcels,
  COUNT(DISTINCT district) AS districts_present,
  SUM(area_hectares) AS total_area_hectares,
  AVG(area_hectares) AS avg_parcel_size,
  ST_Union(geom) AS type_footprint,
  ST_Area(ST_Union(geom)) AS footprint_area_degrees
FROM spatial_parcels
GROUP BY parcel_type
ORDER BY total_area_hectares DESC;

Advanced Multi-Level Aggregation

-- Hierarchical aggregation: district -> parcel type
WITH district_type_summary AS (
  SELECT 
    district,
    parcel_type,
    COUNT(*) AS parcels,
    SUM(area_hectares) AS area_hectares,
    ST_Union(geom) AS type_geom
  FROM spatial_parcels
  GROUP BY district, parcel_type
)
SELECT 
  district,
  json_object_agg(
    parcel_type, 
    json_build_object(
      'parcels', parcels,
      'area_hectares', area_hectares,
      'geometry', ST_AsGeoJSON(type_geom)
    )
  ) AS parcel_types,
  SUM(parcels) AS total_parcels,
  SUM(area_hectares) AS total_area_hectares
FROM district_type_summary
GROUP BY district;

Performance Optimization for Large Datasets

Indexing Strategies

-- Create appropriate indexes
CREATE INDEX idx_spatial_parcels_district ON spatial_parcels(district);
CREATE INDEX idx_spatial_parcels_type ON spatial_parcels(parcel_type);
CREATE INDEX idx_spatial_parcels_area ON spatial_parcels(area_hectares);

-- Composite index for common grouping
CREATE INDEX idx_spatial_parcels_district_type ON spatial_parcels(district, parcel_type);

Memory and Performance Considerations

Function Memory Usage Performance Best For
ST_Collect Low Fast Simple grouping
ST_Union High Slow Boundary dissolution
ST_MemUnion Medium Medium Large datasets
ST_Extent Very Low Very Fast Bounding boxes
ST_ConcaveHull Medium Medium Shape analysis

Real-World Applications

Urban Planning Dashboard

-- Create comprehensive urban planning summary
SELECT 
  district,
  COUNT(*) AS total_parcels,
  SUM(area_hectares) AS total_area_hectares,
  
  -- Area by type
  SUM(CASE WHEN parcel_type = 'Residential' THEN area_hectares ELSE 0 END) AS residential_hectares,
  SUM(CASE WHEN parcel_type = 'Commercial' THEN area_hectares ELSE 0 END) AS commercial_hectares,
  SUM(CASE WHEN parcel_type = 'Industrial' THEN area_hectares ELSE 0 END) AS industrial_hectares,
  
  -- Percentages
  ROUND(100.0 * SUM(CASE WHEN parcel_type = 'Residential' THEN area_hectares ELSE 0 END) / SUM(area_hectares), 1) AS residential_percent,
  ROUND(100.0 * SUM(CASE WHEN parcel_type = 'Commercial' THEN area_hectares ELSE 0 END) / SUM(area_hectares), 1) AS commercial_percent,
  ROUND(100.0 * SUM(CASE WHEN parcel_type = 'Industrial' THEN area_hectares ELSE 0 END) / SUM(area_hectares), 1) AS industrial_percent,
  
  -- Spatial aggregates
  ST_AsGeoJSON(ST_Union(geom)) AS district_boundary,
  ST_AsGeoJSON(ST_Centroid(ST_Union(geom))) AS district_center,
  ST_Area(ST_Union(geom)) AS boundary_area_degrees
  
FROM spatial_parcels
GROUP BY district
ORDER BY total_area_hectares DESC;

Environmental Impact Assessment

-- Environmental fragmentation analysis
SELECT 
  district,
  parcel_type,
  COUNT(*) AS fragment_count,
  SUM(area_hectares) AS total_area,
  AVG(area_hectares) AS avg_fragment_size,
  
  -- Fragmentation metrics
  ST_Area(ST_Union(geom)) AS total_footprint,
  ST_Perimeter(ST_Union(geom)) AS total_perimeter,
  ST_Area(ST_ConvexHull(ST_Collect(geom))) AS convex_hull_area,
  
  -- Shape complexity
  ST_Perimeter(ST_Union(geom)) / (2 * sqrt(pi() * ST_Area(ST_Union(geom)))) AS shape_index,
  ST_Area(ST_Union(geom)) / ST_Area(ST_ConvexHull(ST_Collect(geom))) AS convexity_ratio
  
FROM spatial_parcels
WHERE parcel_type IN ('Residential', 'Industrial')
GROUP BY district, parcel_type
ORDER BY district, shape_index DESC;

Best Practices

  1. Choose the right aggregate function: Use ST_Collect for grouping, ST_Union for merging
  2. Consider performance: Use ST_MemUnion for large datasets
  3. Index appropriately: Create indexes on grouping columns and geometry
  4. Validate results: Check for NULL geometries and empty collections
  5. Combine with attribute aggregates: Create comprehensive summaries
  6. Use appropriate coordinate systems: Project for accurate area calculations
  7. Handle edge cases: Account for empty groups and invalid geometries

Common Pitfalls and Solutions

Pitfall Problem Solution
Memory issues with ST_Union Large datasets cause memory errors Use ST_MemUnion or process in batches
Invalid geometries Aggregation fails on invalid input Use ST_MakeValid() before aggregating
Empty results Groups with no geometries return NULL Use COALESCE() or filter empty groups
Mixed geometry types Cannot union points with polygons Filter by geometry type before aggregating
Coordinate system issues Inaccurate area calculations Transform to appropriate projection