When working with large spatial datasets, the ability to summarize and merge geometries across groups becomes essential. Spatial aggregate functions in PostGIS provide powerful tools for combining geometries, calculating extents, and performing group-wise spatial analysis. This comprehensive guide covers all the key spatial aggregation techniques you need for advanced spatial data processing.
Introduction to Spatial Aggregates
Spatial aggregate functions allow you to summarize spatial data across groups, similar to how traditional SQL aggregates like SUM and COUNT work with numeric data. These functions are essential for:
- Combining all roads in a city into a single network
- Merging forest patches per district
- Getting bounding boxes of all buildings in a region
- Creating simplified representations of complex spatial datasets
- Generating summary statistics for spatial reporting
Function | Purpose | Output Type | Use Case |
---|---|---|---|
ST_Collect | Groups geometries into collection | GeometryCollection | Bundling related features |
ST_Union | Merges overlapping geometries | Single geometry | Dissolving boundaries |
ST_Extent | Bounding box of group | BOX2D | Viewport calculation |
ST_MemUnion | Memory-efficient union | Single geometry | Large dataset merging |
ST_Accum | Array of geometries | Geometry array | Advanced processing |
Setting Up Sample Data
Let's create realistic sample data representing spatial parcels in different districts:
-- Create table for spatial parcels
CREATE TABLE spatial_parcels (
id SERIAL PRIMARY KEY,
district TEXT,
parcel_type TEXT,
area_hectares NUMERIC,
geom GEOMETRY(POLYGON, 4326)
);
-- Insert sample polygons representing parcels in Delhi and Mumbai
INSERT INTO spatial_parcels (district, parcel_type, area_hectares, geom) VALUES
-- Delhi parcels
('Delhi', 'Residential', 2.5, ST_GeomFromText('POLYGON((77.1 28.6, 77.15 28.6, 77.15 28.65, 77.1 28.65, 77.1 28.6))', 4326)),
('Delhi', 'Commercial', 1.8, ST_GeomFromText('POLYGON((77.15 28.6, 77.2 28.6, 77.2 28.65, 77.15 28.65, 77.15 28.6))', 4326)),
('Delhi', 'Industrial', 5.2, ST_GeomFromText('POLYGON((77.2 28.6, 77.3 28.6, 77.3 28.7, 77.2 28.7, 77.2 28.6))', 4326)),
('Delhi', 'Residential', 1.9, ST_GeomFromText('POLYGON((77.1 28.65, 77.15 28.65, 77.15 28.7, 77.1 28.7, 77.1 28.65))', 4326)),
-- Mumbai parcels
('Mumbai', 'Residential', 1.2, ST_GeomFromText('POLYGON((72.8 19.0, 72.85 19.0, 72.85 19.05, 72.8 19.05, 72.8 19.0))', 4326)),
('Mumbai', 'Commercial', 3.1, ST_GeomFromText('POLYGON((72.85 19.0, 72.9 19.0, 72.9 19.1, 72.85 19.1, 72.85 19.0))', 4326)),
('Mumbai', 'Industrial', 4.7, ST_GeomFromText('POLYGON((72.9 19.0, 73.0 19.0, 73.0 19.1, 72.9 19.1, 72.9 19.0))', 4326)),
-- Bangalore parcels
('Bangalore', 'Residential', 2.8, ST_GeomFromText('POLYGON((77.5 12.9, 77.55 12.9, 77.55 12.95, 77.5 12.95, 77.5 12.9))', 4326)),
('Bangalore', 'Commercial', 1.5, ST_GeomFromText('POLYGON((77.55 12.9, 77.6 12.9, 77.6 12.95, 77.55 12.95, 77.55 12.9))', 4326));
-- Create spatial index
CREATE INDEX idx_spatial_parcels_geom ON spatial_parcels USING GIST (geom);
ST_Collect - Grouping Geometries Together
ST_Collect
groups multiple geometries into a GEOMETRYCOLLECTION
without merging or dissolving boundaries. It simply bundles geometries together.
-- Basic collection by district
SELECT
district,
ST_Collect(geom) AS collected_geom,
ST_NumGeometries(ST_Collect(geom)) AS num_parcels
FROM spatial_parcels
GROUP BY district;
-- Collection by parcel type
SELECT
parcel_type,
ST_Collect(geom) AS collected_geom,
COUNT(*) AS parcel_count,
SUM(area_hectares) AS total_area_hectares
FROM spatial_parcels
GROUP BY parcel_type
ORDER BY total_area_hectares DESC;
Advanced ST_Collect Usage
-- Collect with filtering and ordering
SELECT
district,
ST_Collect(geom ORDER BY area_hectares DESC) AS parcels_by_size,
array_agg(area_hectares ORDER BY area_hectares DESC) AS areas_sorted
FROM spatial_parcels
WHERE area_hectares > 2.0
GROUP BY district;
-- Conditional collection
SELECT
district,
ST_Collect(CASE WHEN parcel_type = 'Residential' THEN geom END) AS residential_parcels,
ST_Collect(CASE WHEN parcel_type = 'Commercial' THEN geom END) AS commercial_parcels,
ST_Collect(CASE WHEN parcel_type = 'Industrial' THEN geom END) AS industrial_parcels
FROM spatial_parcels
GROUP BY district;
ST_Union - Merging Geometries Together
ST_Union
actually merges overlapping or adjacent polygons into a single seamless shape, dissolving internal boundaries.
-- Basic union by district
SELECT
district,
ST_Union(geom) AS merged_geom,
ST_Area(ST_Union(geom)) AS total_area_degrees,
COUNT(*) AS original_parcel_count
FROM spatial_parcels
GROUP BY district;
-- Union by parcel type across all districts
SELECT
parcel_type,
ST_Union(geom) AS merged_geom,
ST_Area(ST_Union(geom)) AS total_area_degrees,
SUM(area_hectares) AS total_area_hectares
FROM spatial_parcels
GROUP BY parcel_type;
Comparing ST_Collect vs ST_Union
-- Compare collection vs union results
SELECT
district,
ST_NumGeometries(ST_Collect(geom)) AS collected_parts,
ST_NumGeometries(ST_Union(geom)) AS union_parts,
ST_Area(ST_Collect(geom)) AS collected_area,
ST_Area(ST_Union(geom)) AS union_area,
CASE
WHEN ST_Area(ST_Collect(geom)) = ST_Area(ST_Union(geom)) THEN 'No Overlap'
ELSE 'Has Overlap'
END AS overlap_status
FROM spatial_parcels
GROUP BY district;
Aspect | ST_Collect | ST_Union |
---|---|---|
Boundary handling | Preserves all boundaries | Dissolves internal boundaries |
Overlaps | Preserves overlapping areas | Merges overlapping areas |
Performance | Faster | Slower (more processing) |
Output complexity | Multi-part geometry | Simplified geometry |
Use case | Grouping for analysis | Creating seamless areas |
ST_Extent - Bounding Box of Grouped Features
ST_Extent
returns the bounding box of all geometries in a group as a BOX2D type.
-- Calculate extents by district
SELECT
district,
ST_Extent(geom) AS bbox,
ST_XMin(ST_Extent(geom)) AS min_longitude,
ST_YMin(ST_Extent(geom)) AS min_latitude,
ST_XMax(ST_Extent(geom)) AS max_longitude,
ST_YMax(ST_Extent(geom)) AS max_latitude,
(ST_XMax(ST_Extent(geom)) - ST_XMin(ST_Extent(geom))) *
(ST_YMax(ST_Extent(geom)) - ST_YMin(ST_Extent(geom))) AS bbox_area_degrees
FROM spatial_parcels
GROUP BY district;
-- Overall dataset extent
SELECT
'All Districts' AS scope,
ST_Extent(geom) AS overall_bbox,
COUNT(*) AS total_parcels,
SUM(area_hectares) AS total_area_hectares
FROM spatial_parcels;
Converting Extent to Geometry
-- Convert extent to actual geometry for mapping
SELECT
district,
ST_AsText(ST_Envelope(ST_Union(geom))) AS envelope_geom,
ST_Area(ST_Envelope(ST_Union(geom))) AS envelope_area,
ST_Area(ST_Union(geom)) AS actual_area,
ST_Area(ST_Union(geom)) / ST_Area(ST_Envelope(ST_Union(geom))) AS fill_ratio
FROM spatial_parcels
GROUP BY district;
ST_MemUnion - Memory-Efficient Merging
ST_MemUnion
is a memory-efficient version of ST_Union
, ideal for large datasets where performance is critical.
-- Compare ST_Union vs ST_MemUnion performance
EXPLAIN ANALYZE
SELECT district, ST_Union(geom) AS union_result
FROM spatial_parcels
GROUP BY district;
EXPLAIN ANALYZE
SELECT district, ST_MemUnion(geom) AS memunion_result
FROM spatial_parcels
GROUP BY district;
-- Use ST_MemUnion for large dataset aggregation
SELECT
parcel_type,
ST_MemUnion(geom) AS fast_merged_geom,
COUNT(*) AS parcel_count,
AVG(area_hectares) AS avg_area_hectares
FROM spatial_parcels
GROUP BY parcel_type;
ST_Accum - Array of Geometries
ST_Accum
returns an array of geometries instead of a single collection, useful for advanced processing.
-- Create geometry arrays
SELECT
district,
ST_Accum(geom) AS geom_array,
array_length(ST_Accum(geom), 1) AS array_length,
ST_Accum(area_hectares) AS area_array
FROM spatial_parcels
GROUP BY district;
-- Process geometry arrays
WITH district_arrays AS (
SELECT
district,
ST_Accum(geom) AS geom_array,
ST_Accum(area_hectares) AS area_array
FROM spatial_parcels
GROUP BY district
)
SELECT
district,
array_length(geom_array, 1) AS num_parcels,
geom_array[1] AS first_parcel,
geom_array[array_length(geom_array, 1)] AS last_parcel,
area_array[1] AS first_area,
(SELECT MAX(unnest) FROM unnest(area_array)) AS max_area
FROM district_arrays;
ST_ConcaveHull - Compact Aggregation Geometry
ST_ConcaveHull
creates a tighter-fitting polygon than the convex hull around a group of geometries.
-- Compare different hull types
SELECT
district,
ST_AsText(ST_ConvexHull(ST_Collect(geom))) AS convex_hull,
ST_AsText(ST_ConcaveHull(ST_Collect(geom), 0.99)) AS concave_hull_99,
ST_AsText(ST_ConcaveHull(ST_Collect(geom), 0.8)) AS concave_hull_80,
ST_Area(ST_ConvexHull(ST_Collect(geom))) AS convex_area,
ST_Area(ST_ConcaveHull(ST_Collect(geom), 0.99)) AS concave_area_99
FROM spatial_parcels
GROUP BY district;
-- Urban sprawl analysis using concave hull
SELECT
district,
COUNT(*) AS parcel_count,
ST_Area(ST_Union(geom)) AS actual_area,
ST_Area(ST_ConcaveHull(ST_Collect(geom), 0.9)) AS sprawl_area,
ST_Area(ST_ConcaveHull(ST_Collect(geom), 0.9)) / ST_Area(ST_Union(geom)) AS sprawl_ratio
FROM spatial_parcels
GROUP BY district
ORDER BY sprawl_ratio DESC;
Combining Spatial and Attribute Aggregates
Spatial aggregates can be combined with traditional aggregates for comprehensive analysis.
-- Comprehensive district analysis
SELECT
district,
COUNT(*) AS parcel_count,
SUM(area_hectares) AS total_area_hectares,
AVG(area_hectares) AS avg_area_hectares,
MIN(area_hectares) AS min_area_hectares,
MAX(area_hectares) AS max_area_hectares,
STDDEV(area_hectares) AS area_stddev,
ST_Union(geom) AS district_boundary,
ST_Area(ST_Union(geom)) AS boundary_area_degrees,
ST_Centroid(ST_Union(geom)) AS district_center
FROM spatial_parcels
GROUP BY district
ORDER BY total_area_hectares DESC;
-- Parcel type analysis across districts
SELECT
parcel_type,
COUNT(*) AS total_parcels,
COUNT(DISTINCT district) AS districts_present,
SUM(area_hectares) AS total_area_hectares,
AVG(area_hectares) AS avg_parcel_size,
ST_Union(geom) AS type_footprint,
ST_Area(ST_Union(geom)) AS footprint_area_degrees
FROM spatial_parcels
GROUP BY parcel_type
ORDER BY total_area_hectares DESC;
Advanced Multi-Level Aggregation
-- Hierarchical aggregation: district -> parcel type
WITH district_type_summary AS (
SELECT
district,
parcel_type,
COUNT(*) AS parcels,
SUM(area_hectares) AS area_hectares,
ST_Union(geom) AS type_geom
FROM spatial_parcels
GROUP BY district, parcel_type
)
SELECT
district,
json_object_agg(
parcel_type,
json_build_object(
'parcels', parcels,
'area_hectares', area_hectares,
'geometry', ST_AsGeoJSON(type_geom)
)
) AS parcel_types,
SUM(parcels) AS total_parcels,
SUM(area_hectares) AS total_area_hectares
FROM district_type_summary
GROUP BY district;
Performance Optimization for Large Datasets
Indexing Strategies
-- Create appropriate indexes
CREATE INDEX idx_spatial_parcels_district ON spatial_parcels(district);
CREATE INDEX idx_spatial_parcels_type ON spatial_parcels(parcel_type);
CREATE INDEX idx_spatial_parcels_area ON spatial_parcels(area_hectares);
-- Composite index for common grouping
CREATE INDEX idx_spatial_parcels_district_type ON spatial_parcels(district, parcel_type);
Memory and Performance Considerations
Function | Memory Usage | Performance | Best For |
---|---|---|---|
ST_Collect | Low | Fast | Simple grouping |
ST_Union | High | Slow | Boundary dissolution |
ST_MemUnion | Medium | Medium | Large datasets |
ST_Extent | Very Low | Very Fast | Bounding boxes |
ST_ConcaveHull | Medium | Medium | Shape analysis |
Real-World Applications
Urban Planning Dashboard
-- Create comprehensive urban planning summary
SELECT
district,
COUNT(*) AS total_parcels,
SUM(area_hectares) AS total_area_hectares,
-- Area by type
SUM(CASE WHEN parcel_type = 'Residential' THEN area_hectares ELSE 0 END) AS residential_hectares,
SUM(CASE WHEN parcel_type = 'Commercial' THEN area_hectares ELSE 0 END) AS commercial_hectares,
SUM(CASE WHEN parcel_type = 'Industrial' THEN area_hectares ELSE 0 END) AS industrial_hectares,
-- Percentages
ROUND(100.0 * SUM(CASE WHEN parcel_type = 'Residential' THEN area_hectares ELSE 0 END) / SUM(area_hectares), 1) AS residential_percent,
ROUND(100.0 * SUM(CASE WHEN parcel_type = 'Commercial' THEN area_hectares ELSE 0 END) / SUM(area_hectares), 1) AS commercial_percent,
ROUND(100.0 * SUM(CASE WHEN parcel_type = 'Industrial' THEN area_hectares ELSE 0 END) / SUM(area_hectares), 1) AS industrial_percent,
-- Spatial aggregates
ST_AsGeoJSON(ST_Union(geom)) AS district_boundary,
ST_AsGeoJSON(ST_Centroid(ST_Union(geom))) AS district_center,
ST_Area(ST_Union(geom)) AS boundary_area_degrees
FROM spatial_parcels
GROUP BY district
ORDER BY total_area_hectares DESC;
Environmental Impact Assessment
-- Environmental fragmentation analysis
SELECT
district,
parcel_type,
COUNT(*) AS fragment_count,
SUM(area_hectares) AS total_area,
AVG(area_hectares) AS avg_fragment_size,
-- Fragmentation metrics
ST_Area(ST_Union(geom)) AS total_footprint,
ST_Perimeter(ST_Union(geom)) AS total_perimeter,
ST_Area(ST_ConvexHull(ST_Collect(geom))) AS convex_hull_area,
-- Shape complexity
ST_Perimeter(ST_Union(geom)) / (2 * sqrt(pi() * ST_Area(ST_Union(geom)))) AS shape_index,
ST_Area(ST_Union(geom)) / ST_Area(ST_ConvexHull(ST_Collect(geom))) AS convexity_ratio
FROM spatial_parcels
WHERE parcel_type IN ('Residential', 'Industrial')
GROUP BY district, parcel_type
ORDER BY district, shape_index DESC;
Best Practices
- Choose the right aggregate function: Use ST_Collect for grouping, ST_Union for merging
- Consider performance: Use ST_MemUnion for large datasets
- Index appropriately: Create indexes on grouping columns and geometry
- Validate results: Check for NULL geometries and empty collections
- Combine with attribute aggregates: Create comprehensive summaries
- Use appropriate coordinate systems: Project for accurate area calculations
- Handle edge cases: Account for empty groups and invalid geometries
Common Pitfalls and Solutions
Pitfall | Problem | Solution |
---|---|---|
Memory issues with ST_Union | Large datasets cause memory errors | Use ST_MemUnion or process in batches |
Invalid geometries | Aggregation fails on invalid input | Use ST_MakeValid() before aggregating |
Empty results | Groups with no geometries return NULL | Use COALESCE() or filter empty groups |
Mixed geometry types | Cannot union points with polygons | Filter by geometry type before aggregating |
Coordinate system issues | Inaccurate area calculations | Transform to appropriate projection |