Loading spatial data into PostGIS is a fundamental skill for anyone working with geographic information systems. Whether you're importing shapefiles, converting GeoJSON, or creating geometries from coordinate data, PostGIS provides multiple methods to get your spatial data into the database. This comprehensive guide covers all the essential techniques and tools you need to master spatial data loading.
Using GDAL/OGR Tools
GDAL (Geospatial Data Abstraction Library) and its vector counterpart OGR provide powerful command-line tools for converting between different spatial data formats and loading them directly into PostGIS.
Installing GDAL/OGR
# Ubuntu/Debian
sudo apt install gdal-bin
# macOS with Homebrew
brew install gdal
# Windows
# Download from https://trac.osgeo.org/osgeo4w/
Using ogr2ogr
The ogr2ogr
command-line tool can convert various spatial formats directly to PostGIS:
# Basic syntax
ogr2ogr -f "PostgreSQL" PG:"host=localhost user=postgres dbname=mydb password=mypassword" input_file.shp
# Specify output layer name
ogr2ogr -f "PostgreSQL" PG:"host=localhost user=postgres dbname=mydb password=mypassword" input_file.shp -nln schema_name.table_name
# Specify SRID
ogr2ogr -f "PostgreSQL" PG:"host=localhost user=postgres dbname=mydb password=mypassword" input_file.shp -a_srs EPSG:4326
# Reproject data during import
ogr2ogr -f "PostgreSQL" PG:"host=localhost user=postgres dbname=mydb password=mypassword" input_file.shp -s_srs EPSG:27700 -t_srs EPSG:4326
# Append to existing table
ogr2ogr -f "PostgreSQL" PG:"host=localhost user=postgres dbname=mydb password=mypassword" input_file.shp -append -nln existing_table
Common File Formats Supported by OGR
Format | Extension | Description | Use Case |
---|---|---|---|
Shapefile | .shp | ESRI's vector format | Traditional GIS data exchange |
GeoJSON | .geojson, .json | JSON-based geographic format | Web applications, APIs |
KML/KMZ | .kml, .kmz | Google Earth format | Visualization, GPS data |
GML | .gml | Geography Markup Language | Standards-based exchange |
MapInfo | .tab | MapInfo native format | MapInfo data migration |
CSV | .csv | Comma-separated values with coordinates | Simple point data |
GeoPackage | .gpkg | SQLite-based spatial format | Modern spatial data exchange |
FileGDB | .gdb | ESRI File Geodatabase | ArcGIS data migration |
Example: Converting GeoJSON to PostGIS
ogr2ogr -f "PostgreSQL" PG:"host=localhost user=postgres dbname=spatial_db password=secret" \
cities.geojson \
-nln public.cities \
-a_srs EPSG:4326
Using QGIS
QGIS provides a user-friendly interface for loading spatial data into PostGIS, making it accessible for users who prefer graphical tools.
Using DB Manager
- Connect to PostGIS database:
- Go to "Layer" > "Data Source Manager" > "PostgreSQL" tab
- Create a new connection with your database credentials
- Test the connection and save it
- Open DB Manager:
- Go to "Database" > "DB Manager" > "DB Manager"
- Expand "PostgreSQL" and select your database connection
- Import layer:
- Click the "Import Layer/File" button
- Select your input file
- Configure import options (schema, table name, primary key, SRID)
- Check "Create spatial index" if needed
- Click "OK" to start the import
Using Processing Toolbox
- Open the Processing Toolbox:
- Go to "Processing" > "Toolbox"
- Search for "Export to PostgreSQL":
- Find "Export to PostgreSQL" under "Vector database"
- Configure the connection parameters
- Select the layer to export
- Configure table name, schema, and other options
- Run the algorithm
Using Command-Line Tools
shp2pgsql
The shp2pgsql
tool converts Shapefiles to SQL statements that can be executed in PostgreSQL:
# Generate SQL from a Shapefile
shp2pgsql -s 4326 -I input_file.shp schema_name.table_name > output.sql
# Pipe directly to psql
shp2pgsql -s 4326 -I input_file.shp schema_name.table_name | psql -d mydb -U postgres
Option | Description | Purpose |
---|---|---|
-s |
SRID | Set spatial reference system |
-I |
Create a spatial index | Improve query performance |
-D |
Use PostgreSQL dump format | Faster loading for large files |
-c |
Create new table (default) | Standard import mode |
-a |
Append to existing table | Add data to existing table |
-d |
Drop table, create new one | Replace existing table |
raster2pgsql
Similar to shp2pgsql
but for raster data:
# Generate SQL from a raster file
raster2pgsql -s 4326 -I -C -M input_file.tif schema_name.table_name > output.sql
# Pipe directly to psql
raster2pgsql -s 4326 -I -C -M input_file.tif schema_name.table_name | psql -d mydb -U postgres
Option | Description | Purpose |
---|---|---|
-s |
SRID | Set spatial reference system |
-I |
Create a spatial index | Improve query performance |
-C |
Apply raster constraints | Enforce data consistency |
-M |
Vacuum analyze the table | Update table statistics |
-t |
Create new table with specified tile size | Optimize for large rasters |
Using SQL to Create Spatial Data
Creating Geometries with SQL Functions
-- Create a point
INSERT INTO places (name, geom)
VALUES ('Seattle', ST_SetSRID(ST_MakePoint(-122.3321, 47.6062), 4326));
-- Create a linestring
INSERT INTO rivers (name, geom)
VALUES ('Example River',
ST_SetSRID(ST_MakeLine(
ST_MakePoint(-122.3, 47.6),
ST_MakePoint(-122.2, 47.65),
ST_MakePoint(-122.1, 47.63)
), 4326)
);
-- Create a polygon
INSERT INTO lakes (name, geom)
VALUES ('Example Lake',
ST_SetSRID(ST_MakePolygon(
ST_MakeLine(ARRAY[
ST_MakePoint(-122.3, 47.6),
ST_MakePoint(-122.2, 47.6),
ST_MakePoint(-122.2, 47.7),
ST_MakePoint(-122.3, 47.7),
ST_MakePoint(-122.3, 47.6)
])
), 4326)
);
Creating Geometries from Well-Known Text (WKT)
-- Create a point from WKT
INSERT INTO places (name, geom)
VALUES ('New York', ST_GeomFromText('POINT(-74.0060 40.7128)', 4326));
-- Create a linestring from WKT
INSERT INTO rivers (name, geom)
VALUES ('Hudson River',
ST_GeomFromText('LINESTRING(-74.0060 40.7128, -73.9762 40.7831, -73.9220 41.1058)', 4326)
);
-- Create a polygon from WKT
INSERT INTO lakes (name, geom)
VALUES ('Central Park Reservoir',
ST_GeomFromText('POLYGON((-73.9667 40.7856, -73.9558 40.7856, -73.9558 40.7965, -73.9667 40.7965, -73.9667 40.7856))', 4326)
);
Creating Geometries from GeoJSON
-- Create a point from GeoJSON
INSERT INTO places (name, geom)
VALUES ('London',
ST_GeomFromGeoJSON('{"type":"Point","coordinates":[-0.1278,51.5074]}')
);
-- Create a polygon from GeoJSON
INSERT INTO countries (name, geom)
VALUES ('Simple Square',
ST_GeomFromGeoJSON('{"type":"Polygon","coordinates":[[[0,0],[1,0],[1,1],[0,1],[0,0]]]}')
);
Converting Coordinates to Geometries
Creating Points from Latitude and Longitude Columns
-- Create a table with latitude and longitude columns
CREATE TABLE locations (
id SERIAL PRIMARY KEY,
name VARCHAR(100),
latitude NUMERIC(10, 7),
longitude NUMERIC(10, 7)
);
-- Insert some data
INSERT INTO locations (name, latitude, longitude) VALUES
('Paris', 48.8566, 2.3522),
('Tokyo', 35.6762, 139.6503),
('Sydney', -33.8688, 151.2093);
-- Add a geometry column
ALTER TABLE locations ADD COLUMN geom GEOMETRY(POINT, 4326);
-- Update the geometry column from lat/lon
UPDATE locations
SET geom = ST_SetSRID(ST_MakePoint(longitude, latitude), 4326);
-- Create an index on the geometry column
CREATE INDEX idx_locations_geom ON locations USING GIST (geom);
Creating Geometries from Address Data (Geocoding)
Using the PostGIS Tiger Geocoder (for US addresses):
-- Install the necessary extensions
CREATE EXTENSION postgis_tiger_geocoder;
-- Geocode an address
SELECT g.rating, ST_AsText(g.geomout) AS geom, (addy).address, (addy).streetname, (addy).streettypeabbrev
FROM geocode('1600 Pennsylvania Ave NW, Washington, DC 20500') AS g;
Bulk Loading from CSV Files
Loading CSV with Coordinates
-- Create a table for the CSV data
CREATE TABLE csv_locations (
id SERIAL PRIMARY KEY,
name VARCHAR(100),
description TEXT,
latitude NUMERIC(10, 7),
longitude NUMERIC(10, 7)
);
-- Load CSV data using COPY
COPY csv_locations(name, description, latitude, longitude)
FROM '/path/to/locations.csv'
WITH (FORMAT CSV, HEADER true, DELIMITER ',');
-- Add a geometry column
ALTER TABLE csv_locations ADD COLUMN geom GEOMETRY(POINT, 4326);
-- Update the geometry column from lat/lon
UPDATE csv_locations
SET geom = ST_SetSRID(ST_MakePoint(longitude, latitude), 4326);
Using ogr2ogr with CSV
# Load CSV with coordinates directly to PostGIS
ogr2ogr -f "PostgreSQL" PG:"host=localhost user=postgres dbname=mydb password=mypassword" \
locations.csv \
-oo X_POSSIBLE_NAMES=longitude,lon,x -oo Y_POSSIBLE_NAMES=latitude,lat,y \
-a_srs EPSG:4326 \
-nln locations
Using Programming Languages
Python with psycopg2 and GeoPandas
import geopandas as gpd
from sqlalchemy import create_engine
# Connect to the database
engine = create_engine('postgresql://postgres:password@localhost:5432/spatial_db')
# Read a shapefile
gdf = gpd.read_file('neighborhoods.shp')
# Make sure the CRS is set correctly
gdf = gdf.set_crs(epsg=4326)
# Write to PostGIS
gdf.to_postgis(
name='neighborhoods',
con=engine,
schema='public',
if_exists='replace',
index=True,
index_label='id'
)
Node.js with pg and wkx
const { Client } = require("pg");
const wkx = require("wkx");
const fs = require("fs");
// Connect to the database
const client = new Client({
host: "localhost",
port: 5432,
database: "spatial_db",
user: "postgres",
password: "password",
});
async function loadGeoJSON() {
await client.connect();
// Read GeoJSON file
const geojson = JSON.parse(fs.readFileSync("points.geojson", "utf8"));
// Create table if it doesn't exist
await client.query(`
CREATE TABLE IF NOT EXISTS points (
id SERIAL PRIMARY KEY,
name VARCHAR(100),
geom GEOMETRY(POINT, 4326)
)
`);
// Insert features
for (const feature of geojson.features) {
const name = feature.properties.name || "Unnamed";
const coords = feature.geometry.coordinates;
await client.query(
"INSERT INTO points (name, geom) VALUES ($1, ST_SetSRID(ST_MakePoint($2, $3), 4326))",
[name, coords[0], coords[1]]
);
}
console.log(`Inserted ${geojson.features.length} points`);
await client.end();
}
loadGeoJSON().catch((err) => console.error(err));
Best Practices for Loading Spatial Data
Practice | Description | Benefit |
---|---|---|
Check your SRID | Ensure you're using the correct spatial reference system | Accurate spatial operations |
Create spatial indexes | Add a GiST index on geometry columns | Better query performance |
Validate geometries | Use ST_IsValid to check for invalid geometries | Prevent spatial operation errors |
Use transactions | Wrap bulk operations in transactions | Data consistency and rollback capability |
Consider using COPY | For large datasets, COPY is faster than INSERT | Improved loading performance |
Disable triggers temporarily | For very large imports, disable triggers and constraints | Faster bulk loading |
Use appropriate data types | Choose the right geometry type for your data | Storage efficiency and query optimization |
Check for duplicates | Remove or handle duplicate geometries | Data quality and consistency |
Example of a Well-Structured Import Process
-- Example of a well-structured import process
BEGIN;
-- Create the table
CREATE TABLE parcels (
id SERIAL PRIMARY KEY,
parcel_id VARCHAR(20) UNIQUE NOT NULL,
address VARCHAR(200),
geom GEOMETRY(POLYGON, 4326)
);
-- Create temporary table for raw data
CREATE TEMP TABLE parcels_temp (
parcel_id VARCHAR(20),
address VARCHAR(200),
wkt TEXT
);
-- Load the data
\COPY parcels_temp FROM 'parcels.csv' WITH CSV HEADER;
-- Process and validate
INSERT INTO parcels (parcel_id, address, geom)
SELECT
parcel_id,
address,
ST_MakeValid(ST_SetSRID(ST_GeomFromText(wkt), 4326))
FROM
parcels_temp
WHERE
wkt IS NOT NULL
AND ST_IsValid(ST_GeomFromText(wkt));
-- Create spatial index
CREATE INDEX idx_parcels_geom ON parcels USING GIST (geom);
-- Analyze for query planner
ANALYZE parcels;
COMMIT;
Troubleshooting Common Issues
Issue | Cause | Solution |
---|---|---|
Invalid geometries | Malformed input data | Use ST_MakeValid() or ST_IsValid() to check and fix |
Wrong SRID | Incorrect spatial reference system | Verify and set correct SRID using ST_SetSRID() |
Encoding issues | Character encoding mismatch | Specify correct encoding in COPY or ogr2ogr |
Permission errors | Insufficient database privileges | Grant necessary permissions or use superuser |
Memory issues | Large datasets exceeding memory | Process in smaller batches or increase memory |
Slow imports | Missing indexes or constraints | Temporarily disable indexes during bulk loading |
Performance Optimization Tips
- Batch processing: Load data in batches rather than row by row
- Disable autocommit: Use explicit transactions for better performance
- Temporary disable constraints: Remove constraints during bulk loading
- Use COPY instead of INSERT: COPY is significantly faster for bulk operations
- Create indexes after loading: Build indexes after data is loaded
- Increase work_mem: Temporarily increase PostgreSQL's work_mem setting
- Use unlogged tables: For temporary data, consider unlogged tables