Setting Up Your Python Environment

Introduction to Python for GIS

Video Locked

Please log in to watch this video

Log In
Chapter Info
Course The Ultimate GeoPython course
Module Introduction to Python for GIS
Chapter Setting Up Your Python Environment
Navigation

Chapter Content

Setting up a robust Python environment is the foundation of successful GIS development. This comprehensive guide will walk you through installing Anaconda, configuring development environments, and understanding the Python ecosystem specifically for geospatial applications. By the end of this lesson, you'll have a professional-grade setup ready for advanced GIS programming.

πŸ“š Learning Objectives

  • Install and configure Anaconda distribution for Python GIS development
  • Set up and customize VS Code and JupyterLab for geospatial programming
  • Master the differences between Python scripts and Jupyter notebooks
  • Navigate and leverage the Python ecosystem for geospatial libraries
  • Create and manage virtual environments for different GIS projects
  • Troubleshoot common installation and configuration issues
  • Implement best practices for professional GIS development workflows

πŸ”‘ Key Concepts

Why Python for GIS?

Python has become the de facto standard for GIS programming due to several compelling reasons:

  • Rich Ecosystem: Extensive libraries like GeoPandas, Shapely, Rasterio, and GDAL
  • Integration: Seamless integration with major GIS software (ArcGIS, QGIS)
  • Data Science: Powerful data analysis capabilities with Pandas and NumPy
  • Visualization: Advanced mapping with Folium, Matplotlib, and Plotly
  • Community: Large, active community with extensive documentation
  • Cross-platform: Works consistently across Windows, macOS, and Linux

Understanding the Python Distribution Landscape

Anaconda vs. Miniconda vs. Standard Python

Distribution Size Pre-installed Packages Best For
Anaconda ~3GB 1500+ packages Beginners, data science, GIS
Miniconda ~400MB Essential packages only Advanced users, custom setups
Standard Python ~100MB Standard library only Minimal installations, production

Development Environment Options

Jupyter Notebooks vs. Python Scripts

πŸ““ Jupyter Notebooks (.ipynb)

Best for:

  • Data exploration and analysis
  • Prototyping and experimentation
  • Creating tutorials and documentation
  • Sharing results with visualizations

Features:

  • Interactive code execution
  • Inline visualizations
  • Markdown support
  • Easy sharing and collaboration
🐍 Python Scripts (.py)

Best for:

  • Production code and automation
  • Command-line tools
  • Reusable functions and modules
  • Version control and collaboration

Features:

  • Better performance
  • Easier debugging
  • Version control friendly
  • Professional development

πŸ› οΈ Detailed Installation Guide

Step 1: Installing Anaconda

Windows Installation

  1. Download Anaconda from official website
  2. Run the installer as administrator
  3. Choose "Add Anaconda to PATH" (recommended for beginners)
  4. Complete installation and restart your computer

macOS Installation

# Method 1: Download installer from website
# Method 2: Using Homebrew
brew install --cask anaconda

# Method 3: Command line
curl -O https://repo.anaconda.com/archive/Anaconda3-latest-MacOSX-x86_64.sh
bash Anaconda3-latest-MacOSX-x86_64.sh

Linux Installation

# Download and install
wget https://repo.anaconda.com/archive/Anaconda3-latest-Linux-x86_64.sh
bash Anaconda3-latest-Linux-x86_64.sh

# Add to PATH (if not done automatically)
echo 'export PATH="~/anaconda3/bin:$PATH"' >> ~/.bashrc
source ~/.bashrc

Step 2: Verifying Installation

# Check conda installation
conda --version
conda info

# Check Python installation
python --version
which python

# List installed packages
conda list

Step 3: Creating Your First GIS Environment

Basic GIS Environment

# Create environment with specific Python version
conda create -n gis python=3.9

# Activate environment
conda activate gis

# Install essential GIS packages
conda install -c conda-forge geopandas rasterio fiona shapely folium contextily

Advanced GIS Environment with Machine Learning

# Create comprehensive GIS environment
conda create -n gis-ml python=3.9

# Activate environment
conda activate gis-ml

# Install GIS packages
conda install -c conda-forge geopandas rasterio fiona shapely folium contextily

# Install data science packages
conda install pandas numpy matplotlib seaborn plotly

# Install machine learning packages
conda install scikit-learn tensorflow

# Install web development packages
conda install flask dash streamlit

# Install additional utilities
conda install requests beautifulsoup4 selenium

Step 4: Setting Up Development Environments

VS Code Configuration

  1. Install VS Code: Download from official website
  2. Install Essential Extensions:
    • Python (Microsoft)
    • Jupyter (Microsoft)
    • Python Docstring Generator
    • GitLens
    • Bracket Pair Colorizer
    • Material Icon Theme
  3. Configure Python Interpreter:
    # Open VS Code
    # Press Ctrl+Shift+P (Cmd+Shift+P on Mac)
    # Type "Python: Select Interpreter"
    # Choose your conda environment

JupyterLab Setup

# Install JupyterLab extensions
conda install -c conda-forge jupyterlab

# Install useful extensions
jupyter labextension install @jupyter-widgets/jupyterlab-manager
jupyter labextension install jupyterlab-plotly

# Start JupyterLab
jupyter lab

πŸ’» Comprehensive Code Examples

Environment Testing Script

# test_gis_environment.py
import sys
import platform
from datetime import datetime

def test_basic_imports():
    """Test basic Python functionality"""
    try:
        import pandas as pd
        import numpy as np
        import matplotlib.pyplot as plt
        print("βœ… Basic data science packages imported successfully")
        return True
    except ImportError as e:
        print(f"❌ Error importing basic packages: {e}")
        return False

def test_gis_imports():
    """Test GIS-specific packages"""
    gis_packages = {
        'geopandas': 'gpd',
        'shapely': 'shapely',
        'fiona': 'fiona',
        'rasterio': 'rasterio',
        'folium': 'folium'
    }
    
    results = {}
    for package, alias in gis_packages.items():
        try:
            __import__(package)
            results[package] = "βœ… Available"
        except ImportError:
            results[package] = "❌ Not installed"
    
    return results

def system_info():
    """Display system information"""
    info = {
        'Python Version': sys.version,
        'Platform': platform.platform(),
        'Architecture': platform.architecture()[0],
        'Processor': platform.processor(),
        'Test Date': datetime.now().strftime('%Y-%m-%d %H:%M:%S')
    }
    return info

def main():
    print("πŸ” GIS Environment Test Report")
    print("=" * 50)
    
    # System information
    print("\nπŸ“Š System Information:")
    for key, value in system_info().items():
        print(f"{key}: {value}")
    
    # Test basic imports
    print("\nπŸ§ͺ Testing Basic Packages:")
    test_basic_imports()
    
    # Test GIS imports
    print("\nπŸ—ΊοΈ Testing GIS Packages:")
    gis_results = test_gis_imports()
    for package, status in gis_results.items():
        print(f"{package}: {status}")
    
    # Performance test
    print("\n⚑ Performance Test:")
    import time
    import numpy as np
    
    start_time = time.time()
    large_array = np.random.random((1000, 1000))
    result = np.sum(large_array)
    end_time = time.time()
    
    print(f"Array computation time: {end_time - start_time:.4f} seconds")
    print(f"Result: {result:.2f}")

if __name__ == "__main__":
    main()

Sample GIS Workflow

# sample_gis_workflow.py
import geopandas as gpd
import pandas as pd
import matplotlib.pyplot as plt
import folium
from shapely.geometry import Point

def create_sample_data():
    """Create sample geospatial data"""
    # Sample cities data
    cities_data = {
        'city': ['New York', 'London', 'Tokyo', 'Sydney', 'Cairo'],
        'country': ['USA', 'UK', 'Japan', 'Australia', 'Egypt'],
        'population': [8.4, 9.0, 13.9, 5.3, 9.1],
        'latitude': [40.7128, 51.5074, 35.6762, -33.8688, 30.0444],
        'longitude': [-74.0060, -0.1278, 139.6503, 151.2093, 31.2357]
    }
    
    # Create DataFrame
    df = pd.DataFrame(cities_data)
    
    # Create geometry column
    geometry = [Point(xy) for xy in zip(df.longitude, df.latitude)]
    
    # Create GeoDataFrame
    gdf = gpd.GeoDataFrame(df, geometry=geometry, crs='EPSG:4326')
    
    return gdf

def analyze_data(gdf):
    """Perform basic analysis"""
    print("πŸ“Š Data Analysis Results:")
    print(f"Total cities: {len(gdf)}")
    print(f"Average population: {gdf['population'].mean():.1f} million")
    print(f"Largest city: {gdf.loc[gdf['population'].idxmax(), 'city']}")
    print(f"Countries represented: {gdf['country'].nunique()}")

def create_static_map(gdf):
    """Create static map with matplotlib"""
    fig, ax = plt.subplots(figsize=(12, 8))
    
    # Plot world boundaries (if available)
    try:
        world = gpd.read_file(gpd.datasets.get_path('naturalearth_lowres'))
        world.plot(ax=ax, color='lightgray', edgecolor='white')
    except:
        print("World boundaries not available, plotting points only")
    
    # Plot cities
    gdf.plot(ax=ax, color='red', markersize=gdf['population']*10, alpha=0.7)
    
    # Add labels
    for idx, row in gdf.iterrows():
        ax.annotate(row['city'], (row.geometry.x, row.geometry.y), 
                   xytext=(5, 5), textcoords='offset points', fontsize=8)
    
    ax.set_title('World Cities by Population', fontsize=16, fontweight='bold')
    ax.set_xlabel('Longitude')
    ax.set_ylabel('Latitude')
    
    plt.tight_layout()
    plt.savefig('world_cities_map.png', dpi=300, bbox_inches='tight')
    plt.show()

def create_interactive_map(gdf):
    """Create interactive map with folium"""
    # Create base map
    m = folium.Map(location=[20, 0], zoom_start=2)
    
    # Add markers for each city
    for idx, row in gdf.iterrows():
        folium.CircleMarker(
            location=[row['latitude'], row['longitude']],
            radius=row['population'],
            popup=f"{row['city']}, {row['country']}
Population: {row['population']}M",
            color='red',
            fill=True,
            fillColor='red',
            fillOpacity=0.6
        ).add_to(m)
    
    # Save map
    m.save('interactive_world_cities.html')
    print("Interactive map saved as 'interactive_world_cities.html'")

def main():
    """Main workflow"""
    print("πŸ—ΊοΈ Sample GIS Workflow")
    print("=" * 30)
    
    # Create sample data
    print("\n1. Creating sample geospatial data...")
    gdf = create_sample_data()
    print(f"Created GeoDataFrame with {len(gdf)} cities")
    
    # Display data info
    print("\n2. Data Overview:")
    print(gdf.head())
    print(f"\nCRS: {gdf.crs}")
    print(f"Geometry type: {gdf.geometry.geom_type.iloc[0]}")
    
    # Analyze data
    print("\n3. Data Analysis:")
    analyze_data(gdf)
    
    # Create visualizations
    print("\n4. Creating visualizations...")
    create_static_map(gdf)
    create_interactive_map(gdf)
    
    print("\nβœ… Workflow completed successfully!")

if __name__ == "__main__":
    main()

🎯 Advanced Environment Management

Environment Configuration Files

Complete environment.yml for GIS Projects

name: gis-complete
channels:
  - conda-forge
  - defaults
dependencies:
  # Core Python
  - python=3.9
  
  # Data manipulation
  - pandas>=1.3.0
  - numpy>=1.21.0
  
  # Geospatial core
  - geopandas>=0.10.0
  - shapely>=1.8.0
  - fiona>=1.8.0
  - rasterio>=1.2.0
  - pyproj>=3.2.0
  - rtree>=0.9.0
  
  # Visualization
  - matplotlib>=3.4.0
  - seaborn>=0.11.0
  - folium>=0.12.0
  - contextily>=1.2.0
  - plotly>=5.0.0
  
  # Web mapping
  - leafmap
  - ipyleaflet
  
  # Data sources
  - requests>=2.26.0
  - beautifulsoup4>=4.10.0
  
  # Development tools
  - jupyter>=1.0.0
  - jupyterlab>=3.0.0
  - ipykernel>=6.0.0
  - nb_conda_kernels
  
  # Code quality
  - black
  - flake8
  - pytest
  
  # Additional utilities
  - tqdm
  - openpyxl
  - xlrd
  
  # Pip packages
  - pip
  - pip:
    - geopy
    - osmnx
    - earthpy
    - rasterstats

Environment Management Commands

# Create environment from file
conda env create -f environment.yml

# Update existing environment
conda env update -f environment.yml --prune

# Export current environment
conda env export > environment.yml

# Clone environment
conda create --name gis-backup --clone gis

# List all environments
conda env list

# Remove environment
conda env remove --name old-env

# Activate/deactivate
conda activate gis
conda deactivate

πŸ”§ Professional Tips and Best Practices

πŸ—οΈ Project Structure Best Practices

gis-project/
β”œβ”€β”€ data/
β”‚   β”œβ”€β”€ raw/              # Original, immutable data
β”‚   β”œβ”€β”€ processed/        # Cleaned, processed data
β”‚   └── external/         # External data sources
β”œβ”€β”€ notebooks/            # Jupyter notebooks for exploration
β”œβ”€β”€ src/                  # Source code
β”‚   β”œβ”€β”€ __init__.py
β”‚   β”œβ”€β”€ data/            # Data processing modules
β”‚   β”œβ”€β”€ features/        # Feature engineering
β”‚   β”œβ”€β”€ models/          # Model definitions
β”‚   └── visualization/   # Plotting functions
β”œβ”€β”€ tests/               # Unit tests
β”œβ”€β”€ docs/                # Documentation
β”œβ”€β”€ environment.yml      # Conda environment
β”œβ”€β”€ requirements.txt     # Pip requirements
β”œβ”€β”€ README.md           # Project description
└── .gitignore          # Git ignore file

πŸ’‘ Performance Optimization Tips

  • Memory Management:
    # Monitor memory usage
    import psutil
    print(f"Memory usage: {psutil.virtual_memory().percent}%")
    
    # Use chunking for large datasets
    for chunk in pd.read_csv('large_file.csv', chunksize=10000):
        process_chunk(chunk)
  • Parallel Processing:
    # Use multiprocessing for CPU-intensive tasks
    from multiprocessing import Pool
    import geopandas as gpd
    
    def process_geometry(geom):
        return geom.buffer(100)
    
    # Parallel processing
    with Pool() as pool:
        results = pool.map(process_geometry, gdf.geometry)
  • Efficient Data Types:
    # Optimize data types to save memory
    gdf['category'] = gdf['category'].astype('category')
    gdf['population'] = pd.to_numeric(gdf['population'], downcast='integer')

πŸ”’ Security Best Practices

  • Environment Variables:
    # Use environment variables for sensitive data
    import os
    api_key = os.getenv('GIS_API_KEY')
    
    # Create .env file (never commit to version control)
    # GIS_API_KEY=your_secret_key_here
  • Virtual Environments: Always use isolated environments for different projects
  • Package Verification: Only install packages from trusted sources

🚨 Troubleshooting Common Issues

Installation Problems

❌ Problem: "conda: command not found"

Solution:

# Add conda to PATH
export PATH="~/anaconda3/bin:$PATH"

# Or reinstall with PATH option
# Windows: Check "Add Anaconda to PATH" during installation

❌ Problem: Package conflicts during installation

Solution:

# Use mamba (faster conda alternative)
conda install mamba -c conda-forge
mamba install geopandas

# Or create fresh environment
conda create -n fresh-gis python=3.9
conda activate fresh-gis

Jupyter Issues

⚠️ Problem: Kernel not found in Jupyter

Solution:

# Install ipykernel in your environment
conda activate gis
conda install ipykernel

# Register kernel
python -m ipykernel install --user --name gis --display-name "Python (GIS)"

# List available kernels
jupyter kernelspec list

Import Errors

ℹ️ Problem: "No module named 'geopandas'"

Solution:

# Verify environment is activated
conda info --envs

# Install missing package
conda install -c conda-forge geopandas

# Check installation
python -c "import geopandas; print(geopandas.__version__)"

πŸŽ“ Hands-on Exercises

Exercise 1: Environment Setup Challenge

  1. Create a new conda environment called "gis-challenge"
  2. Install Python 3.9 and essential GIS packages
  3. Create a Jupyter notebook that imports all packages successfully
  4. Export your environment to a YAML file

Exercise 2: Development Environment Comparison

  1. Create the same simple GIS analysis in both a Jupyter notebook and Python script
  2. Compare the development experience
  3. Document the pros and cons of each approach

Exercise 3: Troubleshooting Practice

  1. Intentionally create a package conflict
  2. Practice resolving the conflict using conda commands
  3. Document your troubleshooting process

πŸ”— Additional Resources and Next Steps

Essential Documentation

Community Resources

What's Next?

Now that you have a solid Python environment set up, you're ready to dive into:

  • Variables & Data Types - Understanding Python fundamentals
  • Working with Spatial Data - Loading and manipulating GIS data
  • Creating Maps - Visualization techniques
  • Spatial Analysis - Advanced GIS operations

πŸŽ‰ Congratulations!

You've successfully set up a professional Python environment for GIS development. This foundation will serve you well throughout your geospatial programming journey. Remember to keep your environments organized, document your setups, and don't hesitate to experiment with new packages and tools!