Crop Classification Pipeline

A comprehensive preprocessing pipeline for multispectral crop classification using temporal orthoimages and machine learning.

Overview

This pipeline transforms raw multispectral orthoimages and crop field vectors into machine learning-ready datasets with spatially-aware train/validation/test splits. It's designed for time-series crop classification using Random Forest and other ML and DL algorithms.

Key Features

Patch Extraction: Extract labeled patches from multispectral orthoimages using field boundaries
Temporal Stacking : Combine multi-date patches into 4D time-series arrays
Spatial Splitting : Create spatially-aware train/val/test splits to prevent data leakage
Configurable Pipeline : Fully configurable parameters via command line or config files
Built-in Analytics : Comprehensive statistics and validation throughout the process

Quick Start

1. Installation

# Clone the repository
git clone https://github.com/your-username/Multispectral-UAV-Crop-Classification-Pipeline
cd Multispectral-UAV-Crop-Classification-Pipeline

# Run setup script (creates venv, installs dependencies, sets up directories)
python setup.py

2. Activate Environment

# Windows
venv\\Scripts\\activate

# Linux/MacOS  
source venv/bin/activate

3. Prepare Your Data

Place your data in the following structure:

data/raw/
├── orthoimages/          # Multispectral TIFF files
│   ├── 230601_reflectance_ortho.tif
│   ├── 230615_reflectance_ortho.tif
│   └── ...
├── vectors/              # Crop field boundaries
│   └── crop_fields.geojson
└── zone_masks/           # Spatial zone masks for splitting
    └── spatial_zones.tif

4. Run the Pipeline

# Complete pipeline in one command
python main.py pipeline \\
    --vector data/raw/vectors/crop_fields.geojson \\
    --ortho-dir data/raw/orthoimages/ \\
    --zone-mask data/raw/zone_masks/spatial_zones.tif

Detailed Usage

Step-by-Step Execution

You can run each step independently for more control:

# Step 1: Extract patches
python main.py extract \\
    --vector data/raw/vectors/crop_fields.geojson \\
    --ortho-dir data/raw/orthoimages/ \\
    --patch-size 256 \\
    --stride 128

# Step 2: Stack temporal data
python main.py stack \\
    --patches-dir data/processed/patches \\
    --output-dir data/processed/stacked

# Step 3: Create spatial splits
python main.py split \\
    --mapping data/processed/stacked/stack_mapping.csv \\
    --zone-mask data/raw/zone_masks/spatial_zones.tif \\
    --stacked-folder data/processed/stacked

Configuration Options

Patch Extraction

--patch-size: Size of patches in pixels (default: 256)
--stride: Sliding window stride (default: 128)
--channel-first: Store patches as (C,H,W) instead of (H,W,C)
--min-plots-per-class: Minimum field plots required per crop class (default: 3)

Temporal Stacking

--expected-bands: Number of spectral bands expected (default: 10)
--min-temporal-samples: Minimum temporal observations required (default: 3)
--date-pattern: Regex pattern for date folder matching

Spatial Splitting

--excluded-classes: Crop class IDs to exclude (default: [1, 2])
--zone-mapping: Custom zone to split mapping (format: zone:split)

Custom Zone Mapping

By default, the pipeline uses:

Zones 1,2 → train
Zone 3 → validation
Zone 4 → test

Customize with:

python main.py split ... --zone-mapping 1:train 2:train 3:val 4:test 5:test

Project Structure

Multispectral-UAV-Crop-Classification-Pipeline/
├── main.py                 # Main pipeline orchestrator
├── setup.py               # Environment setup script
├── requirements.txt       # Python dependencies
├── README.md              # This file
├── .gitignore            # Git ignore rules
├── src/
│   └── libs/             # Core library modules
│       ├── __init__.py
│       ├── patch_extractor.py     # Patch extraction functionality
│       ├── temporal_stacker.py    # Temporal stacking functionality  
│       └── spatial_splitter.py    # Spatial splitting functionality
├── data/
│   ├── raw/              # Input data
│   │   ├── orthoimages/
│   │   ├── vectors/
│   │   └── zone_masks/
│   └── processed/        # Pipeline outputs
│       ├── patches/
│       └── stacked/
├── output/               # Final results
│   ├── models/
│   ├── predictions/
│   └── visualizations/
├── notebooks/            # Jupyter notebooks
├── configs/              # Configuration files
└── logs/                # Log files

Machine Learning Training

After preprocessing, use the included training notebooks:

jupyter notebook notebooks/
choose between classic machine learning or deep learning

The notebooks demonstrate:

Feature engineering from 4D temporal data
PCA-based dimensionality reduction
Comprehensive model evaluation
Spatial error analysis and visualization
automated deep learning approach

Advanced Usage

Using as Python Library

from src.libs import PatchExtractor, TemporalStacker, SpatialSplitter

# Extract patches
extractor = PatchExtractor(
    vector_path="data/raw/vectors/fields.geojson",
    ortho_dir="data/raw/orthoimages/",
    output_dir="data/processed/patches"
)
extractor.extract_all_orthos()

# Stack temporal data
stacker = TemporalStacker(
    base_dir="data/processed/patches",
    output_dir="data/processed/stacked"
)
mapping_path = stacker.run()

# Create spatial splits
splitter = SpatialSplitter(
    mapping_csv=mapping_path,
    zone_mask_path="data/raw/zone_masks/zones.tif",
    stacked_folder="data/processed/stacked"
)
splitter.run()

Custom Processing

Each module is designed to be flexible and extensible:

# Custom crop filtering
extractor = PatchExtractor(...)
extractor.min_plots_per_class = 5  # Require more plots per class

# Custom temporal requirements  
stacker = TemporalStacker(...)
stacker.min_temporal_samples = 5  # Require more temporal observations

# Custom spatial zones
splitter = SpatialSplitter(...)
splitter.set_custom_zone_mapping({1: "train", 2: "val", 3: "test"})

Data Requirements

Input Data Format

Multispectral Orthoimages

Format: GeoTIFF (.tif)
Naming: YYMMDD_reflectance_ortho.tif (e.g., 230601_reflectance_ortho.tif)
Bands: 10 spectral bands (configurable)
Coordinate System: Any projected CRS
Data Type: Float32 or UInt16

Crop Field Vectors

Format: GeoJSON (.geojson) or Shapefile (.shp)
Required Fields:
- crop: Crop type name (string)
- plot_ID: Unique plot identifier (optional)
Geometry: Polygon features representing field boundaries
Coordinate System: Any CRS (will be reprojected to match orthoimages)

Spatial Zone Mask

Format: GeoTIFF (.tif)
Values: Integer zone IDs (1, 2, 3, 4, etc.)
Coordinate System: Must match orthoimages
Purpose: Define spatial regions for train/val/test splitting

Output Data Format

Extracted Patches

Format: GeoTIFF files organized by date
Naming: {patch_id}_class{crop_class}.tif
Size: Configurable (default 256x256 pixels)
Bands: Same as input orthoimages

Stacked Arrays

Format: NumPy binary files (.npy)
Shape: (time×bands, height, width)
Metadata: CSV mapping file with temporal and spatial information

Split Assignments

Format: CSV files
Content: Patch assignments to train/validation/test splits
Spatial Awareness: Ensures no spatial overlap between splits

Troubleshooting

Common Issues

Memory Errors: Reduce patch size or batch processing

python main.py extract --patch-size 128 --stride 64

Missing Dependencies: Reinstall requirements

pip install -r requirements.txt

Date Pattern Mismatch: Adjust regex pattern

python main.py stack --date-pattern "\\d{8}_ortho"

Spatial Projection Issues: Ensure CRS compatibility

Zone mask must match orthoimage CRS
Vector data will be automatically reprojected

Performance Optimization

Large Datasets:

Use larger stride values to reduce patch count
Increase minimum temporal samples to filter sparse data
Use smaller patch sizes to reduce memory usage

Computational Resources:

The pipeline supports parallel processing where possible
Consider running on high-memory systems for large datasets
Use SSD storage for improved I/O performance

Pipeline Statistics

The pipeline provides comprehensive statistics at each step:

Patch Extraction: Number of patches per class, spatial distribution
Temporal Stacking: Temporal sample statistics, data quality metrics
Spatial Splitting: Train/val/test distribution, class balance per split

Example output:

 Patch Extraction Complete:
   Total patches: 15,432
   Crop classes: 5 (Potato: 3,245, Soybean: 4,123, ...)
   
 Temporal Stacking Complete:
   Stacked patches: 12,847
   Temporal samples: min=3, max=8, mean=5.2
   
 Spatial Splitting Complete:
   Train: 8,459 patches (65.8%)
   Validation: 2,144 patches (16.7%) 
   Test: 2,244 patches (17.5%)

Contributing

Contributions are welcome! Please:

Fork the repository
Create a feature branch (git checkout -b feature/amazing-feature)
Commit your changes (git commit -m 'Add amazing feature')
Push to the branch (git push origin feature/amazing-feature)
Open a Pull Request

Development Setup

# Clone for development
git clone https://github.com/your-username/Multispectral-UAV-Crop-Classification-Pipeline.git
cd crop-classification-pipeline

# Install in development mode
pip install -e .

# Run tests (if available)
python -m pytest tests/

License

This project is licensed under the MIT License - see the LICENSE file for details.

Support

Issues: GitHub Issues
Discussions: GitHub Discussions
Documentation: See docs/ directory for additional documentation

Acknowledgments

GeoPatch Library: For efficient patch extraction from geospatial data
Rasterio/GDAL: For robust geospatial data handling
Scikit-learn: For machine learning utilities and validation metrics
Contributors: Thanks to all contributors who have helped improve this pipeline

Citation

If you use this pipeline in your research, please cite:

@software{Multispectral-UAV-Crop-Classification-Pipeline,
  title={Multispectral-UAV-Crop-Classification-Pipeline: A Preprocessing Framework for Multispectral Time-Series Data},
  authors={Nelson Pinheiro & Lena Martin & Marina Assenmacher & Ziyu Pei 

},
  year={2024},
  url={https://github.com/your-username/Multispectral-UAV-Crop-Classification-Pipeline}
}

🌾 Happy Crop Classifying! 🌾

Name		Name	Last commit message	Last commit date
Latest commit History 95 Commits
configs		configs
data		data
notebooks		notebooks
output/models		output/models
src/libs		src/libs
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
main.py		main.py
requirements.txt		requirements.txt
setup.py		setup.py

Folders and files

Latest commit

History

Repository files navigation

Crop Classification Pipeline

Overview

Key Features

Quick Start

1. Installation

2. Activate Environment

3. Prepare Your Data

4. Run the Pipeline

Detailed Usage

Step-by-Step Execution

Configuration Options

Patch Extraction

Temporal Stacking

Spatial Splitting

Custom Zone Mapping

Project Structure

Machine Learning Training

Advanced Usage

Using as Python Library

Custom Processing

Data Requirements

Input Data Format

Multispectral Orthoimages

Crop Field Vectors

Spatial Zone Mask

Output Data Format

Extracted Patches

Stacked Arrays

Split Assignments

Troubleshooting

Common Issues

Performance Optimization

Pipeline Statistics

Contributing

Development Setup

License

Support

Acknowledgments

Citation

About

Topics

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Uh oh!

Contributors

Uh oh!

Languages

Packages