A comprehensive preprocessing pipeline for multispectral crop classification using temporal orthoimages and machine learning.
This pipeline transforms raw multispectral orthoimages and crop field vectors into machine learning-ready datasets with spatially-aware train/validation/test splits. It's designed for time-series crop classification using Random Forest and other ML and DL algorithms.
- Patch Extraction: Extract labeled patches from multispectral orthoimages using field boundaries
- Temporal Stacking : Combine multi-date patches into 4D time-series arrays
- Spatial Splitting : Create spatially-aware train/val/test splits to prevent data leakage
- Configurable Pipeline : Fully configurable parameters via command line or config files
- Built-in Analytics : Comprehensive statistics and validation throughout the process
# Clone the repository
git clone https://github.com/your-username/Multispectral-UAV-Crop-Classification-Pipeline
cd Multispectral-UAV-Crop-Classification-Pipeline
# Run setup script (creates venv, installs dependencies, sets up directories)
python setup.py# Windows
venv\\Scripts\\activate
# Linux/MacOS
source venv/bin/activatePlace your data in the following structure:
data/raw/
├── orthoimages/ # Multispectral TIFF files
│ ├── 230601_reflectance_ortho.tif
│ ├── 230615_reflectance_ortho.tif
│ └── ...
├── vectors/ # Crop field boundaries
│ └── crop_fields.geojson
└── zone_masks/ # Spatial zone masks for splitting
└── spatial_zones.tif
# Complete pipeline in one command
python main.py pipeline \\
--vector data/raw/vectors/crop_fields.geojson \\
--ortho-dir data/raw/orthoimages/ \\
--zone-mask data/raw/zone_masks/spatial_zones.tifYou can run each step independently for more control:
# Step 1: Extract patches
python main.py extract \\
--vector data/raw/vectors/crop_fields.geojson \\
--ortho-dir data/raw/orthoimages/ \\
--patch-size 256 \\
--stride 128
# Step 2: Stack temporal data
python main.py stack \\
--patches-dir data/processed/patches \\
--output-dir data/processed/stacked
# Step 3: Create spatial splits
python main.py split \\
--mapping data/processed/stacked/stack_mapping.csv \\
--zone-mask data/raw/zone_masks/spatial_zones.tif \\
--stacked-folder data/processed/stacked--patch-size: Size of patches in pixels (default: 256)--stride: Sliding window stride (default: 128)--channel-first: Store patches as (C,H,W) instead of (H,W,C)--min-plots-per-class: Minimum field plots required per crop class (default: 3)
--expected-bands: Number of spectral bands expected (default: 10)--min-temporal-samples: Minimum temporal observations required (default: 3)--date-pattern: Regex pattern for date folder matching
--excluded-classes: Crop class IDs to exclude (default: [1, 2])--zone-mapping: Custom zone to split mapping (format:zone:split)
By default, the pipeline uses:
- Zones 1,2 → train
- Zone 3 → validation
- Zone 4 → test
Customize with:
python main.py split ... --zone-mapping 1:train 2:train 3:val 4:test 5:testMultispectral-UAV-Crop-Classification-Pipeline/
├── main.py # Main pipeline orchestrator
├── setup.py # Environment setup script
├── requirements.txt # Python dependencies
├── README.md # This file
├── .gitignore # Git ignore rules
├── src/
│ └── libs/ # Core library modules
│ ├── __init__.py
│ ├── patch_extractor.py # Patch extraction functionality
│ ├── temporal_stacker.py # Temporal stacking functionality
│ └── spatial_splitter.py # Spatial splitting functionality
├── data/
│ ├── raw/ # Input data
│ │ ├── orthoimages/
│ │ ├── vectors/
│ │ └── zone_masks/
│ └── processed/ # Pipeline outputs
│ ├── patches/
│ └── stacked/
├── output/ # Final results
│ ├── models/
│ ├── predictions/
│ └── visualizations/
├── notebooks/ # Jupyter notebooks
├── configs/ # Configuration files
└── logs/ # Log files
After preprocessing, use the included training notebooks:
jupyter notebook notebooks/
choose between classic machine learning or deep learningThe notebooks demonstrate:
- Feature engineering from 4D temporal data
- PCA-based dimensionality reduction
- Comprehensive model evaluation
- Spatial error analysis and visualization
- automated deep learning approach
from src.libs import PatchExtractor, TemporalStacker, SpatialSplitter
# Extract patches
extractor = PatchExtractor(
vector_path="data/raw/vectors/fields.geojson",
ortho_dir="data/raw/orthoimages/",
output_dir="data/processed/patches"
)
extractor.extract_all_orthos()
# Stack temporal data
stacker = TemporalStacker(
base_dir="data/processed/patches",
output_dir="data/processed/stacked"
)
mapping_path = stacker.run()
# Create spatial splits
splitter = SpatialSplitter(
mapping_csv=mapping_path,
zone_mask_path="data/raw/zone_masks/zones.tif",
stacked_folder="data/processed/stacked"
)
splitter.run()Each module is designed to be flexible and extensible:
# Custom crop filtering
extractor = PatchExtractor(...)
extractor.min_plots_per_class = 5 # Require more plots per class
# Custom temporal requirements
stacker = TemporalStacker(...)
stacker.min_temporal_samples = 5 # Require more temporal observations
# Custom spatial zones
splitter = SpatialSplitter(...)
splitter.set_custom_zone_mapping({1: "train", 2: "val", 3: "test"})- Format: GeoTIFF (.tif)
- Naming:
YYMMDD_reflectance_ortho.tif(e.g.,230601_reflectance_ortho.tif) - Bands: 10 spectral bands (configurable)
- Coordinate System: Any projected CRS
- Data Type: Float32 or UInt16
- Format: GeoJSON (.geojson) or Shapefile (.shp)
- Required Fields:
crop: Crop type name (string)plot_ID: Unique plot identifier (optional)
- Geometry: Polygon features representing field boundaries
- Coordinate System: Any CRS (will be reprojected to match orthoimages)
- Format: GeoTIFF (.tif)
- Values: Integer zone IDs (1, 2, 3, 4, etc.)
- Coordinate System: Must match orthoimages
- Purpose: Define spatial regions for train/val/test splitting
- Format: GeoTIFF files organized by date
- Naming:
{patch_id}_class{crop_class}.tif - Size: Configurable (default 256x256 pixels)
- Bands: Same as input orthoimages
- Format: NumPy binary files (.npy)
- Shape:
(time×bands, height, width) - Metadata: CSV mapping file with temporal and spatial information
- Format: CSV files
- Content: Patch assignments to train/validation/test splits
- Spatial Awareness: Ensures no spatial overlap between splits
Memory Errors: Reduce patch size or batch processing
python main.py extract --patch-size 128 --stride 64Missing Dependencies: Reinstall requirements
pip install -r requirements.txtDate Pattern Mismatch: Adjust regex pattern
python main.py stack --date-pattern "\\d{8}_ortho"Spatial Projection Issues: Ensure CRS compatibility
- Zone mask must match orthoimage CRS
- Vector data will be automatically reprojected
Large Datasets:
- Use larger stride values to reduce patch count
- Increase minimum temporal samples to filter sparse data
- Use smaller patch sizes to reduce memory usage
Computational Resources:
- The pipeline supports parallel processing where possible
- Consider running on high-memory systems for large datasets
- Use SSD storage for improved I/O performance
The pipeline provides comprehensive statistics at each step:
- Patch Extraction: Number of patches per class, spatial distribution
- Temporal Stacking: Temporal sample statistics, data quality metrics
- Spatial Splitting: Train/val/test distribution, class balance per split
Example output:
Patch Extraction Complete:
Total patches: 15,432
Crop classes: 5 (Potato: 3,245, Soybean: 4,123, ...)
Temporal Stacking Complete:
Stacked patches: 12,847
Temporal samples: min=3, max=8, mean=5.2
Spatial Splitting Complete:
Train: 8,459 patches (65.8%)
Validation: 2,144 patches (16.7%)
Test: 2,244 patches (17.5%)
Contributions are welcome! Please:
- Fork the repository
- Create a feature branch (
git checkout -b feature/amazing-feature) - Commit your changes (
git commit -m 'Add amazing feature') - Push to the branch (
git push origin feature/amazing-feature) - Open a Pull Request
# Clone for development
git clone https://github.com/your-username/Multispectral-UAV-Crop-Classification-Pipeline.git
cd crop-classification-pipeline
# Install in development mode
pip install -e .
# Run tests (if available)
python -m pytest tests/This project is licensed under the MIT License - see the LICENSE file for details.
- Issues: GitHub Issues
- Discussions: GitHub Discussions
- Documentation: See
docs/directory for additional documentation
- GeoPatch Library: For efficient patch extraction from geospatial data
- Rasterio/GDAL: For robust geospatial data handling
- Scikit-learn: For machine learning utilities and validation metrics
- Contributors: Thanks to all contributors who have helped improve this pipeline
If you use this pipeline in your research, please cite:
@software{Multispectral-UAV-Crop-Classification-Pipeline,
title={Multispectral-UAV-Crop-Classification-Pipeline: A Preprocessing Framework for Multispectral Time-Series Data},
authors={Nelson Pinheiro & Lena Martin & Marina Assenmacher & Ziyu Pei
},
year={2024},
url={https://github.com/your-username/Multispectral-UAV-Crop-Classification-Pipeline}
}🌾 Happy Crop Classifying! 🌾