🌲 Fire-Ready Forests Data Challenge (Winner) ->Team Agni Astra

Welcome to the repository for Team Agni Astra's submission to the Fire-Ready Forests Data Challenge hosted by the National Data Platform. Our aim was to build a robust and interpretable machine learning pipeline for classifying forest trees in California across three biological levels:

Plant Functional Type (PFT)
Genus
Species

We used ecological, morphological, and spatial features to train a hierarchical ensemble of classifiers that achieved strong predictive performance across all levels.

📁 Repository Structure

├── data/               # Processed and merged datasets (FIA + REF_SPECIES)
├── Images/             # All project-related figures and visualizations
├── final_NB.ipynb            # Final pipeline notebook (run this)
├── plots.ipynb         # Exploratory plots and feature analyses
├── requirements.txt    # Required Python packages
├── .gitattributes      # Git LFS tracking config
├── README.md           # This file

⚠️ Git LFS Required

This repository uses Git Large File Storage (LFS) to handle large files like CA_TREE.csv.

Before cloning, ensure LFS is installed and initialized:

# One-time setup (if you haven't already)
git lfs install

if you are running on a conda environment, initialize new environment with the following code:

conda create -n "env-name" python=3.11

conda activate env-name

If not on conda, then you can directly skip to the below steps

# Clone the repo
git clone <repo-url>
cd <repo-name>

pip install -r requirements.txt

then proceed on running final_NB.ipynb file.

If you don't set up Git LFS, large files like data/CA_TREE.csv will not download correctly.

🚀 Pipeline Overview

We implemented a hierarchical stacking ensemble pipeline to classify:

🌿 PFTs using tree structure, location, and ecological data
🌲 Genus using all of the above + predicted PFT
🌳 Species using all of the above + predicted Genus

🔧 Preprocessing Steps

Merged FIA and REF_SPECIES datasets
Dropped highly correlated features (e.g., BasalA)
Filtered by ecological zones (ECOSUBCD)
One-hot encoded categorical variables
Handled missing values
Converted units (e.g., inches to cm)
Stratified train-test split and standardized numeric columns

🧠 Modeling Strategy

Base Models: Random Forest, Extra Trees, Decision Tree, Bagging
Meta-Learner: Tuned XGBoost classifier

Features:

Numerical: DIA (diameter), HT (height), LAT, LON
One-hot encoded: Ecological Zones

Target Variables: PFT → Genus → Species (hierarchical)

📊 Performance Summary

Task	Accuracy	Weighted F1-Score
PFT	90.0%	0.89
Genus	83.3%	0.83
Species	96.9%	0.97

The models showed high consistency, especially for dominant classes (e.g., Evergreen conifer, Pinus). Site-level evaluations were conducted for PFTs.

📈 Key Visuals & Insights

All plots are included in Images/ and plots.ipynb. Highlights:

KDE maps of tree density
Distribution plots of diameter, height, and basal area
Correlation heatmaps
Feature importance charts
Scatter and box plots for structural analysis
Pipeline architecture visualizations

📄 Project Report

Read our full methodology, evaluation strategy, modeling diagrams, and discussion in the official PDF report:

📘 SDSC_Report.pdf

✨ Acknowledgments

This project was developed for the Fire-Ready Forests Challenge hosted by the National Data Platform and San Diego Supercomputer Center (SDSC).

Special thanks to Pedro and Leticia for their prompt and helpful responses throughout the challenge. Your support made a huge difference in helping our team resolve doubts and stay on track!

👥 Team Agni Astra

Guruprasad Parasnis
Abhay Lal
Yash Vishe

📌 Future Work

To improve rare-class detection and generalizability, we plan to explore:

📉 Class-aware losses (e.g., focal loss)
🚐 Integration of multispectral/remote sensing & soil data
🌍 Spatial cross-validation techniques
🔀 Continual learning for incremental field data updates

💬 Contact

For questions, suggestions, or collaborations, feel free to open an issue or reach out via email.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

🌲 Fire-Ready Forests Data Challenge (Winner) ->Team Agni Astra

📁 Repository Structure

⚠️ Git LFS Required

🚀 Pipeline Overview

🔧 Preprocessing Steps

🧠 Modeling Strategy

📊 Performance Summary

📈 Key Visuals & Insights

📄 Project Report

✨ Acknowledgments

👥 Team Agni Astra

📌 Future Work

💬 Contact

About

Uh oh!

Releases

Packages

Uh oh!

Uh oh!

Contributors

Uh oh!

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 19 Commits
Data		Data
Experimental_NBs		Experimental_NBs
Images		Images
.DS_Store		.DS_Store
.gitattributes		.gitattributes
README.md		README.md
SDSC_Report.pdf		SDSC_Report.pdf
final_NB.ipynb		final_NB.ipynb
plots.ipynb		plots.ipynb
requirements.txt		requirements.txt

Folders and files

Latest commit

History

Repository files navigation

🌲 Fire-Ready Forests Data Challenge (Winner) ->Team Agni Astra

📁 Repository Structure

⚠️ Git LFS Required

🚀 Pipeline Overview

🔧 Preprocessing Steps

🧠 Modeling Strategy

📊 Performance Summary

📈 Key Visuals & Insights

📄 Project Report

✨ Acknowledgments

👥 Team Agni Astra

📌 Future Work

💬 Contact

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Uh oh!

Contributors

Uh oh!

Languages

Packages