Skip to content

Alberta wildfire pattern analysis (2006 - 2025) using machine learning and geospatial visualization

License

Notifications You must be signed in to change notification settings

cnero101/alberta-wildfire-analysis

Folders and files

NameName
Last commit message
Last commit date

Latest commit

Β 

History

11 Commits
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 

Repository files navigation

πŸ”₯ Alberta Wildfire Data Story (2006-2025)

A comprehensive data science investigation of 26,551 wildfire incidents across two decades

Python Jupyter License scikit-learn geopandas


πŸ“Š Project Overview

This project analyzes 26,551 wildfire incidents from Alberta, Canada (2006-2025) using advanced data science techniques to answer critical questions about fire patterns, causes, and predictability.

🎯 Research Questions

# Question Methods Key Finding
1 Are wildfires increasing? Linear regression, trend analysis High variability, no simple trend
2 Where do fires concentrate? Geospatial clustering (EPSG:3403) Three distinct regions identified
3 What causes fires by region? Chi-square test, contingency analysis Causes vary significantly N→S
4 Does fast response reduce size? Correlation analysis Weak correlation (rβ‰ˆ0.3)
5 What weather predicts fire behavior? Pearson correlation, scatter analysis Combinations matter most
6 Can ML predict fire types? K-means, Random Forest 87% accuracy, 4 fire types

πŸ’‘ Key Insights

βœ… 87% ML prediction accuracy for fire size classification
βœ… 4 distinct fire behavior types identified through clustering
βœ… Regional differences support tailored management strategies
βœ… High year-to-year variability dominates temporal patterns
βœ… Weather combinations predict risk better than individual variables


πŸš€ Quick Start

Prerequisites

  • Python 3.12 or higher
  • Jupyter Notebook
  • 4GB+ RAM recommended

Installation

# Clone repository
git clone https://github.com/yourusername/alberta-wildfire-analysis.git
cd alberta-wildfire-analysis

# Create virtual environment (recommended)
python -m venv venv
source venv/bin/activate  # On Windows: venv\Scripts\activate

# Install dependencies
pip install -r requirements.txt

# Launch Jupyter
jupyter notebook Wildfire_DataStory_Enhanced.ipynb

Get the Data

Option 1: Download from Source

  1. Visit Alberta Wildfire Historical Data
  2. Download complete dataset (2006-2025)
  3. Save as data/wildfire_data.csv

Option 2: Use Sample Data

  • Sample dataset available in /data folder (10% random sample for testing)

πŸ“ Repository Structure

alberta-wildfire-analysis/
β”‚
β”œβ”€β”€ Wildfire_DataStory_Enhanced.ipynb    # Main analysis notebook ⭐
β”œβ”€β”€ README.md                             # This file
β”œβ”€β”€ requirements.txt                      # Python dependencies
β”œβ”€β”€ LICENSE                               # MIT License
β”œβ”€β”€ .gitignore                           # Git ignore rules
β”‚
β”œβ”€β”€ data/
β”‚   β”œβ”€β”€ README.md                        # Data source info
β”‚   └── wildfire_data.csv                # Dataset (download separately)
β”‚
β”œβ”€β”€ images/                              # Visualizations
β”‚   β”œβ”€β”€ eda_*.png                       # Exploratory charts
β”‚   β”œβ”€β”€ q1_*.png                        # Question 1 visuals
β”‚   β”œβ”€β”€ q2_*.png                        # Question 2 visuals
β”‚   └── ...                             # All generated charts
β”‚
└── docs/                               # Additional documentation
    β”œβ”€β”€ methodology.md                  # Detailed methods
    └── data_dictionary.md              # Variable definitions

πŸ› οΈ Technical Stack

Data Science & ML

Library Purpose Version
pandas Data manipulation 2.0+
numpy Numerical computing 1.24+
scipy Statistical analysis 1.10+
scikit-learn Machine learning 1.3+

Visualization

Library Purpose Version
matplotlib Static plots 3.7+
seaborn Statistical graphics 0.12+
plotly Interactive charts 5.14+

Geospatial

Library Purpose Version
geopandas Geographic data structures 0.13+
pyproj Coordinate transformations 3.5+
contextily Basemap tiles 1.3+
shapely Geometric operations 2.0+

Coordinate System: EPSG:3403 (NAD83 Alberta 10-TM Forest)


πŸ“ˆ Analysis Workflow

1. Data Loading & Profiling

  • Import 26,551 fire records
  • Assess data quality (completeness, types, distributions)
  • Identify missing data patterns

2. Data Preparation

  • Handle missing values appropriately
  • Engineer features (Fire Weather Index, periods, regions)
  • Convert dates and categorize variables

3. Exploratory Data Analysis

  • Visualize distributions
  • Identify temporal and spatial patterns
  • Compute initial correlations

4-9. Six Research Questions

  • Each question follows: Motivation β†’ Methods β†’ Analysis β†’ Findings β†’ Implications
  • Statistical rigor: hypothesis tests, significance levels, confidence intervals
  • Multiple visualization types for each question

10. Machine Learning

  • Unsupervised: K-means clustering (k=4) to discover fire types
  • Supervised: Random Forest to predict fire size categories
  • Validation: Silhouette scores, confusion matrices, precision/recall

11. Synthesis & Conclusions

  • Connect findings across questions
  • Identify actionable insights
  • Acknowledge limitations
  • Recommend next steps

πŸ“Š Sample Visualizations

Temporal Trends (Question 1)

Annual fire frequency shows high year-to-year variability with extreme years (2016, 2019, 2023) rather than a consistent linear increase.

Spatial Clustering (Questions 2 & 6)

Geographic analysis reveals three distinct fire environments: Northern boreal (remote, lightning-caused), Central transition zone, and Southern grassland (human-caused).

Machine Learning Results (Question 6)

K-means clustering identified 4 fire behavior types with 87% Random Forest classification accuracy.

Note: All visualizations are generated automatically when running the notebook


πŸŽ“ Methodology Highlights

Statistical Methods

  • Linear Regression - Trend detection (coefficients, RΒ², p-values)
  • Pearson Correlation - Association strength and significance
  • Chi-Square Test - Independence testing (categorical variables)
  • Hypothesis Testing - Ξ± = 0.05 significance level throughout

Machine Learning

  • K-Means Clustering

    • Optimal k selection via elbow method and silhouette scores
    • Feature standardization (StandardScaler)
    • 9 variables: weather, location, timing, fire characteristics
  • Random Forest Classification

    • 70/30 train/test split
    • Hyperparameter tuning via grid search
    • Performance metrics: accuracy, precision, recall, F1
    • Feature importance analysis

Geospatial Analysis

  • Projection: EPSG:3403 (NAD83 Alberta 10-TM Forest)
  • Grid Resolution: 5km Γ— 5km cells
  • Smoothing: Gaussian kernel (Οƒ=2.5 cells)
  • Density Mapping: 2D histograms with interpolation

πŸ’‘ Key Findings & Implications

For Fire Managers

βœ… Pre-position resources based on identified geographic hotspots
βœ… Use cluster profiles for initial fire risk assessment
βœ… Differentiate strategies by region (North vs. Central vs. South)
βœ… Peak suppression capacity needed June-August

For Policy Makers

βœ… Evidence supports regional (not province-wide) strategies
βœ… Invest in northern detection (helicopters, remote sensing)
βœ… Invest in southern prevention (public education, fuel mgmt)
βœ… Climate adaptation: prepare for high-variability future

For Researchers

βœ… Demonstrates ML feasibility for fire classification
βœ… Identifies data gaps (fuel moisture, suppression effort)
βœ… Provides baseline for climate change studies
βœ… Methodology transferable to other regions


⚠️ Limitations & Caveats

Data Quality

  • 60% missing environmental data (weather measurements)
    • Small fires receive abbreviated assessments
    • May over-represent larger fires in correlations
    • Complete case analysis is valid but introduces bias

Analytical Scope

  • Correlation β‰  Causation - We show associations, not proven causes
  • 20-year window - May be too short for climate trend detection
  • Suppression effects - Final fire size reflects both behavior AND firefighting
  • Missing variables - Fuel moisture, suppression effort, economic costs

Model Limitations

  • 87% accuracy - Means 13% error rate (168 large fires misclassified)
  • Cannot replace experts - Models support, don't replace human judgment
  • Temporal validity - Patterns may shift with climate change

See notebook for complete limitations discussion


🀝 Contributing

Contributions welcome! Please follow these guidelines:

How to Contribute

  1. Fork the repository
  2. Create a feature branch (git checkout -b feature/improvement)
  3. Commit your changes (git commit -m 'Add improvement')
  4. Push to branch (git push origin feature/improvement)
  5. Open a Pull Request

Areas for Contribution

  • πŸ”¬ Additional analyses - Temporal forecasting, fuel type deep-dive
  • πŸ“Š New visualizations - Interactive dashboards, animated maps
  • πŸ€– Model improvements - Alternative ML algorithms, ensemble methods
  • πŸ“ Documentation - Data dictionary expansion, methodology details
  • πŸ› Bug fixes - Code optimization, error handling

Code Style

  • Follow PEP 8 guidelines
  • Add docstrings to functions
  • Include comments for complex logic
  • Update requirements.txt if adding dependencies

πŸ“§ Contact & Support

Questions? Open an issue

Project Maintainer:

Want to collaborate? Reach out directly or open a discussion!


πŸ“œ License

This project is licensed under the MIT License - see LICENSE file for details.

Data License

Alberta Wildfire data is public domain (Government of Alberta).
Attribution required: "Data provided by Alberta Wildfire Management"


πŸ™ Acknowledgments

Data Source

  • Alberta Wildfire Management for maintaining comprehensive public records
  • Government of Alberta for open data commitment

Inspiration & Support

  • Fire management professionals whose expertise keeps communities safe
  • Data science community for tools and best practices
  • Open source contributors for excellent libraries

Tools & Technologies

  • Python ecosystem (pandas, scikit-learn, matplotlib, geopandas)
  • Jupyter Project for notebook environment
  • GitHub for version control and collaboration

πŸ“š References & Resources

Data Source

Relevant Literature

  1. Flannigan, M., et al. (2013). Global wildland fire season severity in the 21st century. Forest Ecology and Management.
  2. Rodrigues, M., & de la Riva, J. (2014). An insight into machine-learning algorithms to model wildfire susceptibility. Environmental Modelling & Software.
  3. Tymstra, C., et al. (2010). Development of Prometheus: Canadian Wildland Fire Growth Model. Natural Resources Canada.

Related Projects


πŸ“ˆ Project Stats

GitHub stars GitHub forks GitHub watchers GitHub issues GitHub last commit


🌟 Support This Project

If you found this analysis useful:

⭐ Star this repository
πŸ”€ Fork for your own use
πŸ“’ Share with colleagues
πŸ’¬ Provide feedback
🀝 Contribute improvements

Every star helps make data science research more visible!


πŸ“… Version History

v1.0.0 (February 2026)

  • Initial release
  • Complete 6-question analysis
  • Machine learning implementation
  • EPSG:3403 geospatial visualization
  • Comprehensive documentation

πŸ”₯ Analyzing wildfires with data science to build a more resilient Alberta

Last Updated: February 2026

About

Alberta wildfire pattern analysis (2006 - 2025) using machine learning and geospatial visualization

Topics

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published