- Overview
- Key Features
- Project Architecture
- Pre-Match Analysis
- In-Match Analysis
- Data Collection Process
- Feature Engineering
- Model Architecture
- Project Structure
- How to Run
- Results & Performance
- Future Work
InjurySense.AI is an advanced football injury prediction system using machine learning to help teams proactively prevent player injuries. The system offers two complementary modes of analysis:
- Pre-Match Analysis: Predicts injury risks before matches based on historical data
- In-Match Analysis: Provides real-time monitoring of injury risks during games
Pre-Match Injury Risk Dashboard showing player risks, position breakdowns, and team risk assessment
In-Match Live Risk Monitoring showing real-time player risk tracking throughout the match
This AI-driven solution helps coaches and medical staff make data-informed decisions to optimize player health and team performance.
- Player-specific injury risk predictions before games
- Confidence intervals for risk assessment
- Visual risk distribution by player position
- Identification of high-risk players requiring special attention
- Team vs. opponent contextual analysis
- Real-time player risk monitoring
- Dynamic risk ranking tracking over time
- High-risk zone alerts
- Minute-by-minute risk trend visualization
- Auto-refresh capabilities for live data
The system is built on a three-layer architecture:
-
Data Layer:
- Historical player and match data collection via API-Football
- Feature preprocessing and transformation
- Storage of pre-computed models and inference results
-
Model Layer:
- Pre-match XGBoost classification model (Accuracy score: 0.92)
- In-match streaming prediction pipeline
- Feature importance analysis for explainability
-
Presentation Layer:
- Interactive Streamlit dashboard
- Real-time visualization components
- Team and player selection interface
- Risk distribution charts and player cards
The pre-match analysis system uses a combination of:
- Previous season statistics for baseline player metrics
- Player's recent match data (previous 5 matches)
- Team performance metrics in recent matches
- Opponent team analysis and match context
- Historical injury records to identify recurring patterns
The model generates a risk probability score (0-1) for each player with confidence intervals, allowing medical staff to prioritize preventative measures.
Pre-match injury risk visualization per player with confidence intervals
- Team Risk Overview: Summary of high vs. low risk players
- Risk Distribution by Position: Analysis of which positions have the highest injury rates
- High Risk Players Spotlight: Focused attention on players requiring intervention
- Player Injury Risk Assessment: Detailed risk assessment with confidence intervals
The in-match analysis system provides:
- Live Risk Rankings: Current risk status of all players
- Risk Trend Visualization: How player risk evolves over match time
- High Risk Zone Tracking: Visual identification of players entering dangerous risk levels
- Position-Specific Monitoring: Different thresholds based on player roles
The system updates every minute during live matches and visualizes which players are trending toward higher risk, allowing coaches to make substitution decisions or adjust player roles.
In-match risk trends showing how player risk evolves throughout the game
This process collects data for players who suffered injuries:
-
Injury Data Collection:
- Query the API for players with recorded injuries in the 2024 season
- Extract injury dates, player details, and match contexts
-
Feature Expansion:
- For each injury, identify the opponent team
- Gather previous 5 fixtures for the player's team
- Gather previous 5 fixtures for the opponent team
-
Previous Season Stats Collection:
- Gather player statistics from the previous season
- Include metrics like appearances, goals, assists, etc.
-
Raw Stats Collection:
- For each of the previous 5 matches:
- Collect player-specific performance metrics
- Team aggregate statistics (shots, possession, passing accuracy)
- Opponent team statistics
- For each of the previous 5 matches:
This collects data for non-injured players to create balanced training data:
-
Non-Injury Player Identification:
- Find players who participated in matches without injuries
-
Random Fixture Selection:
- For each non-injured player, randomly select 5 fixtures they participated in
- Tag these as "non-injury" examples (label = 0)
The system leverages these feature categories:
- Recent form indicators (avg. minutes, rating, shots, goals)
- Physical load metrics (duels, tackles, fouls)
- Performance consistency measures
- Historical injury patterns
- Position-specific metrics
- Team's recent shooting efficiency
- Ball possession patterns
- Team passing accuracy
- Defensive metrics (tackles, interceptions)
- Set piece statistics (corners, free kicks)
- Opponent's aggressive play indicators
- Match difficulty based on opponent form
- Opponent pressing intensity
- Team vs. opponent statistical differentials
- Performance ratios (shots, possession, passing)
- Historical matchup patterns
The core predictive engine uses XGBoost, optimized for injury prediction:
-
Preprocessing Pipeline:
- Feature cleaning and normalization
- Missing value imputation
- Feature selection based on importance
-
Model Configuration:
- Optimized hyperparameters (n_estimators=80, learning_rate=0.1)
- Cross-validation with StratifiedGroupKFold to prevent data leakage
- Group-based splitting by player_id to ensure no player appears in both train/test sets
-
Evaluation Metrics:
- Accuracy: 0.92
- Significant improvement over baseline (0.56 accuracy)
-
Key Performance Insights:
- Most predictive features:
- Number of injuries in last 6 months
- Days since last injury incident
- Player's age
- Team's passing accuracy in last 5 matches
- Most predictive features:
π AI_Football_Injury_Prevention/
βββ assets/ # UI assets and logos
βββ components/ # Streamlit UI components
β βββ in_match_analysis.py # In-match analysis page
β βββ live_match_visualization.py # Real-time visualizations
β βββ metric_plane.py # Team risk summary components
β βββ player_cards.py # Player spotlight cards
β βββ risk_charts.py # Distribution charts
β βββ sidebar.py # Navigation sidebar
β βββ table.py # Player risk table
βββ data/ # Data storage
βββ model_inferences/ # Model inference outputs
βββ models/ # Trained models
βββ scripts/ # Data preparation scripts
β βββ neg_samples_generation.ipynb # Negative samples collection
β βββ pos_samples_generation.ipynb # Positive samples collection
β βββ football-model.ipynb # Model training
βββ utils/ # Utility functions
βββ config.yaml # Configuration settings
βββ main.py # Main application entry point
βββ README.md # This file
-
Clone this repository:
git clone https://github.com/username/AI_Football_Injury_Prevention.git cd AI_Football_Injury_Prevention -
Install required packages:
pip install -r requirements.txt
-
Run the Streamlit application:
streamlit run main.py
-
Access the dashboard at http://localhost:8501
The model achieves strong predictive performance:
- Accuracy: 0.92
- Baseline Accuracy: 0.56
- Performance varies by injury type
- Some rare injury causes remain difficult to predict
- Requires recent performance data to make accurate predictions
-
In-Match Enhancement:
- Integrate GPS tracking data for more precise fatigue estimation
- Implement computer vision analysis for movement patterns
- Create personalized player risk profiles
-
UI Development:
- Expand team comparison features
- Add predictive substitution recommendations
- Develop mobile application for sideline use
-
Enhanced Features:
- Integrate physical performance metrics (GPS data)
- Add biomechanical risk factors when available
- Incorporate weather and pitch conditions
-
Model Improvements:
- Experiment with deep learning architectures
- Test ensemble methods for improved accuracy
- Implement explainable AI features for coaching staff
InjurySense.AI Β© 2025 | Powered by AI Football Injury Prevention System
