A comprehensive machine learning system that predicts solar and wind energy generation based on real-time weather data, following the flowchart process outlined in the project requirements
This system implements a complete weather-based energy prediction pipeline that:
- Fetches real-time weather data from Open-Meteo API
- Cleans and processes historical CSV data for model training
- Trains machine learning models for solar and wind energy prediction
- Generates hourly energy predictions using real-time weather inputs
- Provides comprehensive model evaluation and performance analysis
The system follows the exact process flow from the flowchart:
API (Real-time Weather Data) โ CSV Update โ Data Cleaning โ Model Training โ Energy Prediction
โ
Weather Parameters: Temperature, Wind Speed, Solar Irradiance, Humidity, Cloud Cover
โ
Hourly Energy Predictions: Solar (kW/h) + Total Energy (kW/h)
weather-prediction-model/
โโโ Data/
โ โโโ weather_last_year_data .csv # Historical training data
โ โโโ cleaned_weather_data.csv # Processed historical data
โ โโโ weather.csv # Additional weather data
โโโ weather_forecast.py # Original weather data fetcher
โโโ weather_energy_prediction_model.py # Main prediction model
โโโ data_preprocessing.py # Data cleaning and preprocessing
โโโ model_evaluation.py # Model performance evaluation
โโโ requirements.txt # Python dependencies
โโโ README.md # This file
## ๐ Quick Start
### 1. Install Dependencies
```bash
pip install -r requirements.txt
python weather_energy_prediction_model.pyThis will:
- Fetch real-time weather data from API
- Clean historical CSV data
- Train solar and wind energy models
- Generate hourly energy predictions
- Save the trained model
python data_preprocessing.pyThis will:
- Clean and analyze historical data
- Create additional features
- Generate data distribution visualizations
- Save cleaned data for model training
python model_evaluation.pyThis will:
- Load the trained model
- Evaluate performance metrics
- Generate performance visualizations
- Create a comprehensive performance report
Class: WeatherEnergyPredictionModel
Key Methods:
fetch_realtime_weather_data(): Fetches weather data from Open-Meteo APIclean_csv_data(): Cleans historical CSV datatrain_model(): Trains Random Forest models for solar and wind energypredict_energy(): Makes energy predictions using real-time weather datarun_complete_pipeline(): Executes the complete workflow
Features:
- Dual model approach (solar + wind energy)
- Real-time weather data integration
- Automatic feature engineering
- Model persistence and loading
Key Functions:
load_and_explore_data(): Loads and explores historical dataclean_data(): Removes missing values, duplicates, and outlierscreate_features(): Creates interaction and efficiency featuresencode_categorical_variables(): Encodes categorical variablesanalyze_data_distribution(): Creates data distribution visualizationsanalyze_correlations(): Creates correlation heatmaps
Class: ModelEvaluator
Key Methods:
evaluate_model_performance(): Calculates performance metricscreate_performance_visualizations(): Creates prediction vs actual plotscreate_feature_importance_plot(): Shows feature importance analysisgenerate_performance_report(): Creates comprehensive performance report
The system expects historical data with the following columns:
| Column | Description | Type |
|---|---|---|
| Source_Type | Energy source (Solar/Wind) | Categorical |
| Solar_Irradiance | Solar radiation intensity (W/mยฒ) | Numerical |
| Wind_Speed | Wind speed (m/s) | Numerical |
| Ambient_Temperature | Air temperature (ยฐC) | Numerical |
| Humidity | Relative humidity (%) | Numerical |
| Cloud_Cover | Cloud coverage (%) | Numerical |
| Panel_Area | Solar panel area (mยฒ) | Numerical |
| Blade_Length | Wind turbine blade length (m) | Numerical |
| Storage_Capacity | Energy storage capacity | Numerical |
| Maintenance_Schedule | Maintenance schedule | Numerical |
| Energy_Output_Class | Energy output classification | Categorical |
The system fetches real-time weather data including:
- Temperature (ยฐC)
- Wind Speed (m/s)
- Solar Irradiance (W/mยฒ)
- Humidity (%)
- Cloud Cover (%)
- Random Forest Regressor for both solar and wind energy prediction
- Separate models for solar and wind energy generation
- Feature scaling using StandardScaler
- Categorical encoding using LabelEncoder
- Base Features: Solar_Irradiance, Wind_Speed, Ambient_Temperature, Humidity, Cloud_Cover, Panel_Area, Blade_Length
- Engineered Features: Temperature_Squared, Humidity_Temperature_Interaction, Solar_Wind_Interaction
- Categorical Features: Source_Type_Encoded
- Solar Energy: Solar_Irradiance ร Panel_Area ร Efficiency_Factor
- Wind Energy: Wind_Speedยณ ร Blade_Length ร Power_Coefficient
The system generates hourly predictions in the format:
| Datetime | Predicted_Solar_kWh | Predicted_Energy_kWh |
|---|---|---|
| 2024-01-15 09:00 | 25.0 | 35.0 |
| 2024-01-15 10:00 | 40.0 | 50.0 |
| ... | ... | ... |
- Real-time weather data:
weather_forecast_YYYY-MM-DD.csv - Energy predictions:
energy_predictions_YYYYMMDD_HHMMSS.csv - Trained model:
weather_energy_model.pkl - Cleaned data:
Data/cleaned_weather_data.csv - Visualizations:
data_distribution_analysis.pngcorrelation_heatmap.pngmodel_performance_analysis.pngfeature_importance_analysis.png
- Reports:
model_performance_report.txt
- Fetch real-time weather data from Open-Meteo API
- Update CSV with latest weather information
- Remove missing values and duplicates
- Handle outliers using IQR method
- Create additional engineered features
- Encode categorical variables
- Split data into training and testing sets
- Scale features using StandardScaler
- Train Random Forest models for solar and wind energy
- Evaluate model performance
- Use trained models with real-time weather data
- Generate hourly energy predictions
- Save predictions to CSV file
# In weather_energy_prediction_model.py
latitude = 23.077080 # Your location latitude
longitude = 76.85131 # Your location longitude# Random Forest parameters
n_estimators = 100 # Number of trees
random_state = 42 # Random seed for reproducibility
test_size = 0.2 # Test set size (20%)# Efficiency factors
solar_efficiency = 0.15 # Solar panel efficiency
wind_power_coefficient = 0.4 # Wind turbine power coefficientThe system evaluates models using:
- Rยฒ Score: Coefficient of determination
- RMSE: Root Mean Square Error
- MAE: Mean Absolute Error
- MAPE: Mean Absolute Percentage Error
-
API Connection Error
- Check internet connection
- Verify API endpoint availability
- Check latitude/longitude coordinates
-
Data Loading Error
- Ensure CSV file exists in Data/ folder
- Check file permissions
- Verify CSV format matches expected structure
-
Model Training Error
- Ensure sufficient training data
- Check for missing values in features
- Verify feature column names match
-
Memory Issues
- Reduce dataset size for testing
- Use smaller Random Forest parameters
- Process data in chunks
โ Failed to fetch weather data: API connection issueโ Error cleaning data: Data preprocessing problemโ Error training model: Model training failureโ Error making predictions: Prediction generation issue
- Real-time Dashboard: Web interface for live monitoring
- Multiple Weather APIs: Fallback and redundancy
- Advanced ML Models: Deep learning and ensemble methods
- Weather Forecasting: Multi-day predictions
- Energy Optimization: Load balancing and storage optimization
- Hyperparameter Tuning: Grid search and optimization
- Feature Selection: Automated feature importance analysis
- Cross-validation: K-fold cross-validation for robust evaluation
- Model Ensembling: Combine multiple algorithms
- pandas: Data manipulation and analysis
- numpy: Numerical computing
- scikit-learn: Machine learning algorithms
- requests: HTTP library for API calls
- joblib: Model persistence
- matplotlib: Data visualization
- seaborn: Statistical data visualization
- Python: 3.7 or higher
- Memory: Minimum 4GB RAM
- Storage: 1GB free space
- Internet: Required for API calls
- Training Time: 1-5 minutes (depending on data size)
- Prediction Time: <1 second per hour
- Memory Usage: 100-500MB during training
- Storage: ~50MB for trained models
- Fork the repository
- Create a feature branch
- Make your changes
- Add tests if applicable
- Submit a pull request
- Follow PEP 8 guidelines
- Add docstrings for all functions
- Include type hints where possible
- Write clear commit messages
This project is licensed under the MIT License - see the LICENSE file for details.
For questions and support:
- Create an issue in the repository
- Check the troubleshooting section
- Review the error logs
- Verify system requirements
- Open-Meteo API for weather data
- Scikit-learn for machine learning algorithms
- Python community for excellent libraries
- Contributors and users of this system
Note: This system is designed for educational and research purposes. For production use, please ensure proper testing, validation, and compliance with local regulations.