An intelligent machine learning application that predicts student exam scores and provides personalized recommendations for academic improvement using advanced AI and data analytics.
- Python 3.12 or higher
- pip (Python package manager)
- ~2GB free disk space
- Clone the repository
git clone https://github.com/Ayushkumar418/Student_Performance_Predictor
cd student-performance-predictor- Create a virtual environment (optional but recommended)
python -m venv venv
source venv/Scripts/activate # On Windows
# or
source venv/bin/activate # On macOS/Linux- Install dependencies
pip install -r requirements.txtRequired packages:
- streamlit
- pandas
- numpy
- scikit-learn
- joblib
- plotly
- statsmodels
- Verify installation
python verify_system.pyπ‘ First-time setup? See the detailed First Time Setup Guide for step-by-step instructions including model training, verification, and testing in the correct order.
Simple Version (3 tabs):
streamlit run app.pyAdvanced Version (5 tabs) - Recommended:
streamlit run app_advanced.pyThe app will open in your browser at: http://localhost:8501
- Manual Input: Enter 24+ student factors
- Real-time Prediction: Get instant exam score (0-100)
- Performance Metrics:
- Predicted vs Class Average
- Percentile Ranking
- Confidence Intervals (90% & 95%)
- Personalized Recommendations: 10+ actionable tips
- View student semester history
- Analyze performance trends
- Predict next semester performance
- Trend-based recommendations
- Feature Importance: See what factors matter most
- Prediction Confidence: Understand uncertainty levels
- Student Analytics:
- Score distribution
- Attendance vs Performance
- Study hours correlation
- GPA analysis
- Model Comparison: View all 3 trained models
- Cross-validation Results: 5-fold validation metrics
student-performance-predictor/
βββ app.py # Simple 3-tab application
βββ app_advanced.py # Advanced 5-tab dashboard β
βββ train_advanced.py # Model training pipeline
βββ verify_system.py # System verification
βββ test_app.py # Application tests
β
βββ StudentPerformanceFactors.csv # Dataset (6,607 students)
β
βββ student_performance_model.pkl # Trained model (Linear Regression)
βββ all_models.pkl # Backup models (RF, GB)
βββ scaler.pkl # Feature normalizer
β
βββ model_results.json # Performance metrics
βββ feature_importance.json # Feature rankings
βββ residuals.json # Confidence data
βββ analysis_summary.json # Dataset insights
β
βββ README.md # This file
βββ TECHNICAL.md # Technical documentation
βββ requirements.txt # Python dependencies
βββ .gitignore # Git ignore file
Linear Regression with Feature Engineering
- β Test Accuracy: 100% (RΒ² = 1.0000)
- β Cross-Validation: 1.0000 Β± 0.0000 (5-fold)
- β Mean Absolute Error: 0.00 points
- Total: 35 features
- 19 original features
- 16 engineered features (interactions, polynomials, composites)
- Students: 6,607 records
- Columns: 34 attributes
- Score Range: 0-100
- GPA Range: 0-10 (scaled from 0-4)
- π Cumulative GPA (strongest predictor)
- π Attendance Rate (58% correlation)
- β±οΈ Study Hours (45% correlation)
- π€ Class Participation (43% correlation)
- π Previous Scores (18% correlation)
Study Habits
- Hours studied per week (0-50)
- Attendance percentage (60-100%)
- Monthly tutoring sessions (0-10)
- Access to resources (Low/Medium/High)
Environment & Support
- Parental involvement (Low/Medium/High)
- Family income (Low/Medium/High)
- Teacher quality (Low/Medium/High)
- Internet access (Yes/No)
Personal Factors
- Motivation level (Low/Medium/High)
- Peer influence (Negative/Neutral/Positive)
- Sleep hours per night (4-10)
- Previous exam score (0-100)
Advanced Factors (optional)
- Extracurricular activities
- School type (Public/Private)
- Grade level (1-4)
- Learning disabilities
- Gender
- Current semester (1-8)
- Distance from home
- Parental education
- Physical activity hours
- Class participation score
- π Predicted Exam Score: 0-100
- π― Performance Category: Excellent/Good/Average/At Risk
- π Confidence Intervals: Β±X points (90% & 95%)
- π‘ Personalized Recommendations: Top 10 action items
- π Study hours optimization
- π Attendance improvement
- π΄ Sleep hygiene
- π Physical activity
- π¨βπ« Tutoring suggestions
- πͺ Motivation strategies
- π¨ Extracurricular involvement
- π Class participation
- π Resource access
- π¨βπ©βπ§ Family support
- Study: 19+ hours/week
- Attendance: 79%+
- GPA: 7.0+
- Sleep: 6-8 hours/night
- Study: 18 hours/week
- Attendance: 85%
- GPA: 5.0-7.0
- Sleep: 7 hours/night
- Study: 10 hours/week (47% less)
- Attendance: 64% (21% lower)
- GPA: <3.0
- Sleep: Irregular
- π Predict exam performance before studying
- π Understand factors affecting grades
- π‘ Get actionable improvement suggestions
- π Track progress over semesters
- π¨βπ« Identify at-risk students early
- π Provide targeted interventions
- π Analyze class performance patterns
- π― Make data-driven decisions
- π Monitor institutional performance
- π Identify resource needs
- π Generate performance reports
- π― Plan academic support programs
If you have new data or want to retrain:
python train_advanced.pyThis will:
- Load and preprocess the CSV data
- Engineer 16 new features
- Train 3 models (Linear Regression, Random Forest, Gradient Boosting)
- Perform 5-fold cross-validation
- Save the best model and metrics
- Generate feature importance analysis
Note: Make sure StudentPerformanceFactors.csv is in the same directory.
To verify everything is set up correctly:
python verify_system.pyChecks:
- β Model files present
- β Data file accessible
- β All dependencies installed
- β Feature compatibility
- β Model predictions working
Run the test suite:
python test_app.pyTests validate:
- Model predictions
- Feature engineering
- Data compatibility
- Input validation
| Model | Test RΒ² | MAE | RMSE | Accuracy | CV Mean RΒ² |
|---|---|---|---|---|---|
| Linear Regression (Selected) | 1.0000 | 0.00 | 0.00 | 100.00% | 1.0000 Β± 0.0000 |
| Random Forest | 0.9997 | 0.00 | 0.07 | 99.99% | 0.9994 Β± 0.0004 |
| Gradient Boosting | 0.9999 | 0.00 | 0.03 | 100.00% | 0.9998 Β± 0.0002 |
Predictions can be exported as CSV with:
- Timestamp
- Predicted score
- Class average
- Percentile ranking
- Student inputs
- β All data processed locally (no cloud uploads)
- β No external API calls
- β Student data stored securely
- β No third-party data sharing
pip install --upgrade streamlit
streamlit run app_advanced.pypip install statsmodelspython verify_system.py
# or
python train_advanced.py- Ensure
StudentPerformanceFactors.csvis in the project directory - Check file permissions
- Verify CSV format integrity
- TECHNICAL.md - Deep technical documentation
- requirements.txt - All dependencies
- In-app Help - Hover over fields for tooltips
streamlit run app_advanced.pystreamlit run app_advanced.py --server.port 8501 --server.address 0.0.0.0FROM python:3.12-slim
WORKDIR /app
COPY requirements.txt .
RUN pip install -r requirements.txt
COPY . .
EXPOSE 8501
CMD ["streamlit", "run", "app_advanced.py"]- π Model Caching: Models cached in memory for instant predictions
- π Data Caching: CSV loaded once and cached
- β‘ Efficient Computation: NumPy/Pandas optimized operations
- π¨ UI Optimization: Lazy loading of visualizations
1. User Input (24+ factors)
β
2. Data Validation
β
3. Feature Engineering (35 features)
β
4. Model Prediction
β
5. Confidence Calculation
β
6. Recommendation Generation
β
7. Results Display + Export
- β 5-fold cross-validation ensures robustness
- β Multiple models for comparison
- β Residual analysis for uncertainty
- β Feature importance verification
- β Regular testing suite
Contributions welcome! Areas to improve:
- Real-time database integration
- Email alert system for at-risk students
- PDF report generation
- Mobile app version
- REST API endpoints
- Multi-language support
This project is licensed under the MIT License - see LICENSE file for details.
Created with β€οΈ for educational institutions
- π§ For issues, use GitHub Issues
- π¬ Questions? Check TECHNICAL.md
- π Bug reports welcome
β¨ 100% Accurate predictions on test set
π 35 Engineered Features for better insights
π‘ Personalized Recommendations for each student
π Advanced Analytics dashboard included
β‘ Lightning Fast predictions (<100ms)
π Secure local data processing
π± Responsive UI on all devices
π― Production Ready code quality
Ready to improve student performance? Get Started β