Build a machine learning model that can automatically evaluate spoken audio and assign a grammar score (1–5) based on sentence structure and syntax quality.
Mind Map - Dataset
├── Audio Files (.wav)
│ ├── audios_train/
│ └── audios_test/
├── train.csv
│ └── filename + grammar score
├── test.csv
│ └── filename only
└── sample_submission.csv
└── sample format for output
---
## ⚙️ Workflow / Pipeline
Mind Map - Workflow
- 🎧 Audio to Text └── Using Whisper for transcription
- ✨ Text Cleaning └── Remove punctuation, lowercase, clean spaces
- 🧮 Feature Extraction └── TF-IDF Vectorizer (max 1000 features)
- 🌲 Model Training └── Random Forest Regressor
- 📊 Evaluation └── Pearson Correlation
- 🧪 Prediction on test set └── Generate submission.csv
---
## 📈 Evaluation Metric
**Pearson Correlation** used to evaluate prediction quality.
📌 Final Public Score: 0.519
---
## 📁 Files Included
- `Untitled0.ipynb` - Main notebook with code and explanations
- `submission.csv` - Output file with predictions for test set
---
## 💡 Future Enhancements
Mind Map - Improvements ├── Use advanced models (e.g. BERT, XGBoost) ├── Handle diverse accents ├── Use grammar-checking NLP tools └── Add audio-based features (e.g. fluency, pause detection)
---
## 👤 Author
**Crafted with care by [Avin Raj]** ✨
📬 For queries or collaborations, feel free to reach out!