Skip to content

EngrEeshaKhan/Learning-Agency-Lab---Automated-Essay-Scoring-2.0

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

71 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

📘 Automated Essay Scoring (AES)

📌 Project Overview

This project implements an Automated Essay Scoring (AES) system trained on the Learning Agency Lab – AES 2 Kaggle dataset. The system evaluates essays based on linguistic richness, coherence, structure, and semantic quality to predict their final human-assigned score.


📂 Project Structure

AES-Project/
│
├── Dataset/
│   ├── train.zip
│   └── test.csv
│
├── Notebook/
│   └── Model_Usage.ipynb
│
├── References/
│   └── CEP.pdf
│
├── requirements.txt
├── .gitignore
├── LICENSE
└── README.md

📊 Dataset Description

The dataset comes from the Learning Agency Lab Automated Essay Scoring 2 challenge (Kaggle). It contains thousands of student essays, each scored by human graders.

Included in the dataset:

  • Essay text
  • Human-assigned scores
  • Essay set information
  • Training & testing partitions
image

🧹 Preprocessing Pipeline

  • Lowercasing & normalization
  • Removing special characters
  • Tokenization (NLTK + spaCy)
  • Lemmatization
  • Stopword removal
  • Sentence segmentation
  • Grammar/spelling cleanup
  • Removal of extremely short essays
🧠 Feature Engineering

Extracted Feature Groups

  • Lexical Features: word count, unique words, vocabulary richness
  • Syntactic Features: POS ratios, sentence lengths
  • Semantic Features: TF-IDF, transformer embeddings
  • Error-Based Features: grammar & spelling errors
  • Structural Features: paragraph count, transitions

🔥 Model Comparison

Model R² Score Notes
Linear Regression 0.40–0.45 Simple baseline
Random Forest ~0.68 Strong classical model
Gradient Boosting ~0.70 Handles non-linear patterns
XGBoost ~0.75 High performance
BERT / RoBERTa Regression Model 0.80–0.82 Best overall results
image

📘Notebook: Model_Usage.ipynb

The notebook performs:

Dataset loading Full preprocessing pipeline Feature engineering ML + Transformer model training Model evaluation Generating predictions

🚀 How to Run the Project

1. Install Dependencies

Make sure Python 3.8+ is installed, then run:

pip install -r Requirements.txt

2. Download spaCy Language Model

This project uses spaCy for text preprocessing. Install the required model:

python -m spacy download en_core_web_sm

3. Open the Notebook

Run the main project notebook from your preferred environment:

  • Google Colab
  • Jupyter Notebook
  • VS Code (Jupyter Extension)
Notebook/Model_Usage.ipynb

✔️ After completing these steps, the system is ready for use.


📎 Dataset Source

This project uses the dataset from the Learning Agency Lab – Automated Essay Scoring 2 competition.

🔗 Click to view the dataset on Kaggle

Kaggle submission score

image

📄 License

This project is licensed under the MIT License.

📎 References

Kaggle AES 2 Dataset

CEP PDF (in /References/)

Standard NLP & AES literature

Releases

No releases published

Packages

No packages published