📘 Automated Essay Scoring (AES)

📌 Project Overview

This project implements an Automated Essay Scoring (AES) system trained on the Learning Agency Lab – AES 2 Kaggle dataset. The system evaluates essays based on linguistic richness, coherence, structure, and semantic quality to predict their final human-assigned score.

📂 Project Structure

AES-Project/
│
├── Dataset/
│   ├── train.zip
│   └── test.csv
│
├── Notebook/
│   └── Model_Usage.ipynb
│
├── References/
│   └── CEP.pdf
│
├── requirements.txt
├── .gitignore
├── LICENSE
└── README.md

📊 Dataset Description

The dataset comes from the Learning Agency Lab Automated Essay Scoring 2 challenge (Kaggle). It contains thousands of student essays, each scored by human graders.

Included in the dataset:

Essay text
Human-assigned scores
Essay set information
Training & testing partitions

🧹 Preprocessing Pipeline

Lowercasing & normalization
Removing special characters
Tokenization (NLTK + spaCy)
Lemmatization
Stopword removal
Sentence segmentation
Grammar/spelling cleanup
Removal of extremely short essays

🧠 Feature Engineering

Extracted Feature Groups

Lexical Features: word count, unique words, vocabulary richness
Syntactic Features: POS ratios, sentence lengths
Semantic Features: TF-IDF, transformer embeddings
Error-Based Features: grammar & spelling errors
Structural Features: paragraph count, transitions

🔥 Model Comparison

Model	R² Score	Notes
Linear Regression	0.40–0.45	Simple baseline
Random Forest	~0.68	Strong classical model
Gradient Boosting	~0.70	Handles non-linear patterns
XGBoost	~0.75	High performance
BERT / RoBERTa Regression Model	0.80–0.82	Best overall results

📘Notebook: Model_Usage.ipynb

The notebook performs:

Dataset loading Full preprocessing pipeline Feature engineering ML + Transformer model training Model evaluation Generating predictions

🚀 How to Run the Project

1. Install Dependencies

Make sure Python 3.8+ is installed, then run:

pip install -r Requirements.txt

2. Download spaCy Language Model

This project uses spaCy for text preprocessing. Install the required model:

python -m spacy download en_core_web_sm

3. Open the Notebook

Run the main project notebook from your preferred environment:

Google Colab
Jupyter Notebook
VS Code (Jupyter Extension)

Notebook/Model_Usage.ipynb

✔️ After completing these steps, the system is ready for use.

📎 Dataset Source

This project uses the dataset from the Learning Agency Lab – Automated Essay Scoring 2 competition.

🔗 Click to view the dataset on Kaggle

Kaggle submission score

📄 License

This project is licensed under the MIT License.

📎 References

Kaggle AES 2 Dataset

CEP PDF (in /References/)

Standard NLP & AES literature

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

📘 Automated Essay Scoring (AES)

📌 Project Overview

📊 Dataset Description

Extracted Feature Groups

🚀 How to Run the Project

1. Install Dependencies

2. Download spaCy Language Model

3. Open the Notebook

📎 Dataset Source

Kaggle submission score

About

Uh oh!

Releases

Packages

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 71 Commits
Dataset		Dataset
Notebook		Notebook
References		References
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
Requirements.txt		Requirements.txt

License

EngrEeshaKhan/Learning-Agency-Lab---Automated-Essay-Scoring-2.0

Folders and files

Latest commit

History

Repository files navigation

📘 Automated Essay Scoring (AES)

📌 Project Overview

📊 Dataset Description

Extracted Feature Groups

🚀 How to Run the Project

1. Install Dependencies

2. Download spaCy Language Model

3. Open the Notebook

📎 Dataset Source

Kaggle submission score

About

Topics

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages