AI Copychecking is an innovative solution designed to automate the grading process by comparing original answer keys with students' handwritten answer scripts. The project uses advanced NLP techniques, similarity metrics, and OCR technology to provide an accurate grading system. The application is deployed on Azure and features a user-friendly frontend.
-
Input:
- Users upload two files:
- The original answer key in PDF format.
- The student's handwritten notes (image or scanned PDF).
- Users upload two files:
-
Text Extraction:
- From PDFs (Typed Text):
- Text is extracted using the
PyPDF2
library. - This ensures clean and structured text data from the answer key.
- Text is extracted using the
- From Handwritten Notes:
- Handwritten text is extracted using the Gemini OCR API.
- The OCR API processes the image or scanned notes and converts it into machine-readable text.
- From PDFs (Typed Text):
-
Text Comparison:
- Naive Similarity:
- Basic word overlap and matching techniques are applied.
- Context-Based Similarity:
- Tools like
Gensim
andWord2Vec
are used to measure the semantic similarity between the extracted texts.
- Tools like
- Evaluation Metrics:
- BLEU (Bilingual Evaluation Understudy): Measures precision-based similarity.
- ROUGE-N: Measures recall-based similarity.
- Naive Similarity:
-
Grading System:
- A grading algorithm assigns scores based on threshold values of BLEU, ROUGE-N, and other metrics.
- These thresholds can be adjusted for different grading criteria.
-
Frontend:
- A user-friendly interface allows:
- File uploads.
- Viewing similarity scores and grades.
- Built using Gradio or Hugging Face Spaces.
- A user-friendly interface allows:
-
Deployment:
- The entire application is hosted on Azure for scalability and reliability.
-
NLP Operations:
-
Word Similarity Mapping by Context:
-
Text Extraction:
- Typed Text from PDFs: PyPDF2: For extracting text from PDF files.
- Handwritten Notes: Gemini OCR API: For converting handwritten content into text. (Paid API; ensure you have access.)
-
Frontend:
- Gradio: For building interactive user interfaces.
- Hugging Face Spaces: Alternative for hosting simple apps.
-
Backend:
-
Deployment:
- Azure: For hosting and scaling the application.
- Install Python 3.8+.
- Create an Azure account.
- Obtain a subscription for the Gemini OCR API (if needed).
-
Clone the Repository:
git clone https://github.com/yourusername/ai-copychecking.git cd ai-copychecking
-
Set Up a Virtual Environment:
python -m venv venv source venv/bin/activate # On Windows: venv\Scripts\activate
-
Install Dependencies:
pip install -r requirements.txt
-
Configure API Keys:
- Create a
config.py
file. - Add your API keys:
GEMINI_OCR_API_KEY = "your_api_key_here"
- Create a
-
Run the Application:
python app.py
-
Access the Application:
- Open your browser and navigate to
http://localhost:5000
.
- Open your browser and navigate to
- Automated Text Extraction:
- Extracts text from PDFs and handwritten notes seamlessly.
- Advanced Comparison:
- Uses both naive and context-based similarity techniques.
- Customizable Grading System:
- Adjust thresholds for BLEU, ROUGE-N, and other metrics.
- Interactive Frontend:
- Simple interface for uploading files and viewing results.
- Cloud Deployment:
- Hosted on Azure for high availability and scalability.
- BLEU Metric: BLEU Explained
- ROUGE Metric: ROUGE Explained
- Gradio: Documentation
- Azure Deployment: Getting Started
- Gemini OCR API: API Details
- FastAPI: Documentation
Contributions are welcome! Please follow these steps:
- Fork the repository.
- Create a new branch:
git checkout -b feature/your-feature-name
- Commit your changes:
git commit -m "Add your message here"
- Push to the branch:
git push origin feature/your-feature-name
- Open a pull request.
This project is licensed under the MIT License. See the LICENSE
file for details.