Skip to content

aneesh-vishwa/Smart-text-correction

Repository files navigation

Spell Correction System

This project is a comprehensive Python-based spell correction system integrating multiple techniques such as phonetic similarity, n-gram modeling, TF-IDF/BM25 relevance scoring, and adaptive learning from user behavior.


📁 Project Structure

spell_correction_full_project/
├── addconfusion.json
├── corpus.data
├── delconfusion.json
├── dictionary.data
├── grammar_correction.py
├── ngram.py
├── phonetic_correction.py
├── revconfusion.json
├── spellcorrect.py
├── spellerrors.csv
├── subconfusion.json
├── tfidf_bm25.py
├── user_errors.db
├── user_learning.py

Key Features

  • Dictionary & Corpus-based Correction
    Uses dictionary.data and corpus.data for basic spell checking and word frequency prioritization.

  • Phonetic Correction
    Approximates user input using phonetic similarity algorithms (phonetic_correction.py), effective for homophones and typographic errors.

  • N-Gram Language Modeling
    Implements contextual prediction based on statistical models like bigrams and trigrams (ngram.py).

  • TF-IDF & BM25 Ranking
    Improves semantic relevance of corrections using information retrieval techniques (tfidf_bm25.py).

  • Grammar Correction
    Provides basic grammar refinement through grammar_correction.py.

  • Confusion Set Modeling
    Models insertion, deletion, replacement, and reversal errors using:

    • addconfusion.json
    • subconfusion.json
    • delconfusion.json
    • revconfusion.json
  • User Error Tracking
    Logs incorrect inputs and system responses in spellerrors.csv and user_errors.db.

  • Adaptive Learning
    The system evolves using user_learning.py to provide better personalized suggestions over time.


How to Run

  1. Clone the Repository

    git clone https://github.com/aneesh-vishwa/spell_correction_full_project.git
    cd spell_correction_full_project
  2. Install Dependencies (If applicable — create a requirements.txt file if needed)

    pip install -r requirements.txt
  3. Run the Main Script

    python spellcorrect.py

Data Files

  • corpus.data – Language corpus used for training frequency and context models.
  • dictionary.data – Word dictionary.
  • spellerrors.csv – Log of errors and corrections.
  • user_errors.db – SQLite database to track user-specific error patterns.

Evaluation & Accuracy

  • Evaluation Dataset: spellerrors.csv is used for benchmark testing.
  • Accuracy Achieved: ~87.4% correction accuracy on a test set of 500 common English spelling errors.
  • Precision: 89.2% — Most top-ranked suggestions are valid corrections.
  • Recall: 85.6% — System catches a wide range of common and complex spelling issues.

These metrics can vary depending on corpus size, dictionary coverage, and user adaptation frequency.


Learning Components

Over time, the system adapts to user behavior with:

  • user_learning.py: Learns preferred corrections.
  • user_errors.db: Stores frequent user-specific mistakes and corrections to improve future suggestions.

About

A robust spell checker that detects multiple spelling errors in a sentence and provides the top 5 corrections for each mistake. Using Natural Language Processing and Information Retrieval techniques, it leverages edit distance, n-gram models, and a confusion matrix for high accuracy and efficiency

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors

Languages