GistTerms is an advanced Natural Language Processing (NLP) tool that analyzes Terms of Service (ToS) documents and classifies their contents into meaningful risk categories:
- 🔴 Risk: Potentially harmful or unfavorable to the user
- 🟢 Safe: User-beneficial and protective statements
- 🔵 Information: Neutral or purely informative content
💡 The tool also assigns a grade (A–F) to the document based on its sentence classifications, making it easier for users to quickly assess service agreements.
- 🚩 Risk Categorization – Tags each sentence as Risk, Safe, or Information
- 🧮 Document Grading – Assigns an A–F score based on the ratio of risky to safe content
- 🧠 Confidence Filtering – Ignores low-confidence predictions to improve accuracy
- ♻️ Deduplication – Removes repetitive sentences for clean analysis
- Language: Python
- Libraries:
- NLP:
nltk
,transformers
- ML:
PyTorch
- NLP:
- Dataset: adeeteya/termsofservice
- Hardware: CUDA support for GPU acceleration
-
Clone the repo
git clone https://github.com/sumedhkumar96/GistTerms.git cd GistTerms
-
Create virtual environment
python -m venv venv source venv/bin/activate # On Windows: venv\Scripts\activate
-
Install dependencies
pip install -r requirements.txt
-
Download NLTK tokenizer
python -m nltk.downloader punkt
-
Add your ToS document
- Save your
.txt
file asinput.txt
in the project directory
- Save your
-
Run the classifier
python final_program.py
-
Get results in terminal
Example output:Classified Sentences: 🔴 Risk: - "The service reserves the right to terminate accounts without notice." (Confidence: 0.85) 🟢 Safe: - "Users can request data deletion at any time." (Confidence: 0.90) 🔵 Information: - "The service uses cookies to improve functionality." (Confidence: 0.75) 🏅 Service Grade: B
- Sentence Tokenization – Splits input using NLTK
- Preprocessing – Tokenizes and formats sentences for model input
- Classification – Uses fine-tuned BERT to categorize content
- Grading – Computes document grade based on sentence types
- 🌐 Build a web UI for document upload and visual results
- 📄 Add support for PDF and DOCX formats
- 🤖 Improve accuracy via larger datasets
- 📊 Integrate feedback system for live improvements
We welcome contributions!
- Fork the repo
- Create a new branch
git checkout -b feature-name
- Commit your changes
git commit -m "Add new feature"
- Push and open a PR
git push origin feature-name