📄 GistTerms - Terms of Service Document Classifier

🔍 Overview

GistTerms is an advanced Natural Language Processing (NLP) tool that analyzes Terms of Service (ToS) documents and classifies their contents into meaningful risk categories:

🔴 Risk: Potentially harmful or unfavorable to the user
🟢 Safe: User-beneficial and protective statements
🔵 Information: Neutral or purely informative content

💡 The tool also assigns a grade (A–F) to the document based on its sentence classifications, making it easier for users to quickly assess service agreements.

✨ Features

🚩 Risk Categorization – Tags each sentence as Risk, Safe, or Information
🧮 Document Grading – Assigns an A–F score based on the ratio of risky to safe content
🧠 Confidence Filtering – Ignores low-confidence predictions to improve accuracy
♻️ Deduplication – Removes repetitive sentences for clean analysis

🛠️ Tech Stack

Language: Python
Libraries:
- NLP: nltk, transformers
- ML: PyTorch
Dataset: adeeteya/termsofservice
Hardware: CUDA support for GPU acceleration

🚀 Installation

Clone the repo

git clone https://github.com/sumedhkumar96/GistTerms.git
cd GistTerms

Create virtual environment

python -m venv venv
source venv/bin/activate     # On Windows: venv\Scripts\activate

Install dependencies
```
pip install -r requirements.txt
```
Download NLTK tokenizer
```
python -m nltk.downloader punkt
```

🧪 Usage

Add your ToS document
- Save your .txt file as input.txt in the project directory
Run the classifier
```
python final_program.py
```

Get results in terminal
Example output:

Classified Sentences:

🔴 Risk:
  - "The service reserves the right to terminate accounts without notice." (Confidence: 0.85)

🟢 Safe:
  - "Users can request data deletion at any time." (Confidence: 0.90)

🔵 Information:
  - "The service uses cookies to improve functionality." (Confidence: 0.75)

🏅 Service Grade: B

⚙️ How It Works

Sentence Tokenization – Splits input using NLTK
Preprocessing – Tokenizes and formats sentences for model input
Classification – Uses fine-tuned BERT to categorize content
Grading – Computes document grade based on sentence types

🛤️ Roadmap

🌐 Build a web UI for document upload and visual results
📄 Add support for PDF and DOCX formats
🤖 Improve accuracy via larger datasets
📊 Integrate feedback system for live improvements

🤝 Contributing

We welcome contributions!

Fork the repo
Create a new branch
```
git checkout -b feature-name
```
Commit your changes
```
git commit -m "Add new feature"
```
Push and open a PR
```
git push origin feature-name
```

Name		Name	Last commit message	Last commit date
Latest commit History 51 Commits
DataSet		DataSet
Experimentation		Experimentation
Summarization		Summarization
classification		classification
summarization		summarization
README.md		README.md
final_program.py		final_program.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

📄 GistTerms - Terms of Service Document Classifier

🔍 Overview

✨ Features

🛠️ Tech Stack

🚀 Installation

🧪 Usage

⚙️ How It Works

🛤️ Roadmap

🤝 Contributing

About

Uh oh!

Releases

Packages

Languages

sumedhkumar96/GistTerms

Folders and files

Latest commit

History

Repository files navigation

📄 GistTerms - Terms of Service Document Classifier

🔍 Overview

✨ Features

🛠️ Tech Stack

🚀 Installation

🧪 Usage

⚙️ How It Works

🛤️ Roadmap

🤝 Contributing

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages