SyntaxBase Moderation Microservice

SyntaxBase Moderation Microservice is a scalable, interactive comment moderation system designed to automatically classify user comments into multiple toxicity levels: safe, mild, toxic, and severe.

The system uses a hybrid pipeline combining classical machine learning and modern NLP models:

Classical ML – XGBoost with TF-IDF and numeric text features for fast first-pass classification.
BERT-based models – Deep NLP models for semantic understanding and finer toxicity detection.
ToxicBERT – Specialized model for aggressive or identity-based content.
LLM fallback – Large language model handles uncertain cases and provides reasoning.

The microservice is fully Dockerized and orchestrated via docker-compose, with each component running as an independent service for flexible scaling and fault isolation.

Key Features

Multi-level toxicity detection: safe, mild, toxic, severe
Hybrid architecture combining classical ML, transformers, and LLMs
Dockerized services for reproducible deployment
Detailed reasoning for LLM-classified content

Tech Stack

Python, FastAPI, XGBoost, Transformers, Docker, docker-compose

Project Structure

SyntaxBase-moderation-microservice/
│
├── data/
│ └── processed/
│ ├── raw/
│ └── test/
│ └── utils/
|
├── docker/
│ └── api/
    └── Dockerfile
│ ├── bert/
    └── Dockerfile
│ └── classical/
    └── Dockerfile
│ └── llm/
    └── Dockerfile
│ └── toxic_bert/
    └── Dockerfile
|
├── docs/
│ └── architecture.md
│ └── docs.md
│ └── index.md
│ └── research_paper.md
│ └── training_logs.md
│
├── models/
│ └── saved/
│   └── bert/
|   └── classical/
|   └── toxic_bert/
│
├── notebooks/
│ └── 01_baseline_experiments.ipynb
│ └── 02_distilbert_experiments.ipynb
│ └── 03_toxicbert_experiments.ipynb
│ └── 04_model_comparison.ipynb
│ └── 05_error_analysis.ipynb
│ └── 06_metrics_analysis.ipynb
|
├── results/
│ └── comparisons/
|   └── bert/
|   └── llm/
│ └── model_comparison_summary.csv
│ └── runtime_memory_tradeoff.csv
|
│ └── logs/
|   └── classical/
|   └── llm/
|       └── meta-llama-3.1-8b-instruct/
|       └── phi-4-reasoning-plus/
|       └── qwen3-4b-thinking-2507/
|   └── transformer/
|       └── distil_bert/
|       └── toxic_bert/
│ └── metrics/
│ └── visuals/
│
├── src/
│ ├── classical/
│ ├── llm/
│   ├── prompts/
│   ├── evaluator.py/
│   ├── llm_moderation_client.py/
│   ├── llm_service.py/
│ ├── transformer/
│ ├── utils/
│ │ └── generate_comments/
│ │     └── all_round_comments.py # general comments from various sources
│ │     └── forum_based_comments.py # comments that will likely be in forum
│ │ └── aggregateLlmResults.py
│ │ └── compare_all_llms.py
│ │ └── evaluate_both_models.py
│ │ └── evaluate_checkpoints.py
│ │ └── model_comparison_summary.py
│ │ └── predict_comments_bert.py
│ │ └── predict_comments_toxicbert.py
│ │ └── predict_comments.py
│ │ ├── preprocessing.py
│ │ ├── runtime_memory_tradeoff_comparison.py
| │api_service.py   
| |config.py
| |inference.py   
| |model_loader.py   
| |utils.py   
|
├── .gitattributes
├── .gitignore
├── docker-compose.yml
├── LICENSE
├── README.md
└── requirements.txt

Features

Multilevel Toxicity Classification

Classifies comments into safe, mild, toxic, and severe categories.

Classical ML Baseline

TF-IDF features for textual representation.
Numeric text features including character count, word count, capitalization ratio, punctuation count, and swear word detection.
Fast and interpretable baseline for moderation.

Transformer-Based Models

DistilBERT for semantic understanding and contextual analysis.
ToxicBERT for fine-grained toxic content detection.
Confidence scoring and threshold-based decision-making.

Hybrid Moderation Pipeline

Multi-step classification: classical ML → BERT → ToxicBERT → LLM reasoning.
LLM invoked only when previous models are uncertain or disagree.
Flexible orchestration ensures high accuracy while minimizing compute costs.

LLM Integration

Uses Qwen3-4B for context-aware moderation and reasoning.
Supports reasoning output along with label prediction for transparency.
Easily extendable to other open-source LLMs (e.g., LLaMA3, Mistral, Phi-4).

Interactive CLI

Live moderation interface for testing individual comments.
Provides labels, confidence scores, and LLM reasoning when applicable.

REST API & Microservices

Fully containerized services for classical, BERT, ToxicBERT, LLM, and API orchestration.
Docker Compose for multi-service deployment and health monitoring.
Endpoints for remote moderation requests.

Experiment Tracking & Notebooks

Jupyter notebooks for model evaluation, error analysis, and comparison across classical and transformer models.
Supports reproducible experiments and visualizations.

Monitoring & Logging

Logs and metrics stored for each service (classical, transformers, LLM).
Facilitates debugging and performance analysis.

Planned / Future Enhancements

Continuous retraining and monitoring with MLflow or Weights & Biases.
Hybrid model improvements and additional LLM integrations.
Advanced web integration with dashboards and user feedback loop.

Installation & Setup

1. Clone the repository

git clone https://github.com/ismiljanic/SyntaxBase-moderation-microservice.git
cd SyntaxBase-moderation-microservice

2. Install Docker & Docker Compose

Make sure you have Docker and Docker Compose installed.

Check versions:

docker --version
docker-compose --version

3. Build and start services

All core services (Classical ML, BERT, ToxicBERT, LLM, API) are containerized.

docker-compose build
docker-compose up -dx

This will build and start all microservices.

Each service exposes a port for API requests:

classical: 7001
bert: 7002
toxicbert: 7003
llm: 7004
api: 8000

4. Verify services

Check logs to ensure services are running:

docker-compose logs -f api
docker-compose ps

Usage

1. Using the REST API

Send a POST request to the API endpoint:

curl -X POST http://localhost:8000/classify \
     -H "Content-Type: application/json" \
     -d '{"text": "Some test comment"}'

Response will include:

final_label
Intermediate labels & confidence scores from classical, BERT, and ToxicBERT
LLM reasoning (if invoked)

Note on LLM verification: To use LLM-based moderation, you need to have LM Studio running with a model loaded, or another local open-source LLM available via API. This codebase assumes LM Studio is running on the default endpoint (typically http://localhost:1234/v1). Without a running LLM server, the LLM service will not be able to provide reasoning or verification.

2. Interactive CLI (optional)

If you want to use the CLI for single-comment moderation:

1.Classical phase 1 moderation

api python src/utils/predict_comments.py

or

docker-compose exec api python src/utils/predict_comments.py

2.Distil BERT moderation (phase 2)

python src/utils/predict_comments_bert.py

or

docker-compose exec api python src/utils/predict_comments_bert.py

3.ToxicBERT moderation (phase 2)

 python src/utils/predict_comments.py

or

docker-compose exec api python src/utils/predict_comments_toxicbert.py

3. Monitoring

Each service has logs accessible via docker-compose logs <service_name>
Health checks are built-in for all microservices
detailed training logs are available in docs/training_log.md
detailed results and comparisons are available in results/ folder

Notes

Models are preloaded in each service; no local setup of .pt/.safetensors files is needed for API use
Future updates may include retraining scripts and experiment tracking with MLflow or W&B, which can also run inside containers

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

SyntaxBase Moderation Microservice

Key Features

Tech Stack

Project Structure

Features

Installation & Setup

1. Clone the repository

2. Install Docker & Docker Compose

3. Build and start services

4. Verify services

Usage

1. Using the REST API

2. Interactive CLI (optional)

1.Classical phase 1 moderation

2.Distil BERT moderation (phase 2)

3.ToxicBERT moderation (phase 2)

3. Monitoring

Notes

License

About

Uh oh!

Releases

Packages

Uh oh!

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 95 Commits
data		data
docker		docker
docs		docs
models/saved		models/saved
notebooks		notebooks
results		results
src		src
.gitattributes		.gitattributes
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
docker-compose.yml		docker-compose.yml
requirements.txt		requirements.txt

License

ismiljanic/SyntaxBase-moderation-microservice

Folders and files

Latest commit

History

Repository files navigation

SyntaxBase Moderation Microservice

Key Features

Tech Stack

Project Structure

Features

Installation & Setup

1. Clone the repository

2. Install Docker & Docker Compose

3. Build and start services

4. Verify services

Usage

1. Using the REST API

2. Interactive CLI (optional)

1.Classical phase 1 moderation

2.Distil BERT moderation (phase 2)

3.ToxicBERT moderation (phase 2)

3. Monitoring

Notes

License

About

Topics

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Languages

Packages