Toxic Comment Analysis - AWS Deployment

Note: This project was heavily influenced and informed by the incremental learnings and implementations of my mlops project monorepo, mlops-du. The architectural patterns, deployment strategies, and MLOps best practices demonstrated in this repository build upon the foundational work and lessons learned.

In addition, to view the final project rubric and screenshots of the AWS deployment view:

assets/docs/RUBRIC.md
assets/docs/SCREENSHOTS.md

Project Dependencies

Note: For easy setup on macOS/Linux, use the included installation script after cloning the repo:

git clone https://github.com/jairus-m/toxic-mlops.git
cd toxic-mlops 
bash install-tools.sh

This shell script will install uv, task, aws, and terraform if they're not already present and will verify the installation.

This project deploys a toxic comment analysis application on AWS using a fully automated Terraform setup. The architecture consists of a FastAPI backend, a Streamlit frontend, ML training, a Streamlit monitoring dashboard, and an MLflow tracking server. Each service runs in a Docker container on their own dedicated EC2 instances.

This project is structured as a multi-package monorepo using uv workspaces. Each application (fastapi_backend, streamlit_frontend, sklearn_training, streamlit_monitoring) has its own modules and dependencies while sharing a single uv.lock file at the root.

AWS Architecture

Architecture Overview:

MLflow Server (EC2): Experiment tracking and model registry with PostgreSQL backend
ML Training Service (EC2): Downloads data, trains model, logs to MLflow, uploads to S3
FastAPI Backend (EC2): Provides inference API for toxic comment classification
Streamlit Frontend (EC2): User interface for testing comment toxicity
Streamlit Monitoring (EC2): ML Model performance monitoring dashboard
S3: Stores model and data files, MLflow artifacts
DynamoDB: Stores ML performance monitoring logs
RDS PostgreSQL: MLflow backend store for experiments and runs
Infrastructure:
- Terraform manages AWS resources (EC2, S3, RDS, DynamoDB, Security Groups, CloudWatch)
- Automated deployment with Docker containers
- CloudWatch logging for all services

Local Architecture

Locally, this project is orchestrated by Docker Compose. Each service is containerized with the same exact logic/code flow as the AWS deployment and has the same overall architecture as depicted above. However, when running with the environment variable APP_ENV=development (automatically set in task dev:up), the application uses the config.yaml to determine which local file paths to use instead of S3, DynamoDB, or RDS PostgreSQL:

development:
  paths: # Local file paths
    data: "assets/data/train.csv"
    model: "assets/models/toxic_model.pkl"
    model_metadata: "assets/models/toxic_model_metadata.json"
  prediction_logging:
    handler: "file"
    path: "assets/logs/prediction_logs.json"
  mlflow: # MLflow tracking via local server
    tracking_uri: "http://mlflow-server:5000"
    experiment_name: "toxic-comments-classification" # Uses local SQLite DB for backend store

Local Dev Deployment

Prerequisites

uv>=0.7.10
- uv installation
task>=3.43.3
- task installation
Docker and Docker Compose installed on your local machine.

Configuration for Local Deployment

For running this project on your local machine, no configuration is needed. The application will use your local files instead of AWS services for storage and your CPU for computation.

Local Development with Docker Compose

For a streamlined and consistent development experience that mirrors the production environment locally, you can use Docker Compose and the custom task commands.

From the root of the project, you can use the following commands:

Build and Start All Services

This single command builds the Docker images for MLflow server, ML training, the FastAPI backend, and the Streamlit frontend/monitoring, and then starts them in the correct order. It will first run the training container to generate the model and then launch the backend and frontend services.
```
task dev:up TRAIN_MODEL=true
```
You will see the logs from all services streamed to your terminal. Note that you can also pass TRAIN_MODE=false to skip training (in order to save time).
Access the Applications!
- Frontend URL: http://localhost:8501
  - Submit a comment for toxicity analysis
  - Get the predicted toxicity classification
  - Provide feedback for whether the prediction was correct or not
- Monitoring URL: http://localhost:8502
  - View the performance/status of the ML toxic comment classification app
- MLflow UI: http://localhost:5000
  - Track experiments, compare model performance, and manage model registry
- Backend API Docs: http://localhost:8000/docs
- Note: If you want to follow the logs of the running services without blocking your main terminal, you can run:
```
task dev:logs
```
Stop and Clean Up

To stop all the running services and remove the containers and network, run:
```
task dev:down
```
Optional: Unit Tests

To run unit tests:
```
task dev:unit
```
Unit tests in tests/ do not need running services. They mock dependencies like the Kaggle dataset download, model training pipeline, Streamlit frontend imports, FastAPI backend, etc. to ensure core functionality of this application works in isolation.

AWS Prod Deployment

Prerequisites

The following need to be installed:
- AWS CLI
  - Need an AWS account with programmatic access along with the following credentials:
    - aws_access_key_id
    - aws_secret_access_key
    - aws_session_token
- Terraform installed on your local machine.
- uv installed on your local machine.
- Docker Desktop (Contains both the Engine and CLI Client)
  - docker installation

Configuration for AWS Deployment

Within the project root, create an .env file and add the following:

# For AWS CLI / Boto3
AWS_ACCESS_KEY_ID=
AWS_SECRET_ACCESS_KEY=
AWS_SESSION_TOKEN=

# For Terraform (re-using the same values as above)
TF_VAR_aws_access_key_id=$AWS_ACCESS_KEY_ID
TF_VAR_aws_secret_access_key=$AWS_SECRET_ACCESS_KEY
TF_VAR_aws_session_token=$AWS_SESSION_TOKEN

While these AWS credentials are really only used in production, they are needed locally for Terraform for authentication in order to run CLI commands like plan and apply as well as for passing directly to the EC2 instances at launch time.

Note: You can find these credentials in the AWS Academy Learner Lab. From the Vocareum main page: Start Lab > AWS Details > AWS CLI: Show > Copy and Paste the CLI details into this local .env file. Putting it here instead of your ~/.aws/credentials file allows for Terraform to pick up on the credentials and also makes it easier to reproduce my steps for this isolated project.

AWS Deployment with Terraform

This workflow provisions the entire infrastructure and deploys the applications with Terraform.

Step 1: Initialize Terraform

Initialize Terraform

From the root of the project, run the following command to initialize Terraform. This will download the necessary providers and set up the backend.
```
task prod:init
```

Step 2: Deploy the Application

Apply the Terraform Plan

Run the following command to create the AWS resources and deploy the application.
```
task prod:apply TRAIN_MODEL=true
```
This process will take some time as the EC2 instances need to start, transfer necessary application files, install dependencies, build the Docker images, and run them. You can pass an option TRAIN_MODEL=false to skip the model traning (in order to save time).

Step 3: Access the App

Once the terraform apply command is complete, it will output the public IP addresses of the application services.

Frontend URL: http://<FRONTEND_PUBLIC_IP>:8501
Monitoring URL: http://<MONITORING_PUBLIC_IP>:8502
MLflow Server URL: http://<MLFLOW_SERVER_PUBLIC_IP>:5000

Step 4: Cleanup

To tear down all the AWS resources created by this project, run the destroy command:

task prod:destroy

Tip: Be weary of AWS state versus your local Terraform state. Manually changing resources in AWS and then running Terraform commands from the CLI can cause a lot of confusion and headache, so be careful when running these commands on your production AWS instance. However, if your Terraform lockfiles and states get out of sync and are causing you issues, run:

task prod:reset

This will delete all the local Terraform artifacts (to release the lock and reset state) and will re-initialize Terraform.

AWS Infrastructure Components

The production deployment creates the following AWS resources:

EC2 instances (one for each of the 5 services):
- MLflow Server (t2.small) - Experiment tracking and model registry
- ML Training (t2.medium) - Model training with memory requirements
- FastAPI Backend (t2.micro) - Inference API
- Streamlit Frontend (t2.micro) - User interface
- Streamlit Monitoring (t2.micro) - Performance monitoring
S3 bucket for model/data storage and MLflow artifacts
DynamoDB for ML performance monitoring logs
RDS PostgreSQL for MLflow backend store
CloudWatch for centralized logging (easily viewable logs in AWS)
Security Groups for access control
Key Pair for SSH access to instances

Name		Name	Last commit message	Last commit date
Latest commit History 25 Commits
.github		.github
assets		assets
infrastructure		infrastructure
src		src
tests		tests
.gitignore		.gitignore
.pre-commit-config.yaml		.pre-commit-config.yaml
README.md		README.md
Taskfile.yml		Taskfile.yml
config.yaml		config.yaml
docker-compose.yml		docker-compose.yml
install-tools.sh		install-tools.sh
pyproject.toml		pyproject.toml
uv.lock		uv.lock

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Toxic Comment Analysis - AWS Deployment

Project Dependencies

Table of Contents

AWS Architecture

Local Architecture

Local Dev Deployment

Prerequisites

Configuration for Local Deployment

Local Development with Docker Compose

AWS Prod Deployment

Prerequisites

Configuration for AWS Deployment

AWS Deployment with Terraform

Step 1: Initialize Terraform

Step 2: Deploy the Application

Step 3: Access the App

Step 4: Cleanup

AWS Infrastructure Components

About

Uh oh!

Releases

Packages

Languages

jairus-m/toxic-mlops

Folders and files

Latest commit

History

Repository files navigation

Toxic Comment Analysis - AWS Deployment

Project Dependencies

Table of Contents

AWS Architecture

Local Architecture

Local Dev Deployment

Prerequisites

Configuration for Local Deployment

Local Development with Docker Compose

AWS Prod Deployment

Prerequisites

Configuration for AWS Deployment

AWS Deployment with Terraform

Step 1: Initialize Terraform

Step 2: Deploy the Application

Step 3: Access the App

Step 4: Cleanup

AWS Infrastructure Components

About

Topics

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages