InboxAI is a Retrieval-Augmented Generation (RAG) pipeline designed to revolutionize email data processing and analysis. Our system seamlessly integrates email fetching, processing, and storage capabilities, providing an intuitive web interface for advanced search and analytics.
The project follows a modular architecture with these key components:
/data_pipeline: Data ingestion and processing scripts/backend: Server implementation and RAG resources/frontend: User interface components and assets/rag_model: RAG evaluators and model quality validation tools/mlflow: MLFlow tracking server (Docker-based)/vector_db: Chroma DB vector database setup
Note: Access requires Northeastern University credentials
π InboxAI Platform
- Web interface for email analytics and search
- Secure access with authentication
- Real-time RAG pipeline integration
- Real-time performance monitoring
- Resource utilization tracking
- System health indicators
The system architecture is illustrated below:
Refer to individual component directories for specific setup instructions and documentation.
Absolutely! Here's a clean, production-grade section you can place at the top of your README.md under something like:
Here's a complete guide to set up a Cloud VM on GCP and install a self-hosted GitHub Actions Runner, tailored for deploying your Airflow pipelines from GitHub Actions:
Go to the GCP Console β VM Instances and click βCreate Instanceβ.
Recommended configuration for Airflow + Vector DB:
- Name:
inboxai-deployment-server - Region/Zone:
us-central1-a(or as needed) - Machine Type:
e2-standard-4(4 vCPU, 16 GB RAM) - Boot Disk: Ubuntu 22.04 LTS, 100 GB SSD
- Firewall:
- [β] Allow HTTP traffic
- [β] Allow HTTPS traffic
After creation, SSH into your VM.
SSH into the VM:
gcloud compute ssh inboxai-deployment-server --zone=us-central1-aInstall base dependencies:
sudo apt update && sudo apt upgrade -y
sudo apt install -y docker.io docker-compose git unzip
sudo usermod -aG docker $USERInstall Node.js (required for GitHub runner):
curl -fsSL https://deb.nodesource.com/setup_20.x | sudo -E bash -
sudo apt install -y nodejsLog out and log back in to apply Docker group changes:
exit
gcloud compute ssh inboxai-deployment-server --zone=us-central1-aGo to your GitHub repo β Settings β Actions β Runners
Click "New self-hosted runner" β Choose OS: Linux β Architecture: x64
Follow the given commands, or use this:
mkdir actions-runner && cd actions-runner
curl -o actions-runner-linux-x64-2.316.0.tar.gz -L https://github.com/actions/runner/releases/download/v2.316.0/actions-runner-linux-x64-2.316.0.tar.gz
tar xzf ./actions-runner-linux-x64-2.316.0.tar.gzNow run the setup command from GitHub, e.g.:
./config.sh --url https://github.com/your-org/your-repo \
--token <TOKEN_FROM_GITHUB>You can also use the command below
./run.shIf you need to expose ports for Airflow, MLflow, or ChromaDB:
gcloud compute firewall-rules create airflow-rule \
--allow tcp:8080,tcp:5555,tcp:8000,tcp:7070 \
--target-tags airflow-server \
--description="Allow Airflow and vector DB ports"To enable scalable and managed database support for InboxAI across all components (backend, MLflow, and Airflow), follow these steps to provision a PostgreSQL instance on Google Cloud SQL, apply schema DDLs, and make the DB accessible via SQLAlchemy-compatible connection strings.
gcloud services enable sqladmin.googleapis.comgcloud sql instances create inboxai-postgres \
--database-version=POSTGRES_14 \
--cpu=2 \
--memory=4GB \
--region=us-central1π By default, public IP connections are disabled. We'll allow access from specific IPs later.
gcloud sql databases create inboxai_db \
--instance=inboxai-postgresgcloud sql users create inboxai_user \
--instance=inboxai-postgres \
--password=inboxai-passwordgcloud sql instances patch inboxai-postgres \
--authorized-networks=$(curl -s ifconfig.me)/32gcloud sql instances describe inboxai-postgres \
--format="value(ipAddresses.ipAddress)"Letβs say the IP is: DB_HOST_IP
Install the Cloud SQL Auth Proxy or connect directly:
psql "host=DB_HOST_IP port=5432 dbname=inboxai_db user=inboxai_user password=inboxai-password"π‘ You can use tools like TablePlus or DBeaver for GUI-based access too.
Apply the schema for your app by running:
psql "host=DB_HOST_IP port=5432 dbname=inboxai_db user=inboxai_user password=inboxai-password" \
-f backend/POSTGRES_DDLS.sqlThis sets up all required tables, constraints, and enums used in your backend API and Airflow DAGs.
Update your .env file with this format:
DB_NAME=your_db
DB_USER=your_db_user
DB_PASSWORD=your_db_pass
DB_HOST=DB_HOST_IP
DB_PORT=5432This connection string can now be used across:
backend/.envdata_pipeline/.envmlflow/.env
GCP Cloud Logs

