An end-to-end compliance assurance and monitoring (CAM) agent for healthcare language-model scenarios. This project bundles:
- RAG-backed question answering with retrieval of regulatory documents.
- Judge orchestration (local MedGemma via Ollama, Gemini Flash, or both) to score safety/compliance.
- Interactive dashboard with live timeline playback, KPI metrics, and a realtime console to submit ad‑hoc queries.
The repository contains the Python pipeline, FastAPI UI bridge, and a React + Vite dashboard ready for local demos.
- Prerequisites
- Quick Start
- Python Environment
- Configuring Judge Backends
- Running the Compliance Pipeline
- UI Services
- GPU & Ollama Notes
- Testing
- Project Structure
- Troubleshooting
| Component | Version / Notes |
|---|---|
| Python | 3.10+ (developed against 3.12) |
| Node.js | 18+ (recommended 20 LTS) |
| Ollama | 0.3.12+ with GPU support (for MedGemma judges) |
| Google Gemini API key | Required only if enabling the Gemini judge |
| NVIDIA GPU | Dual 24 GiB RTX 4090s were used in development. See GPU & Ollama Notes if running heavy models. |
# 1. Clone the repository
git clone https://github.com/nullnuller/cam_agent.git
cd cam_agent
# 2. Create and activate a Python environment
python -m venv .venv
source .venv/bin/activate
# 3. Install Python dependencies
pip install --upgrade pip
pip install -r requirements.txt
# 4. Install dashboard dependencies
cd ui/dashboard
npm install
cd ../..
# 5. Copy the environment template and customise it
cp .env.example .env
# edit .env to point at your Ollama endpoints, judge models, and Gemini credentials
# make sure CAM_MODEL_* values match tags from `ollama list` on this machinePopulate the regulatory knowledge base (optional but recommended) via:
./download_health_regulations.shAll Python packages live in requirements.txt. Activate the virtual environment before running any scripts:
source .venv/bin/activateThe most relevant modules:
cam_pipeline.py– orchestrates ingestion, RAG lookup, model inference, and judge scoring.cam_agent/services– LLM clients, retrieval managers, and compliance rules.cam_agent/ui/api.py– FastAPI service that exposes audit logs and a realtime console for the dashboard.
We support three judge modes:
- Local Ollama (generate) – default MedGemma judge.
- Local Ollama (chat endpoint) – if you expose
/api/chat. - OpenAI-compatible REST – for llama.cpp or hosted OpenAI models.
In .env choose the block that matches your setup. Example for a local llama.cpp (OpenAI-compatible) server:
JUDGE_MODE=openai
JUDGE_MODEL=google_medgemma-27b
JUDGE_BASE_URL=http://localhost:8678/v1/chat/completions
CAM_MODEL_MEDGEMMA_LARGE=google_medgemma-27b
CAM_MODEL_MEDGEMMA_LARGE_API_MODE=openai
CAM_MODEL_MEDGEMMA_LARGE_ENDPOINT=http://localhost:8678/v1/chat/completionsGemini judge configuration:
GEMINI_API_KEY=your_api_key
GEMINI_MODEL=models/gemini-2.5-flash
GEMINI_RPM=10Tip: If the 27B judge OOMs on your GPU, create a Modelfile with lower
num_ctxandparallelsettings (see GPU & Ollama Notes).
source .venv/bin/activate
# Evaluate a batch of questions using the MedGemma and Gemini judges
python cam_pipeline.py \
--questions_file project_bundle/questions.txt \
--enable-med-judge \
--enable-gemini-judgeKey outputs land under project_bundle/:
cam_suite_report.html– interactive report.cam_suite_report.json– machine-readable summary.cam_suite_audit.jsonl– timeline audit log (consumed by the UI).
Flags of interest:
| Flag | Purpose |
|---|---|
--skip-ollama-judge |
Disable the local judge while keeping Gemini. |
| `--judge-mode <ollama | gemini |
--refresh-store |
Rebuild the RAG store from health_docs/. |
Expose the audit log and realtime console:
source .venv/bin/activate
CAM_UI_AUDIT_LOG=project_bundle/cam_suite_audit.jsonl \
CAM_UI_API_HOST=0.0.0.0 \
CAM_UI_API_PORT=8080 \
python -m cam_agent.ui.serverEnvironment knobs:
CAM_UI_STORE_DIR– override the RAG store path.CAM_UI_DIGEST_PATH– specify a digest file for judge context.CAM_UI_API_RELOAD=1– enable autoreload during development.
cd ui/dashboard
npm run dev # or npm run build && npm run previewThe dashboard expects VITE_CAM_API_BASE to point at the FastAPI server (defaults to http://127.0.0.1:8080).
Features:
- Animated timeline highlighting user → LLM → judge flow.
- KPI cards with weighted judge agreement, latency, violations.
- Safety panel grouped by severity and category.
- Realtime console with selectable base model and judge backend.
MedGemma 27B is memory hungry. If you keep it on Ollama and see cudaMalloc failed:
- Reduce concurrency: set
OLLAMA_NUM_PARALLEL=1or create a Modelfile withparameter parallel 1. - Lower context: set
JUDGE_NUM_CTX=4096in.env(or adjust the Modelfilenum_ctx) to shrink GPU memory. - Split across GPUs:
parameter num_gpu 2ensures weights span both 24 GiB cards. - Unload other models:
ollama psthenollama stop <id>before starting the judge. - Fallback judge: use the 4B MedGemma (
hf.co/bartowski/google_medgemma-4b-it-GGUF:latest) for interactive demos.
Example Modelfile:
FROM hf.co/bartowski/google_medgemma-27b-it-GGUF:latest
PARAMETER num_ctx 4096
PARAMETER parallel 1
PARAMETER num_gpu 2
Create it once:
ollama create medgemma27b-judge -f ModelfileThen point .env JUDGE_MODEL=medgemma27b-judge.
Python tests (pytest):
source .venv/bin/activate
pytestDashboard lint/build:
cd ui/dashboard
npm run lint
npm run buildcam_agent/
├── cam_pipeline.py # Batch orchestration entry-point
├── cam_agent/
│ ├── evaluation/ # Judge and scenario definitions
│ ├── services/ # LLM clients, retrieval, agent logic
│ ├── ui/ # FastAPI bridge for the dashboard
│ └── storage/ # Audit logging utilities
├── docs/ # Developer notes and guides
├── project_bundle/ # Generated reports, audit logs, RAG store
└── ui/dashboard/ # React + Vite SPA (timeline & console)
| Symptom | Fix |
|---|---|
Ollama HTTP 500 … cudaMalloc failed |
Apply the GPU strategies to reduce VRAM usage. |
| Gemini judge returns 429 | Increase GEMINI_RPM, add backoff, or pause before re-running the pipeline. |
Dashboard shows pending judge agreement |
Means no judge events were emitted. Check the FastAPI logs for External judge returned no results. |
| Realtime console does nothing | Ensure CAM_UI_AUDIT_LOG points at a writable JSONL. The UI API logs a message for each submission. |
Happy monitoring!