Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
47 changes: 47 additions & 0 deletions CLAUDE.md
Original file line number Diff line number Diff line change
Expand Up @@ -71,6 +71,53 @@ cd api && prisma studio
cd api && prisma migrate dev --create-only --name migration_name
```

### Synthetic Data Seeding

#### Local Development Seeding
```bash
# Seed basic patient demographics (original script)
python scripts/seed.py

# Seed staging data with synthetic vitals (requires API running)
python scripts/seed_staging.py

# Or with environment variable (mimics Railway staging)
RAILWAY_ENVIRONMENT=staging python scripts/seed_staging.py
```

#### Railway Staging Deployment
Synthetic data is **automatically seeded** on Railway staging deployments when `RAILWAY_ENVIRONMENT=staging` is set.

**Configuration:**
1. In Railway staging project, set environment variable:
```
RAILWAY_ENVIRONMENT=staging
```

2. On deployment, the Dockerfile automatically:
- Runs `prisma migrate deploy`
- **Seeds synthetic data** (if patient count < 3)
- Starts the API

**Seed Script Behavior:**
- **Environment-aware**: Only runs when `RAILWAY_ENVIRONMENT=staging`
- **Idempotent**: Safe to run multiple times (checks patient count threshold)
- **Fast**: Completes in <10 seconds
- **Generates**:
- 15 synthetic patients with realistic demographics (Faker)
- 2-5 vital signs readings per patient
- Clinically plausible values (BP: 90-140/60-90 mmHg, Pulse: 60-100 bpm)
- Timestamps spread over past 1-4 weeks
- MRN prefix: `STAGING-` to distinguish from production data

**Manual Trigger:**
```bash
# SSH into Railway container (if needed)
railway run python /scripts/seed_staging.py
```

See [ADR-0005](./docs/adr/0005-synthetic-data-generation.md) for implementation details and decision rationale.

### Docker & Infrastructure
```bash
# Check EHRBase status (wait 30-60s after docker compose up)
Expand Down
15 changes: 13 additions & 2 deletions api/Dockerfile
Original file line number Diff line number Diff line change
Expand Up @@ -23,11 +23,15 @@ COPY api/src ./src
COPY api/prisma ./prisma
COPY api/templates ./templates

# Copy seed scripts (from repo root)
COPY scripts /scripts

# Generate Prisma client (binaries will be cached in PRISMA_BINARY_CACHE_DIR)
RUN prisma generate

# Grant non-root user ownership of app directory and Prisma binaries
# Grant non-root user ownership of app directory, scripts, and Prisma binaries
RUN chown -R appuser:appgroup /app && \
chown -R appuser:appgroup /scripts && \
chown -R appuser:appgroup /home/appuser/.cache && \
chown -R appuser:appgroup /usr/local/lib/python*/site-packages/prisma/

Expand All @@ -38,4 +42,11 @@ EXPOSE 8000
# Use Railway's $PORT if available, otherwise default to 8000
# Use shell form to allow environment variable substitution
# Run migrations before starting the server
CMD sh -c "echo 'Running database migrations...' && prisma migrate deploy && echo 'Starting uvicorn on port ${PORT:-8000}...' && uvicorn src.main:app --host 0.0.0.0 --port ${PORT:-8000}"
# Conditionally seed staging data if RAILWAY_ENVIRONMENT=staging
CMD sh -c "echo 'Running database migrations...' && \
prisma migrate deploy && \
if [ \"$RAILWAY_ENVIRONMENT\" = \"staging\" ]; then \
echo 'Seeding staging data...' && python /scripts/seed_staging.py; \
fi && \
echo 'Starting uvicorn on port ${PORT:-8000}...' && \
uvicorn src.main:app --host 0.0.0.0 --port ${PORT:-8000}"
Comment on lines +45 to +52
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🔴 Critical

Critical: Seed script requires running API but executes before uvicorn starts.

The seed script is executed before starting uvicorn in the CMD chain, but scripts/seed_staging.py makes HTTP requests to API endpoints:

  • Line 50: GET /health
  • Line 61: GET /api/patients
  • Lines 82-85: POST /api/patients
  • Lines 101-104: POST /api/vital-signs

Since the API hasn't started yet, all HTTP requests will fail, causing should_seed() to return False (line 55-56), and seeding will be skipped every time.

This means automatic staging seeding will never execute successfully despite the documentation claiming it does.

Proposed solutions

Solution 1: Use database directly (Recommended)

Refactor seed_staging.py to use Prisma client directly instead of HTTP API:

from prisma import Prisma

async def main():
    db = Prisma()
    await db.connect()
    try:
        # Create patients directly in DB
        patient = await db.patient.create(data={...})
        # Create compositions via ehrbase_client (imported from api/src/ehrbase/client.py)
        await ehrbase_client.create_composition(...)
    finally:
        await db.disconnect()

Solution 2: Run seed after API starts

Use Railway deployment lifecycle hooks or a separate service to run the seed script after the API is healthy:

# Remove seeding from CMD, handle via Railway deployment actions
CMD sh -c "prisma migrate deploy && uvicorn src.main:app --host 0.0.0.0 --port ${PORT:-8000}"

Then configure Railway to run the seed script as a post-deployment hook.

Solution 3: Background seeding

Start uvicorn first, then run seed in the background:

CMD sh -c "prisma migrate deploy && \
  uvicorn src.main:app --host 0.0.0.0 --port ${PORT:-8000} & \
  sleep 5 && \
  if [ \"$RAILWAY_ENVIRONMENT\" = \"staging\" ]; then python /scripts/seed_staging.py; fi && \
  wait"

However, this adds complexity and fragility.

1 change: 1 addition & 0 deletions api/pyproject.toml
Original file line number Diff line number Diff line change
Expand Up @@ -13,6 +13,7 @@ dependencies = [
"httpx>=0.26.0",
"python-multipart>=0.0.6",
"aiofiles>=23.2.0",
"faker>=22.0.0",
]

[project.optional-dependencies]
Expand Down
Loading