⚠️ Beta Status: Warp is currently in beta (version 0.1.2) and should not be considered production-ready. While we welcome development contributions and feedback, please use this tool with caution in non-production environments. The project is actively being developed and may undergo significant changes. Warp will be considered production-ready upon the release of version 1.0.0.
Warp importing 100 patients to OpenEMR in <1 min.
Patients uploaded by Warp displayed in OpenEMR's Patient Finder
- Overview
- Features
- Installation
- Quick Start
- Usage
- Credential Auto-Discovery
- Architecture
- Performance
- Kubernetes Deployment
- Configuration
- Troubleshooting
- Development
- Advanced Topics
- License
Warp is a high-performance tool for uploading data to OpenEMR installations. It bypasses APIs and web interfaces entirely, writing directly to the database and filesystem for maximum speed and reliability.
Warp provides horizontally scalable accelerated imports to OpenEMR by:
- Writing directly to OpenEMR's MySQL database
- No API authentication or web session overhead
- Parallel processing with optimized batch sizes
- 🚀 Direct Database Import: Writes directly to OpenEMR database (ONLY METHOD)
- ⚡ Maximum Performance: Horizontally scalable workers write directly to the database
- 🔒 Reliable: No API authentication or web session dependencies
- 📦 Kubernetes-Native: Designed to run as a resource-intensive pod
- 🌐 Multiple Data Sources: Supports S3, local files, and other sources
- 🔧 Auto-Discovery: Automatically finds credentials from Kubernetes/Terraform
- 📊 OMOP CDM Support: Loads OMOP Common Data Model data for direct database import
- 🔄 Parallel Processing: Multi-worker architecture for maximum throughput
- Python 3.8 or higher (3.14 recommended)
- Access to OpenEMR database (required for direct database import)
- AWS credentials (for S3 data sources)
# Clone the repository
git clone https://github.com/openemr/openemr-on-eks-dev.git
cd openemr-on-eks-dev/warp
# Install dependencies
pip install -r requirements.txt
# Or install warp as a package
pip install -e .warp --version
warp --helpWarp automatically discovers credentials from Kubernetes secrets or Terraform:
# No credentials needed - auto-discovered!
warp ccda_data_upload \
--data-source s3://synpuf-omop/cmsdesynpuf1k/ \
--max-records 100If auto-discovery fails, provide database credentials manually:
warp ccda_data_upload \
--db-host aurora-cluster.region.rds.amazonaws.com \
--db-user openemr \
--db-password password \
--data-source s3://synpuf-omop/cmsdesynpuf1k/ \
--max-records 100Warp uses direct database import exclusively. It writes directly to OpenEMR's MySQL database, matching the exact structure used by OpenEMR's internal functions.
Warp automatically discovers database credentials from:
- Kubernetes Secrets:
openemr-db-credentialssecret in theopenemrnamespace - Terraform Outputs: Aurora endpoint and password from Terraform state
- Environment Variables:
DB_HOST,DB_USER,DB_PASSWORD,DB_NAME
# Auto-discover all credentials
warp ccda_data_upload \
--data-source s3://synpuf-omop/cmsdesynpuf1k/ \
--max-records 1000# Set via environment variables
export DB_HOST="aurora-cluster.region.rds.amazonaws.com"
export DB_USER="openemr"
export DB_PASSWORD="password"
export DB_NAME="openemr"
warp ccda_data_upload \
--data-source s3://synpuf-omop/cmsdesynpuf1k/ \
--max-records 1000# Increase batch size for better performance
warp ccda_data_upload \
--data-source s3://synpuf-omop/cmsdesynpuf1k/ \
--batch-size 500 \
--workers 8 \
--max-records 10000Warp can automatically discover database credentials from multiple sources, eliminating the need to manually provide them.
-
Kubernetes Secrets (highest priority)
- Secret:
openemr-db-credentialsin namespaceopenemr - Keys:
mysql-host,mysql-user,mysql-password,mysql-database
- Secret:
-
Terraform Outputs
- Aurora endpoint:
aurora_endpointoutput - Database password:
aurora_passwordoutput
- Aurora endpoint:
-
Environment Variables
DB_HOST,DB_USER,DB_PASSWORD,DB_NAME
# Fully automatic (recommended)
warp ccda_data_upload --data-source s3://bucket/path
# Partial override
warp ccda_data_upload \
--db-user openemr \
--data-source s3://bucket/path
# Manual override (disables auto-discovery)
warp ccda_data_upload \
--db-host aurora-cluster.region.rds.amazonaws.com \
--db-user openemr \
--db-password password \
--data-source s3://bucket/pathWarp writes directly to OpenEMR's MySQL database using:
- Direct SQL: Uses
INSERT INTOmatching OpenEMR'snewPatientData()function - Schema-Aware: Understands OpenEMR's database schema from source code
- Transaction-Safe: Uses transactions for atomic operations
- Parallel Processing: Multiple workers writing concurrently
| OMOP Table | OMOP Field | OpenEMR Table | OpenEMR Field |
|---|---|---|---|
| PERSON | person_id | patient_data | pid |
| PERSON | year_of_birth | patient_data | DOB (year) |
| PERSON | gender_concept_id | patient_data | sex |
| CONDITION_OCCURRENCE | condition_concept_id | lists | diagnosis |
| CONDITION_OCCURRENCE | condition_start_date | lists | begdate |
| DRUG_EXPOSURE | drug_concept_id | lists | diagnosis |
| DRUG_EXPOSURE | drug_exposure_start_date | lists | begdate |
- Load OMOP Data: Reads from S3 or local filesystem
- Direct Database Write: Writes directly to OpenEMR database tables
- Parallel Processing: Multiple workers process batches concurrently
Note: Warp writes directly to OpenEMR database tables - no CCDA conversion or intermediate formats are used.
Full Dataset Import (synpuf-omop 1k dataset):
- Dataset: 1,000 patients with 160,322 conditions, 49,542 medications, 13,481 observations
- Configuration: Single worker, batch size 100
- Results:
- Patients successfully uploaded: 1,000 (100% success rate)
- Failed: 0
- Total duration: 132.96 seconds (~2.22 minutes)
- Processing rate: ~7.5 records/second
- Total data imported: 224,345 records (1,000 patients + 160,322 conditions + 49,542 medications + 13,481 observations)
Performance Notes:
- Single worker configuration provides stable, reliable imports
- Multi-worker configuration can achieve higher throughput but requires careful database connection management
- Processing time includes data loading from S3, transformation, and database insertion
Standard Configuration (tested benchmark):
resources:
requests:
cpu: "2"
memory: "4Gi"
limits:
cpu: "4"
memory: "8Gi"- Performance: ~7.5 patients imported per second
- Tested: Successfully imported 1,000 patients in 2.22 minutes
For comprehensive testing of the complete Warp deployment and data import workflow, use the automated end-to-end test script:
# Full end-to-end test (deploys infrastructure, OpenEMR, and imports 1000 records)
cd ../scripts
./test-warp-end-to-end.sh
# Import 500 records instead
./test-warp-end-to-end.sh --max-records 500
# Use existing infrastructure
./test-warp-end-to-end.sh --skip-terraform --skip-openemrWhat the End-to-End Test Does:
- Deploys Terraform infrastructure (EKS, RDS, Redis, EFS, etc.)
- Deploys OpenEMR on EKS
- Installs Warp via ConfigMap
- Imports test data using Warp
- Prints OpenEMR login URL and credentials
- Waits a default of 5 minutes while the user verifies successful data import
- Deletes all infrastructure with
destroy.sh
See scripts/README.md for complete documentation of the end-to-end test script.
Warp is designed to run as a Kubernetes Job with generous resources. There are two deployment approaches:
How it works: Warp uses IRSA (IAM Roles for Service Accounts) to access S3 buckets securely without hardcoded credentials.
- Service Account: The Kubernetes Job uses the
openemr-saservice account - IAM Role Binding: The service account is annotated with an AWS IAM role ARN
- Automatic Credential Discovery: When
boto3runs in the pod, it automatically discovers credentials from:- The mounted service account token at
/var/run/secrets/eks.amazonaws.com/serviceaccount/token - AWS SDK automatically uses IRSA credentials (no manual configuration needed)
- The mounted service account token at
- IAM Permissions: The IAM role must have S3 read permissions for your dataset buckets
Required IAM Permissions:
The IAM role (configured in terraform/iam.tf) needs S3 permissions for dataset buckets. For security, these permissions are commented out by default to prevent accidental access.
Enabling S3 Access:
- Edit
terraform/iam.tfand locate the "S3 permissions for Warp dataset access" section - Uncomment the Resource array entries and specify your bucket ARNs:
{
# S3 permissions for Warp dataset access (OMOP/CCDA data sources)
Effect = "Allow"
Action = [
"s3:GetObject",
"s3:ListBucket"
]
Resource = [
# Uncomment and specify your bucket ARNs:
"arn:aws:s3:::synpuf-omop",
"arn:aws:s3:::synpuf-omop/*"
]
}- Apply Terraform changes:
cd terraform
terraform applySecurity Best Practice: Only grant access to specific buckets that Warp needs. Never use wildcards (*) for bucket names in production environments.
Note: The job does NOT use the cluster's node IAM role. It uses IRSA for pod-level, least-privilege access.
Why: This approach uses an off-the-shelf Python 3.14 image and builds warp inside the pod from a ConfigMap. This eliminates the need to:
- Build and maintain custom Docker images
- Push images to container registries
- Deal with image versioning and updates
- Require public repository access
How it works:
- Warp code is packaged as a tarball and stored in a Kubernetes ConfigMap
- Pod uses
python:3.14-slimbase image - On startup, the pod extracts the code from ConfigMap and installs warp
- Warp runs with direct database access from within the cluster
Setup:
# 1. Package warp code
cd warp
tar czf /tmp/warp-code.tar.gz warp/ setup.py requirements.txt README.md
# 2. Create ConfigMap
kubectl create configmap warp-code \
--from-file=warp-code.tar.gz=/tmp/warp-code.tar.gz \
-n openemr
# 3. Deploy job (see k8s-job-test.yaml)
kubectl apply -f k8s-job-test.yamlExample Job:
apiVersion: batch/v1
kind: Job
metadata:
name: warp-ccda-upload-test
namespace: openemr
spec:
template:
spec:
containers:
- name: warp
# Python image version is managed in versions.yaml under applications.python
# The version automatically tracks the latest Python 3.xx release
# To use a specific version, replace with: python:3.14-slim
image: python:3.14-slim
command: ["/bin/bash"]
args:
- -c
- |
apt-get update && apt-get install -y gcc && rm -rf /var/lib/apt/lists/*
pip install --no-cache-dir pymysql>=1.1.0 boto3>=1.28.0
tar xzf /warp-code/warp-code.tar.gz
cd warp && pip install --no-cache-dir -e .
warp ccda_data_upload --db-host "$DB_HOST" ...
volumeMounts:
- name: warp-code
mountPath: /warp-code
readOnly: true
volumes:
- name: warp-code
configMap:
name: warp-codePython Version Management:
The Python Docker image version is centrally managed in versions.yaml:
- Location:
applications.python.current(defaults to3.14) - Auto-detection: When
auto_detect_latest: true, the version manager automatically checks Docker Hub for the latest Python 3.xx release - Script: Use
scripts/get-python-image-version.shto get the current version programmatically - Updates: The monthly version check workflow will notify when newer Python 3.xx versions are available
If you prefer a pre-built image, you can build and push a custom Docker image:
apiVersion: batch/v1
kind: Job
metadata:
name: warp-import
namespace: openemr
spec:
template:
spec:
containers:
- name: warp
image: openemr/warp:latest
command: ["warp", "ccda_data_upload"]
args:
- "--data-source"
- "s3://synpuf-omop/cmsdesynpuf1k/"
- "--max-records"
- "10000"
resources:
requests:
cpu: "4"
memory: "8Gi"
limits:
cpu: "8"
memory: "16Gi"
restartPolicy: NeverSee k8s-job.yaml and k8s-job-test.yaml for complete examples.
Uploads patient data to OpenEMR from OMOP format datasets using direct database import.
| Option | Description | Default |
|---|---|---|
--db-host |
Database host | Auto-discovered |
--db-user |
Database username | Auto-discovered |
--db-password |
Database password | Auto-discovered |
--db-name |
Database name | openemr |
--data-source |
Data source (S3 path or local directory) | Required |
--batch-size |
Records per batch | Auto-calculated |
--workers |
Number of parallel workers (for a single task) | CPU count |
--max-records |
Maximum records to process | All records |
--start-from |
Start processing from record number | 0 |
--dry-run |
Dry run mode (no actual import) | False |
--aws-region |
AWS region for S3 access | Auto-detected |
--namespace |
Kubernetes namespace | openemr |
--terraform-dir |
Terraform directory path | Auto-detected |
# Database Configuration (required)
export DB_HOST="aurora-cluster.region.rds.amazonaws.com"
export DB_USER="openemr"
export DB_PASSWORD="password"
export DB_NAME="openemr"
# AWS Configuration
export AWS_REGION="us-west-2"
export AWS_ACCESS_KEY_ID="..."
export AWS_SECRET_ACCESS_KEY="..."Problem: Warp cannot find credentials automatically.
Solution:
- Check Kubernetes secrets exist:
kubectl get secret openemr-db-credentials -n openemr - Verify Terraform outputs:
terraform output -json - Provide database credentials manually using
--db-host,--db-user,--db-password
Problem: Cannot connect to OpenEMR database.
Solution:
# Test database connectivity
kubectl exec -n openemr <pod> -- mysql -h <db-host> -u <user> -p<password> -e "SELECT 1"
# Check network connectivity
kubectl exec -n openemr <pod> -- nc -zv <db-host> 3306Problem: Import is slower than expected.
Solution:
- Experiment with changing batch size
- Experiment with changing worker count
- Monitor database CPU/memory usage
- Verify network latency to database
Problem: Some records fail to import.
Solution:
- Enable verbose logging:
warp -v ccda_data_upload ... - Check logs for specific error messages
- Verify OMOP data format matches expected schema
- Review OpenEMR database constraints
- Check for duplicate patient IDs
Enable verbose logging for detailed debugging:
warp -v ccda_data_upload \
--data-source s3://bucket/path \
--max-records 10# Install development dependencies
pip install -r requirements.txt
pip install pytest pytest-cov flake8 black mypy
# Install warp in development mode
pip install -e .# Run all tests
pytest tests/ -v
# Run with coverage
pytest tests/ -v --cov=warp --cov-report=html
# Run specific test file
pytest tests/test_omop_to_ccda.py -v# Linting
flake8 warp/ --max-line-length=127
# Formatting
black warp/ tests/ --line-length 127
# Type checking
mypy warp/ --ignore-missing-importsThe project uses GitHub Actions for CI/CD (integrated into main CI/CD pipeline):
- Pinned versions test: Automatically validates that Python package versions match versions.yaml
- Automated testing with pytest
- Code quality checks (flake8, black, mypy)
- Security scanning (Trivy)
- Coverage reporting
The CI/CD pipeline includes a step (test-warp-pinned-versions.sh) that:
- Reads Python package versions from
versions.yaml - Installs exact pinned versions
- Verifies versions match expectations
- Runs all Warp tests with pinned versions
- Ensures consistency between versions.yaml and actual dependencies
This ensures that the versions specified in versions.yaml are always tested and validated before code is merged.
For advanced development topics, architecture details, and contributing guidelines, see DEVELOPER.md.
MIT License

