A self-contained Docker-based environment for exploring curated single-cell datasets using CellXGene. Researchers can browse available datasets through a web interface and launch CellXGene for interactive visualization.
- π¬ Dataset Catalog: Browse curated single-cell datasets with metadata
- π One-Click Launch: Launch CellXGene viewer for any dataset with smart status polling
- οΏ½ Progress Tracking: Real-time loading progress bar with estimated completion times
- β±οΈ Smart Estimates: File size-based loading time predictions
- π’ Status Indicators: Visual badges showing running/stopped container status
- βΉοΈ Manual Control: Stop button to close containers on-demand
- π§ Admin Panel: Monitor and manage all active containers with memory estimates
- π Auto-Retry: Automatic retry mechanism for failed launches (OOM/timeout)
- π¬ Better Errors: Context-aware error messages with actionable recovery hints
- π³ Docker-Based: Fully containerized for easy deployment
- π¦ Volume-Mounted Storage: Add datasets without rebuilding containers
- π Extensible: Add additional services via Docker Compose
- β‘ High Concurrency: Dynamic container spawning supports multiple concurrent users
- β° Auto-Cleanup: Containers automatically close after 48 hours of inactivity
- π¨ Earlham Institute Branding: Custom styling with institutional brand colors
- π‘οΈ Memory Management: 4GB per-container limits prevent OOM crashes
- Docker 20.10+ and Docker Compose 2.0+
- 48GB+ available RAM (recommended: 16 cores, 48GB RAM for 10-worker configuration)
- Linux host (Ubuntu 20.04+, CentOS 8+) or macOS with Docker Desktop
- Clone the repository:
git clone <repository-url>
cd cellxgene_stack- Copy the environment template:
cp .env.example .env- Add your datasets to the data directory:
mkdir -p data/datasets data/logs
# Copy your .h5ad files to data/datasets/
# Metadata is read directly from the h5ad files- Start the services:
docker-compose up -d- Access the landing page at
http://localhost(or your configured port)
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
β Nginx β
β (Reverse Proxy, Routing & Error Handling) β
βββββββββββ¬βββββββββββββββββββ¬βββββββββββββββββ¬ββββββββββββββββ
β β β
βββββββΌβββββββ βββββββββΌβββββββββ βββββΌβββββββββββββ
β Landing β β Static β β Dynamic β
β Page β β CellXGene β β CellXGene β
β (Flask + β β Service β β Containers β
β APSched) β β (Optional) β β (On-demand) β
β β β β β β
β - Catalog β β - Port 5005 β β - Ports 5006+ β
β - API β ββββββββββββββββββ β - Per dataset β
β - Containerβ β - Auto-cleanup β
β Manager β β 48h timeout β
βββββββ¬βββββββ ββββββ¬ββββββββββββ
β β
β βββββββββββββββββββββββββββ
β β
βββββββΌβββββββββΌββββββ
β Docker Socket β
β (Container Mgmt) β
βββββββββββ¬βββββββββββ
β
βββββββββββΌβββββββββββ
β Volume Mount β
β data/datasets/ β
β - *.h5ad files β
ββββββββββββββββββββββ
-
Nginx: Reverse proxy with intelligent routing and error handling
/β Landing page web interface/api/β Landing page REST API/cellxgene-{dataset_id}/β Dynamic per-dataset containers- Custom error pages for closed containers
-
Landing Page Service: Python Flask application with container orchestration
- Scans data directory for h5ad files using memory-mapped reading
- Extracts embedded metadata from each file
- Provides REST API for dataset catalog and container status
- Manages dynamic CellXGene container lifecycle
- Background scheduler for automatic cleanup (48-hour inactivity)
- Status polling endpoint for smooth container startup
- Admin panel for monitoring active containers
-
Dynamic CellXGene Containers: On-demand instances
- Spawned automatically when dataset is launched
- Each dataset gets isolated container on unique port (5006-5100)
- 4GB memory limit per container (configurable via CELLXGENE_MEMORY_PER_WORKER_GB)
- Production spec: 10 workers on 16-core, 48GB RAM VM
- Automatic cleanup after 48 hours of inactivity
- Health checking ensures ready before user access
- 180-second startup timeout for large files (4.5GB+)
- Dataset Cards: Grid view with metadata (organism, tissue, assay, cell/gene counts)
- Search & Filter: Find datasets by name, organism, tissue, or assay
- Sort Options: By name, cell count, or file size
- Launch Button: One-click launch with progress bar
- Stop Button: Appears after launch to manually close containers
- Status Badge: Green "Running" indicator for active containers
- Loading Progress: Real-time progress bar with estimated completion time
- Estimated Times: File size-based predictions (< 100MB: ~30s, > 3GB: ~3 mins)
Access at /admin to:
- View all active containers with status
- See dataset names, ports, file sizes
- Monitor last accessed time and inactive duration
- Estimate total memory usage
- Stop individual containers
- Auto-refreshes every 30 seconds
GET /api/datasets- List all datasetsGET /api/datasets/{id}- Get dataset detailsPOST /api/datasets/{id}/launch- Launch container (returns URL and timeout info)GET /api/datasets/{id}/status- Check container status (ready/starting)POST /api/datasets/{id}/keepalive- Update access timePOST /api/datasets/{id}/stop- Stop running containerGET /api/admin/containers- List active containers (admin)GET /api/health- Health checkGET /api/statistics- Get catalog statistics
Edit .env to customize:
- Ports: Change
NGINX_PORT,LANDING_PAGE_PORT,CELLXGENE_PORT - Workers: Adjust
CELLXGENE_WORKERS(default: 10 for production) - Memory: Modify
CELLXGENE_MEMORY_PER_WORKER_GB(default: 4GB per worker) - Host Paths: Set
HOST_DATA_DIRECTORYandHOST_LOG_DIRECTORYto absolute paths on your host machine - Container Paths: Set
DATA_DIRECTORY,LOG_DIRECTORY(internal container paths)
Production Configuration (16 cores, 48GB RAM):
CELLXGENE_WORKERS=10CELLXGENE_MEMORY_PER_WORKER_GB=4
Development Configuration (8GB RAM):
CELLXGENE_WORKERS=2CELLXGENE_MEMORY_PER_WORKER_GB=2
Important: Before deploying, copy .env.example to .env and update HOST_DATA_DIRECTORY and HOST_LOG_DIRECTORY with your actual paths.
- Place your
.h5adfile indata/datasets/ - Ensure your h5ad file has metadata embedded in the
.unsattribute - Restart the services:
docker-compose restart landing-page
The system will automatically extract metadata from the h5ad file's .uns attribute.
Required metadata fields can be stored under adata.uns['metadata']:
name: Dataset namedescription: Dataset descriptionorganism: Organism nametissue: Tissue typeassay: Assay technology
Cell and gene counts are automatically extracted from the data dimensions.
Example of adding metadata to an h5ad file:
import anndata
adata = anndata.read_h5ad("your_dataset.h5ad")
adata.uns['metadata'] = {
"name": "PBMC 3k Dataset",
"description": "3k PBMCs from a Healthy Donor",
"organism": "Homo sapiens",
"tissue": "peripheral blood",
"assay": "10x 3' v2"
}
adata.write_h5ad("your_dataset.h5ad")All documentation is located in the docs/ directory:
- Architecture - System design, components, and data flow
- Deployment - Remote VM deployment guide (Ubuntu/CentOS)
- API Documentation - REST API reference
- Adding Datasets - How to add new datasets
- Troubleshooting - Common issues and solutions
- Dynamic Containers - Container management details
- Inactivity Timeout - Auto-cleanup system
- Earlham Styling - Branding and design guidelines
- CI Fixes - CI/CD configuration notes
Run tests with:
# Unit tests
pytest services/landing-page/tests/unit/
# Integration tests
pytest services/landing-page/tests/integration/
# End-to-end tests
pytest tests/e2e/Out of Memory (OOM) Errors
- Symptom: Containers exit with code 137 or crash after ~20 seconds
- Cause: Dataset too large for available RAM
- Solutions:
- Close other containers via Admin Panel (
/admin) - Increase Docker memory limit in Docker Desktop settings
- Reduce number of concurrent containers
- Increase per-container memory limit in
container_manager.py
- Close other containers via Admin Panel (
Slow Loading / Timeouts
- Large files (>4GB) may take 2-3 minutes to load
- The system will retry automatically (up to 2 retries)
- Progress bar shows estimated time
- Check Admin Panel to see if containers are stuck
Container Not Starting
- Check
docker logs cellxgene-landing-pagefor errors - Verify dataset file exists and is valid h5ad format
- Ensure Docker has sufficient resources
- Check if port range (5006-5100) is available
Can't Stop Container
- Container may have already stopped automatically
- Check Admin Panel for current status
- Use
docker ps | grep cellxgeneto verify - Restart landing-page service if manager state is inconsistent
See docs/troubleshooting.md for more detailed solutions.
This project follows the constitution defined in .specify/memory/constitution.md:
- β Unit Testing: 80%+ test coverage with pytest
- β Modular Architecture: Containerized services with clear boundaries
- β Code Clarity: Comprehensive documentation and comments
- β Fail-Fast: Startup validation with explicit error messages
- β Documentation: README, API docs, deployment guides, troubleshooting
- β Accessibility: Designed for users with varying technical expertise
[Specify your license here]
For issues or questions, please open an issue or consult the documentation in the docs/ directory.