Darwin is an enterprise-grade, end-to-end machine learning platform. This repository handles deployment orchestration - building Docker images and deploying the entire platform to Kubernetes using Helm charts.
1. ./init.sh # Interactive use case selection (creates .setup/enabled-services.yaml)
2. ./setup.sh # Build images, create Kind cluster, push to local registry
3. ./start.sh # Deploy to Kubernetes via Helminit.sh Modes:
| Mode | Command | Description |
|---|---|---|
| Default | ./init.sh |
Simplified preset selection (Training / Inference) |
| Dev Mode | ./init.sh --dev-mode |
Granular service-by-service selection |
| All | ./init.sh --all |
Enable all services without prompts |
Presets (Default Mode):
| Preset | Features Enabled | Use Case |
|---|---|---|
| Training | Compute + MLFlow | Model training, experiments, distributed compute |
| Inference | Serve + MLFlow | Model deployment, real-time predictions |
Other Flags:
./setup.sh -y- Skip prompts (auto-answer yes)
| Feature | Applications | Description |
|---|---|---|
| Compute | darwin-compute, darwin-cluster-manager | Ray cluster management & K8s orchestration |
| Workspace | darwin-workspace | Project & Jupyter environment management |
| Feature Store | darwin-ofs-v2, darwin-ofs-v2-admin, darwin-ofs-v2-consumer | Online feature serving (<10ms latency) |
| MLflow | darwin-mlflow, darwin-mlflow-app | Experiment tracking & model registry |
| Serve | ml-serve-app, artifact-builder | Model deployment & Docker image building |
| Catalog | darwin-catalog | Data asset discovery & lineage |
| Chronos | chronos, chronos-consumer | Event processing & metadata tracking |
| Workflow | darwin-workflow | ML pipeline orchestration (Airflow-based) |
| Datastore | Usage |
|---|---|
| MySQL | Metadata storage for all services |
| Cassandra | Feature Store values (high-throughput) |
| OpenSearch | Chronos events, Compute metadata |
| Kafka + Zookeeper | Event streaming, feature materialization |
| LocalStack | S3 emulation for artifacts |
| Airflow | Workflow DAG execution |
| Elasticsearch | Workflow search (alternative to OpenSearch) |
- KubeRay Operator (v1.1.0) - Ray cluster lifecycle management
- Nginx - Ingress controller
- Grafana - Monitoring dashboards
| File | Purpose |
|---|---|
init.sh |
Interactive service selection wizard (run first) |
setup.sh |
Creates cluster & builds all images |
start.sh |
Deploys platform via Helm with config overrides |
services.yaml |
Application registry - defines available services, datastores, operators |
service-dependencies.yaml |
Service-to-service and service-to-datastore dependencies |
.setup/config.env |
Runtime configuration (generated) |
.setup/enabled-services.yaml |
User-selected services config (generated by init.sh) |
helm/darwin/ |
Main Helm umbrella chart |
kind/ |
Local Kubernetes cluster config |
deployer/ |
Base images (Python, Java, Go) and build scripts |
- Run
init.shfirst - This creates.setup/enabled-services.yamlwith user's service selections - Load prompts on-demand - Only read prompts relevant to the current task
- Check
.setup/enabled-services.yaml- This is the source of truth for which services are enabled - Check
services.yaml- Defines available applications, datastores, and operators - Check
service-dependencies.yaml- Understand service dependencies before enabling/disabling - Check
.setup/config.env- Contains current KUBECONFIG and DOCKER_REGISTRY - Respect
.odin/conventions - Each submodule must have build.sh, setup.sh, start.sh
kubectl get pods -n darwin # Darwin services
kubectl get pods -n ray # Ray clusters
kubectl get pods -n serve # Model serving pods# Build and push image
sh deployer/scripts/image-builder.sh -a <app-name> -t <base-path> -p <path> -e <base-image> -r $DOCKER_REGISTRY
# Restart deployment
kubectl rollout restart deployment/<service-name> -n darwin- Compute:
http://localhost/compute/* - Feature Store:
http://localhost/feature-store/* - MLflow UI:
http://localhost/mlflow-app/* - Chronos:
http://localhost/chronos/* - Catalog:
http://localhost/darwin-catalog/* - Workspace:
http://localhost/workspace/* - Workflow:
http://localhost/workflow/*
- Add entry to
services.yamlunderapplications: - Add dependencies to
service-dependencies.yaml - Create Helm subchart in
helm/darwin/charts/services/ - Update
init.shif it's a new feature group - Update
start.shwith helm path mapping inget_helm_path()
- Add entry to
services.yamlunderdatastores: - Create templates in
helm/darwin/charts/datastores/templates/ - Update
service-dependencies.yamlfor services that need it
darwin-compute → darwin-cluster-manager
darwin-workspace → darwin-compute
darwin-workflow → darwin-compute, darwin-cluster-manager
ml-serve-app → artifact-builder, darwin-cluster-manager, darwin-mlflow-app
darwin-mlflow-app → darwin-mlflow
darwin-ofs-v2 → darwin-ofs-v2-admin
darwin-ofs-v2-consumer → darwin-ofs-v2-admin
chronos-consumer → chronos
Unified command-line interface for all Darwin services:
source .venv/bin/activate
darwin config set --env darwin-local
darwin serve configure
darwin serve create --name my-model --type api --space serve
darwin serve deploy-model --serve-name my-model --model-uri mlflow-artifacts:/...📖 Full documentation: darwin-cli/README.md
| Image | Ray Version | Python | Spark |
|---|---|---|---|
ray:2.37.0 |
2.37.0 | 3.10 | - |
ray:2.53.0 |
2.53.0 | 3.10 | - |
ray:2.37.0-darwin-sdk |
2.37.0 | 3.10 | 3.5.0 |
Darwin SDK Runtime includes Spark integration for distributed data processing.
| Document | Location |
|---|---|
| Main README | README.md |
| Darwin CLI | darwin-cli/README.md |
| Helm Umbrella Chart | helm/darwin/UMBRELLA_CHART.md |
| Deployment Order | helm/darwin/DEPLOYMENT_ORDER.md |
| Feature Store Architecture | feature-store/ARCHITECTURE.md |
Cluster not reachable:
source .setup/config.env
kubectl cluster-infoService not starting:
kubectl describe pod <pod-name> -n darwin
kubectl logs <pod-name> -n darwinHelm deployment failed:
helm status darwin -n darwin
helm history darwin -n darwinLocalStack S3 issues:
kubectl port-forward svc/darwin-localstack -n darwin 4566:4566
AWS_ACCESS_KEY_ID=test AWS_SECRET_ACCESS_KEY=test aws s3 ls --endpoint-url=http://localhost:4566| Task | Steps |
|---|---|
| First-time setup (simple) | Run ./init.sh → Select Training/Inference → ./setup.sh → ./start.sh |
| First-time setup (advanced) | Run ./init.sh --dev-mode → Select individual services → ./setup.sh → ./start.sh |
| Add new microservice | Edit services.yaml → Add Helm chart → Update init.sh/start.sh |
| Enable/disable service | Edit .setup/enabled-services.yaml → Run ./start.sh |
| Rebuild images | Run ./setup.sh -y |
| Debug pod | kubectl logs/describe → Check service dependencies |
| Deploy model | Use Darwin CLI (see darwin-cli/README.md#serve-commands) |