Darwin ML Platform - Agent Entry Point

Darwin is an enterprise-grade, end-to-end machine learning platform. This repository handles deployment orchestration - building Docker images and deploying the entire platform to Kubernetes using Helm charts.

🚀 Setup Workflow

1. ./init.sh      # Interactive use case selection (creates .setup/enabled-services.yaml)
2. ./setup.sh     # Build images, create Kind cluster, push to local registry
3. ./start.sh     # Deploy to Kubernetes via Helm

init.sh Modes:

Mode	Command	Description
Default	`./init.sh`	Simplified preset selection (Training / Inference)
Dev Mode	`./init.sh --dev-mode`	Granular service-by-service selection
All	`./init.sh --all`	Enable all services without prompts

Presets (Default Mode):

Preset	Features Enabled	Use Case
Training	Compute + MLFlow	Model training, experiments, distributed compute
Inference	Serve + MLFlow	Model deployment, real-time predictions

Other Flags:

./setup.sh -y - Skip prompts (auto-answer yes)

🏗️ Project Architecture

Platform Components

Feature	Applications	Description
Compute	darwin-compute, darwin-cluster-manager	Ray cluster management & K8s orchestration
Workspace	darwin-workspace	Project & Jupyter environment management
Feature Store	darwin-ofs-v2, darwin-ofs-v2-admin, darwin-ofs-v2-consumer	Online feature serving (<10ms latency)
MLflow	darwin-mlflow, darwin-mlflow-app	Experiment tracking & model registry
Serve	ml-serve-app, artifact-builder	Model deployment & Docker image building
Catalog	darwin-catalog	Data asset discovery & lineage
Chronos	chronos, chronos-consumer	Event processing & metadata tracking
Workflow	darwin-workflow	ML pipeline orchestration (Airflow-based)

Datastores

Datastore	Usage
MySQL	Metadata storage for all services
Cassandra	Feature Store values (high-throughput)
OpenSearch	Chronos events, Compute metadata
Kafka + Zookeeper	Event streaming, feature materialization
LocalStack	S3 emulation for artifacts
Airflow	Workflow DAG execution
Elasticsearch	Workflow search (alternative to OpenSearch)

Infrastructure Operators

KubeRay Operator (v1.1.0) - Ray cluster lifecycle management
Nginx - Ingress controller
Grafana - Monitoring dashboards

📁 Key Files Reference

File	Purpose
`init.sh`	Interactive service selection wizard (run first)
`setup.sh`	Creates cluster & builds all images
`start.sh`	Deploys platform via Helm with config overrides
`services.yaml`	Application registry - defines available services, datastores, operators
`service-dependencies.yaml`	Service-to-service and service-to-datastore dependencies
`.setup/config.env`	Runtime configuration (generated)
`.setup/enabled-services.yaml`	User-selected services config (generated by init.sh)
`helm/darwin/`	Main Helm umbrella chart
`kind/`	Local Kubernetes cluster config
`deployer/`	Base images (Python, Java, Go) and build scripts

🔧 Agent Instructions

Run init.sh first - This creates .setup/enabled-services.yaml with user's service selections
Load prompts on-demand - Only read prompts relevant to the current task
Check .setup/enabled-services.yaml - This is the source of truth for which services are enabled
Check services.yaml - Defines available applications, datastores, and operators
Check service-dependencies.yaml - Understand service dependencies before enabling/disabling
Check .setup/config.env - Contains current KUBECONFIG and DOCKER_REGISTRY
Respect .odin/ conventions - Each submodule must have build.sh, setup.sh, start.sh

Common Operations

Check Cluster Status

kubectl get pods -n darwin          # Darwin services
kubectl get pods -n ray             # Ray clusters
kubectl get pods -n serve           # Model serving pods

Rebuild a Single Service

# Build and push image
sh deployer/scripts/image-builder.sh -a <app-name> -t <base-path> -p <path> -e <base-image> -r $DOCKER_REGISTRY

# Restart deployment
kubectl rollout restart deployment/<service-name> -n darwin

Access Services (Local)

Compute: http://localhost/compute/*
Feature Store: http://localhost/feature-store/*
MLflow UI: http://localhost/mlflow-app/*
Chronos: http://localhost/chronos/*
Catalog: http://localhost/darwin-catalog/*
Workspace: http://localhost/workspace/*
Workflow: http://localhost/workflow/*

Adding New Services

Add entry to services.yaml under applications:
Add dependencies to service-dependencies.yaml
Create Helm subchart in helm/darwin/charts/services/
Update init.sh if it's a new feature group
Update start.sh with helm path mapping in get_helm_path()

Adding New Datastores

Add entry to services.yaml under datastores:
Create templates in helm/darwin/charts/datastores/templates/
Update service-dependencies.yaml for services that need it

📦 Service Dependencies (Quick Reference)

darwin-compute         → darwin-cluster-manager
darwin-workspace       → darwin-compute
darwin-workflow        → darwin-compute, darwin-cluster-manager
ml-serve-app           → artifact-builder, darwin-cluster-manager, darwin-mlflow-app
darwin-mlflow-app      → darwin-mlflow
darwin-ofs-v2          → darwin-ofs-v2-admin
darwin-ofs-v2-consumer → darwin-ofs-v2-admin
chronos-consumer       → chronos

🛠️ CLI Tools

Darwin CLI

Unified command-line interface for all Darwin services:

source .venv/bin/activate
darwin config set --env darwin-local
darwin serve configure
darwin serve create --name my-model --type api --space serve
darwin serve deploy-model --serve-name my-model --model-uri mlflow-artifacts:/...

📖 Full documentation: darwin-cli/README.md

📊 Ray Runtimes

Image	Ray Version	Python	Spark
`ray:2.37.0`	2.37.0	3.10	-
`ray:2.53.0`	2.53.0	3.10	-
`ray:2.37.0-darwin-sdk`	2.37.0	3.10	3.5.0

Darwin SDK Runtime includes Spark integration for distributed data processing.

📚 Additional Documentation

Document	Location
Main README	`README.md`
Darwin CLI	`darwin-cli/README.md`
Helm Umbrella Chart	`helm/darwin/UMBRELLA_CHART.md`
Deployment Order	`helm/darwin/DEPLOYMENT_ORDER.md`
Feature Store Architecture	`feature-store/ARCHITECTURE.md`

🐛 Troubleshooting

Common Issues

Cluster not reachable:

source .setup/config.env
kubectl cluster-info

Service not starting:

kubectl describe pod <pod-name> -n darwin
kubectl logs <pod-name> -n darwin

Helm deployment failed:

helm status darwin -n darwin
helm history darwin -n darwin

LocalStack S3 issues:

kubectl port-forward svc/darwin-localstack -n darwin 4566:4566
AWS_ACCESS_KEY_ID=test AWS_SECRET_ACCESS_KEY=test aws s3 ls --endpoint-url=http://localhost:4566

🎯 Quick Task Navigation

Task	Steps
First-time setup (simple)	Run `./init.sh` → Select Training/Inference → `./setup.sh` → `./start.sh`
First-time setup (advanced)	Run `./init.sh --dev-mode` → Select individual services → `./setup.sh` → `./start.sh`
Add new microservice	Edit `services.yaml` → Add Helm chart → Update `init.sh`/`start.sh`
Enable/disable service	Edit `.setup/enabled-services.yaml` → Run `./start.sh`
Rebuild images	Run `./setup.sh -y`
Debug pod	`kubectl logs/describe` → Check service dependencies
Deploy model	Use Darwin CLI (see `darwin-cli/README.md#serve-commands`)

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Darwin ML Platform - Agent Entry Point

🚀 Setup Workflow

🏗️ Project Architecture

Platform Components

Datastores

Infrastructure Operators

📁 Key Files Reference

🔧 Agent Instructions

Common Operations

Check Cluster Status

Rebuild a Single Service

Access Services (Local)

Adding New Services

Adding New Datastores

📦 Service Dependencies (Quick Reference)

🛠️ CLI Tools

Darwin CLI

📊 Ray Runtimes

📚 Additional Documentation

🐛 Troubleshooting

Common Issues

🎯 Quick Task Navigation

FilesExpand file tree

START_CHAT.md

Latest commit

History

START_CHAT.md

File metadata and controls

Darwin ML Platform - Agent Entry Point

🚀 Setup Workflow

🏗️ Project Architecture

Platform Components

Datastores

Infrastructure Operators

📁 Key Files Reference

🔧 Agent Instructions

Common Operations

Check Cluster Status

Rebuild a Single Service

Access Services (Local)

Adding New Services

Adding New Datastores

📦 Service Dependencies (Quick Reference)

🛠️ CLI Tools

Darwin CLI

📊 Ray Runtimes

📚 Additional Documentation

🐛 Troubleshooting

Common Issues

🎯 Quick Task Navigation