Thank you for your interest in contributing to Darwin! This document provides guidelines and best practices for contributing to the Darwin ML Platform codebase.
- Getting Started for Contributors
- Development Guidelines
- Testing Expectations
- Adding New Features or Modules
- Security & Access
- Deploy & Verify Changes
- Communication Expectations
- Getting Help
Ensure you have the following installed on your system:
Required Tools:
- Git v2.30+
- Docker v20.10+ and Docker Compose
- Kind v0.11+ (for local Kubernetes)
- kubectl v1.24+
- Helm v3.9+
- yq v4.0+ (YAML processor, auto-installed by setup script)
Language-Specific Requirements:
| Component | Language | Version | Build Tool |
|---|---|---|---|
| Feature Store | Java | 11+ | Maven 3.8+ |
| Compute, MLflow, Workspace, ML Serve | Python | 3.9.7+ | pip, venv |
| Cluster Manager | Go | 1.18+ | Go modules |
System Requirements:
- OS: macOS (darwin) or Linux
- Architecture: x86_64 (amd64) or arm64
- RAM: Minimum 16GB (for running full platform)
- Disk: At least 20GB free space
# Fork the repository on GitHub, then clone your fork
git clone https://github.com/YOUR_USERNAME/darwin.git
cd darwin# Interactive wizard to select components to enable
./init.sh
# Build base images and setup local Kind cluster
./setup.sh -y # Non-interactive, keeps existing data
./setup.sh -y --clean # Non-interactive, clean install (deletes cluster & data)
# Deploy Darwin platform to local cluster
./start.shWhat this does:
- Creates a Kind Kubernetes cluster
- Builds base Docker images (Java 11, Python 3.9.7, Go 1.18)
- Compiles and builds all enabled service images
- Deploys services via Helm to the local cluster
darwin/
βββ darwin-compute/ # Ray cluster orchestration (Python)
β βββ app_layer/ # FastAPI REST API
β βββ core/ # Business logic
β βββ model/ # Data models
β βββ sdk/ # Python SDK
β βββ script/ # Background jobs (status poller, auto-termination)
βββ darwin-cluster-manager/ # Kubernetes orchestration (Go)
β βββ services/ # Service layer
β βββ rest/ # HTTP handlers
β βββ charts/ # Helm chart templates
βββ feature-store/ # Feature Store (Java/Vert.x)
β βββ app/ # Online serving
β βββ admin/ # Feature management
β βββ consumer/ # Kafka consumer
β βββ populator/ # Bulk ingestion
β βββ python/ # Python SDK
βββ mlflow/ # Experiment tracking (Python)
β βββ app_layer/ # FastAPI wrapper
β βββ sdk/ # MLflow client wrapper
βββ ml-serve-app/ # Model serving (Python)
β βββ app_layer/ # REST API
β βββ core/ # Deployment logic
β βββ model/ # Tortoise ORM models
β βββ runtime/ # Serving runtime template
βββ artifact-builder/ # Docker image builder (Python)
βββ chronos/ # Event processing (Python)
βββ workspace/ # Project management (Python)
βββ darwin-catalog/ # Data catalog (Java/Spring Boot)
βββ hermes-cli/ # Serve CLI backend (Python/Typer)
βββ darwin-cli/ # Unified CLI for all services (Python/Typer)
βββ helm/ # Helm charts
β βββ darwin/ # Umbrella chart
β βββ charts/datastores/ # MySQL, Cassandra, Kafka, etc.
β βββ charts/services/ # Application services
βββ deployer/ # Build infrastructure
β βββ images/ # Base Docker images
β βββ scripts/ # Image builders
βββ kind/ # Local Kubernetes config
βββ init.sh # Configuration wizard
βββ setup.sh # Build and setup script
βββ start.sh # Deployment script
βββ services.yaml # Service registry
# Example: Setting up darwin-compute
cd darwin-compute
# Create virtual environment
python3.9 -m venv venv
source venv/bin/activate # On Windows: venv\Scripts\activate
# Install all modules in editable mode
pip install -e app_layer/.
pip install -e core/.
pip install -e model/.
pip install -e sdk/.
pip install -e script/.
# Install development dependencies
pip install -r core/requirements_dev.txtPyCharm Setup:
- Mark all
srcdirectories as "Sources Root" (Right-click β Mark Directory as β Sources Root) - Mark all
testsdirectories as "Test Sources Root" - Configure Python interpreter to use the virtual environment
- Install plugins: Black (code formatter), EnvFile (environment variables)
# Example: Setting up feature-store
cd feature-store
# Build the project
mvn clean compile package
# Run tests
mvn clean verifyIntelliJ IDEA Setup:
- Import as Maven project
- Set JDK to 11
- Enable annotation processing (for Lombok and MapStruct)
- Run
mvn clean compileto generate sources - Sync Maven project: Right-click
pom.xmlβ Maven β Reload project
# Example: Setting up darwin-cluster-manager
cd darwin-cluster-manager
# Download dependencies
go mod download
# Build the project
make build
# Run tests
make testStyle Guide: PEP 8 with line length 120 characters
Required Tools:
- Black (code formatter):
black -l 120 src/ tests/ - isort (import sorting):
isort src/ tests/ - mypy (type checking):
mypy src/ - pylint (linting):
pylint src/ - pytest (testing):
pytest tests/
Type Hints: All functions must have type hints
from typing import List, Dict, Optional
def fetch_cluster(cluster_id: str, user: Optional[str] = None) -> Dict[str, Any]:
"""Fetch cluster details by ID."""
...Docstrings: Use Google-style docstrings
def create_cluster(cluster_def: ComputeClusterDefinition) -> dict:
"""Create a new Ray cluster.
Args:
cluster_def: Cluster configuration definition
Returns:
Dictionary containing cluster_id and status
Raises:
ValueError: If cluster definition is invalid
"""
...Logging: Use structured logging
from loguru import logger
logger.info(f"Creating cluster: {cluster_id}")
logger.error(f"Failed to create cluster: {cluster_id}", exc_info=True)Style Guide: Google Java Style Guide
Required Tools:
- Spotless (code formatter):
mvn spotless:apply - Checkstyle (style checker): Configured in pom.xml
- Lombok: Use for boilerplate reduction (@Data, @Builder)
- MapStruct: Use for DTO mappings
Code Formatting:
# Format code before committing
mvn spotless:applyNaming Conventions:
- Classes: PascalCase (
FeatureGroupService) - Methods: camelCase (
createFeatureGroup) - Constants: UPPER_SNAKE_CASE (
DEFAULT_TIMEOUT) - Packages: lowercase (
com.dream11.app.service)
Style Guide: Effective Go + Go Code Review Comments
Required Tools:
- gofmt (formatting):
gofmt -w . - golint (linting):
golint ./... - go vet (static analysis):
go vet ./...
Naming Conventions:
- Exported identifiers: PascalCase (
CreateCluster) - Private identifiers: camelCase (
getClusterStatus) - Acronyms: All uppercase or all lowercase (
HTTPServer,httpServer)
Every service submodule MUST contain .odin/{service-name}/ with these scripts:
service-name/
βββ .odin/
β βββ {service-name}/
β βββ build.sh # Compile and prepare artifacts (REQUIRED)
β βββ setup.sh # Install dependencies at Docker build time (REQUIRED)
β βββ start.sh # Container entrypoint (REQUIRED)
β βββ pre-deploy.sh # Database migrations (OPTIONAL)
build.sh - Compiles application, outputs to target/:
#!/bin/bash
set -e
SERVICE_NAME="my-service"
mkdir -p target/$SERVICE_NAME
# Copy application files
cp -r src/* target/$SERVICE_NAME/
cp requirements.txt target/$SERVICE_NAME/
echo "Build completed for $SERVICE_NAME"setup.sh - Runs at Docker build time:
#!/bin/bash
set -e
cd /app
pip install --no-cache-dir -r requirements.txtstart.sh - Container entrypoint:
#!/bin/bash
cd /app
exec uvicorn main:app --host 0.0.0.0 --port 8000REST API Standards:
- Use FastAPI for Python services
- Use Spring Boot/Vert.x for Java services
- Follow RESTful conventions (GET, POST, PUT, DELETE)
- Use snake_case for JSON keys in Python services
- Use camelCase for JSON keys in Java services
- Include OpenAPI/Swagger documentation
Response Format (Python services):
{
"status": "SUCCESS" | "FAILURE",
"data": {...},
"message": "Optional message"
}Error Handling:
from fastapi import HTTPException
raise HTTPException(
status_code=404,
detail={"error": "ClusterNotFound", "cluster_id": cluster_id}
)MySQL Naming:
- Tables: snake_case (
compute_clusters,feature_groups) - Columns: snake_case (
cluster_id,created_at) - Indexes:
idx_{table}_{column} - Foreign keys:
fk_{table}_{referenced_table}
Schema Migrations:
- Feature Store: Flyway migrations in
resources/db/ - Compute: SQL scripts in
resources/db/mysql/migrations/ - ML Serve/Artifact Builder: Tortoise ORM auto-migration
Connection Pooling:
- Use connection pooling for all database access
- Configuration files in
resources/config/mysql/
Python Packages (installed in editable mode):
compute_model: Shared models across Compute SDK and serviceml_serve_model: Shared models for ML Servedarwin_fs: Feature Store Python SDKdarwin_mlflow: MLflow wrapper SDKdarwin_compute: Compute SDK
Java Modules (Maven multi-module):
core: Shared domain logicapp: Application serviceadmin: Admin serviceconsumer: Consumer service
Version Pinning: All dependencies must be pinned to specific versions
# requirements.txt
fastapi==0.104.1
ray==2.37.0
mlflow==2.12.2Dependency Updates:
- Create a separate PR for dependency updates
- Run full test suite before merging
- Document breaking changes in PR description
Environment Variables:
- Use uppercase SNAKE_CASE (
MYSQL_HOST,ENV) - Document all environment variables in service README
- Use
.env.examplefiles (NEVER commit actual.envfiles)
Configuration Files:
- YAML for Kubernetes/Helm configurations
.conffiles for database connections- Store in
resources/config/directory
Secrets Management:
- Use Kubernetes Secrets in production
- Use environment variables in local development
- NEVER hardcode credentials in code
| Service Type | Unit Tests | Integration Tests | E2E Tests |
|---|---|---|---|
| Python Services | β₯70% | Required | Optional |
| Java Services | β₯60% | Required | Required |
| Go Services | β₯60% | Required | Optional |
# Run all tests
pytest
# Run with coverage
pytest --cov=compute_core --cov-report=html
# Run specific test file
pytest tests/test_compute.py
# Run specific test
pytest tests/test_compute.py::test_create_cluster
# Run with verbose output
pytest -v
# Run integration tests only
pytest -m integrationTest Markers:
import pytest
@pytest.mark.unit
def test_cluster_validation():
...
@pytest.mark.integration
def test_dcm_integration():
...
@pytest.mark.slow
def test_long_running_job():
...# Run all tests
mvn test
# Run with coverage
mvn clean verify
# Run specific test class
mvn test -Dtest=FeatureGroupServiceTest
# Skip tests (use sparingly)
mvn clean package -DskipTests# Run all tests
go test ./...
# Run with coverage
go test -cover ./...
# Run specific package
go test ./services/clusterv2
# Run with verbose output
go test -v ./...service/
βββ tests/ # Test directory
β βββ conftest.py # Pytest fixtures (Python)
β βββ test_unit/ # Unit tests
β βββ test_integration/ # Integration tests
β βββ test_e2e/ # End-to-end tests
Fixtures and Mocks:
- Store test data in
tests/fixtures/ortests/resources/ - Use pytest fixtures for reusable test setup
- Mock external services (databases, APIs) in unit tests
- Use real services in integration tests (Docker Compose)
Example Pytest Fixture:
# conftest.py
import pytest
from compute_core.compute import Compute
@pytest.fixture
def compute_client():
"""Provide a Compute client for tests."""
return Compute(env="test")
@pytest.fixture
def sample_cluster_definition():
"""Provide a sample cluster definition."""
return {
"name": "test-cluster",
"runtime": "Ray2.37.0-Py310-CPU",
"head_node": {"cores": 2, "memory": 4}
}- Test Naming: Use descriptive names (
test_create_cluster_with_valid_config) - AAA Pattern: Arrange, Act, Assert
- Isolation: Tests should not depend on each other
- Cleanup: Always clean up resources (clusters, databases) after tests
- Deterministic: Tests should produce consistent results
- Fast: Unit tests should run in milliseconds
- Check Existing Issues: Search for existing issues/discussions
- Create an Issue: Describe the feature, use case, and proposed approach
- Discuss: Wait for feedback from maintainers before implementation
- Design Document: For major features, create an RFC (see below)
We use a structured process for discussing changes based on their complexity:
| Change Type | Process | Example |
|---|---|---|
| Small fix/feature | Open Issue β PR | Fix typo, add config option |
| Medium feature | Open Issue β Discuss β PR | New API endpoint, refactor module |
| Large/Breaking change | RFC Issue β Design Review β PR | New service, breaking API change, architecture change |
For significant changes, use the RFC template:
1. Draft β Author creates RFC issue with [RFC] prefix
2. Discussion β Team reviews, comments (minimum 1 week)
3. Revision β Author addresses feedback
4. Decision β Team lead approves/rejects
5. Implement β Create feature branch linked to RFC
Create an RFC when:
- Adding a new service or major component
- Changing public APIs in breaking ways
- Introducing new dependencies or technologies
- Architectural changes affecting multiple services
- Changes requiring database migrations
When working on a feature:
- Create Issue/RFC first - Get alignment before coding
- Reference in branch name -
feat/123-add-gpu-support(issue #123) - Link PR to Issue - Use
Closes #123in PR description - Update Issue with progress - Comment on blockers, decisions
| Topic | Location |
|---|---|
| Bug reports | GitHub Issues (Bug Report template) |
| Feature ideas | GitHub Issues (Feature Request template) |
| Design proposals | GitHub Issues (RFC template) |
| Implementation questions | PR comments |
| General Q&A | GitHub Discussions |
| Quick questions | Team chat (Slack/Discord) |
For RFCs and significant changes:
- Approval: 2+ team members add π and "LGTM"
- Changes Requested: Comment with specific feedback
- Blocking: Add π with clear reason (security, performance, etc.)
- Timeout: If no response in 1 week, author can ping or escalate
We use trunk-based development - all work targets main directly (no develop branch).
# Update your fork
git checkout main
git pull upstream main
# Create a feature branch
git checkout -b feat/your-feature-nameBranch Naming Conventions:
feat/- New featuresfix/- Bug fixeshotfix/- Critical production fixes (from release tags)refactor/- Code refactoringdocs/- Documentation updateschore/- Maintenance tasks
Checklist before committing:
- Code follows style guidelines (run linters/formatters)
- All tests pass (
pytest,mvn test,go test) - New tests added for new functionality
- Documentation updated (README, docstrings, comments)
- No sensitive data committed (credentials, tokens, PII)
- Type hints added (Python) or proper types used (Java/Go)
- No debug statements or commented-out code
- Environment variables documented
Follow Conventional Commits format:
# Format: <type>(<scope>): <subject>
git commit -m "feat(compute): add auto-scaling policy support"
git commit -m "fix(feature-store): resolve Cassandra timeout issues"
git commit -m "docs(mlflow): update SDK usage examples"
git commit -m "refactor(serve): simplify deployment logic"
git commit -m "test(catalog): add lineage tracking tests"Commit Types:
feat: New featurefix: Bug fixdocs: Documentation changesstyle: Code style changes (formatting, no logic change)refactor: Code refactoringtest: Adding or updating testschore: Maintenance tasks (dependencies, build scripts)perf: Performance improvements
Commit Scope Examples:
compute,feature-store,mlflow,serve,catalog,chronossdk,cli,helm,deploy
# Push to your fork
git push origin feature/your-feature-nameCreate a Pull Request on GitHub with this template:
## Description
Brief description of the changes.
## Type of Change
- [ ] Bug fix (non-breaking change which fixes an issue)
- [ ] New feature (non-breaking change which adds functionality)
- [ ] Breaking change (fix or feature that would cause existing functionality to not work as expected)
- [ ] Documentation update
## Related Issue
Closes #123
## Changes Made
- Change 1
- Change 2
- Change 3
## Testing
Describe the tests you ran:
- [ ] Unit tests pass
- [ ] Integration tests pass
- [ ] Manual testing performed
## Deployment Notes
Any special deployment considerations?
## Checklist
- [ ] My code follows the style guidelines
- [ ] I have performed a self-review
- [ ] I have commented my code, particularly in hard-to-understand areas
- [ ] I have made corresponding changes to the documentation
- [ ] My changes generate no new warnings
- [ ] I have added tests that prove my fix is effective or that my feature works
- [ ] New and existing unit tests pass locally
- [ ] Any dependent changes have been merged and published
## Screenshots (if applicable)Critical Paths:
- Cluster lifecycle management (create, start, stop, restart)
- DCM integration (cluster deployment)
- Ray cluster configuration generation
- Auto-termination policies
Testing Requirements:
- Mock DCM responses in unit tests
- Test cluster state transitions
- Validate YAML generation for Ray clusters
- Test auto-termination policy triggers
Breaking Change Checklist:
- SDK backward compatibility maintained
- Database schema migrations included
- API versioning updated if needed
- Documentation updated
Critical Paths:
- Feature serving API (low-latency path)
- Cassandra read/write operations
- Feature group schema management
- Kafka consumer offset management
Testing Requirements:
- Performance tests for feature retrieval (<10ms)
- Integration tests with Cassandra
- Schema evolution tests
- Consumer lag monitoring
Breaking Change Checklist:
- Python SDK updated (
darwin_fs) - API version bump
- Migration scripts for schema changes
- Backward compatibility for old schemas
Critical Paths:
- Serve deployment logic
- Artifact build integration
- DCM integration for deployment
- Model URI resolution (MLflow/S3)
Testing Requirements:
- Test deployment to multiple environments
- Mock artifact builder responses
- Test auto-scaling configuration
- Validate Helm values generation
Breaking Change Checklist:
- Darwin CLI updated
- Deployment configs migrated
- Active deployments not affected
Critical Paths:
- Experiment and run tracking
- Artifact storage (S3)
- Authentication and permissions
- Proxy to MLflow backend
Testing Requirements:
- Test user permissions
- Artifact upload/download
- Experiment CRUD operations
- Auth middleware
Breaking Change Checklist:
- SDK wrapper updated (
darwin_mlflow) - Migration for database schema
- Existing experiments accessible
NEVER Commit:
- Database credentials
- API keys and tokens
- AWS access keys
- Private keys or certificates
- User PII or sensitive data
- Internal hostnames or IPs
Approved Methods:
- Local Development: Use
.envfiles (add to.gitignore) - Kubernetes: Use Kubernetes Secrets
- CI/CD: Use GitHub Secrets or CI environment variables
Example .env file (NEVER commit):
# Database
MYSQL_HOST=localhost
MYSQL_USERNAME=darwin
MYSQL_PASSWORD=password
# AWS (LocalStack for local)
AWS_ACCESS_KEY_ID=test
AWS_SECRET_ACCESS_KEY=test
AWS_DEFAULT_REGION=us-east-1Service Authentication:
- Use service accounts in Kubernetes
- Use IAM roles for AWS services
- Use token-based auth for service-to-service communication
User Authentication:
- Email-based authentication via headers (
emailheader) - Token-based authentication for CLI tools
- MLflow Basic Auth for experiment access
Implementing Auth in New Services:
from fastapi import Header, HTTPException
async def get_current_user(email: str = Header(...)):
"""Extract user from email header."""
if not email:
raise HTTPException(status_code=401, detail="Authentication required")
return emailReviewers should verify:
- No hardcoded credentials
- SQL injection prevention (parameterized queries)
- Input validation and sanitization
- Proper error handling (no sensitive data in error messages)
- Rate limiting on public endpoints
- CORS configured appropriately
- Dependencies have no known vulnerabilities
# Clean rebuild
./setup.sh -y
# Redeploy with changes
./start.sh
# Check deployment status
kubectl get pods -n darwin
kubectl logs -f deployment/darwin-compute -n darwinOption 1: Rebuild and Redeploy Single Service
# Example: Rebuild darwin-compute
cd darwin-compute
# Rebuild Docker image
docker build -t darwin-compute:latest \
--build-arg BASE_IMAGE=darwin/python:3.9.7-pip-bookworm-slim \
-f ../deployer/images/Dockerfile ..
# Tag and push to local registry
docker tag darwin-compute:latest localhost:5000/darwin-compute:latest
docker push localhost:5000/darwin-compute:latest
# Restart deployment
kubectl rollout restart deployment/darwin-compute -n darwin
kubectl rollout status deployment/darwin-compute -n darwinOption 2: Local Development (without Docker)
# Example: Run darwin-compute locally
cd darwin-compute/app_layer
# Set environment variables
export ENV=local
export VAULT_SERVICE_MYSQL_USERNAME=darwin
export VAULT_SERVICE_MYSQL_PASSWORD=password
# ... other env vars
# Run service
uvicorn src.compute_app_layer.main:app --reload --port 8000Port Forwarding for Local Testing:
# Forward Darwin Compute
kubectl port-forward deployment/darwin-compute 8000:8000 -n darwin
# Forward MySQL
kubectl port-forward service/darwin-mysql 3306:3306 -n darwin
# Forward MLflow
kubectl port-forward deployment/darwin-mlflow-app 8080:8000 -n darwinTest Endpoints:
# Health check
curl http://localhost:8000/health
# Create cluster (example)
curl -X POST http://localhost:8000/cluster \
-H "Content-Type: application/json" \
-H "msd-user: {\"email\": \"test@example.com\"}" \
-d @test-cluster-config.jsonBefore requesting review, verify:
Service Health:
- Service pod is running:
kubectl get pods -n darwin - Health endpoint responds:
curl http://service/health - Logs show no errors:
kubectl logs -f deployment/service -n darwin
API Functionality:
- CRUD operations work as expected
- Authentication/authorization works
- Database connections successful
- External service integrations work (DCM, MLflow, Feature Store)
Performance:
- Response times are acceptable (<1s for most operations)
- No memory leaks (monitor pod memory usage)
- Database queries are optimized (check slow query logs)
Integration:
- Dependent services can communicate
- SDKs work with changes
- CLI commands function correctly
# 1. Create a cluster via REST API
curl --location 'http://localhost/compute/cluster' \
--header 'Content-Type: application/json' \
--data-raw '{
"cluster_name": "test-cluster",
"tags": ["test"],
"runtime": "Ray2.37.0-Py310-CPU",
"inactive_time": 30,
"head_node_config": {
"cores": 4,
"memory": 8
},
"worker_node_configs": [
{
"cores": 2,
"memory": 4,
"min_pods": 1,
"max_pods": 2
}
],
"user": "test@example.com"
}'
# 2. Verify cluster in Kubernetes
kubectl get rayclusters -n ray
# 3. Access Jupyter
# Get Cluster Dashboards link via below API using cluster_id returned in create_cluster response
curl --location 'http://localhost/compute/cluster/{cluster_id}/dashboards'
# Access Jupyter notebook at the returned jupyter_lab_url
# 4. Run a job
# Submit job via Ray dashboard or SDK
# 5. Stop cluster
curl --location --request POST 'http://localhost/compute/cluster/stop-cluster/{cluster_id}' \
--header 'msd-user: {"email": "test@example.com"}'
# 6. Verify cleanup
kubectl get rayclusters -n ray # Should be deletedFor complete Darwin CLI documentation, see darwin-cli/README.md#serve-commands
# 1. Setup Darwin CLI
source .venv/bin/activate
# 2. Configure environment and authentication
darwin config set --env darwin-local
darwin serve configure
# 3. Create environment (if not already created)
darwin serve environment create \
--name local \
--domain-suffix .local \
--cluster-name kind
# 4. Create serve
darwin serve create \
--name test-model \
--type api \
--space serve \
--description "Test model deployment"
# 5. Deploy model
darwin serve deploy-model \
--serve-name test-model \
--artifact-version v1 \
--model-uri mlflow-artifacts:/1/abc123/artifacts/model \
--cores 2 \
--memory 4 \
--node-capacity spot \
--min-replicas 1 \
--max-replicas 2
# 6. Verify deployment in Kubernetes
kubectl get deployments -n serve
kubectl get pods -n serve
kubectl get ingresses -n serve
# 7. Test inference endpoint
curl -X POST http://localhost/serve/test-model/predict \
-H "Content-Type: application/json" \
-d '{"features": [[1.0, 2.0, 3.0]]}'
# 8. Verify auto-scaling (optional)
kubectl get hpa -n serve
# 9. Cleanup (undeploy model)
darwin serve undeploy --name test-model --env darwin-local
# 10. Verify cleanup
kubectl get deployments -n serve # test-model should be goneFormat: <type>(<scope>): <subject>
Good Examples:
feat(compute): add GPU node support for Ray clusters
fix(feature-store): resolve timeout in Cassandra batch reads
docs(sdk): add examples for cluster auto-scaling
refactor(serve): simplify artifact deployment logic
test(chronos): add integration tests for event transformers
perf(feature-store): optimize feature retrieval query
Bad Examples:
β fixed bug
β updates
β WIP
β more changes
Rules:
- Subject line max 72 characters
- Use imperative mood ("add" not "added" or "adds")
- No period at the end of subject line
- Body wraps at 72 characters (if needed)
- Reference issues/PRs in body (
Closes #123,Relates to #456)
Title Format: Same as commit messages
feat(compute): add GPU node support for Ray clusters
Description Requirements:
- What: Describe the changes made
- Why: Explain the motivation and context
- How: Explain implementation approach (if non-obvious)
- Testing: Describe how you tested the changes
- Breaking Changes: Highlight any breaking changes
- Screenshots: Include for UI changes
Size Guidelines:
- Aim for small, focused PRs (<500 lines changed)
- Split large features into multiple PRs
- Keep related changes together (don't mix features with refactoring)
Draft PRs:
- Use draft PRs for work-in-progress
- Request review only when ready for review
- Convert to ready when tests pass and you've self-reviewed
As an Author:
- Self-review before requesting review
- Ensure all checks pass (tests, linters)
- Add reviewers (1-2 reviewers recommended)
- Respond to feedback promptly
- Mark conversations as resolved when addressed
- Squash commits before merging (if requested)
As a Reviewer:
- Review within 2 business days
- Be respectful and constructive
- Ask questions rather than making demands
- Approve when satisfied
- Request changes if necessary
- Block if critical issues found
Review Checklist:
- Code follows style guidelines
- Logic is sound and correct
- Tests are adequate and pass
- Documentation is updated
- No security vulnerabilities
- Performance implications considered
- Error handling is appropriate
- Breaking changes are justified and documented
Feedback Examples:
Good Feedback:
Consider using a connection pool here to improve performance.
What do you think about using the `mysql.connector.pooling` module?
Poor Feedback:
This is wrong. Use connection pooling.
Bug Reports should include:
**Describe the bug**
A clear description of what the bug is.
**To Reproduce**
Steps to reproduce the behavior:
1. Create cluster with config '...'
2. Start cluster
3. See error
**Expected behavior**
What you expected to happen.
**Actual behavior**
What actually happened.
**Environment**
- Darwin version:
- Service affected:
- Kubernetes version:
- Browser (if UI bug):
**Logs**Screenshots If applicable, add screenshots.
**Feature Requests** should include:
```markdown
**Problem Statement**
What problem does this solve?
**Proposed Solution**
How would you solve it?
**Alternatives Considered**
What other approaches did you consider?
**Additional Context**
Any other information.
- Documentation: Check service-specific READMEs in each submodule
- Existing Issues: Search GitHub issues for similar questions
- Code Examples: Check
examples/directory - Darwin CLI: See darwin-cli/README.md for complete CLI documentation
Good Questions:
- Include context (what you're trying to do)
- Show what you've tried
- Include error messages and logs
- Specify your environment (local/dev/prod)
Where to Ask:
- GitHub Issues: For bugs and feature requests
- GitHub Discussions: For questions and general discussion
- Pull Request Comments: For specific code questions
- Internal Channels: [Specify your internal communication channels]
- Start with
README.mdfor high-level overview - Read
.prompts/00-overview.mdfor architecture details - Explore individual service READMEs:
darwin-compute/README.mdfeature-store/README.mdmlflow/README.mdml-serve-app/README.mdchronos/README.md
- Review Darwin CLI documentation:
darwin-cli/README.md
Ray:
- Ray Documentation
- Ray version used: 2.37.0
MLflow:
- MLflow Documentation
- MLflow version used: 2.12.2
Kubernetes:
FastAPI:
When making changes to public APIs or SDKs:
- Maintain backward compatibility whenever possible
- Deprecate before removing (give users time to migrate)
- Version APIs if breaking changes are necessary
- Document migration path in release notes
- Profile code before and after changes
- Use connection pooling for databases
- Implement caching where appropriate
- Consider impact on high-traffic endpoints
- Load test significant changes
- Update README when adding features
- Add docstrings to all public functions/classes
- Include code examples in documentation
- Document configuration options
- Keep API documentation (Swagger) up to date
Before submitting your PR:
- Code follows style guidelines (linters pass)
- All tests pass (
pytest,mvn test,go test) - New tests added for new functionality
- Documentation updated
- Commit messages follow conventions
- PR description is complete
- No sensitive data committed
- Self-review completed
- Local deployment tested
- Integration with other services verified
Thank you for contributing to Darwin ML Platform! Your contributions help build a better ML infrastructure for everyone. π