From 88f638fcfbac94ca66c9547c5d075ea5864de50d Mon Sep 17 00:00:00 2001 From: Rob Holland Date: Wed, 23 Jul 2025 12:18:40 +0100 Subject: [PATCH 01/25] Add migration doc draft. --- docs/README.md | 3 + docs/migration-guide.md | 631 ++++++++++++++++++++++++++++++++++++++++ 2 files changed, 634 insertions(+) create mode 100644 docs/migration-guide.md diff --git a/docs/README.md b/docs/README.md index 373d661c..eb856d40 100644 --- a/docs/README.md +++ b/docs/README.md @@ -11,6 +11,9 @@ This documentation structure is designed to support various types of technical d ## Index +### [Migration Guide](migration-guide.md) +Comprehensive guide for migrating from existing versioned worker deployment systems to the Temporal Worker Controller. Includes step-by-step instructions, configuration mapping, and common patterns. + ### [Limits](limits.md) Technical constraints and limitations of the Temporal Worker Controller system, including maximum field lengths and other operational boundaries. diff --git a/docs/migration-guide.md b/docs/migration-guide.md new file mode 100644 index 00000000..7d96d5f3 --- /dev/null +++ b/docs/migration-guide.md @@ -0,0 +1,631 @@ +# Migrating to Temporal Worker Controller + +This guide helps teams migrate from their existing versioned worker deployment systems to the Temporal Worker Controller. It assumes you are already running versioned workers using Temporal's Worker Versioning feature and want to automate the management of these deployments. + +## Important Terminology + +To avoid confusion, this guide uses specific terminology: + +- **Temporal Worker Deployment**: A logical grouping in Temporal (e.g., "payment-processor", "notification-sender") +- **`TemporalWorkerDeployment` CRD**: The Kubernetes custom resource that manages one Temporal Worker Deployment +- **Kubernetes `Deployment`**: The actual k8s Deployment resources that run worker pods (multiple per Temporal Worker Deployment, one per version) + +**Key Relationship**: One `TemporalWorkerDeployment` CRD → Multiple Kubernetes `Deployment` resources (managed by controller) + +## Table of Contents + +1. [Prerequisites](#prerequisites) +2. [Understanding the Differences](#understanding-the-differences) +3. [Migration Strategy](#migration-strategy) +4. [Step-by-Step Migration](#step-by-step-migration) +5. [Configuration Mapping](#configuration-mapping) +6. [Testing and Validation](#testing-and-validation) +7. [Common Migration Patterns](#common-migration-patterns) +8. [Troubleshooting](#troubleshooting) + +## Prerequisites + +Before starting the migration, ensure you have: + +- ✅ **Existing versioned workers**: Your workers are already using Temporal's Worker Versioning feature +- ✅ **Kubernetes cluster**: Running Kubernetes 1.19+ with CustomResourceDefinition support +- ✅ **Worker configuration**: Workers are configured with deployment names and build IDs +- ✅ **Rainbow deployments**: Currently managing multiple concurrent versions manually +- ✅ **Administrative access**: Ability to install Custom Resource Definitions and controllers + +### Environment Variables Your Workers Should Already Use + +Your workers should already be configured with these standard environment variables: + +```bash +TEMPORAL_HOST_PORT=your-namespace.tmprl.cloud:7233 +TEMPORAL_NAMESPACE=your-temporal-namespace +TEMPORAL_DEPLOYMENT_NAME=your-deployment-name +WORKER_BUILD_ID=your-build-id +``` + +If you're using different variable names, you'll need to update your worker code during migration. + +## Understanding the Differences + +### Before: Manual Version Management + +```yaml +# You currently manage multiple deployments manually +apiVersion: apps/v1 +kind: Deployment +metadata: + name: my-worker-v1 +spec: + replicas: 3 + template: + spec: + containers: + - name: worker + image: my-worker:v1.2.3 + env: + - name: WORKER_BUILD_ID + value: "v1.2.3" +--- +apiVersion: apps/v1 +kind: Deployment +metadata: + name: my-worker-v2 +spec: + replicas: 3 + template: + spec: + containers: + - name: worker + image: my-worker:v1.2.4 + env: + - name: WORKER_BUILD_ID + value: "v1.2.4" +``` + +### After: Controller-Managed Versions + +```yaml +# Single resource manages all versions automatically +apiVersion: temporal.io/v1alpha1 +kind: TemporalWorkerDeployment +metadata: + name: my-worker +spec: + replicas: 3 + workerOptions: + connection: production-temporal + temporalNamespace: production + cutover: + strategy: Manual # Use Manual during migration, then switch to Progressive + sunset: + scaledownDelay: 1h + deleteDelay: 24h + template: + spec: + containers: + - name: worker + image: my-worker:v1.2.4 # Update this image to import/deploy versions +``` + +**Key Differences:** +- ✅ **Single CRD resource** manages multiple versions instead of multiple manual Kubernetes Deployments +- ✅ **Controller creates Kubernetes Deployments** automatically (one per version) +- ✅ **Automated lifecycle** handles registration, routing, and cleanup +- ✅ **Version history** tracked in the CRD resource status +- ✅ **Rollout strategies** automate traffic shifting between versions + +## Migration Strategy + +### The Single Resource Requirement + +⚠️ **CRITICAL**: For each **Temporal Worker Deployment**, you MUST create exactly **ONE** `TemporalWorkerDeployment` CRD resource and import all existing versions into it. This is not optional - it's how the system works. + +**Terminology Clarification:** +- **Temporal Worker Deployment**: A logical grouping in Temporal (e.g., "payment-processor") +- **`TemporalWorkerDeployment` CRD**: One Kubernetes custom resource per Temporal Worker Deployment +- **Kubernetes `Deployment`**: Multiple k8s Deployments (one per version) managed by the controller + +**Key Principles:** +- **One CRD resource per Temporal Worker Deployment** - Don't create multiple `TemporalWorkerDeployment` resources for the same logical worker +- **Use Manual strategy during import** - Prevents unwanted automatic promotions +- **Import versions sequentially** - Update the same CRD resource to import each existing version +- **Enable automation last** - Switch to Progressive strategy only after migration is complete + +### Recommended Migration Steps + +1. **Install the controller** in your cluster +2. **Choose a non-critical worker** to migrate first +3. **Create a single TemporalWorkerDeployment** with Manual strategy +4. **Import existing versions one by one** by updating the image spec +5. **Clean up manual deployments** after confirming controller management +6. **Enable automated rollouts** by switching to Progressive strategy +7. **Repeat for remaining workers** one deployment at a time + +## Step-by-Step Migration + +### Step 1: Install the Temporal Worker Controller + +```bash +# Install the controller using Helm +helm install --repo FIXME temporal-worker-controller temporal-worker-controller +``` + +### Step 2: Create TemporalConnection Resources + +Define connection parameters to your Temporal server(s): + +```yaml +apiVersion: temporal.io/v1alpha1 +kind: TemporalConnection +metadata: + name: production-temporal + namespace: default +spec: + hostPort: "production.abc123.tmprl.cloud:7233" + mutualTLSSecret: temporal-cloud-mtls # If using mTLS +--- +apiVersion: temporal.io/v1alpha1 +kind: TemporalConnection +metadata: + name: staging-temporal + namespace: default +spec: + hostPort: "staging.abc123.tmprl.cloud:7233" + mutualTLSSecret: temporal-cloud-mtls +``` + +### Step 3: Map Your Current Configuration + +For each existing **Temporal Worker Deployment** (logical grouping), identify all currently running versions. You'll create a single `TemporalWorkerDeployment` CRD resource and import each version one by one. + +**Current Manual Kubernetes Deployments:** +```yaml +# Version 1 (older, still has running workflows) +apiVersion: apps/v1 +kind: Deployment +metadata: + name: payment-processor-v1 +spec: + replicas: 2 + template: + spec: + containers: + - name: worker + image: payment-processor:v1.5.1 + env: + - name: WORKER_BUILD_ID + value: "v1.5.1" +--- +# Version 2 (current production version) +apiVersion: apps/v1 +kind: Deployment +metadata: + name: payment-processor-v2 +spec: + replicas: 5 + template: + spec: + containers: + - name: worker + image: payment-processor:v1.5.2 + env: + - name: WORKER_BUILD_ID + value: "v1.5.2" +``` + +**Single `TemporalWorkerDeployment` CRD Resource (CRITICAL: Use Manual strategy):** +```yaml +apiVersion: temporal.io/v1alpha1 +kind: TemporalWorkerDeployment +metadata: + name: payment-processor + labels: + app: payment-processor +spec: + replicas: 5 + workerOptions: + connection: production-temporal + temporalNamespace: production + # IMPORTANT: Use Manual during migration to prevent unwanted promotions + cutover: + strategy: Manual + sunset: + scaledownDelay: 30m + deleteDelay: 2h + template: + metadata: + annotations: + prometheus.io/scrape: "true" + spec: + containers: + - name: worker + image: payment-processor:v1.5.1 # Start with oldest version + resources: + requests: + memory: "512Mi" + cpu: "250m" + # Note: TEMPORAL_* env vars are set automatically by controller +``` + +### Step 4: Import Existing Versions One by One + +⚠️ **CRITICAL**: Use a single `TemporalWorkerDeployment` CRD resource with `strategy: Manual` to import all existing versions. + +1. **Create the `TemporalWorkerDeployment` CRD with the oldest version:** + ```bash + kubectl apply -f payment-processor-migration.yaml + ``` + +2. **Wait for the first version to be registered:** + ```bash + # Monitor until status shows the version is registered and current + kubectl get temporalworkerdeployment payment-processor -w + ``` + + The controller will create a Kubernetes `Deployment` resource (e.g., `payment-processor-v1.5.1`) for this version. + +3. **Import the next version by updating the CRD resource:** + ```bash + # Edit the TemporalWorkerDeployment CRD to use the next image version + kubectl patch temporalworkerdeployment payment-processor --type='merge' -p='{"spec":{"template":{"spec":{"containers":[{"name":"worker","image":"payment-processor:v1.5.2"}]}}}}' + ``` + +4. **Wait for the new version to be registered:** + ```bash + # Monitor until the new version appears in status.targetVersion + kubectl get temporalworkerdeployment payment-processor -o yaml + ``` + + You should see: + ```yaml + status: + currentVersion: + versionID: "payment-processor.v1.5.1" + status: Current + targetVersion: + versionID: "payment-processor.v1.5.2" + status: Inactive # Will be Inactive because strategy is Manual + ``` + + The controller will create another Kubernetes `Deployment` resource (e.g., `payment-processor-v1.5.2`) for this new version. + +5. **Repeat for all existing versions** until all are imported under controller management. + + Each version update will result in a new Kubernetes `Deployment` being created by the controller. + +### Step 5: Validate All Versions Are Imported + +Confirm all your existing versions are now managed by the controller: + +```bash +# Check all versions are registered in the CRD status +kubectl get temporalworkerdeployment payment-processor -o yaml + +# Verify controller-managed Kubernetes Deployments exist for all versions +kubectl get deployments -l temporal.io/managed-by=temporal-worker-controller + +# Check pods are running for all versions +kubectl get pods -l temporal.io/worker-deployment=payment-processor +``` + +You should see multiple Kubernetes `Deployment` resources created by the controller, one for each version you imported. + +### Step 6: Clean Up Manual Kubernetes Deployments + +⚠️ **Only after confirming all versions are imported and working:** + +```bash +# Scale down your original manual Kubernetes Deployments +kubectl scale deployment payment-processor-v1 --replicas=0 +kubectl scale deployment payment-processor-v2 --replicas=0 + +# Monitor that the controller-managed Kubernetes Deployments are handling all traffic +# Check Temporal UI to verify workflows are running on controller-managed versions + +# Delete the original manual Kubernetes Deployments +kubectl delete deployment payment-processor-v1 payment-processor-v2 +``` + +At this point, you should only have controller-managed Kubernetes `Deployment` resources running your workers. + +### Step 7: Enable Automated Rollouts + +Once migration is complete and all legacy versions are imported, update your deployment system to use automated rollouts: + +```bash +# Update the TemporalWorkerDeployment CRD to use Progressive strategy for new deployments +kubectl patch temporalworkerdeployment payment-processor --type='merge' -p='{ + "spec": { + "cutover": { + "strategy": "Progressive", + "steps": [ + {"rampPercentage": 5, "pauseDuration": "5m"}, + {"rampPercentage": 25, "pauseDuration": "10m"}, + {"rampPercentage": 50, "pauseDuration": "15m"} + ] + } + } +}' +``` + +From this point forward: +- **New worker versions** will automatically follow the Progressive rollout strategy +- **Your CI/CD system** should update the `spec.template.spec.containers[0].image` field in the `TemporalWorkerDeployment` CRD to deploy new versions +- **The controller** will automatically create new Kubernetes `Deployment` resources, handle version registration, traffic routing, and cleanup + +Example CI/CD update command: +```bash +# Your deployment pipeline should update the CRD image field like this: +kubectl patch temporalworkerdeployment payment-processor --type='merge' -p='{"spec":{"template":{"spec":{"containers":[{"name":"worker","image":"payment-processor:v1.6.0"}]}}}}' +``` + +The controller will automatically create a new Kubernetes `Deployment` (e.g., `payment-processor-v1.6.0`) and manage the rollout. + +## Configuration Mapping + +### Rollout Strategies + +**Manual Strategy (Default Behavior):** +```yaml +cutover: + strategy: Manual +# Requires manual intervention to promote versions +``` + +**Immediate Cutover:** +```yaml +cutover: + strategy: AllAtOnce +# Immediately routes 100% traffic to new version when healthy +``` + +**Progressive Rollout:** +```yaml +cutover: + strategy: Progressive + steps: + - rampPercentage: 1 + pauseDuration: 5m + - rampPercentage: 10 + pauseDuration: 10m + - rampPercentage: 50 + pauseDuration: 15m + gate: + workflowType: "HealthCheck" # Optional validation workflow +``` + +### Sunset Configuration + +```yaml +sunset: + scaledownDelay: 1h # Wait 1 hour after draining before scaling to 0 + deleteDelay: 24h # Wait 24 hours after draining before deleting +``` + +### Resource Management + +**CPU and Memory:** +```yaml +template: + spec: + containers: + - name: worker + resources: + requests: + memory: "1Gi" + cpu: "500m" + limits: + memory: "2Gi" + cpu: "1" +``` + +**Pod Annotations and Labels:** +```yaml +template: + metadata: + annotations: + prometheus.io/scrape: "true" + prometheus.io/port: "9090" + labels: + team: payments + environment: production +``` + +## Testing and Validation + +### Pre-Migration Testing + +1. **Test in non-production environment:** + ```bash + # Create test TemporalWorkerDeployment + kubectl apply -f test-worker.yaml + + # Verify worker registration + kubectl logs -l temporal.io/worker-deployment=test-worker + ``` + +2. **Validate environment variables:** + ```bash + kubectl exec -it deployment/test-worker-v1 -- env | grep TEMPORAL + ``` + +### Post-Migration Validation + +1. **Check version status:** + ```bash + kubectl get temporalworkerdeployment -o wide + ``` + +2. **Monitor version transitions:** + ```bash + kubectl get events --field-selector involvedObject.kind=TemporalWorkerDeployment + ``` + +3. **Validate workflow routing:** + - Start test workflows + - Verify they're routed to correct versions + - Check Temporal UI for version distribution + +## Common Migration Patterns + +⚠️ **Remember**: Each pattern below represents **separate Temporal Worker Deployments**. Each gets exactly **one** `TemporalWorkerDeployment` CRD resource. + +### Pattern 1: Microservices with Multiple Workers + +```yaml +# payment-service workers (ONE CRD for this Temporal Worker Deployment) +apiVersion: temporal.io/v1alpha1 +kind: TemporalWorkerDeployment +metadata: + name: payment-processor +spec: + workerOptions: + connection: production-temporal + temporalNamespace: payments + cutover: + strategy: Manual # Use Manual during migration + # ... rest of config +--- +# notification-service workers (separate Temporal Worker Deployment = separate CRD) +apiVersion: temporal.io/v1alpha1 +kind: TemporalWorkerDeployment +metadata: + name: notification-sender +spec: + workerOptions: + connection: production-temporal + temporalNamespace: notifications + cutover: + strategy: Manual # Use Manual during migration + # ... rest of config +``` + +### Pattern 2: Environment-Specific Deployments + +```yaml +# Production (use Manual during migration, then switch to Progressive) +apiVersion: temporal.io/v1alpha1 +kind: TemporalWorkerDeployment +metadata: + name: my-worker + namespace: production +spec: + workerOptions: + connection: production-temporal + temporalNamespace: production + cutover: + strategy: Manual # Use Manual during migration, then switch to Progressive +--- +# Staging (use Manual during migration, then switch to AllAtOnce) +apiVersion: temporal.io/v1alpha1 +kind: TemporalWorkerDeployment +metadata: + name: my-worker + namespace: staging +spec: + workerOptions: + connection: staging-temporal + temporalNamespace: staging + cutover: + strategy: Manual # Use Manual during migration, then switch to AllAtOnce +``` + + + +## Troubleshooting + +### Common Issues + +**1. Workers Not Registering with Temporal** + +*Symptoms:* +``` +status: + targetVersion: + status: NotRegistered +``` + +*Solutions:* +- Check worker logs for connection errors +- Verify TemporalConnection configuration +- Ensure TLS secrets are properly configured +- Verify network connectivity to Temporal server + +**2. Version Stuck in Ramping State** + +*Symptoms:* +``` +status: + targetVersion: + status: Ramping + rampPercentage: 5 +``` + +*Solutions:* +- Check if gate workflow is configured and completing successfully +- Verify progressive rollout steps are reasonable +- Check controller logs for errors + +**3. Old Versions Not Being Cleaned Up** + +*Symptoms:* +- Multiple old Deployments still exist +- Deprecated versions not transitioning to Drained + +*Solutions:* +- Check if workflows are still running on old versions +- Verify sunset configuration is reasonable +- Check Temporal UI for workflow status + +### Debugging Commands + +```bash +# Check controller logs +kubectl logs -n temporal-worker-controller-system deployment/controller-manager + +# Check worker status +kubectl describe temporalworkerdeployment my-worker + +# Check managed deployments +kubectl get deployments -l temporal.io/managed-by=temporal-worker-controller + +# Check worker logs +kubectl logs -l temporal.io/worker-deployment=my-worker + +# Check controller events +kubectl get events --field-selector involvedObject.kind=TemporalWorkerDeployment +``` + +### Getting Help + +1. **Check controller logs** for error messages +2. **Review TemporalWorkerDeployment status** for detailed state information +3. **Verify Temporal server connectivity** from worker pods +4. **File issues** at the project repository with logs and configuration + +## Migration Summary + +🎯 **Remember the core principle**: +- **One `TemporalWorkerDeployment` CRD per Temporal Worker Deployment** +- **Manual strategy during migration** to prevent unwanted promotions +- **Import existing versions sequentially** by updating the same CRD resource +- **Enable automation only after migration is complete** + +**Resource Relationship:** +- **Before**: Multiple manual Kubernetes `Deployment` resources per worker +- **After**: One `TemporalWorkerDeployment` CRD → Controller creates multiple Kubernetes `Deployment` resources (one per version) + +This approach ensures a smooth transition from manual version management to controller automation without disrupting running workflows. + +## Next Steps + +After successful migration: + +1. **Set up monitoring** for your TemporalWorkerDeployment resources +2. **Update CI/CD pipelines** to patch TemporalWorkerDeployment image specs instead of managing Deployments directly +3. **Configure alerting** on version transition failures +4. **Train your team** on the new deployment process (single resource updates vs multiple Deployment management) +5. **Document your specific configuration** patterns for future reference + +The Temporal Worker Controller should significantly reduce the operational overhead of managing versioned worker deployments while providing better automation and safety for your workflow deployments. \ No newline at end of file From 9bf4fd75262ec259dc747d318651c4b0e4a7b3cc Mon Sep 17 00:00:00 2001 From: Rob Holland Date: Fri, 22 Aug 2025 16:19:58 +0100 Subject: [PATCH 02/25] Split concepts into separate file. Refocus migration guide on unversioned -> versioned. --- docs/concepts.md | 160 ++++++++++++ docs/migration-guide.md | 559 ++++++++++++++++++++++++---------------- 2 files changed, 499 insertions(+), 220 deletions(-) create mode 100644 docs/concepts.md diff --git a/docs/concepts.md b/docs/concepts.md new file mode 100644 index 00000000..b19d6e04 --- /dev/null +++ b/docs/concepts.md @@ -0,0 +1,160 @@ +# Temporal Worker Controller Concepts + +This document defines key concepts and terminology used throughout the Temporal Worker Controller documentation. + +## Core Terminology + +### Temporal Worker Deployment +A logical grouping in Temporal that represents a collection of workers that handle the same set of workflows and activities. Examples include "payment-processor", "notification-sender", or "data-pipeline-worker". This is a concept within Temporal itself, not specific to Kubernetes. + +**Key characteristics:** +- Identified by a unique deployment name (e.g., "payment-processor") +- Can have multiple concurrent versions running simultaneously +- Versions are identified by Build IDs (e.g., "v1.5.1", "v1.5.2") +- Temporal routes workflow executions to appropriate versions based on compatibility rules + +### `TemporalWorkerDeployment` CRD +The Kubernetes Custom Resource Definition that manages one Temporal Worker Deployment. This is the primary resource you interact with when using the Temporal Worker Controller. + +**Key characteristics:** +- One CRD resource per Temporal Worker Deployment +- Manages the lifecycle of all versions for that deployment +- Defines rollout strategies, resource requirements, and connection details +- Controller creates and manages multiple Kubernetes `Deployment` resources based on this spec + +### Kubernetes `Deployment` +The actual Kubernetes Deployment resources that run worker pods. The controller automatically creates these - you don't manage them directly. + +**Key characteristics:** +- Multiple Kubernetes `Deployment` resources per `TemporalWorkerDeployment` CRD (one per version) +- Named with the pattern: `{deployment-name}-{build-id}` (e.g., `payment-processor-v1.5.1`) +- Managed entirely by the controller - created, updated, and deleted automatically +- Each runs a specific version of your worker code + +### Key Relationship +**One `TemporalWorkerDeployment` CRD → Multiple Kubernetes `Deployment` resources (managed by controller)** + +This is the fundamental architecture: you manage a single CRD resource, and the controller handles all the underlying Kubernetes `Deployment` resources for different versions. + +## Version States + +Worker deployment versions progress through various states during their lifecycle: + +### NotRegistered +The version has been specified in the CRD but hasn't been registered with Temporal yet. This typically happens when: +- The worker pods are still starting up +- There are connectivity issues to Temporal +- The worker code has errors preventing registration + +### Inactive +The version is registered with Temporal but isn't receiving any new workflow executions. This is the initial state for new versions when using Manual rollout strategy. + +### Ramping +The version is receiving a percentage of new workflow executions as part of a Progressive rollout. The percentage gradually increases according to the configured rollout steps. + +### Current +The version is receiving 100% of new workflow executions. This is the "production" version that handles all new work. + +### Draining +The version is no longer receiving new workflow executions but may still be processing existing workflows. The controller waits for all workflows on this version to complete. + +### Drained +All workflows on this version have completed. The version is ready for cleanup according to the sunset configuration. + +## Rollout Strategies + +### Manual Strategy +Requires explicit human intervention to promote versions. New versions remain in the `Inactive` state until manually promoted. + +**Use cases:** +- During migration from manual deployment systems +- High-risk production environments requiring human approval +- Testing and validation scenarios + +### AllAtOnce Strategy +Immediately routes 100% of new workflow executions to the new version once it's healthy and registered. + +**Use cases:** +- Non-production environments +- Low-risk deployments +- When you want immediate cutover without gradual rollout + +### Progressive Strategy +Gradually increases the percentage of new workflow executions routed to the new version according to configured steps. + +**Use cases:** +- Production deployments where you want to validate new versions gradually +- When you want automated rollouts with built-in safety checks +- Deployments that benefit from canary analysis + +## Configuration Concepts + +### Worker Options +Configuration that defines how workers connect to Temporal: +- **connection**: Reference to a `TemporalConnection` resource +- **temporalNamespace**: The Temporal namespace to connect to +- **deploymentName**: The logical deployment name in Temporal (auto-generated if not specified) + +### Cutover Configuration +Defines how new versions are promoted: +- **strategy**: Manual, AllAtOnce, or Progressive +- **steps**: For Progressive strategy, defines ramp percentages and pause durations +- **gate**: Optional workflow that must succeed before promotion continues + +### Sunset Configuration +Defines how old versions are cleaned up: +- **scaledownDelay**: How long to wait after draining before scaling pods to zero +- **deleteDelay**: How long to wait after draining before deleting the Kubernetes `Deployment` + +### Template +The pod template used for all versions of this deployment. Similar to a standard Kubernetes Deployment template but managed by the controller. + +## Environment Variables + +The controller automatically sets these environment variables for all worker pods: + +### TEMPORAL_HOST_PORT +The host and port of the Temporal server, derived from the `TemporalConnection` resource. + +### TEMPORAL_NAMESPACE +The Temporal namespace the worker should connect to, from `spec.workerOptions.temporalNamespace`. + +### TEMPORAL_DEPLOYMENT_NAME +The deployment name in Temporal, either from `spec.workerOptions.deploymentName` or auto-generated from the CRD name. + +### WORKER_BUILD_ID +The build ID for this specific version, derived from the container image tag or explicitly set. + +## Resource Management Concepts + +### Rainbow Deployments +The pattern of running multiple versions of the same worker simultaneously. This is essential for maintaining workflow determinism in Temporal, as running workflows must continue executing on the version they started with. + +### Version Lifecycle Management +The automated process of: +1. Registering new versions with Temporal +2. Gradually routing traffic to new versions +3. Draining old versions once they're no longer needed +4. Cleaning up resources for drained versions + +### Controller-Managed Resources +Resources that are created, updated, and deleted automatically by the controller: +- Kubernetes `Deployment` resources for each version +- ConfigMaps and Secrets as needed +- Service accounts and RBAC resources +- Labels and annotations for tracking and management + +## Migration Concepts + +### Import Process +The process of bringing existing manually-managed worker deployments under controller management. This involves: +1. Creating a `TemporalWorkerDeployment` CRD with Manual strategy +2. Sequentially updating the image spec to register each existing version +3. Cleaning up original manual Kubernetes `Deployment` resources +4. Enabling automated rollouts + +### Single Resource Requirement +The critical principle that each Temporal Worker Deployment must be managed by exactly one `TemporalWorkerDeployment` CRD resource. You cannot split a single logical deployment across multiple CRD resources. + +### Legacy Version Handling +The process of ensuring that existing worker versions continue running during migration, maintaining workflow determinism while transitioning to controller management. diff --git a/docs/migration-guide.md b/docs/migration-guide.md index 7d96d5f3..0918df23 100644 --- a/docs/migration-guide.md +++ b/docs/migration-guide.md @@ -1,61 +1,70 @@ -# Migrating to Temporal Worker Controller +# Migrating from Unversioned to Versioned Workflows with Temporal Worker Controller -This guide helps teams migrate from their existing versioned worker deployment systems to the Temporal Worker Controller. It assumes you are already running versioned workers using Temporal's Worker Versioning feature and want to automate the management of these deployments. +This guide helps teams migrate from unversioned Temporal workflows to versioned workflows using the Temporal Worker Controller. It assumes you are currently running workers without Temporal's Worker Versioning feature and want to adopt versioned deployments for safer, more controlled rollouts. -## Important Terminology +## Important Note -To avoid confusion, this guide uses specific terminology: +This guide uses specific terminology that is defined in the [Concepts](concepts.md) document. Please review the concepts document first to understand key terms like **Temporal Worker Deployment**, **`TemporalWorkerDeployment` CRD**, and **Kubernetes `Deployment`**, as well as the relationship between them. -- **Temporal Worker Deployment**: A logical grouping in Temporal (e.g., "payment-processor", "notification-sender") -- **`TemporalWorkerDeployment` CRD**: The Kubernetes custom resource that manages one Temporal Worker Deployment -- **Kubernetes `Deployment`**: The actual k8s Deployment resources that run worker pods (multiple per Temporal Worker Deployment, one per version) +## Table of Contents -**Key Relationship**: One `TemporalWorkerDeployment` CRD → Multiple Kubernetes `Deployment` resources (managed by controller) +1. [Why Migrate to Versioned Workflows](#why-migrate-to-versioned-workflows) +2. [Prerequisites](#prerequisites) +3. [Understanding the Differences](#understanding-the-differences) +4. [Migration Strategy](#migration-strategy) +5. [Step-by-Step Migration](#step-by-step-migration) +6. [Configuration Reference](#configuration-reference) +7. [Testing and Validation](#testing-and-validation) +8. [Common Migration Patterns](#common-migration-patterns) +9. [Troubleshooting](#troubleshooting) -## Table of Contents +## Why Migrate to Versioned Workflows + +If you're currently running unversioned Temporal workflows, you may be experiencing challenges with deployments: + +- **Deployment Risk**: Code changes can break running workflows if they're not backward compatible +- **Rollback Complexity**: Rolling back deployments can disrupt in-flight workflows +- **Workflow Determinism Issues**: Changes to workflow logic can cause non-deterministic errors -1. [Prerequisites](#prerequisites) -2. [Understanding the Differences](#understanding-the-differences) -3. [Migration Strategy](#migration-strategy) -4. [Step-by-Step Migration](#step-by-step-migration) -5. [Configuration Mapping](#configuration-mapping) -6. [Testing and Validation](#testing-and-validation) -7. [Common Migration Patterns](#common-migration-patterns) -8. [Troubleshooting](#troubleshooting) +Versioned workflows with the Temporal Worker Controller solve these problems by: + +- ✅ **Safe Deployments**: New versions run alongside old ones, ensuring running workflows complete successfully +- ✅ **Automated Rollouts**: Progressive rollout strategies reduce risk of new deployments +- ✅ **Easy Rollbacks**: Can instantly route new workflows back to previous versions +- ✅ **Workflow Continuity**: Running workflows can continue on their original version until completion ## Prerequisites Before starting the migration, ensure you have: -- ✅ **Existing versioned workers**: Your workers are already using Temporal's Worker Versioning feature +- ✅ **Unversioned Temporal workers**: Currently running workers without Worker Versioning - ✅ **Kubernetes cluster**: Running Kubernetes 1.19+ with CustomResourceDefinition support -- ✅ **Worker configuration**: Workers are configured with deployment names and build IDs -- ✅ **Rainbow deployments**: Currently managing multiple concurrent versions manually +- ✅ **Basic worker configuration**: Workers connect to Temporal with namespace and task queue configuration - ✅ **Administrative access**: Ability to install Custom Resource Definitions and controllers +- ✅ **Deployment pipeline**: Existing CI/CD system that can be updated to use new deployment method -### Environment Variables Your Workers Should Already Use +### Current Environment Variables -Your workers should already be configured with these standard environment variables: +Your workers are likely configured with basic environment variables like: ```bash TEMPORAL_HOST_PORT=your-namespace.tmprl.cloud:7233 TEMPORAL_NAMESPACE=your-temporal-namespace -TEMPORAL_DEPLOYMENT_NAME=your-deployment-name -WORKER_BUILD_ID=your-build-id +# No TEMPORAL_DEPLOYMENT_NAME or WORKER_BUILD_ID yet ``` -If you're using different variable names, you'll need to update your worker code during migration. +The controller will automatically add the versioning-related environment variables during migration. ## Understanding the Differences -### Before: Manual Version Management +### Before: Unversioned Workers ```yaml -# You currently manage multiple deployments manually +# Single deployment, all workers run the same code apiVersion: apps/v1 kind: Deployment metadata: - name: my-worker-v1 + name: my-worker spec: replicas: 3 template: @@ -64,29 +73,23 @@ spec: - name: worker image: my-worker:v1.2.3 env: - - name: WORKER_BUILD_ID - value: "v1.2.3" ---- -apiVersion: apps/v1 -kind: Deployment -metadata: - name: my-worker-v2 -spec: - replicas: 3 - template: - spec: - containers: - - name: worker - image: my-worker:v1.2.4 - env: - - name: WORKER_BUILD_ID - value: "v1.2.4" + - name: TEMPORAL_HOST_PORT + value: "production.tmprl.cloud:7233" + - name: TEMPORAL_NAMESPACE + value: "production" + # No versioning environment variables ``` -### After: Controller-Managed Versions +**Current Deployment Process:** +1. Build new worker image +2. Update Deployment with new image +3. Kubernetes rolls out new pods, terminating old ones +4. **Risk**: Running workflows may fail if code changes break compatibility + +### After: Versioned Workers with Controller ```yaml -# Single resource manages all versions automatically +# Single CRD manages multiple versions automatically apiVersion: temporal.io/v1alpha1 kind: TemporalWorkerDeployment metadata: @@ -97,7 +100,12 @@ spec: connection: production-temporal temporalNamespace: production cutover: - strategy: Manual # Use Manual during migration, then switch to Progressive + strategy: Progressive # Gradual rollout of new versions + steps: + - rampPercentage: 10 + pauseDuration: 5m + - rampPercentage: 50 + pauseDuration: 10m sunset: scaledownDelay: 1h deleteDelay: 24h @@ -105,42 +113,64 @@ spec: spec: containers: - name: worker - image: my-worker:v1.2.4 # Update this image to import/deploy versions + image: my-worker:v1.2.4 # Update this to deploy new versions + # Note: Controller automatically adds versioning environment variables: + # TEMPORAL_HOST_PORT, TEMPORAL_NAMESPACE, TEMPORAL_DEPLOYMENT_NAME, WORKER_BUILD_ID ``` -**Key Differences:** -- ✅ **Single CRD resource** manages multiple versions instead of multiple manual Kubernetes Deployments -- ✅ **Controller creates Kubernetes Deployments** automatically (one per version) -- ✅ **Automated lifecycle** handles registration, routing, and cleanup -- ✅ **Version history** tracked in the CRD resource status -- ✅ **Rollout strategies** automate traffic shifting between versions +**New Deployment Process:** +1. Build new worker image +2. Update `TemporalWorkerDeployment` CRD with new image +3. Controller creates new Kubernetes `Deployment` for the new version +4. Controller gradually routes new workflows to new version +5. Old version continues handling existing workflows until they complete +6. **Safety**: No disruption to running workflows, automated rollout control + +**Key Benefits:** +- ✅ **Zero-disruption deployments** - Running workflows continue on original version +- ✅ **Automated version management** - Controller handles registration and routing +- ✅ **Progressive rollouts** - Gradual traffic shifting with automatic pause points +- ✅ **Easy rollbacks** - Instantly route new workflows back to previous version +- ✅ **Workflow continuity** - Deterministic execution preserved across deployments ## Migration Strategy -### The Single Resource Requirement +### Safe Migration Approach -⚠️ **CRITICAL**: For each **Temporal Worker Deployment**, you MUST create exactly **ONE** `TemporalWorkerDeployment` CRD resource and import all existing versions into it. This is not optional - it's how the system works. - -**Terminology Clarification:** -- **Temporal Worker Deployment**: A logical grouping in Temporal (e.g., "payment-processor") -- **`TemporalWorkerDeployment` CRD**: One Kubernetes custom resource per Temporal Worker Deployment -- **Kubernetes `Deployment`**: Multiple k8s Deployments (one per version) managed by the controller +The migration from unversioned to versioned workflows requires careful planning to avoid disrupting running workflows. The key is to transition gradually while maintaining workflow continuity. **Key Principles:** -- **One CRD resource per Temporal Worker Deployment** - Don't create multiple `TemporalWorkerDeployment` resources for the same logical worker -- **Use Manual strategy during import** - Prevents unwanted automatic promotions -- **Import versions sequentially** - Update the same CRD resource to import each existing version -- **Enable automation last** - Switch to Progressive strategy only after migration is complete - -### Recommended Migration Steps - -1. **Install the controller** in your cluster -2. **Choose a non-critical worker** to migrate first -3. **Create a single TemporalWorkerDeployment** with Manual strategy -4. **Import existing versions one by one** by updating the image spec -5. **Clean up manual deployments** after confirming controller management -6. **Enable automated rollouts** by switching to Progressive strategy -7. **Repeat for remaining workers** one deployment at a time +- **Start with Manual strategy** - Prevents automatic promotions during initial setup +- **Migrate one worker deployment at a time** - Reduces risk and allows learning +- **Test thoroughly in non-production** - Validate the approach before production migration +- **Preserve running workflows** - Ensure in-flight workflows complete successfully +- **Enable automation gradually** - Move to Progressive rollouts only after validation + +### Migration Phases + +#### Phase 1: Preparation +1. **Install the controller** in non-production environments +2. **Update worker code** to support versioning (if needed) +3. **Test migration process** with non-critical workers +4. **Prepare CI/CD pipeline changes** for new deployment method + +#### Phase 2: Initial Migration +1. **Choose lowest-risk worker** to migrate first +2. **Create `TemporalWorkerDeployment` CRD** with Manual strategy +3. **Validate controller management** works correctly +4. **Update deployment pipeline** for this worker + +#### Phase 3: Gradual Rollout +1. **Migrate remaining workers** one at a time +2. **Enable Progressive rollouts** for validated workers +3. **Monitor and tune** rollout configurations +4. **Train team** on new deployment process + +### Recommended Migration Order + +1. **Background/batch processing workers** - Lower risk, easier to validate +2. **Internal service workers** - Limited external impact +3. **Customer-facing workers** - Highest risk, migrate last with most care ## Step-by-Step Migration @@ -148,7 +178,10 @@ spec: ```bash # Install the controller using Helm -helm install --repo FIXME temporal-worker-controller temporal-worker-controller +helm install -n temporal-system --create-namespace \ + temporal-worker-controller \ + oci://docker.io/temporalio/temporal-worker-controller + ``` ### Step 2: Create TemporalConnection Resources @@ -175,46 +208,55 @@ spec: mutualTLSSecret: temporal-cloud-mtls ``` -### Step 3: Map Your Current Configuration +### Step 3: Prepare Your Worker Code (If Needed) + +Most workers will work without changes, but you may need to update your worker initialization code to properly handle versioning: + +**Before (Unversioned):** +```go +// Worker connects without versioning +worker := worker.New(client, "my-task-queue", worker.Options{}) +``` + +**After (Versioned - Optional Enhancement):** +```go +// Worker can optionally use build ID from environment +buildID := os.Getenv("WORKER_BUILD_ID") +workerOptions := worker.Options{} +if buildID != "" { + workerOptions.BuildID = buildID + workerOptions.UseBuildIDForVersioning = true +} +worker := worker.New(client, "my-task-queue", workerOptions) +``` + +> **Note**: The controller automatically sets `WORKER_BUILD_ID` environment variable, so most workers will work without code changes. -For each existing **Temporal Worker Deployment** (logical grouping), identify all currently running versions. You'll create a single `TemporalWorkerDeployment` CRD resource and import each version one by one. +### Step 4: Create Your First TemporalWorkerDeployment -**Current Manual Kubernetes Deployments:** +Start with your lowest-risk worker. Convert your existing unversioned Deployment to a `TemporalWorkerDeployment` CRD: + +**Current Unversioned Deployment:** ```yaml -# Version 1 (older, still has running workflows) apiVersion: apps/v1 kind: Deployment metadata: - name: payment-processor-v1 -spec: - replicas: 2 - template: - spec: - containers: - - name: worker - image: payment-processor:v1.5.1 - env: - - name: WORKER_BUILD_ID - value: "v1.5.1" ---- -# Version 2 (current production version) -apiVersion: apps/v1 -kind: Deployment -metadata: - name: payment-processor-v2 + name: payment-processor spec: - replicas: 5 + replicas: 3 template: spec: containers: - name: worker image: payment-processor:v1.5.2 env: - - name: WORKER_BUILD_ID - value: "v1.5.2" + - name: TEMPORAL_HOST_PORT + value: "production.tmprl.cloud:7233" + - name: TEMPORAL_NAMESPACE + value: "production" ``` -**Single `TemporalWorkerDeployment` CRD Resource (CRITICAL: Use Manual strategy):** +**New TemporalWorkerDeployment CRD (IMPORTANT: Use Manual strategy initially):** ```yaml apiVersion: temporal.io/v1alpha1 kind: TemporalWorkerDeployment @@ -223,11 +265,11 @@ metadata: labels: app: payment-processor spec: - replicas: 5 + replicas: 3 workerOptions: connection: production-temporal temporalNamespace: production - # IMPORTANT: Use Manual during migration to prevent unwanted promotions + # CRITICAL: Use Manual during initial migration cutover: strategy: Manual sunset: @@ -240,132 +282,156 @@ spec: spec: containers: - name: worker - image: payment-processor:v1.5.1 # Start with oldest version + image: payment-processor:v1.5.2 # Same image as current deployment resources: requests: memory: "512Mi" cpu: "250m" - # Note: TEMPORAL_* env vars are set automatically by controller + # Note: Controller automatically adds TEMPORAL_* env vars ``` -### Step 4: Import Existing Versions One by One - -⚠️ **CRITICAL**: Use a single `TemporalWorkerDeployment` CRD resource with `strategy: Manual` to import all existing versions. +### Step 5: Deploy the TemporalWorkerDeployment -1. **Create the `TemporalWorkerDeployment` CRD with the oldest version:** +1. **Create the `TemporalWorkerDeployment` CRD:** ```bash - kubectl apply -f payment-processor-migration.yaml + kubectl apply -f payment-processor-versioned.yaml ``` -2. **Wait for the first version to be registered:** +2. **Wait for the version to be registered:** ```bash # Monitor until status shows the version is registered and current kubectl get temporalworkerdeployment payment-processor -w ``` - The controller will create a Kubernetes `Deployment` resource (e.g., `payment-processor-v1.5.1`) for this version. + You should see the controller create a new Kubernetes `Deployment` resource (e.g., `payment-processor-v1.5.2`) for this version. -3. **Import the next version by updating the CRD resource:** +3. **Verify the versioned deployment is working:** ```bash - # Edit the TemporalWorkerDeployment CRD to use the next image version - kubectl patch temporalworkerdeployment payment-processor --type='merge' -p='{"spec":{"template":{"spec":{"containers":[{"name":"worker","image":"payment-processor:v1.5.2"}]}}}}' - ``` - -4. **Wait for the new version to be registered:** - ```bash - # Monitor until the new version appears in status.targetVersion - kubectl get temporalworkerdeployment payment-processor -o yaml - ``` - - You should see: - ```yaml - status: - currentVersion: - versionID: "payment-processor.v1.5.1" - status: Current - targetVersion: - versionID: "payment-processor.v1.5.2" - status: Inactive # Will be Inactive because strategy is Manual - ``` + # Check that controller-managed deployment exists + kubectl get deployments -l temporal.io/managed-by=temporal-worker-controller - The controller will create another Kubernetes `Deployment` resource (e.g., `payment-processor-v1.5.2`) for this new version. - -5. **Repeat for all existing versions** until all are imported under controller management. + # Check pods are running + kubectl get pods -l temporal.io/worker-deployment=payment-processor - Each version update will result in a new Kubernetes `Deployment` being created by the controller. - -### Step 5: Validate All Versions Are Imported - -Confirm all your existing versions are now managed by the controller: - -```bash -# Check all versions are registered in the CRD status -kubectl get temporalworkerdeployment payment-processor -o yaml + # Check worker logs to verify versioning is active + kubectl logs -l temporal.io/worker-deployment=payment-processor + ``` -# Verify controller-managed Kubernetes Deployments exist for all versions -kubectl get deployments -l temporal.io/managed-by=temporal-worker-controller + You should see logs indicating the worker has registered with a Build ID. -# Check pods are running for all versions -kubectl get pods -l temporal.io/worker-deployment=payment-processor -``` +4. **Verify in Temporal UI:** + - Check the Workers page in Temporal UI + - You should see your worker deployment with version information + - New workflows should be routed to the versioned worker -You should see multiple Kubernetes `Deployment` resources created by the controller, one for each version you imported. +### Step 6: Transition from Unversioned to Versioned -### Step 6: Clean Up Manual Kubernetes Deployments +Now you need to carefully transition from your old unversioned deployment to the new versioned one: -⚠️ **Only after confirming all versions are imported and working:** +1. **Ensure both deployments are running:** + ```bash + # Check your original unversioned deployment + kubectl get deployment payment-processor + + # Check the new controller-managed versioned deployment + kubectl get deployments -l temporal.io/managed-by=temporal-worker-controller + ``` -```bash -# Scale down your original manual Kubernetes Deployments -kubectl scale deployment payment-processor-v1 --replicas=0 -kubectl scale deployment payment-processor-v2 --replicas=0 +2. **Monitor workflow routing:** + - In Temporal UI, check that new workflows are being routed to the versioned worker + - Existing workflows should continue on the unversioned worker until they complete -# Monitor that the controller-managed Kubernetes Deployments are handling all traffic -# Check Temporal UI to verify workflows are running on controller-managed versions +3. **Wait for existing workflows to complete:** + ```bash + # Monitor running workflows in Temporal UI + # Or use Temporal CLI to check workflow status + temporal workflow list --namespace production + ``` -# Delete the original manual Kubernetes Deployments -kubectl delete deployment payment-processor-v1 payment-processor-v2 -``` +4. **Scale down the original unversioned deployment:** + ```bash + # Gradually reduce replicas of the original deployment + kubectl scale deployment payment-processor --replicas=1 + + # Monitor that workflows continue to work correctly + # Once confident, scale to zero + kubectl scale deployment payment-processor --replicas=0 + ``` -At this point, you should only have controller-managed Kubernetes `Deployment` resources running your workers. +5. **Clean up the original deployment:** + ```bash + # Only after confirming all workflows are handled by versioned workers + kubectl delete deployment payment-processor + ``` ### Step 7: Enable Automated Rollouts -Once migration is complete and all legacy versions are imported, update your deployment system to use automated rollouts: +Once the initial migration is complete and validated, enable automated rollouts for future deployments: ```bash -# Update the TemporalWorkerDeployment CRD to use Progressive strategy for new deployments +# Update the TemporalWorkerDeployment CRD to use Progressive strategy kubectl patch temporalworkerdeployment payment-processor --type='merge' -p='{ "spec": { "cutover": { "strategy": "Progressive", "steps": [ - {"rampPercentage": 5, "pauseDuration": "5m"}, - {"rampPercentage": 25, "pauseDuration": "10m"}, - {"rampPercentage": 50, "pauseDuration": "15m"} + {"rampPercentage": 10, "pauseDuration": "5m"}, + {"rampPercentage": 50, "pauseDuration": "10m"} ] } } }' ``` -From this point forward: -- **New worker versions** will automatically follow the Progressive rollout strategy -- **Your CI/CD system** should update the `spec.template.spec.containers[0].image` field in the `TemporalWorkerDeployment` CRD to deploy new versions -- **The controller** will automatically create new Kubernetes `Deployment` resources, handle version registration, traffic routing, and cleanup +### Step 8: Update Your CI/CD Pipeline + +Modify your deployment pipeline to work with the new versioned approach: -Example CI/CD update command: +**Before (Unversioned):** ```bash -# Your deployment pipeline should update the CRD image field like this: +# Old pipeline updated Deployment directly +kubectl set image deployment/payment-processor worker=payment-processor:v1.6.0 +``` + +**After (Versioned):** +```bash +# New pipeline updates TemporalWorkerDeployment CRD kubectl patch temporalworkerdeployment payment-processor --type='merge' -p='{"spec":{"template":{"spec":{"containers":[{"name":"worker","image":"payment-processor:v1.6.0"}]}}}}' ``` -The controller will automatically create a new Kubernetes `Deployment` (e.g., `payment-processor-v1.6.0`) and manage the rollout. +**What happens next:** +1. Controller detects the image change +2. Creates a new Kubernetes `Deployment` (e.g., `payment-processor-v1.6.0`) +3. Registers the new version with Temporal +4. Gradually routes traffic according to Progressive strategy +5. Scales down and cleans up old version once new version is fully deployed + +### Step 9: Test Your First Versioned Deployment + +Deploy a new version to validate the entire flow: -## Configuration Mapping +1. **Make a small, safe change** to your worker code +2. **Build and push** a new container image +3. **Update the TemporalWorkerDeployment:** + ```bash + kubectl patch temporalworkerdeployment payment-processor --type='merge' -p='{"spec":{"template":{"spec":{"containers":[{"name":"worker","image":"payment-processor:v1.6.0"}]}}}}' + ``` +4. **Monitor the rollout:** + ```bash + # Watch the deployment progress + kubectl get temporalworkerdeployment payment-processor -w + + # Check that new deployment is created + kubectl get deployments -l temporal.io/managed-by=temporal-worker-controller + ``` +5. **Verify in Temporal UI** that traffic is gradually shifting to the new version + +## Configuration Reference ### Rollout Strategies +See the [Concepts](concepts.md) document for detailed explanations of rollout strategies. Here are the basic configuration patterns: + **Manual Strategy (Default Behavior):** ```yaml cutover: @@ -469,68 +535,105 @@ template: ## Common Migration Patterns -⚠️ **Remember**: Each pattern below represents **separate Temporal Worker Deployments**. Each gets exactly **one** `TemporalWorkerDeployment` CRD resource. +Here are common scenarios when migrating from unversioned to versioned workflows: -### Pattern 1: Microservices with Multiple Workers +### Pattern 1: Microservices Architecture + +If you have multiple services with their own workers, migrate each service independently: ```yaml -# payment-service workers (ONE CRD for this Temporal Worker Deployment) +# Payment service worker apiVersion: temporal.io/v1alpha1 kind: TemporalWorkerDeployment metadata: name: payment-processor + namespace: payments spec: workerOptions: connection: production-temporal temporalNamespace: payments cutover: - strategy: Manual # Use Manual during migration + strategy: Progressive # Conservative rollout for financial operations + steps: + - rampPercentage: 5 + pauseDuration: 10m + - rampPercentage: 25 + pauseDuration: 15m # ... rest of config --- -# notification-service workers (separate Temporal Worker Deployment = separate CRD) +# Notification service worker (separate service, different risk profile) apiVersion: temporal.io/v1alpha1 kind: TemporalWorkerDeployment metadata: name: notification-sender + namespace: notifications spec: workerOptions: connection: production-temporal temporalNamespace: notifications cutover: - strategy: Manual # Use Manual during migration + strategy: AllAtOnce # Lower risk, faster rollouts acceptable # ... rest of config ``` -### Pattern 2: Environment-Specific Deployments +### Pattern 2: Environment-Specific Strategies + +Use different rollout strategies based on environment risk: ```yaml -# Production (use Manual during migration, then switch to Progressive) +# Production - Conservative rollout apiVersion: temporal.io/v1alpha1 kind: TemporalWorkerDeployment metadata: - name: my-worker + name: order-processor namespace: production spec: workerOptions: connection: production-temporal temporalNamespace: production cutover: - strategy: Manual # Use Manual during migration, then switch to Progressive + strategy: Progressive + steps: + - rampPercentage: 10 + pauseDuration: 15m + - rampPercentage: 50 + pauseDuration: 30m + gate: + workflowType: "HealthCheck" # Validate new version before proceeding --- -# Staging (use Manual during migration, then switch to AllAtOnce) +# Staging - Fast rollout for testing apiVersion: temporal.io/v1alpha1 kind: TemporalWorkerDeployment metadata: - name: my-worker + name: order-processor namespace: staging spec: workerOptions: connection: staging-temporal temporalNamespace: staging cutover: - strategy: Manual # Use Manual during migration, then switch to AllAtOnce + strategy: AllAtOnce # Immediate cutover for faster iteration ``` +### Pattern 3: Gradual Team Migration + +Migrate teams/services based on their readiness and risk tolerance: + +**Phase 1: Low-Risk Services** +- Background processing workers +- Internal tooling workflows +- Non-customer-facing operations + +**Phase 2: Medium-Risk Services** +- Internal API workflows +- Data processing pipelines +- Administrative workflows + +**Phase 3: High-Risk Services** +- Customer-facing workflows +- Payment processing +- Critical business operations + ## Troubleshooting @@ -548,35 +651,50 @@ status: *Solutions:* - Check worker logs for connection errors -- Verify TemporalConnection configuration +- Verify TemporalConnection configuration - Ensure TLS secrets are properly configured - Verify network connectivity to Temporal server +- Check that worker code properly handles `WORKER_BUILD_ID` environment variable + +**2. New Workflows Still Going to Unversioned Workers** + +*Symptoms:* +- Temporal UI shows workflows executing on unversioned workers +- Versioned workers appear idle + +*Solutions:* +- Verify versioned workers are properly registered in Temporal UI +- Check that new workflow starts are using the correct task queue +- Ensure unversioned workers are scaled down gradually, not immediately +- Verify Temporal routing rules are working correctly -**2. Version Stuck in Ramping State** +**3. Existing Workflows Failing During Migration** + +*Symptoms:* +- Running workflows encounter errors during migration +- Workflow history shows non-deterministic errors + +*Solutions:* +- Ensure unversioned workers remain running until workflows complete +- Don't force-terminate unversioned workers with running workflows +- Check that worker code changes are backward compatible +- Monitor workflow completion before scaling down old workers + +**4. Version Stuck in Ramping State** *Symptoms:* ``` status: targetVersion: status: Ramping - rampPercentage: 5 + rampPercentage: 10 ``` *Solutions:* - Check if gate workflow is configured and completing successfully - Verify progressive rollout steps are reasonable - Check controller logs for errors - -**3. Old Versions Not Being Cleaned Up** - -*Symptoms:* -- Multiple old Deployments still exist -- Deprecated versions not transitioning to Drained - -*Solutions:* -- Check if workflows are still running on old versions -- Verify sunset configuration is reasonable -- Check Temporal UI for workflow status +- Ensure new version is healthy and processing workflows correctly ### Debugging Commands @@ -606,26 +724,27 @@ kubectl get events --field-selector involvedObject.kind=TemporalWorkerDeployment ## Migration Summary -🎯 **Remember the core principle**: -- **One `TemporalWorkerDeployment` CRD per Temporal Worker Deployment** -- **Manual strategy during migration** to prevent unwanted promotions -- **Import existing versions sequentially** by updating the same CRD resource -- **Enable automation only after migration is complete** +🎯 **Key principles for unversioned to versioned migration**: +- **Start with Manual strategy** to maintain control during initial migration +- **Run both unversioned and versioned workers** during transition period +- **Wait for existing workflows to complete** before scaling down unversioned workers +- **Enable Progressive rollouts** only after validating the migration process +- **Migrate one service at a time** to reduce risk and enable learning -**Resource Relationship:** -- **Before**: Multiple manual Kubernetes `Deployment` resources per worker -- **After**: One `TemporalWorkerDeployment` CRD → Controller creates multiple Kubernetes `Deployment` resources (one per version) +See the [Concepts](concepts.md) document for detailed explanations of the resource relationships and terminology. -This approach ensures a smooth transition from manual version management to controller automation without disrupting running workflows. +This approach ensures a safe transition from unversioned to versioned workflows without disrupting running workflows or introducing deployment risks. ## Next Steps -After successful migration: +After successful migration to versioned workflows: -1. **Set up monitoring** for your TemporalWorkerDeployment resources -2. **Update CI/CD pipelines** to patch TemporalWorkerDeployment image specs instead of managing Deployments directly -3. **Configure alerting** on version transition failures -4. **Train your team** on the new deployment process (single resource updates vs multiple Deployment management) -5. **Document your specific configuration** patterns for future reference +1. **Set up monitoring** for your TemporalWorkerDeployment resources and version transitions +2. **Update CI/CD pipelines** to patch TemporalWorkerDeployment image specs instead of managing Deployments directly +3. **Configure alerting** on version transition failures and rollout issues +4. **Train your team** on the new versioned deployment process and rollback procedures +5. **Document your rollout strategies** and tune them based on your specific risk tolerance +6. **Plan for advanced features** like canary analysis and automated rollbacks +7. **Migrate remaining services** using the lessons learned from initial migrations -The Temporal Worker Controller should significantly reduce the operational overhead of managing versioned worker deployments while providing better automation and safety for your workflow deployments. \ No newline at end of file +The Temporal Worker Controller should significantly improve your deployment safety and reduce the risk of workflow disruptions while providing automated rollout capabilities that weren't possible with unversioned workflows. \ No newline at end of file From ed9a51067c21f605a926102b6dee47021981ed7a Mon Sep 17 00:00:00 2001 From: Rob Holland Date: Fri, 22 Aug 2025 16:35:11 +0100 Subject: [PATCH 03/25] Remove prom annotations. --- docs/migration-guide.md | 6 ------ 1 file changed, 6 deletions(-) diff --git a/docs/migration-guide.md b/docs/migration-guide.md index 0918df23..8bc5210b 100644 --- a/docs/migration-guide.md +++ b/docs/migration-guide.md @@ -276,9 +276,6 @@ spec: scaledownDelay: 30m deleteDelay: 2h template: - metadata: - annotations: - prometheus.io/scrape: "true" spec: containers: - name: worker @@ -490,9 +487,6 @@ template: ```yaml template: metadata: - annotations: - prometheus.io/scrape: "true" - prometheus.io/port: "9090" labels: team: payments environment: production From e21e16eead975d885cfebe9193e6e4dcc9d32349 Mon Sep 17 00:00:00 2001 From: Rob Holland Date: Fri, 22 Aug 2025 16:36:28 +0100 Subject: [PATCH 04/25] Remove Next Steps which isn't really related to migration per se. --- docs/migration-guide.md | 12 ------------ 1 file changed, 12 deletions(-) diff --git a/docs/migration-guide.md b/docs/migration-guide.md index 8bc5210b..7dfbe1e3 100644 --- a/docs/migration-guide.md +++ b/docs/migration-guide.md @@ -729,16 +729,4 @@ See the [Concepts](concepts.md) document for detailed explanations of the resour This approach ensures a safe transition from unversioned to versioned workflows without disrupting running workflows or introducing deployment risks. -## Next Steps - -After successful migration to versioned workflows: - -1. **Set up monitoring** for your TemporalWorkerDeployment resources and version transitions -2. **Update CI/CD pipelines** to patch TemporalWorkerDeployment image specs instead of managing Deployments directly -3. **Configure alerting** on version transition failures and rollout issues -4. **Train your team** on the new versioned deployment process and rollback procedures -5. **Document your rollout strategies** and tune them based on your specific risk tolerance -6. **Plan for advanced features** like canary analysis and automated rollbacks -7. **Migrate remaining services** using the lessons learned from initial migrations - The Temporal Worker Controller should significantly improve your deployment safety and reduce the risk of workflow disruptions while providing automated rollout capabilities that weren't possible with unversioned workflows. \ No newline at end of file From a88aa9adc3ce00657c2ca1670336409fa096b9bf Mon Sep 17 00:00:00 2001 From: Rob Holland Date: Tue, 26 Aug 2025 12:21:34 +0100 Subject: [PATCH 05/25] Update doc reference. --- docs/README.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/docs/README.md b/docs/README.md index eb856d40..fc8f6206 100644 --- a/docs/README.md +++ b/docs/README.md @@ -12,7 +12,7 @@ This documentation structure is designed to support various types of technical d ## Index ### [Migration Guide](migration-guide.md) -Comprehensive guide for migrating from existing versioned worker deployment systems to the Temporal Worker Controller. Includes step-by-step instructions, configuration mapping, and common patterns. +Comprehensive guide for migrating from existing unversioned worker deployment systems to the Temporal Worker Controller. Includes step-by-step instructions, configuration mapping, and common patterns. ### [Limits](limits.md) Technical constraints and limitations of the Temporal Worker Controller system, including maximum field lengths and other operational boundaries. From dd2fcc29274c89d2939c4c2488eadf9737f6a5e6 Mon Sep 17 00:00:00 2001 From: Rob Holland Date: Tue, 26 Aug 2025 12:37:58 +0100 Subject: [PATCH 06/25] Address some feedback. --- docs/concepts.md | 43 ++++++++++++++++++++++--------------------- 1 file changed, 22 insertions(+), 21 deletions(-) diff --git a/docs/concepts.md b/docs/concepts.md index b19d6e04..9cf1a45e 100644 --- a/docs/concepts.md +++ b/docs/concepts.md @@ -5,61 +5,62 @@ This document defines key concepts and terminology used throughout the Temporal ## Core Terminology ### Temporal Worker Deployment -A logical grouping in Temporal that represents a collection of workers that handle the same set of workflows and activities. Examples include "payment-processor", "notification-sender", or "data-pipeline-worker". This is a concept within Temporal itself, not specific to Kubernetes. +A logical grouping in Temporal that represents a collection of workers that are deployed together and should be versioned together. Examples include "payment-processor", "notification-sender", or "data-pipeline-worker". This is a concept within Temporal itself, not specific to Kubernetes. See https://docs.temporal.io/production-deployment/worker-deployments/worker-versioning for more details. **Key characteristics:** -- Identified by a unique deployment name (e.g., "payment-processor") -- Can have multiple concurrent versions running simultaneously -- Versions are identified by Build IDs (e.g., "v1.5.1", "v1.5.2") +- Identified by a unique worker deployment name (e.g., "payment-processor/staging") +- Can have multiple concurrent worker versions running simultaneously +- Versions of a Worker Deployment are identified by Build IDs (e.g., "v1.5.1", "v1.5.2") +- Temporal routes workflow executions to appropriate worker versions based on the `RoutingConfig` of the Worker Deployment that the versions are in. - Temporal routes workflow executions to appropriate versions based on compatibility rules ### `TemporalWorkerDeployment` CRD The Kubernetes Custom Resource Definition that manages one Temporal Worker Deployment. This is the primary resource you interact with when using the Temporal Worker Controller. **Key characteristics:** -- One CRD resource per Temporal Worker Deployment -- Manages the lifecycle of all versions for that deployment +- One `TemporalWorkerDeployment` Custom Resource per Temporal Worker Deployment +- Manages the lifecycle of all versions for that worker deployment - Defines rollout strategies, resource requirements, and connection details - Controller creates and manages multiple Kubernetes `Deployment` resources based on this spec -### Kubernetes `Deployment` +The actual Kubernetes `Deployment` resources that run worker pods. The controller automatically creates these - you don't manage them directly. The actual Kubernetes Deployment resources that run worker pods. The controller automatically creates these - you don't manage them directly. **Key characteristics:** -- Multiple Kubernetes `Deployment` resources per `TemporalWorkerDeployment` CRD (one per version) -- Named with the pattern: `{deployment-name}-{build-id}` (e.g., `payment-processor-v1.5.1`) +- Multiple Kubernetes `Deployment` resources per `TemporalWorkerDeployment` Custom Resource (one per version) +- Named with the pattern: `{worker-deployment-name}-{build-id}` (e.g., `payment-processor/staging-v1.5.1`) - Managed entirely by the controller - created, updated, and deleted automatically - Each runs a specific version of your worker code ### Key Relationship -**One `TemporalWorkerDeployment` CRD → Multiple Kubernetes `Deployment` resources (managed by controller)** +**One `TemporalWorkerDeployment` Custom Resource → Multiple Kubernetes `Deployment` resources (managed by controller)** -This is the fundamental architecture: you manage a single CRD resource, and the controller handles all the underlying Kubernetes `Deployment` resources for different versions. +make changes to the spec of your `TemporalWorkerDeployment` Custom Resource, and the controller handles all the underlying Kubernetes `Deployment` resources for different versions. ## Version States Worker deployment versions progress through various states during their lifecycle: ### NotRegistered -The version has been specified in the CRD but hasn't been registered with Temporal yet. This typically happens when: +The version has been specified in the `TemporalWorkerDeployment` resource but hasn't been registered with Temporal yet. This typically happens when: - The worker pods are still starting up - There are connectivity issues to Temporal - The worker code has errors preventing registration ### Inactive -The version is registered with Temporal but isn't receiving any new workflow executions. This is the initial state for new versions when using Manual rollout strategy. +The version is registered with Temporal but isn't automatically receiving any new workflow executions through the Worker Deployment's `RoutingConfig`. This is the initial state for new versions before they are promoted via Versioning API calls. Inactive versions can receive workflow executions via `VersioningOverride` only. ### Ramping -The version is receiving a percentage of new workflow executions as part of a Progressive rollout. The percentage gradually increases according to the configured rollout steps. +The version is receiving a percentage of new workflow executions. If managed by a Progressive rollout, the percentage gradually increases according to the configured rollout steps. If the rollout is Manual, the user is responsible for setting the ramp percentage and ramping version. ### Current -The version is receiving 100% of new workflow executions. This is the "production" version that handles all new work. +The version is receiving 100% of new workflow executions. This is the "stable" version that handles all new workflows and all existing AutoUpgrade workflows running on the task queues in this Worker Deployment. ### Draining -The version is no longer receiving new workflow executions but may still be processing existing workflows. The controller waits for all workflows on this version to complete. +The version is no longer receiving new workflow executions but may still be processing existing workflows. ### Drained -All workflows on this version have completed. The version is ready for cleanup according to the sunset configuration. +All Pinned workflows on this version have completed. The version is ready for cleanup according to the sunset configuration. ## Rollout Strategies @@ -72,7 +73,7 @@ Requires explicit human intervention to promote versions. New versions remain in - Testing and validation scenarios ### AllAtOnce Strategy -Immediately routes 100% of new workflow executions to the new version once it's healthy and registered. +Immediately routes 100% of new workflow executions to the target version once it's healthy and registered. **Use cases:** - Non-production environments @@ -90,16 +91,16 @@ Gradually increases the percentage of new workflow executions routed to the new ## Configuration Concepts ### Worker Options -Configuration that defines how workers connect to Temporal: +Configuration that tells the controller how to connect to the same Temporal cluster and namespace that the worker is connected to: - **connection**: Reference to a `TemporalConnection` resource - **temporalNamespace**: The Temporal namespace to connect to - **deploymentName**: The logical deployment name in Temporal (auto-generated if not specified) -### Cutover Configuration +### Rollout Configuration Defines how new versions are promoted: - **strategy**: Manual, AllAtOnce, or Progressive - **steps**: For Progressive strategy, defines ramp percentages and pause durations -- **gate**: Optional workflow that must succeed before promotion continues +- **gate**: Optional workflow that must succeed on all task queues in the target Worker Deployment Version before promotion continues ### Sunset Configuration Defines how old versions are cleaned up: From f400ebc4ecff705b92e8203b56910f4331e002f1 Mon Sep 17 00:00:00 2001 From: Rob Holland Date: Tue, 26 Aug 2025 12:47:22 +0100 Subject: [PATCH 07/25] Apply suggestions from code review Co-authored-by: Carly de Frondeville --- docs/concepts.md | 27 +++++++++++++++------------ docs/migration-guide.md | 26 +++++++++++++------------- 2 files changed, 28 insertions(+), 25 deletions(-) diff --git a/docs/concepts.md b/docs/concepts.md index 9cf1a45e..5013820d 100644 --- a/docs/concepts.md +++ b/docs/concepts.md @@ -103,12 +103,12 @@ Defines how new versions are promoted: - **gate**: Optional workflow that must succeed on all task queues in the target Worker Deployment Version before promotion continues ### Sunset Configuration -Defines how old versions are cleaned up: -- **scaledownDelay**: How long to wait after draining before scaling pods to zero -- **deleteDelay**: How long to wait after draining before deleting the Kubernetes `Deployment` +Defines how Drained versions are cleaned up: +- **scaledownDelay**: How long to wait after a version has been Drained before scaling pods to zero +- **deleteDelay**: How long to wait after a version has been Drained before deleting the Kubernetes `Deployment` ### Template -The pod template used for all versions of this deployment. Similar to a standard Kubernetes Deployment template but managed by the controller. +The pod template used for the target version of this worker deployment. Similar to a standard Kubernetes Deployment template but managed by the controller. ## Environment Variables @@ -116,27 +116,30 @@ The controller automatically sets these environment variables for all worker pod ### TEMPORAL_HOST_PORT The host and port of the Temporal server, derived from the `TemporalConnection` resource. +The worker must connect to this Temporal endpoint, but since this is user provided and not controller generated, the user does not necessarily need to access this env var to get that endpoint if it already knows the endpoint another way. ### TEMPORAL_NAMESPACE The Temporal namespace the worker should connect to, from `spec.workerOptions.temporalNamespace`. +The worker must connect to this Temporal namespace, but since this is user provided and not controller generated, the user does not necessarily need to access this env var to get that namespace if it already knows the namespace another way. ### TEMPORAL_DEPLOYMENT_NAME -The deployment name in Temporal, either from `spec.workerOptions.deploymentName` or auto-generated from the CRD name. +The worker deployment name in Temporal, auto-generated from the `TemporalWorkerDeployment` name and Kubernetes namespace. +The worker *must* use this to configure its `worker.DeploymentOptions`. ### WORKER_BUILD_ID -The build ID for this specific version, derived from the container image tag or explicitly set. +The build ID for this specific version, derived from the container image tag and hash of the target pod template. +The worker *must* use this to configure its `worker.DeploymentOptions`. ## Resource Management Concepts ### Rainbow Deployments -The pattern of running multiple versions of the same worker simultaneously. This is essential for maintaining workflow determinism in Temporal, as running workflows must continue executing on the version they started with. +The pattern of running multiple versions of the same service simultaneously. Running multiple versions of your workers simultaneously is essential for supporting Pinned workflows in Temporal, as Pinned workflows must continue executing on the worker version they started on. ### Version Lifecycle Management The automated process of: 1. Registering new versions with Temporal 2. Gradually routing traffic to new versions -3. Draining old versions once they're no longer needed -4. Cleaning up resources for drained versions +3. Cleaning up resources for drained versions ### Controller-Managed Resources Resources that are created, updated, and deleted automatically by the controller: @@ -149,9 +152,9 @@ Resources that are created, updated, and deleted automatically by the controller ### Import Process The process of bringing existing manually-managed worker deployments under controller management. This involves: -1. Creating a `TemporalWorkerDeployment` CRD with Manual strategy -2. Sequentially updating the image spec to register each existing version -3. Cleaning up original manual Kubernetes `Deployment` resources +1. Creating a `TemporalWorkerDeployment` custom resource with Manual strategy +2. Sequentially updating the target pod template in the `TemporalWorkerDeployment` spec to prompt the controller to create a Kubernetes Deployment with that pod spec that is owned and tracked by the controller. Do this for each of your existing Deployments. +3. Cleaning up original non-worker-controller-managed Kubernetes `Deployment` resources 4. Enabling automated rollouts ### Single Resource Requirement diff --git a/docs/migration-guide.md b/docs/migration-guide.md index 7dfbe1e3..81a44f18 100644 --- a/docs/migration-guide.md +++ b/docs/migration-guide.md @@ -1,6 +1,6 @@ # Migrating from Unversioned to Versioned Workflows with Temporal Worker Controller -This guide helps teams migrate from unversioned Temporal workflows to versioned workflows using the Temporal Worker Controller. It assumes you are currently running workers without Temporal's Worker Versioning feature and want to adopt versioned deployments for safer, more controlled rollouts. +This guide helps teams migrate from unversioned Temporal workflows to versioned workflows using the Temporal Worker Controller. It assumes you are currently running workers without Temporal's Worker Versioning feature and want to adopt versioned worker deployments for safer, more controlled rollouts. ## Important Note @@ -48,7 +48,7 @@ Before starting the migration, ensure you have: Your workers are likely configured with basic environment variables like: ```bash -TEMPORAL_HOST_PORT=your-namespace.tmprl.cloud:7233 +TEMPORAL_HOST_PORT=your-temporal-namespace.tmprl.cloud:7233 TEMPORAL_NAMESPACE=your-temporal-namespace # No TEMPORAL_DEPLOYMENT_NAME or WORKER_BUILD_ID yet ``` @@ -89,7 +89,7 @@ spec: ### After: Versioned Workers with Controller ```yaml -# Single CRD manages multiple versions automatically +# Single Custom Resource manages multiple versions of a worker deployment automatically apiVersion: temporal.io/v1alpha1 kind: TemporalWorkerDeployment metadata: @@ -122,8 +122,8 @@ spec: 1. Build new worker image 2. Update `TemporalWorkerDeployment` CRD with new image 3. Controller creates new Kubernetes `Deployment` for the new version -4. Controller gradually routes new workflows to new version -5. Old version continues handling existing workflows until they complete +4. Controller gradually routes new workflows and existing AutoUpgrade workflows to new version +5. Old version continues handling existing Pinned workflows until they complete 6. **Safety**: No disruption to running workflows, automated rollout control **Key Benefits:** @@ -236,7 +236,7 @@ worker := worker.New(client, "my-task-queue", workerOptions) Start with your lowest-risk worker. Convert your existing unversioned Deployment to a `TemporalWorkerDeployment` CRD: -**Current Unversioned Deployment:** +**Existing Unversioned Deployment:** ```yaml apiVersion: apps/v1 kind: Deployment @@ -270,7 +270,7 @@ spec: connection: production-temporal temporalNamespace: production # CRITICAL: Use Manual during initial migration - cutover: + rollout: strategy: Manual sunset: scaledownDelay: 30m @@ -279,7 +279,7 @@ spec: spec: containers: - name: worker - image: payment-processor:v1.5.2 # Same image as current deployment + image: payment-processor:v1.5.2 # Same image as current deployment to ensure no breaking changes resources: requests: memory: "512Mi" @@ -289,7 +289,7 @@ spec: ### Step 5: Deploy the TemporalWorkerDeployment -1. **Create the `TemporalWorkerDeployment` CRD:** +1. **Create the `TemporalWorkerDeployment` custom resource:** ```bash kubectl apply -f payment-processor-versioned.yaml ``` @@ -366,7 +366,7 @@ Now you need to carefully transition from your old unversioned deployment to the Once the initial migration is complete and validated, enable automated rollouts for future deployments: ```bash -# Update the TemporalWorkerDeployment CRD to use Progressive strategy +# Update the TemporalWorkerDeployment custom resource to use Progressive strategy kubectl patch temporalworkerdeployment payment-processor --type='merge' -p='{ "spec": { "cutover": { @@ -392,7 +392,7 @@ kubectl set image deployment/payment-processor worker=payment-processor:v1.6.0 **After (Versioned):** ```bash -# New pipeline updates TemporalWorkerDeployment CRD +# New pipeline updates TemporalWorkerDeployment custom resource kubectl patch temporalworkerdeployment payment-processor --type='merge' -p='{"spec":{"template":{"spec":{"containers":[{"name":"worker","image":"payment-processor:v1.6.0"}]}}}}' ``` @@ -547,7 +547,7 @@ spec: connection: production-temporal temporalNamespace: payments cutover: - strategy: Progressive # Conservative rollout for financial operations + strategy: Progressive steps: - rampPercentage: 5 pauseDuration: 10m @@ -658,7 +658,7 @@ status: *Solutions:* - Verify versioned workers are properly registered in Temporal UI -- Check that new workflow starts are using the correct task queue +- Check that new workflows are starting on the correct task queue - Ensure unversioned workers are scaled down gradually, not immediately - Verify Temporal routing rules are working correctly From a9c2b8b27c5ab1d4c41daa1f11b4fdde47aa99c3 Mon Sep 17 00:00:00 2001 From: Rob Holland Date: Thu, 4 Sep 2025 17:10:43 +0100 Subject: [PATCH 08/25] More consistent use of custom resource. --- docs/concepts.md | 8 ++++---- docs/migration-guide.md | 8 ++++---- 2 files changed, 8 insertions(+), 8 deletions(-) diff --git a/docs/concepts.md b/docs/concepts.md index 5013820d..542c3c8d 100644 --- a/docs/concepts.md +++ b/docs/concepts.md @@ -42,7 +42,7 @@ make changes to the spec of your `TemporalWorkerDeployment` Custom Resource, and Worker deployment versions progress through various states during their lifecycle: ### NotRegistered -The version has been specified in the `TemporalWorkerDeployment` resource but hasn't been registered with Temporal yet. This typically happens when: +The version has been specified in the `TemporalWorkerDeployment` custom resource but hasn't been registered with Temporal yet. This typically happens when: - The worker pods are still starting up - There are connectivity issues to Temporal - The worker code has errors preventing registration @@ -92,7 +92,7 @@ Gradually increases the percentage of new workflow executions routed to the new ### Worker Options Configuration that tells the controller how to connect to the same Temporal cluster and namespace that the worker is connected to: -- **connection**: Reference to a `TemporalConnection` resource +- **connection**: Reference to a `TemporalConnection` custom resource - **temporalNamespace**: The Temporal namespace to connect to - **deploymentName**: The logical deployment name in Temporal (auto-generated if not specified) @@ -115,7 +115,7 @@ The pod template used for the target version of this worker deployment. Similar The controller automatically sets these environment variables for all worker pods: ### TEMPORAL_HOST_PORT -The host and port of the Temporal server, derived from the `TemporalConnection` resource. +The host and port of the Temporal server, derived from the `TemporalConnection` custom resource. The worker must connect to this Temporal endpoint, but since this is user provided and not controller generated, the user does not necessarily need to access this env var to get that endpoint if it already knows the endpoint another way. ### TEMPORAL_NAMESPACE @@ -158,7 +158,7 @@ The process of bringing existing manually-managed worker deployments under contr 4. Enabling automated rollouts ### Single Resource Requirement -The critical principle that each Temporal Worker Deployment must be managed by exactly one `TemporalWorkerDeployment` CRD resource. You cannot split a single logical deployment across multiple CRD resources. +The critical principle that each Temporal Worker Deployment must be managed by exactly one `TemporalWorkerDeployment` custom resource. You cannot split a single logical deployment across multiple custom resources. ### Legacy Version Handling The process of ensuring that existing worker versions continue running during migration, maintaining workflow determinism while transitioning to controller management. diff --git a/docs/migration-guide.md b/docs/migration-guide.md index 81a44f18..a7a09229 100644 --- a/docs/migration-guide.md +++ b/docs/migration-guide.md @@ -120,7 +120,7 @@ spec: **New Deployment Process:** 1. Build new worker image -2. Update `TemporalWorkerDeployment` CRD with new image +2. Update `TemporalWorkerDeployment` custom resource with new image 3. Controller creates new Kubernetes `Deployment` for the new version 4. Controller gradually routes new workflows and existing AutoUpgrade workflows to new version 5. Old version continues handling existing Pinned workflows until they complete @@ -156,7 +156,7 @@ The migration from unversioned to versioned workflows requires careful planning #### Phase 2: Initial Migration 1. **Choose lowest-risk worker** to migrate first -2. **Create `TemporalWorkerDeployment` CRD** with Manual strategy +2. **Create `TemporalWorkerDeployment` custom resource** with Manual strategy 3. **Validate controller management** works correctly 4. **Update deployment pipeline** for this worker @@ -234,7 +234,7 @@ worker := worker.New(client, "my-task-queue", workerOptions) ### Step 4: Create Your First TemporalWorkerDeployment -Start with your lowest-risk worker. Convert your existing unversioned Deployment to a `TemporalWorkerDeployment` CRD: +Start with your lowest-risk worker. Convert your existing unversioned Deployment to a `TemporalWorkerDeployment` custom resource: **Existing Unversioned Deployment:** ```yaml @@ -256,7 +256,7 @@ spec: value: "production" ``` -**New TemporalWorkerDeployment CRD (IMPORTANT: Use Manual strategy initially):** +**New TemporalWorkerDeployment custom resource (IMPORTANT: Use Manual strategy initially):** ```yaml apiVersion: temporal.io/v1alpha1 kind: TemporalWorkerDeployment From cd05bb619b7386840b662245bc0531580af45af9 Mon Sep 17 00:00:00 2001 From: Rob Holland Date: Fri, 5 Sep 2025 15:26:57 +0100 Subject: [PATCH 09/25] Amendments based on feedback. --- docs/concepts.md | 19 +------------------ 1 file changed, 1 insertion(+), 18 deletions(-) diff --git a/docs/concepts.md b/docs/concepts.md index 542c3c8d..7c53a31b 100644 --- a/docs/concepts.md +++ b/docs/concepts.md @@ -69,7 +69,6 @@ Requires explicit human intervention to promote versions. New versions remain in **Use cases:** - During migration from manual deployment systems -- High-risk production environments requiring human approval - Testing and validation scenarios ### AllAtOnce Strategy @@ -143,22 +142,6 @@ The automated process of: ### Controller-Managed Resources Resources that are created, updated, and deleted automatically by the controller: +- `TemporalWorkerDeployment` custom resources, to update their status - Kubernetes `Deployment` resources for each version -- ConfigMaps and Secrets as needed -- Service accounts and RBAC resources - Labels and annotations for tracking and management - -## Migration Concepts - -### Import Process -The process of bringing existing manually-managed worker deployments under controller management. This involves: -1. Creating a `TemporalWorkerDeployment` custom resource with Manual strategy -2. Sequentially updating the target pod template in the `TemporalWorkerDeployment` spec to prompt the controller to create a Kubernetes Deployment with that pod spec that is owned and tracked by the controller. Do this for each of your existing Deployments. -3. Cleaning up original non-worker-controller-managed Kubernetes `Deployment` resources -4. Enabling automated rollouts - -### Single Resource Requirement -The critical principle that each Temporal Worker Deployment must be managed by exactly one `TemporalWorkerDeployment` custom resource. You cannot split a single logical deployment across multiple custom resources. - -### Legacy Version Handling -The process of ensuring that existing worker versions continue running during migration, maintaining workflow determinism while transitioning to controller management. From fa035b806c2d04748a513e2108422303e5fcb613 Mon Sep 17 00:00:00 2001 From: Rob Holland Date: Fri, 5 Sep 2025 15:29:57 +0100 Subject: [PATCH 10/25] Add a bit more detail about the spec triggering deploys. --- docs/migration-guide.md | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/docs/migration-guide.md b/docs/migration-guide.md index a7a09229..a9bdf04a 100644 --- a/docs/migration-guide.md +++ b/docs/migration-guide.md @@ -110,10 +110,10 @@ spec: scaledownDelay: 1h deleteDelay: 24h template: - spec: + spec: # Any changes to this spec will trigger the controller to deploy a new version. containers: - name: worker - image: my-worker:v1.2.4 # Update this to deploy new versions + image: my-worker:v1.2.4 # This is the most common value to change, as you roll out a new worker image. # Note: Controller automatically adds versioning environment variables: # TEMPORAL_HOST_PORT, TEMPORAL_NAMESPACE, TEMPORAL_DEPLOYMENT_NAME, WORKER_BUILD_ID ``` From 3e4843e83a4292312becef4bc19d5ddf91b5529b Mon Sep 17 00:00:00 2001 From: Rob Holland Date: Fri, 5 Sep 2025 15:40:36 +0100 Subject: [PATCH 11/25] Don't imply things are optional. --- docs/migration-guide.md | 8 ++++---- 1 file changed, 4 insertions(+), 4 deletions(-) diff --git a/docs/migration-guide.md b/docs/migration-guide.md index a9bdf04a..6d949d9b 100644 --- a/docs/migration-guide.md +++ b/docs/migration-guide.md @@ -150,7 +150,7 @@ The migration from unversioned to versioned workflows requires careful planning #### Phase 1: Preparation 1. **Install the controller** in non-production environments -2. **Update worker code** to support versioning (if needed) +2. **Update worker code** to support versioning 3. **Test migration process** with non-critical workers 4. **Prepare CI/CD pipeline changes** for new deployment method @@ -208,9 +208,9 @@ spec: mutualTLSSecret: temporal-cloud-mtls ``` -### Step 3: Prepare Your Worker Code (If Needed) +### Step 3: Prepare Your Worker Code -Most workers will work without changes, but you may need to update your worker initialization code to properly handle versioning: +Update your worker initialization code to properly handle versioning: **Before (Unversioned):** ```go @@ -218,7 +218,7 @@ Most workers will work without changes, but you may need to update your worker i worker := worker.New(client, "my-task-queue", worker.Options{}) ``` -**After (Versioned - Optional Enhancement):** +**After (Versioned):** ```go // Worker can optionally use build ID from environment buildID := os.Getenv("WORKER_BUILD_ID") From 15d660fc28258c281d21d2e80a1fec72f10e1c89 Mon Sep 17 00:00:00 2001 From: Rob Holland Date: Fri, 5 Sep 2025 15:45:10 +0100 Subject: [PATCH 12/25] Update go worker code. --- docs/migration-guide.md | 16 ++++++++++------ 1 file changed, 10 insertions(+), 6 deletions(-) diff --git a/docs/migration-guide.md b/docs/migration-guide.md index 6d949d9b..027e97ea 100644 --- a/docs/migration-guide.md +++ b/docs/migration-guide.md @@ -220,18 +220,22 @@ worker := worker.New(client, "my-task-queue", worker.Options{}) **After (Versioned):** ```go -// Worker can optionally use build ID from environment +// Worker must use the build ID from environment, this is for the deployment by the controller buildID := os.Getenv("WORKER_BUILD_ID") +if buildID == "" { + // exit with an error +} workerOptions := worker.Options{} -if buildID != "" { - workerOptions.BuildID = buildID - workerOptions.UseBuildIDForVersioning = true +workerOptions.DeploymentOptions = worker.DeploymentOptions{ + UseVersioning: true, + Version: worker.WorkerDeploymentVersion{ + DeploymentName: mustGetEnv("TEMPORAL_DEPLOYMENT_NAME"), + BuildId: mustGetEnv("TEMPORAL_WORKER_BUILD_ID"), + }, } worker := worker.New(client, "my-task-queue", workerOptions) ``` -> **Note**: The controller automatically sets `WORKER_BUILD_ID` environment variable, so most workers will work without code changes. - ### Step 4: Create Your First TemporalWorkerDeployment Start with your lowest-risk worker. Convert your existing unversioned Deployment to a `TemporalWorkerDeployment` custom resource: From 270e35e7063fcd389ee5734b44fa406a25061ddd Mon Sep 17 00:00:00 2001 From: Rob Holland Date: Fri, 5 Sep 2025 15:47:00 +0100 Subject: [PATCH 13/25] Cutover -> rollout. --- docs/migration-guide.md | 22 +++++++++++----------- 1 file changed, 11 insertions(+), 11 deletions(-) diff --git a/docs/migration-guide.md b/docs/migration-guide.md index 027e97ea..76943cd0 100644 --- a/docs/migration-guide.md +++ b/docs/migration-guide.md @@ -99,7 +99,7 @@ spec: workerOptions: connection: production-temporal temporalNamespace: production - cutover: + rollout: strategy: Progressive # Gradual rollout of new versions steps: - rampPercentage: 10 @@ -373,7 +373,7 @@ Once the initial migration is complete and validated, enable automated rollouts # Update the TemporalWorkerDeployment custom resource to use Progressive strategy kubectl patch temporalworkerdeployment payment-processor --type='merge' -p='{ "spec": { - "cutover": { + "rollout": { "strategy": "Progressive", "steps": [ {"rampPercentage": 10, "pauseDuration": "5m"}, @@ -435,21 +435,21 @@ See the [Concepts](concepts.md) document for detailed explanations of rollout st **Manual Strategy (Default Behavior):** ```yaml -cutover: +rollout: strategy: Manual # Requires manual intervention to promote versions ``` -**Immediate Cutover:** +**Immediate Rollout:** ```yaml -cutover: +rollout: strategy: AllAtOnce # Immediately routes 100% traffic to new version when healthy ``` **Progressive Rollout:** ```yaml -cutover: +rollout: strategy: Progressive steps: - rampPercentage: 1 @@ -550,7 +550,7 @@ spec: workerOptions: connection: production-temporal temporalNamespace: payments - cutover: + rollout: strategy: Progressive steps: - rampPercentage: 5 @@ -569,7 +569,7 @@ spec: workerOptions: connection: production-temporal temporalNamespace: notifications - cutover: + rollout: strategy: AllAtOnce # Lower risk, faster rollouts acceptable # ... rest of config ``` @@ -589,7 +589,7 @@ spec: workerOptions: connection: production-temporal temporalNamespace: production - cutover: + rollout: strategy: Progressive steps: - rampPercentage: 10 @@ -609,8 +609,8 @@ spec: workerOptions: connection: staging-temporal temporalNamespace: staging - cutover: - strategy: AllAtOnce # Immediate cutover for faster iteration + rollout: + strategy: AllAtOnce # Immediate rollout for faster iteration ``` ### Pattern 3: Gradual Team Migration From 97c9da7dda32d6e78d51e006a995066e9b11e208 Mon Sep 17 00:00:00 2001 From: Rob Holland Date: Fri, 5 Sep 2025 15:54:40 +0100 Subject: [PATCH 14/25] Correct code. --- docs/migration-guide.md | 12 +++++++----- 1 file changed, 7 insertions(+), 5 deletions(-) diff --git a/docs/migration-guide.md b/docs/migration-guide.md index 76943cd0..b9b97528 100644 --- a/docs/migration-guide.md +++ b/docs/migration-guide.md @@ -220,17 +220,19 @@ worker := worker.New(client, "my-task-queue", worker.Options{}) **After (Versioned):** ```go -// Worker must use the build ID from environment, this is for the deployment by the controller -buildID := os.Getenv("WORKER_BUILD_ID") -if buildID == "" { +// Worker must use the build ID/deployment name from environment +// These are set on the deployment by the controller +buildID := os.Getenv("TEMPORAL_WORKER_BUILD_ID") +deploymentName := os.Getenv("TEMPORAL_DEPLOYMENT_NAME") +if buildID == "" || deploymentName == "" { // exit with an error } workerOptions := worker.Options{} workerOptions.DeploymentOptions = worker.DeploymentOptions{ UseVersioning: true, Version: worker.WorkerDeploymentVersion{ - DeploymentName: mustGetEnv("TEMPORAL_DEPLOYMENT_NAME"), - BuildId: mustGetEnv("TEMPORAL_WORKER_BUILD_ID"), + DeploymentName: deploymentName, + BuildId: buildId, }, } worker := worker.New(client, "my-task-queue", workerOptions) From 0a3867ac0481ed5b33dabcd2f6330f18ff7224e3 Mon Sep 17 00:00:00 2001 From: Rob Holland Date: Fri, 5 Sep 2025 15:56:32 +0100 Subject: [PATCH 15/25] Var cleanup. --- docs/migration-guide.md | 6 +++--- 1 file changed, 3 insertions(+), 3 deletions(-) diff --git a/docs/migration-guide.md b/docs/migration-guide.md index b9b97528..201a282d 100644 --- a/docs/migration-guide.md +++ b/docs/migration-guide.md @@ -50,7 +50,7 @@ Your workers are likely configured with basic environment variables like: ```bash TEMPORAL_HOST_PORT=your-temporal-namespace.tmprl.cloud:7233 TEMPORAL_NAMESPACE=your-temporal-namespace -# No TEMPORAL_DEPLOYMENT_NAME or WORKER_BUILD_ID yet +# No TEMPORAL_DEPLOYMENT_NAME or TEMPORAL_WORKER_BUILD_ID yet ``` The controller will automatically add the versioning-related environment variables during migration. @@ -115,7 +115,7 @@ spec: - name: worker image: my-worker:v1.2.4 # This is the most common value to change, as you roll out a new worker image. # Note: Controller automatically adds versioning environment variables: - # TEMPORAL_HOST_PORT, TEMPORAL_NAMESPACE, TEMPORAL_DEPLOYMENT_NAME, WORKER_BUILD_ID + # TEMPORAL_HOST_PORT, TEMPORAL_NAMESPACE, TEMPORAL_DEPLOYMENT_NAME, TEMPORAL_WORKER_BUILD_ID ``` **New Deployment Process:** @@ -654,7 +654,7 @@ status: - Verify TemporalConnection configuration - Ensure TLS secrets are properly configured - Verify network connectivity to Temporal server -- Check that worker code properly handles `WORKER_BUILD_ID` environment variable +- Check that worker code properly handles `TEMPORAL_DEPLOYMENT_NAME` and `TEMPORAL_WORKER_BUILD_ID` environment variables **2. New Workflows Still Going to Unversioned Workers** From 1a22a3e83957bc3054c5b68096bb24eec53c28f0 Mon Sep 17 00:00:00 2001 From: Rob Holland Date: Fri, 5 Sep 2025 16:10:25 +0100 Subject: [PATCH 16/25] Clarity. --- docs/migration-guide.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/docs/migration-guide.md b/docs/migration-guide.md index 201a282d..081c8b56 100644 --- a/docs/migration-guide.md +++ b/docs/migration-guide.md @@ -572,7 +572,7 @@ spec: connection: production-temporal temporalNamespace: notifications rollout: - strategy: AllAtOnce # Lower risk, faster rollouts acceptable + strategy: AllAtOnce # Lower risk, faster rollouts desired # ... rest of config ``` From 81d58b95fe05873f361fdb5dcfa8389952f40600 Mon Sep 17 00:00:00 2001 From: Rob Holland Date: Fri, 5 Sep 2025 16:14:06 +0100 Subject: [PATCH 17/25] Clarity. --- docs/migration-guide.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/docs/migration-guide.md b/docs/migration-guide.md index 081c8b56..7d05a781 100644 --- a/docs/migration-guide.md +++ b/docs/migration-guide.md @@ -539,7 +539,7 @@ Here are common scenarios when migrating from unversioned to versioned workflows ### Pattern 1: Microservices Architecture -If you have multiple services with their own workers, migrate each service independently: +If you have multiple services with their own workers, and each service is versioned and patched separately, migrate each service independently: ```yaml # Payment service worker From ade47a108b839319b87ea1ab53f873b536caed84 Mon Sep 17 00:00:00 2001 From: Rob Holland Date: Fri, 5 Sep 2025 16:14:55 +0100 Subject: [PATCH 18/25] Remove core k8s concepts. --- docs/migration-guide.md | 26 -------------------------- 1 file changed, 26 deletions(-) diff --git a/docs/migration-guide.md b/docs/migration-guide.md index 7d05a781..6fd9d589 100644 --- a/docs/migration-guide.md +++ b/docs/migration-guide.md @@ -472,32 +472,6 @@ sunset: deleteDelay: 24h # Wait 24 hours after draining before deleting ``` -### Resource Management - -**CPU and Memory:** -```yaml -template: - spec: - containers: - - name: worker - resources: - requests: - memory: "1Gi" - cpu: "500m" - limits: - memory: "2Gi" - cpu: "1" -``` - -**Pod Annotations and Labels:** -```yaml -template: - metadata: - labels: - team: payments - environment: production -``` - ## Testing and Validation ### Pre-Migration Testing From 0126b9e74e0eaeb82a3a2f457b7d13fdee00c17a Mon Sep 17 00:00:00 2001 From: Rob Holland Date: Fri, 5 Sep 2025 16:18:07 +0100 Subject: [PATCH 19/25] Scaling to 1 isn't useful. --- docs/migration-guide.md | 6 +----- 1 file changed, 1 insertion(+), 5 deletions(-) diff --git a/docs/migration-guide.md b/docs/migration-guide.md index 6fd9d589..d2dc56ef 100644 --- a/docs/migration-guide.md +++ b/docs/migration-guide.md @@ -353,11 +353,7 @@ Now you need to carefully transition from your old unversioned deployment to the 4. **Scale down the original unversioned deployment:** ```bash - # Gradually reduce replicas of the original deployment - kubectl scale deployment payment-processor --replicas=1 - - # Monitor that workflows continue to work correctly - # Once confident, scale to zero + # Scale down the original deployment kubectl scale deployment payment-processor --replicas=0 ``` From 7f3f7e5303428634a58c0e997ddaa0eb0d4e5431 Mon Sep 17 00:00:00 2001 From: Rob Holland Date: Sun, 7 Sep 2025 11:26:44 +0100 Subject: [PATCH 20/25] Feedback. --- docs/concepts.md | 14 +++++--------- docs/migration-guide.md | 21 +++++---------------- 2 files changed, 10 insertions(+), 25 deletions(-) diff --git a/docs/concepts.md b/docs/concepts.md index 7c53a31b..9c804922 100644 --- a/docs/concepts.md +++ b/docs/concepts.md @@ -12,7 +12,6 @@ A logical grouping in Temporal that represents a collection of workers that are - Can have multiple concurrent worker versions running simultaneously - Versions of a Worker Deployment are identified by Build IDs (e.g., "v1.5.1", "v1.5.2") - Temporal routes workflow executions to appropriate worker versions based on the `RoutingConfig` of the Worker Deployment that the versions are in. -- Temporal routes workflow executions to appropriate versions based on compatibility rules ### `TemporalWorkerDeployment` CRD The Kubernetes Custom Resource Definition that manages one Temporal Worker Deployment. This is the primary resource you interact with when using the Temporal Worker Controller. @@ -24,18 +23,16 @@ The Kubernetes Custom Resource Definition that manages one Temporal Worker Deplo - Controller creates and manages multiple Kubernetes `Deployment` resources based on this spec The actual Kubernetes `Deployment` resources that run worker pods. The controller automatically creates these - you don't manage them directly. -The actual Kubernetes Deployment resources that run worker pods. The controller automatically creates these - you don't manage them directly. **Key characteristics:** - Multiple Kubernetes `Deployment` resources per `TemporalWorkerDeployment` Custom Resource (one per version) -- Named with the pattern: `{worker-deployment-name}-{build-id}` (e.g., `payment-processor/staging-v1.5.1`) -- Managed entirely by the controller - created, updated, and deleted automatically +- Named with the pattern: `{worker-deployment-name}-{build-id}` (e.g., `staging/payment-processor-v1.5.1`) - Each runs a specific version of your worker code ### Key Relationship **One `TemporalWorkerDeployment` Custom Resource → Multiple Kubernetes `Deployment` resources (managed by controller)** -make changes to the spec of your `TemporalWorkerDeployment` Custom Resource, and the controller handles all the underlying Kubernetes `Deployment` resources for different versions. +Make changes to the spec of your `TemporalWorkerDeployment` Custom Resource, and the controller handles all the underlying Kubernetes `Deployment` resources for different versions. ## Version States @@ -107,16 +104,15 @@ Defines how Drained versions are cleaned up: - **deleteDelay**: How long to wait after a version has been Drained before deleting the Kubernetes `Deployment` ### Template -The pod template used for the target version of this worker deployment. Similar to a standard Kubernetes Deployment template but managed by the controller. +The pod template used for the target version of this worker deployment. Similar to the pod template used in a standar Kubernetes `Deployment`, but managed by the controller. ## Environment Variables The controller automatically sets these environment variables for all worker pods: -### TEMPORAL_HOST_PORT +### TEMPORAL_ADDRESS The host and port of the Temporal server, derived from the `TemporalConnection` custom resource. The worker must connect to this Temporal endpoint, but since this is user provided and not controller generated, the user does not necessarily need to access this env var to get that endpoint if it already knows the endpoint another way. - ### TEMPORAL_NAMESPACE The Temporal namespace the worker should connect to, from `spec.workerOptions.temporalNamespace`. The worker must connect to this Temporal namespace, but since this is user provided and not controller generated, the user does not necessarily need to access this env var to get that namespace if it already knows the namespace another way. @@ -125,7 +121,7 @@ The worker must connect to this Temporal namespace, but since this is user provi The worker deployment name in Temporal, auto-generated from the `TemporalWorkerDeployment` name and Kubernetes namespace. The worker *must* use this to configure its `worker.DeploymentOptions`. -### WORKER_BUILD_ID +### TEMPORAL_WORKER_BUILD_ID The build ID for this specific version, derived from the container image tag and hash of the target pod template. The worker *must* use this to configure its `worker.DeploymentOptions`. diff --git a/docs/migration-guide.md b/docs/migration-guide.md index d2dc56ef..39a86df7 100644 --- a/docs/migration-guide.md +++ b/docs/migration-guide.md @@ -20,18 +20,7 @@ This guide uses specific terminology that is defined in the [Concepts](concepts. ## Why Migrate to Versioned Workflows -If you're currently running unversioned Temporal workflows, you may be experiencing challenges with deployments: - -- **Deployment Risk**: Code changes can break running workflows if they're not backward compatible -- **Rollback Complexity**: Rolling back deployments can disrupt in-flight workflows -- **Workflow Determinism Issues**: Changes to workflow logic can cause non-deterministic errors - -Versioned workflows with the Temporal Worker Controller solve these problems by: - -- ✅ **Safe Deployments**: New versions run alongside old ones, ensuring running workflows complete successfully -- ✅ **Automated Rollouts**: Progressive rollout strategies reduce risk of new deployments -- ✅ **Easy Rollbacks**: Can instantly route new workflows back to previous versions -- ✅ **Workflow Continuity**: Running workflows can continue on their original version until completion +If you're currently running unversioned Temporal workflows, you may be experiencing challenges with deployments. Versioned workflows with the Temporal Worker Controller can solve these problems. For details on the benefits of Worker Versioning, see the [Temporal documentation](https://docs.temporal.io/production-deployment/worker-deployments/worker-versioning). ## Prerequisites @@ -48,7 +37,7 @@ Before starting the migration, ensure you have: Your workers are likely configured with basic environment variables like: ```bash -TEMPORAL_HOST_PORT=your-temporal-namespace.tmprl.cloud:7233 +TEMPORAL_ADDRESS=your-temporal-namespace.tmprl.cloud:7233 TEMPORAL_NAMESPACE=your-temporal-namespace # No TEMPORAL_DEPLOYMENT_NAME or TEMPORAL_WORKER_BUILD_ID yet ``` @@ -73,7 +62,7 @@ spec: - name: worker image: my-worker:v1.2.3 env: - - name: TEMPORAL_HOST_PORT + - name: TEMPORAL_ADDRESS value: "production.tmprl.cloud:7233" - name: TEMPORAL_NAMESPACE value: "production" @@ -115,7 +104,7 @@ spec: - name: worker image: my-worker:v1.2.4 # This is the most common value to change, as you roll out a new worker image. # Note: Controller automatically adds versioning environment variables: - # TEMPORAL_HOST_PORT, TEMPORAL_NAMESPACE, TEMPORAL_DEPLOYMENT_NAME, TEMPORAL_WORKER_BUILD_ID + # TEMPORAL_ADDRESS, TEMPORAL_NAMESPACE, TEMPORAL_DEPLOYMENT_NAME, TEMPORAL_WORKER_BUILD_ID ``` **New Deployment Process:** @@ -620,7 +609,7 @@ status: ``` *Solutions:* -- Check worker logs for connection errors +- Check worker logs for connection/initialization errors - Verify TemporalConnection configuration - Ensure TLS secrets are properly configured - Verify network connectivity to Temporal server From 1358194c77d983bcebc3e9edd7e79c3b8aa9f96b Mon Sep 17 00:00:00 2001 From: Rob Holland Date: Sun, 7 Sep 2025 11:35:01 +0100 Subject: [PATCH 21/25] Recommend progressive rather than manual. --- docs/migration-guide.md | 46 +++++++++++++++++++++++++++-------------- 1 file changed, 30 insertions(+), 16 deletions(-) diff --git a/docs/migration-guide.md b/docs/migration-guide.md index 39a86df7..3e89904a 100644 --- a/docs/migration-guide.md +++ b/docs/migration-guide.md @@ -129,11 +129,11 @@ spec: The migration from unversioned to versioned workflows requires careful planning to avoid disrupting running workflows. The key is to transition gradually while maintaining workflow continuity. **Key Principles:** -- **Start with Manual strategy** - Prevents automatic promotions during initial setup +- **Start with Progressive strategy with conservative settings** - Experience the controller's main value while maintaining safety - **Migrate one worker deployment at a time** - Reduces risk and allows learning - **Test thoroughly in non-production** - Validate the approach before production migration - **Preserve running workflows** - Ensure in-flight workflows complete successfully -- **Enable automation gradually** - Move to Progressive rollouts only after validation +- **Use very conservative ramp percentages initially** - Start with 1-5% ramps to minimize risk ### Migration Phases @@ -251,7 +251,7 @@ spec: value: "production" ``` -**New TemporalWorkerDeployment custom resource (IMPORTANT: Use Manual strategy initially):** +**New TemporalWorkerDeployment custom resource (IMPORTANT: Use Progressive strategy with conservative settings initially):** ```yaml apiVersion: temporal.io/v1alpha1 kind: TemporalWorkerDeployment @@ -264,9 +264,16 @@ spec: workerOptions: connection: production-temporal temporalNamespace: production - # CRITICAL: Use Manual during initial migration + # Start with Progressive strategy using conservative ramp percentages rollout: - strategy: Manual + strategy: Progressive + steps: + - rampPercentage: 1 + pauseDuration: 10m + - rampPercentage: 5 + pauseDuration: 15m + - rampPercentage: 25 + pauseDuration: 20m sunset: scaledownDelay: 30m deleteDelay: 2h @@ -352,12 +359,12 @@ Now you need to carefully transition from your old unversioned deployment to the kubectl delete deployment payment-processor ``` -### Step 7: Enable Automated Rollouts +### Step 7: Optimize Rollout Settings -Once the initial migration is complete and validated, enable automated rollouts for future deployments: +Once the initial migration is complete and validated, you can optimize rollout settings for faster deployments: ```bash -# Update the TemporalWorkerDeployment custom resource to use Progressive strategy +# Update the TemporalWorkerDeployment custom resource to use faster Progressive rollout kubectl patch temporalworkerdeployment payment-processor --type='merge' -p='{ "spec": { "rollout": { @@ -420,11 +427,12 @@ Deploy a new version to validate the entire flow: See the [Concepts](concepts.md) document for detailed explanations of rollout strategies. Here are the basic configuration patterns: -**Manual Strategy (Default Behavior):** +**Manual Strategy (Advanced Use Cases):** ```yaml rollout: strategy: Manual # Requires manual intervention to promote versions +# Only recommended for special cases requiring full manual control ``` **Immediate Rollout:** @@ -434,17 +442,23 @@ rollout: # Immediately routes 100% traffic to new version when healthy ``` -**Progressive Rollout:** +**Progressive Rollout (Recommended):** ```yaml rollout: strategy: Progressive steps: + # Conservative initial migration settings - rampPercentage: 1 - pauseDuration: 5m - - rampPercentage: 10 pauseDuration: 10m - - rampPercentage: 50 + - rampPercentage: 5 pauseDuration: 15m + - rampPercentage: 25 + pauseDuration: 20m + # Can be optimized to faster ramps after validation: + # - rampPercentage: 10 + # pauseDuration: 5m + # - rampPercentage: 50 + # pauseDuration: 10m gate: workflowType: "HealthCheck" # Optional validation workflow ``` @@ -571,7 +585,7 @@ spec: connection: staging-temporal temporalNamespace: staging rollout: - strategy: AllAtOnce # Immediate rollout for faster iteration + strategy: AllAtOnce # Faster rollout ``` ### Pattern 3: Gradual Team Migration @@ -684,10 +698,10 @@ kubectl get events --field-selector involvedObject.kind=TemporalWorkerDeployment ## Migration Summary 🎯 **Key principles for unversioned to versioned migration**: -- **Start with Manual strategy** to maintain control during initial migration +- **Start with Progressive strategy using conservative ramp percentages** to experience the controller's value while maintaining safety - **Run both unversioned and versioned workers** during transition period - **Wait for existing workflows to complete** before scaling down unversioned workers -- **Enable Progressive rollouts** only after validating the migration process +- **Begin with very conservative ramp percentages (1-5%)** and optimize after validating the migration process - **Migrate one service at a time** to reduce risk and enable learning See the [Concepts](concepts.md) document for detailed explanations of the resource relationships and terminology. From 0d05f12035b41b362acbeddc766996a0c904b8d9 Mon Sep 17 00:00:00 2001 From: Rob Holland Date: Sun, 7 Sep 2025 12:28:42 +0100 Subject: [PATCH 22/25] Split configuration reference into it's own file. --- docs/concepts.md | 3 +- docs/configuration.md | 341 ++++++++++++++++++++++++++++++++++++++++ docs/migration-guide.md | 57 ++----- 3 files changed, 353 insertions(+), 48 deletions(-) create mode 100644 docs/configuration.md diff --git a/docs/concepts.md b/docs/concepts.md index 9c804922..e4198a3d 100644 --- a/docs/concepts.md +++ b/docs/concepts.md @@ -51,7 +51,7 @@ The version is registered with Temporal but isn't automatically receiving any ne The version is receiving a percentage of new workflow executions. If managed by a Progressive rollout, the percentage gradually increases according to the configured rollout steps. If the rollout is Manual, the user is responsible for setting the ramp percentage and ramping version. ### Current -The version is receiving 100% of new workflow executions. This is the "stable" version that handles all new workflows and all existing AutoUpgrade workflows running on the task queues in this Worker Deployment. +The current version receives all new workflow executions except those routed to the Ramping version. This is the "stable" version that handles the majority of traffic - all new workflows not being ramped to a newer version, plus all existing AutoUpgrade workflows running on the task queues in this Worker Deployment. ### Draining The version is no longer receiving new workflow executions but may still be processing existing workflows. @@ -113,6 +113,7 @@ The controller automatically sets these environment variables for all worker pod ### TEMPORAL_ADDRESS The host and port of the Temporal server, derived from the `TemporalConnection` custom resource. The worker must connect to this Temporal endpoint, but since this is user provided and not controller generated, the user does not necessarily need to access this env var to get that endpoint if it already knows the endpoint another way. + ### TEMPORAL_NAMESPACE The Temporal namespace the worker should connect to, from `spec.workerOptions.temporalNamespace`. The worker must connect to this Temporal namespace, but since this is user provided and not controller generated, the user does not necessarily need to access this env var to get that namespace if it already knows the namespace another way. diff --git a/docs/configuration.md b/docs/configuration.md new file mode 100644 index 00000000..dceca868 --- /dev/null +++ b/docs/configuration.md @@ -0,0 +1,341 @@ +# Configuration Reference + +This document provides comprehensive configuration options for the Temporal Worker Controller. + +## Table of Contents + +1. [Rollout Strategies](#rollout-strategies) +2. [Sunset Configuration](#sunset-configuration) +3. [Worker Options](#worker-options) +4. [Gate Configuration](#gate-configuration) +5. [Advanced Configuration](#advanced-configuration) + +## Rollout Strategies + +See the [Concepts](concepts.md) document for detailed explanations of rollout strategies. Here are the basic configuration patterns: + +### Manual Strategy (Advanced Use Cases) + +```yaml +rollout: + strategy: Manual +# Requires manual intervention to promote versions +# Only recommended for special cases requiring full manual control +``` + +Use Manual strategy when you need complete control over version promotions, such as: +- Complex validation processes that require human approval +- Coordinated deployments across multiple services +- Special compliance or regulatory requirements + +### AllAtOnce Strategy + +```yaml +rollout: + strategy: AllAtOnce +# Immediately routes 100% traffic to new version when healthy +``` + +Use AllAtOnce strategy for: +- Low-risk environments (development, staging) +- Services where fast deployment is more important than gradual rollout +- Background processing workers with minimal user impact + +### Progressive Strategy (Recommended) + +```yaml +rollout: + strategy: Progressive + steps: + # Conservative initial migration settings + - rampPercentage: 1 + pauseDuration: 10m + - rampPercentage: 5 + pauseDuration: 15m + - rampPercentage: 25 + pauseDuration: 20m + # Can be optimized to faster ramps after validation: + # - rampPercentage: 10 + # pauseDuration: 5m + # - rampPercentage: 50 + # pauseDuration: 10m + gate: + workflowType: "HealthCheck" # Optional validation workflow +``` + +Progressive strategy is recommended for most production deployments because it: +- Minimizes risk by gradually increasing traffic to new versions +- Provides automatic pause points for validation +- Allows for quick rollback if issues are detected +- Can be tuned for different risk tolerances + +#### Progressive Rollout Examples + +**Conservative Production Rollout:** +```yaml +rollout: + strategy: Progressive + steps: + - rampPercentage: 1 + pauseDuration: 15m + - rampPercentage: 5 + pauseDuration: 30m + - rampPercentage: 25 + pauseDuration: 45m + - rampPercentage: 75 + pauseDuration: 30m + gate: + workflowType: "ProductionHealthCheck" +``` + +**Faster Development Environment:** +```yaml +rollout: + strategy: Progressive + steps: + - rampPercentage: 25 + pauseDuration: 2m + - rampPercentage: 75 + pauseDuration: 3m +``` + +**Canary-Style Rollout:** +```yaml +rollout: + strategy: Progressive + steps: + - rampPercentage: 1 + pauseDuration: 30m # Long canary period + - rampPercentage: 100 + pauseDuration: 0s # Full rollout after canary validation +``` + +## Sunset Configuration + +Controls how old versions are scaled down and cleaned up after they're no longer receiving new traffic: + +```yaml +sunset: + scaledownDelay: 1h # Wait 1 hour after draining before scaling to 0 + deleteDelay: 24h # Wait 24 hours after draining before deleting +``` + +### Sunset Configuration Examples + +**Conservative Cleanup (Recommended for Production):** +```yaml +sunset: + scaledownDelay: 2h # Allow time for workflows to complete + deleteDelay: 48h # Keep resources for debugging/rollback +``` + +**Aggressive Cleanup (Development/Staging):** +```yaml +sunset: + scaledownDelay: 15m # Quick scaledown + deleteDelay: 2h # Minimal retention +``` + +**Long-Running Workflow Environment:** +```yaml +sunset: + scaledownDelay: 24h # Long-running workflows need time + deleteDelay: 168h # 1 week retention for analysis +``` + +## Worker Options + +Configure how workers connect to Temporal: + +```yaml +workerOptions: + connection: production-temporal # Reference to TemporalConnection + temporalNamespace: production # Temporal namespace + taskQueues: # Optional: explicit task queue list + - order-processing + - payment-processing +``` + +### Connection Configuration + +Reference a `TemporalConnection` resource that defines server details: + +```yaml +apiVersion: temporal.io/v1alpha1 +kind: TemporalConnection +metadata: + name: production-temporal +spec: + hostPort: "production.abc123.tmprl.cloud:7233" + mutualTLSSecret: temporal-cloud-mtls # Optional: for mTLS +``` + +## Gate Configuration + +Optional validation workflow that must succeed before proceeding with rollout: + +```yaml +rollout: + strategy: Progressive + steps: + - rampPercentage: 10 + pauseDuration: 5m + gate: + workflowType: "HealthCheck" + input: | + { + "version": "{{.Version}}", + "environment": "production" + } + timeout: 300s +``` + +### Gate Workflow Examples + +**Simple Health Check:** +```yaml +gate: + workflowType: "HealthCheck" + timeout: 60s +``` + +**Complex Validation with Input:** +```yaml +gate: + workflowType: "ValidationWorkflow" + input: | + { + "deploymentName": "{{.DeploymentName}}", + "buildId": "{{.BuildId}}", + "rampPercentage": {{.RampPercentage}}, + "environment": "{{.Environment}}" + } + timeout: 600s +``` + +## Advanced Configuration + +### Resource Limits and Requests + +Configure Kubernetes resource limits for worker pods: + +```yaml +template: + spec: + containers: + - name: worker + image: my-worker:latest + resources: + requests: + memory: "512Mi" + cpu: "250m" + limits: + memory: "1Gi" + cpu: "500m" +``` + +### Environment-Specific Configurations + +**Production Configuration:** +```yaml +apiVersion: temporal.io/v1alpha1 +kind: TemporalWorkerDeployment +metadata: + name: order-processor + namespace: production +spec: + replicas: 5 + workerOptions: + connection: production-temporal + temporalNamespace: production + rollout: + strategy: Progressive + steps: + - rampPercentage: 1 + pauseDuration: 15m + - rampPercentage: 10 + pauseDuration: 30m + - rampPercentage: 50 + pauseDuration: 45m + gate: + workflowType: "ProductionHealthCheck" + timeout: 300s + sunset: + scaledownDelay: 2h + deleteDelay: 48h +``` + +**Staging Configuration:** +```yaml +apiVersion: temporal.io/v1alpha1 +kind: TemporalWorkerDeployment +metadata: + name: order-processor + namespace: staging +spec: + replicas: 2 + workerOptions: + connection: staging-temporal + temporalNamespace: staging + rollout: + strategy: Progressive + steps: + - rampPercentage: 25 + pauseDuration: 5m + - rampPercentage: 100 + pauseDuration: 0s + sunset: + scaledownDelay: 30m + deleteDelay: 4h +``` + +### Labels and Annotations + +Add custom labels and annotations to managed resources: + +```yaml +template: + metadata: + labels: + app: my-worker + team: platform + environment: production + annotations: + deployment.kubernetes.io/revision: "1" + spec: + # ... container spec +``` + +### Multiple Task Queues + +Configure workers that handle multiple task queues: + +```yaml +workerOptions: + connection: production-temporal + temporalNamespace: production + taskQueues: + - order-processing + - payment-processing + - notification-sending +``` + +## Configuration Validation + +The controller validates configuration and will report errors in the resource status: + +```bash +# Check for configuration errors +kubectl describe temporalworkerdeployment my-worker + +# Look for validation errors in status +kubectl get temporalworkerdeployment my-worker -o yaml +``` + +Common validation errors: +- Invalid ramp percentages (must be 1-100) +- Invalid duration formats (use Go duration format: "5m", "1h", "30s") +- Missing required fields (connection, temporalNamespace) +- Invalid strategy combinations + +For more examples and patterns, see the [Migration Guide](migration-guide.md) and [Concepts](concepts.md) documentation. diff --git a/docs/migration-guide.md b/docs/migration-guide.md index 3e89904a..d65b10ab 100644 --- a/docs/migration-guide.md +++ b/docs/migration-guide.md @@ -18,6 +18,8 @@ This guide uses specific terminology that is defined in the [Concepts](concepts. 8. [Common Migration Patterns](#common-migration-patterns) 9. [Troubleshooting](#troubleshooting) +For detailed configuration options, see the [Configuration Reference](configuration.md) document. + ## Why Migrate to Versioned Workflows If you're currently running unversioned Temporal workflows, you may be experiencing challenges with deployments. Versioned workflows with the Temporal Worker Controller can solve these problems. For details on the benefits of Worker Versioning, see the [Temporal documentation](https://docs.temporal.io/production-deployment/worker-deployments/worker-versioning). @@ -229,7 +231,7 @@ worker := worker.New(client, "my-task-queue", workerOptions) ### Step 4: Create Your First TemporalWorkerDeployment -Start with your lowest-risk worker. Convert your existing unversioned Deployment to a `TemporalWorkerDeployment` custom resource: +Start with your lowest-risk worker. Make a copy of your existing unversioned Deployment and convert it to a `TemporalWorkerDeployment` custom resource: **Existing Unversioned Deployment:** ```yaml @@ -423,53 +425,16 @@ Deploy a new version to validate the entire flow: ## Configuration Reference -### Rollout Strategies - -See the [Concepts](concepts.md) document for detailed explanations of rollout strategies. Here are the basic configuration patterns: - -**Manual Strategy (Advanced Use Cases):** -```yaml -rollout: - strategy: Manual -# Requires manual intervention to promote versions -# Only recommended for special cases requiring full manual control -``` - -**Immediate Rollout:** -```yaml -rollout: - strategy: AllAtOnce -# Immediately routes 100% traffic to new version when healthy -``` +For comprehensive configuration options including rollout strategies, sunset configuration, worker options, and advanced settings, see the [Configuration Reference](configuration.md) document. -**Progressive Rollout (Recommended):** -```yaml -rollout: - strategy: Progressive - steps: - # Conservative initial migration settings - - rampPercentage: 1 - pauseDuration: 10m - - rampPercentage: 5 - pauseDuration: 15m - - rampPercentage: 25 - pauseDuration: 20m - # Can be optimized to faster ramps after validation: - # - rampPercentage: 10 - # pauseDuration: 5m - # - rampPercentage: 50 - # pauseDuration: 10m - gate: - workflowType: "HealthCheck" # Optional validation workflow -``` +Key configuration patterns for migration: -### Sunset Configuration +- **Progressive Strategy (Recommended)**: Start with conservative ramp percentages (1%, 5%, 25%) for initial migrations +- **AllAtOnce Strategy**: For development/staging environments where speed is preferred over gradual rollout +- **Manual Strategy**: Only for advanced use cases requiring full manual control +- **Sunset Configuration**: Configure delays for scaling down and deleting old versions -```yaml -sunset: - scaledownDelay: 1h # Wait 1 hour after draining before scaling to 0 - deleteDelay: 24h # Wait 24 hours after draining before deleting -``` +See [Configuration Reference](configuration.md) for detailed examples and advanced configuration options. ## Testing and Validation @@ -607,8 +572,6 @@ Migrate teams/services based on their readiness and risk tolerance: - Payment processing - Critical business operations - - ## Troubleshooting ### Common Issues From 07313eb14b71183daa575a1a9de944e7305733a4 Mon Sep 17 00:00:00 2001 From: Rob Holland Date: Sun, 7 Sep 2025 12:55:13 +0100 Subject: [PATCH 23/25] Config reference iteration. --- docs/configuration.md | 36 ------------------------------------ 1 file changed, 36 deletions(-) diff --git a/docs/configuration.md b/docs/configuration.md index dceca868..2e4cfef4 100644 --- a/docs/configuration.md +++ b/docs/configuration.md @@ -215,25 +215,6 @@ gate: ## Advanced Configuration -### Resource Limits and Requests - -Configure Kubernetes resource limits for worker pods: - -```yaml -template: - spec: - containers: - - name: worker - image: my-worker:latest - resources: - requests: - memory: "512Mi" - cpu: "250m" - limits: - memory: "1Gi" - cpu: "500m" -``` - ### Environment-Specific Configurations **Production Configuration:** @@ -289,23 +270,6 @@ spec: deleteDelay: 4h ``` -### Labels and Annotations - -Add custom labels and annotations to managed resources: - -```yaml -template: - metadata: - labels: - app: my-worker - team: platform - environment: production - annotations: - deployment.kubernetes.io/revision: "1" - spec: - # ... container spec -``` - ### Multiple Task Queues Configure workers that handle multiple task queues: From 443575f882d6aafebc0cd6dc91f67108c0558de4 Mon Sep 17 00:00:00 2001 From: Rob Holland Date: Sun, 7 Sep 2025 12:59:54 +0100 Subject: [PATCH 24/25] Add TLS to example custom resource. --- docs/migration-guide.md | 4 ++++ 1 file changed, 4 insertions(+) diff --git a/docs/migration-guide.md b/docs/migration-guide.md index d65b10ab..541b385b 100644 --- a/docs/migration-guide.md +++ b/docs/migration-guide.md @@ -68,6 +68,10 @@ spec: value: "production.tmprl.cloud:7233" - name: TEMPORAL_NAMESPACE value: "production" + - name: TEMPORAL_TLS_CLIENT_CERT_PATH + value: "/path/to/temporal.cert" + - name: TEMPORAL_TLS_CLIENT_KEY_PATH + value: "/path/to/temporal.key" # No versioning environment variables ``` From 31ce01164ccc700c9dd12bdae7e928dbe528378c Mon Sep 17 00:00:00 2001 From: Carly de Frondeville Date: Mon, 8 Sep 2025 14:21:12 -0700 Subject: [PATCH 25/25] Recommend conservative Progressive instead of Manual initial rollout Clean up last few places that recommend Manual --- docs/concepts.md | 3 +-- docs/migration-guide.md | 7 +++---- 2 files changed, 4 insertions(+), 6 deletions(-) diff --git a/docs/concepts.md b/docs/concepts.md index e4198a3d..c793cb5e 100644 --- a/docs/concepts.md +++ b/docs/concepts.md @@ -65,8 +65,7 @@ All Pinned workflows on this version have completed. The version is ready for cl Requires explicit human intervention to promote versions. New versions remain in the `Inactive` state until manually promoted. **Use cases:** -- During migration from manual deployment systems -- Testing and validation scenarios +- Advanced deployment scenarios that are not supported by the other strategies (eg. user wants to do custom testing and validation before making changes to how workflow traffic is routed) ### AllAtOnce Strategy Immediately routes 100% of new workflow executions to the target version once it's healthy and registered. diff --git a/docs/migration-guide.md b/docs/migration-guide.md index 541b385b..b4563ce8 100644 --- a/docs/migration-guide.md +++ b/docs/migration-guide.md @@ -151,15 +151,14 @@ The migration from unversioned to versioned workflows requires careful planning #### Phase 2: Initial Migration 1. **Choose lowest-risk worker** to migrate first -2. **Create `TemporalWorkerDeployment` custom resource** with Manual strategy +2. **Create `TemporalWorkerDeployment` custom resource** with a Progressive strategy (conservative intervals recommended) 3. **Validate controller management** works correctly 4. **Update deployment pipeline** for this worker #### Phase 3: Gradual Rollout 1. **Migrate remaining workers** one at a time -2. **Enable Progressive rollouts** for validated workers -3. **Monitor and tune** rollout configurations -4. **Train team** on new deployment process +2. **Monitor and tune** rollout configurations +3. **Train team** on new deployment process ### Recommended Migration Order