diff --git a/docs/evaluate/development-production-features/multi-tenant.mdx b/docs/evaluate/development-production-features/multi-tenant.mdx index 4e86735040..9bbbd5518d 100644 --- a/docs/evaluate/development-production-features/multi-tenant.mdx +++ b/docs/evaluate/development-production-features/multi-tenant.mdx @@ -1,35 +1,48 @@ --- id: multi-tenancy title: Multi-tenancy - Temporal feature -description: Learn about Temporal Cloud's multi-tenant architecture and how it enhances scalability, efficiency, and cost-effectiveness. +description: Learn about Temporal Cloud's multi-tenant architecture and how to build multi-tenant applications using Temporal. sidebar_label: Multi-tenancy tags: - Temporal Cloud +- Multitenancy keywords: - multi-tenant - Temporal Cloud -- cloud architecture -- scalability -- cost-effectiveness -- noisy neighbor -- database performance -- high throughput +- namespace isolation +- multi-tenant applications +- tenant isolation --- import { RelatedReadContainer, RelatedReadItem } from '@site/src/components'; -A Namespace is a unit of isolation within the Temporal Platform -- but even a single Namespace is still multi-tenant. -Multi-tenancy ensures extra capacity is available for all customers during traffic spikes. +Multi-tenancy in Temporal operates at two levels: -However, multi-tenancy can also presents the challenge of "noisy neighbors", where high-traffic tenants consume excess resources, causing slower performance for other tenants. -This is a common problem for database scaling. +## Namespace isolation -Temporal's write-heavy workload, where changes in execution state are constantly written to the persistence layer, demands a database that supports reliably high throughput with low latency for multiple customers, concurrently and fairly. +**How Temporal Cloud isolates your namespaces from each other.** -With Temporal Cloud, customers pay for consumption instead of entire sets of hardware, providing a cost-effective solution. -Temporal Cloud's architecture scales to handle multiple tenants efficiently. +The nature of Temporal workloads – write- and read-heavy, with strong consistency and latency requirements – can make multi-tenant operations challenging. Without careful design, 'noisy neighbor' problems arise when one customer's traffic spike degrades performance for others. + +Temporal Cloud is a multi-tenant service that isolates each namespace to provide cost-effective, consumption-based pricing while maintaining enterprise-grade security, reliability, and capacity to handle traffic spikes. Each namespace has: + +- **Independent authentication** via [API keys](/cloud/api-keys) or [mTLS certificates](/cloud/certificates) +- **Separate [rate limits](/cloud/limits#namespace-level)** to prevent noisy neighbor problems +- **Controlled inter-namespace communication** only through [Nexus](/evaluate/nexus) - - + + + +## Application multi-tenancy + +**How to build multi-tenant applications using Temporal.** + +Many organizations use Temporal Cloud to power their own multi-tenant SaaS application. Temporal provides patterns for isolating your tenants using: + +- **Task queues per tenant** for workload isolation +- **Worker design patterns** for efficient resource utilization +- **Search attributes** for tenant-specific queries + +The [best practices guide](/production-deployment/multi-tenant-patterns) includes detailed recommendations and factors to consider when architecting your multi-tenant application on Temporal Cloud. diff --git a/docs/evaluate/temporal-cloud/security.mdx b/docs/evaluate/temporal-cloud/security.mdx index 85f2962a10..16f7233106 100644 --- a/docs/evaluate/temporal-cloud/security.mdx +++ b/docs/evaluate/temporal-cloud/security.mdx @@ -10,6 +10,7 @@ keywords: - security - temporal cloud tags: + - Multitenancy - Security - Temporal Cloud --- @@ -50,9 +51,34 @@ By deploying a [Codec Server](/production-deployment/data-encryption) you can se ### Namespace isolation The base unit of isolation in a Temporal environment is a [Namespace](/namespaces). -Each Temporal Cloud account can have multiple Namespaces. -A Namespace (regardless of account) cannot interact with other Namespaces. -Each Namespace is available through a secure gRPC (mTLS) endpoint and an HTTPS (TLS) endpoint. +Each Temporal Cloud account can have multiple Namespaces, and each Namespace is isolated to ensure your workloads remain secure and performant. + +#### Authentication + +Each Namespace is secured with your choice of authentication method: +- **mTLS certificates** - Namespace-specific X.509 certificates for mutual TLS authentication +- **API keys** - Namespace-scoped API keys for authentication + +See [API Keys](/cloud/api-keys) and [mTLS Certificates](/cloud/certificates) for more details on configuring authentication for your Namespace. + +#### Rate limiting + +Temporal Cloud protects each Namespace with separate rate limits to prevent noisy neighbor problems: +- **Actions Per Second (APS)** - Limits the rate of [actions](/best-practices/managing-aps-limits) performed in your Workflows +- **Operations Per Second (OPS)** - Limits the rate of all [operations](/references/operation-list) that create load on Temporal Server + +These per-Namespace rate limits ensure that one Namespace experiencing a traffic spike cannot impact the performance or reliability of other Namespaces, whether those Namespaces belong to a single Temporal Cloud account or separate ones. + +See [Rate limiting](/cloud/limits) for more information about Temporal Cloud limits, and [Monitoring trends against limits](/production-deployment/cloud/service-health#rps-aps-rate-limits) for monitoring best practices. + +#### Inter-Namespace communication + +Namespaces are isolated by default. The only way for Workflows in one Namespace to interact with Workflows in another Namespace is through [Temporal Nexus](/nexus), which provides controlled, secure cross-Namespace communication via Nexus Endpoints. + +See [Nexus Security](/nexus/security) for details on how Nexus enables secure inter-Namespace communication. + +#### Logical segregation + Temporal Cloud is a multi-tenant service. Namespaces in the same environment are logically segregated. Namespaces do not share data processing or data storage across regional boundaries. diff --git a/docs/production-deployment/multi-tenant-patterns.mdx b/docs/production-deployment/multi-tenant-patterns.mdx new file mode 100644 index 0000000000..85037b0b85 --- /dev/null +++ b/docs/production-deployment/multi-tenant-patterns.mdx @@ -0,0 +1,282 @@ +--- +id: multi-tenant-patterns +title: Multi-tenant application patterns +sidebar_label: Multi-tenant patterns +description: Learn how to build multi-tenant applications using Temporal with task queue isolation patterns, worker design, and best practices. +slug: /production-deployment/multi-tenant-patterns +toc_max_heading_level: 4 +keywords: + - multi-tenant + - task queues + - worker patterns + - SaaS +tags: + - Multitenancy + - Best Practices +--- + +import { RelatedReadContainer, RelatedReadItem } from '@site/src/components'; + +**How do I build a multi-tenant application using Temporal?** + +Many SaaS providers and large enterprise platform teams use a single Temporal [Namespace](/namespaces) with [per-tenant Task Queues](#1-task-queues-per-tenant-recommended) to power their multi-tenant applications. This approach maximizes resource efficiency while maintaining logical separation between tenants. + +This guide covers architectural patterns, design considerations, and practical examples for building multi-tenant applications with Temporal. + +## Architectural principles + +When designing a multi-tenant Temporal application, follow these principles: + +- **Define your tenant model** - Determine what constitutes a tenant in your business (customers, pricing tiers, teams, etc.) +- **Prefer simplicity** - Start with the simplest pattern that meets your needs +- **Understand Temporal limits** - Design within the constraints of your Temporal deployment +- **Test at scale** - Performance testing must drive your capacity decisions +- **Plan for growth** - Consider how you'll onboard new tenants and scale workers + +## Architectural patterns + +There are three main patterns for multi-tenant applications in Temporal, listed from most to least recommended: + +### 1. Task queues per tenant (Recommended) + +**Use different [Task Queues](/task-queue) for each tenant's [Workflows](/workflows) and [Activities](/activities).** + +This is the recommended pattern for most use cases. Each tenant gets dedicated Task Queue(s), with [Workers](/workers) polling multiple tenant Task Queues in a single process. + +**Pros:** +- Strong isolation between tenants +- Efficient resource utilization +- Flexible worker scaling +- Easy to add new tenants +- Can handle thousands of tenants per [Namespace](/namespaces) + +**Cons:** +- Requires worker configuration management +- Potential for uneven resource distribution +- Need to prevent "noisy neighbor" issues at the worker level + + + + + +### 2. Shared Workflow Task Queues, separate Activity Task Queues + +**Share [Workflow Task Queues](/task-queue) but use different [Activity Task Queues](/task-queue) per tenant.** + +Use this pattern when [Workflows](/workflows) are lightweight but [Activities](/activities) have heavy resource requirements or external dependencies that need isolation. + +**Pros:** +- Easier worker management than full isolation +- Activity-level tenant isolation +- Good for compute-intensive Activities + +**Cons:** +- Less isolation than pattern #1 +- Workflow visibility is shared +- More complex to reason about + +### 3. Namespace per tenant + +**Use a separate [Namespace](/namespaces) for each tenant.** + +Only practical for a small number (< 50) of high-value tenants due to operational overhead. + +**Pros:** +- Complete isolation between tenants +- Per-tenant rate limiting +- Maximum security + +**Cons:** +- Higher operational overhead +- Credential and connectivity management per [Namespace](/namespaces) +- Requires more [Workers](/workers) (minimum 2 per Namespace for HA) +- Expensive at scale + + + + + +## Task Queue isolation pattern + +This section details the recommended pattern for most multi-tenant applications. + +### Worker design + +When a [Worker](/workers) starts up: + +1. **Load tenant configuration** - Retrieve the list of tenants this Worker should handle (from config file, API, or database) +2. **Create [Task Queues](/task-queue)** - For each tenant, generate a unique Task Queue name (e.g., `customer-{tenant-id}`) +3. **Register [Workflows](/workflows) and [Activities](/activities)** - Register your Workflow and Activity implementations once, passing the tenant-specific Task Queue name +4. **Poll multiple Task Queues** - A single Worker process polls all assigned tenant Task Queues + +```go +// Example: Go worker polling multiple tenant Task Queues +for _, tenant := range assignedTenants { + taskQueue := fmt.Sprintf("customer-%s", tenant.ID) + + worker := worker.New(client, taskQueue, worker.Options{}) + worker.RegisterWorkflow(YourWorkflow) + worker.RegisterActivity(YourActivity) +} +``` + +### Routing requests to Task Queues + +Your application needs to route [Workflow](/workflows) starts and other operations to the correct tenant [Task Queue](/task-queue): + +```go +// Example: Starting a Workflow for a specific tenant +taskQueue := fmt.Sprintf("customer-%s", tenantID) +workflowOptions := client.StartWorkflowOptions{ + ID: workflowID, + TaskQueue: taskQueue, +} +``` + +Consider creating an API or service that: +- Maps tenant IDs to Task Queue names +- Tracks which [Workers](/workers) are handling which tenants +- Allows both your application and Workers to read the mappings of 1. Tenant IDs to Task Queues and 2. Workers to tenants. + +### Capacity planning + +Key questions to answer through performance testing: + +**[Namespace](/namespaces) capacity:** +- How many concurrent [Task Queue](/task-queue) pollers can your Namespace support? +- What are your [Actions Per Second (APS)](/cloud/limits#actions-per-second) limits? +- What are your [Operations Per Second (OPS)](/references/operation-list) limits? + +**[Worker](/workers) capacity:** +- How many tenants can a single Worker process handle? +- What are the CPU and memory requirements per tenant? +- How many concurrent [Workflow](/workflows) executions per tenant? +- How many concurrent [Activity](/activities) executions per tenant? + +**SDK configuration to tune:** +- `MaxConcurrentWorkflowTaskExecutionSize` +- `MaxConcurrentActivityExecutionSize` +- `MaxConcurrentWorkflowTaskPollers` +- `MaxConcurrentActivityTaskPollers` +- Worker replicas (in Kubernetes deployments) + +### Provisioning new tenants + +Automate tenant onboarding with a Temporal [Workflow](/workflows): + +1. Create a tenant onboarding Workflow that: + - Validates tenant information + - Provisions infrastructure + - Deploys/updates [Worker](/workers) configuration + - Triggers Worker restarts or scaling + - Verifies the tenant is operational + +2. Store tenant-to-Worker mappings in a database or configuration service + +3. Update Worker deployments to pick up new tenant assignments + +## Practical example + +**Scenario:** A SaaS company has 1,000 customers and expects to grow to 5,000 customers over 3 years. They have 2 [Workflows](/workflows) and ~25 [Activities](/activities) per Workflow. All customers are on the same tier (no segmentation yet). + +### Assumptions + +| Item | Value | +|------|-------| +| Current customers | 1,000 | +| Workflow Task Queues per customer | 1 | +| Activity Task Queues per customer | 1 | +| Max Task Queue pollers per Namespace | 5,000 | +| SDK concurrent Workflow task pollers | 5 | +| SDK concurrent Activity task pollers | 5 | +| Max concurrent Workflow executions | 200 | +| Max concurrent Activity executions | 200 | + +### Capacity calculations + +**[Task Queue](/task-queue) poller limits:** +- Each [Worker](/workers) uses 10 pollers per tenant (5 Workflow + 5 Activity) +- Maximum Workers in [Namespace](/namespaces): 5,000 pollers ÷ 10 = **500 Workers** + +**Worker capacity:** +- Each Worker can theoretically handle 200 [Workflows](/workflows) and 200 [Activities](/activities) concurrently +- Conservative estimate: **250 tenants per Worker** (accounting for overhead) +- For 1,000 customers: **4 Workers minimum** (plus replicas for HA) +- For 5,000 customers: **20 Workers minimum** (plus replicas for HA) + +**Namespace capacity:** +- At 250 tenants per Worker, need 2 Workers per group of tenants (for HA) +- Maximum tenants in Namespace: (500 Workers ÷ 2) × 250 = **62,500 tenants** + +:::note +These are theoretical calculations based on SDK defaults. **Always perform load testing** to determine actual capacity for your specific workload. Monitor CPU, memory, and Temporal metrics during testing. + +While testing, also pay attention to your [metrics capacity and cardinality](/production-deployment/cloud/metrics/openmetrics/api-reference#managing-high-cardinality). +::: + +### Worker assignment strategies + +**Option 1: Static configuration** +- Each [Worker](/workers) reads a config file listing assigned tenant IDs +- Simple to implement +- Requires deployment to add tenants + +**Option 2: Dynamic API** +- Workers call an API on startup to get assigned tenants +- Workers identified by static ID (1 to N) +- API returns tenant list based on Worker ID +- More flexible, no deployment needed for new tenants + +## Best practices + +### Monitoring + +Track these [metrics](/references/sdk-metrics) per tenant: +- [Workflow completion](/production-deployment/cloud/metrics/openmetrics/metrics-reference#workflow-completion-metrics) rates +- [Activity execution](/production-deployment/cloud/metrics/openmetrics/metrics-reference#task-queue-metrics) rates +- [Task Queue backlog](/production-deployment/cloud/metrics/openmetrics/metrics-reference#task-queue-metrics) +- [Worker resource utilization](/references/sdk-metrics#worker_task_slots_used) +- [Workflow failure rates](/encyclopedia/detecting-workflow-failures) + +### Handling noisy neighbors + +Even with [Task Queue](/task-queue) isolation, monitor for tenants that: +- Generate excessive load +- Have high failure rates +- Cause [Worker](/workers) resource exhaustion + +Strategies: +- Implement per-tenant rate limiting in your application +- Move problematic tenants to dedicated Workers +- Use [Workflow](/workflows)/[Activity](/activities) timeouts aggressively + +### Tenant lifecycle + +Plan for: +- **Onboarding** - Automated provisioning [Workflow](/workflows) +- **Scaling** - When to add new [Workers](/workers) for growing tenants +- **Offboarding** - Graceful tenant removal and data cleanup +- **Rebalancing** - Redistributing tenants across Workers + +### Search Attributes + +Use [Search Attributes](/search-attribute) to enable tenant-scoped queries: +```go +// Add tenant ID as a Search Attribute +searchAttributes := map[string]interface{}{ + "TenantId": tenantID, +} +``` + +This allows filtering [Workflows](/workflows) by tenant in the UI and SDK: +```sql +TenantId = 'customer-123' AND ExecutionStatus = 'Running' +``` + +## Related resources + + + + + + diff --git a/sidebars.js b/sidebars.js index 8e3fe05780..0db50a45b8 100644 --- a/sidebars.js +++ b/sidebars.js @@ -633,6 +633,7 @@ module.exports = { 'best-practices/cloud-access-control', 'best-practices/security-controls', 'best-practices/worker', + 'production-deployment/multi-tenant-patterns', ], }, {