Skip to content

Commit 3eb7235

Browse files
authored
Merge branch 'main' into af-3393-ai-agents-optimize
2 parents ada3cb8 + 226566f commit 3eb7235

File tree

29 files changed

+1263
-372
lines changed

29 files changed

+1263
-372
lines changed

.github/renovate.json5

Lines changed: 6 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -60,5 +60,11 @@
6060
groupName: "node",
6161
automerge: true,
6262
},
63+
{
64+
// Automerge minor and patch updates of Maven/Java dependencies
65+
matchManagers: ["maven"],
66+
matchUpdateTypes: ["patch", "minor"],
67+
automerge: true,
68+
},
6369
],
6470
}
Lines changed: 91 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,91 @@
1+
---
2+
id: sizing-benchmarks
3+
title: Run benchmarks
4+
tags:
5+
- Performance
6+
- Hardware
7+
- Sizing
8+
- Benchmarks
9+
description: "Run your own benchmarks to validate Camunda 8 sizing for your specific workload."
10+
---
11+
12+
Run your own benchmarks to validate [Camunda 8 sizing](./sizing-your-environment.md) for your specific workload.
13+
14+
## Reference benchmark scenario
15+
16+
The sizing recommendations for [SaaS](sizing-saas.md) and [Self-Managed](sizing-self-managed.md) are based on a reference benchmark scenario. Your actual workload may differ significantly, so running your own benchmarks is the most reliable way to validate that your chosen configuration meets your needs.
17+
18+
Camunda uses the following realistic benchmark scenario:
19+
20+
- **Process model:** [bankCustomerComplaintDisputeHandling.bpmn](https://github.com/camunda/camunda/blob/main/load-tests/load-tester/src/main/resources/bpmn/realistic/bankCustomerComplaintDisputeHandling.bpmn) (a credit card fraud dispute handling process from the [Camunda Marketplace blueprint](https://marketplace.camunda.com/en-US/apps/449510/credit-card-fraud-dispute-handling)).
21+
- **Payload:** [realisticPayload.json](https://github.com/camunda/camunda/blob/main/load-tests/load-tester/src/main/resources/bpmn/realistic/realisticPayload.json) (~11 KB).
22+
- This setup produces approximately **101 tasks per second at 1 PI/s** due to internal sub-process instantiation (50 sub-process instances per root instance).
23+
24+
:::note
25+
The official sizing numbers on this page are produced using the [load-tester](https://github.com/camunda/camunda/tree/main/load-tests/load-tester) tool from the Camunda monorepo.
26+
:::
27+
28+
## Run your own benchmarks
29+
30+
Use the [Camunda 8 Benchmark project (c8b)](https://github.com/camunda-community-hub/camunda-8-benchmark), a Spring Boot application, to run load tests against your cluster.
31+
32+
### Key features
33+
34+
- Starts process instances at a configurable rate and **automatically adjusts based on backpressure**.
35+
- Completes tasks that appear in the process instances.
36+
- **Bring your own BPMN process model and payload**, which can be provided as URLs, such as GitHub Gists.
37+
- **Automatic job type discovery** from BPMN files.
38+
- Configurable **task completion delay** to simulate real worker behavior.
39+
- Built-in **Prometheus metrics and Grafana dashboards** for observability.
40+
41+
### Quick start
42+
43+
Run the following command against your cluster:
44+
45+
```bash
46+
mvn spring-boot:run
47+
```
48+
49+
With Docker:
50+
51+
```bash
52+
docker run camundacommunityhub/camunda-8-benchmark:main
53+
```
54+
55+
Customize it with your own process and payload:
56+
57+
```bash
58+
benchmark.bpmnResource=url:https://your-gist-url/your-process.bpmn
59+
benchmark.payloadPath=url:https://your-gist-url/your-payload.json
60+
benchmark.processInstanceStartRate=25
61+
benchmark.taskCompletionDelay=200
62+
```
63+
64+
:::important
65+
To run meaningful benchmarks, use a **properly sized environment**. SaaS trial clusters and local developer machines have limited resources and will hit bottlenecks too early. Use either a correctly sized Camunda SaaS cluster (with help from your Camunda representative) or a properly provisioned Self-Managed Kubernetes environment.
66+
:::
67+
68+
## When to benchmark
69+
70+
Running your own benchmarks when:
71+
72+
- Your process models or payload sizes **differ significantly** from the reference scenario.
73+
- **Latency or cycle time requirements** are critical to your use case.
74+
- You are running Optimize with **payloads larger than the reference ~11 KB** or retention periods **exceeding 6 months**. Larger payloads and longer retention amplify Elasticsearch disk consumption and Optimize import times.
75+
- You are **upgrading from a pre-8.8 version** and want to validate resource requirements.
76+
- You are using **RDBMS (PostgreSQL) as secondary storage** and want to validate throughput differences.
77+
78+
## What to measure
79+
80+
When running benchmarks, focus on these key metrics:
81+
82+
- **Sustained throughput (tasks/second):** The rate your cluster can handle continuously without increasing backpressure.
83+
- **Backpressure rate:** Should remain below 10% for sustainable operation.
84+
- **Process instance latency (p99):** End-to-end time from instance creation to completion. Target depends on your SLO.
85+
- **Elasticsearch disk growth rate:** Helps you forecast disk capacity needs.
86+
- **Data availability latency:** The time between an event in the engine and its appearance in Operate/Tasklist.
87+
- Note: to measure this, you have to compare the time from starting an instance and its availability in query APIs using the Orchestration Cluster REST API
88+
- **CPU usage and throttling:** High CPU usage or frequent throttling indicates a need for more CPU resources or additional brokers.
89+
- **Memory usage:** Sustained high memory usage suggests the need for larger memory limits or additional nodes.
90+
91+
<!-- TODO: Define the exact SLO boundary used for "max throughput" in the official benchmark tables (e.g., "max sustainable throughput where backpressure remains below 10% and p99 process duration stays under 1 second"). If the exact boundary is not standardized, document the methodology. -->
Lines changed: 71 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,71 @@
1+
---
2+
id: sizing-saas
3+
title: Size your SaaS cluster
4+
tags:
5+
- Performance
6+
- Hardware
7+
- Sizing
8+
- SaaS
9+
description: "Select the right Camunda 8 SaaS cluster size based on your needs."
10+
---
11+
12+
import Tabs from "@theme/Tabs";
13+
import TabItem from "@theme/TabItem";
14+
15+
Select the right Camunda 8 SaaS cluster size based on your needs. For an overview of the factors that influence sizing, see [Size your environment](./sizing-your-environment.md).
16+
17+
## Determine your cluster size
18+
19+
Camunda 8 defines four [cluster sizes](/components/concepts/clusters.md#cluster-size) (1x, 2x, 3x, and 4x) you can select after choosing your [cluster type](/components/concepts/clusters.md#cluster-type).
20+
21+
To do so, follow these steps:
22+
23+
1. Calculate your throughput and storage requirements using the guidance in [Size your environment](./sizing-your-environment.md).
24+
2. Use the [sizing tables](#sizing-tables) to find the cluster size that meets your needs.
25+
26+
:::note
27+
Contact your Customer Success Manager to increase the cluster size beyond 4x. This requires custom sizing and pricing.
28+
:::
29+
30+
### Sizing tables
31+
32+
| Cluster size | 1x | 2x | 3x | 4x |
33+
| :--------------------------------------------------- | ------------------------------: | ------------------------------: | ------------------------------: | ------------------------------: |
34+
| Max Throughput **Tasks/day** **\*** | 9 M | 18 M | 27 M | 36 M |
35+
| Max Throughput **Tasks/second** **\*** | 100 | 200 | 300 | 400 |
36+
| Max Throughput **Process Instances/second** **\*\*** | 5 | 10 | 15 | 20 |
37+
| Max Total Number of PI stored (in ES) **\*\*\*** | 200 k | 400 k | 600 k | 800 k |
38+
| Approximate resources provisioned **\*\*\*\*** | 11 vCPU, 22 GB mem, 192 GB disk | 22 vCPU, 44 GB mem, 384 GB disk | 33 vCPU, 66 GB mem, 576 GB disk | 44 vCPU, 88 GB mem, 768 GB disk |
39+
40+
<!-- TODO: Validate "with Optimize" numbers against 8.9 benchmarks. The numbers above were measured with Camunda 8.8. Also confirm whether the "max throughput" boundary condition is defined as backpressure < 10% and p99 process duration < 1s, or another SLO. -->
41+
42+
:::note
43+
The numbers in the tables were measured using Camunda 8 (version 8.8), [the benchmark project](https://github.com/camunda-community-hub/camunda-8-benchmark) running on its own Kubernetes cluster, and using a [realistic process](https://github.com/camunda/camunda/blob/main/load-tests/load-tester/src/main/resources/bpmn/realistic/bankCustomerComplaintDisputeHandling.bpmn) with this [payload](https://github.com/camunda/camunda/blob/main/load-tests/load-tester/src/main/resources/bpmn/realistic/reducedPayload.json) (~1.4 KB). To calculate day-based metrics, an equal distribution over 24 hours is assumed.
44+
:::
45+
46+
**\*** Tasks (including service, send, and user tasks, among others) completed per day are the primary metric, as this is easy to measure and strongly influences resource consumption. This number assumes a constant load throughout the day. Tasks/day and Tasks/second are scaled linearly.
47+
48+
**\*\*** Because tasks are the primary resource driver, the number of process instances supported by a cluster is calculated assuming an average of 10 tasks per process. As a customers, you can calculate a more accurate process instance estimate using your anticipated number of tasks per process.
49+
50+
**\*\*\*** Maximum total number of historical process instances within the retention period.
51+
For active process instances, this is limited mostly by Zeebe resources; for historical instances, it is limited mostly by Elasticsearch resources. Calculated assuming a typical set of process variables per process instance. Note that it makes a difference whether you add one or two strings (requiring ~1 KB of space) to your process instances or attach a full JSON document containing 1 MB, as this data must be stored in various places, influencing memory and disk requirements. If this number increases, you can still retain the runtime throughput, but Tasklist, Operate, and/or Optimize may lag behind.
52+
The provisioned disk size is calculated as the sum of the disk size used by Zeebe and Elasticsearch.
53+
54+
**\*\*\*\*** These are the resource limits configured in the Kubernetes cluster and are subject to change.
55+
56+
## Data retention
57+
58+
The maximum throughput numbers should be considered peak loads, and the data retention configuration considered when defining the amount of data kept for completed instances in your cluster. See [Camunda 8 SaaS data retention](/components/saas/data-retention.md) for the default retention times for Zeebe, Tasklist, Operate, and Optimize.
59+
60+
- If process instances are completed and older than the configured retention time for an application, the data is removed.
61+
- If a process instance is older than the configured retention time but still active and incomplete, it continues to function at runtime and is _not_ removed.
62+
63+
Camunda can adjust data retention on request (up to certain limits). Consider retention time adjustments and/or storage capacity increases if you plan to run more than \[max PI stored in ES\] / \[configured retention time\].
64+
65+
:::note Why is the total number of process instances stored that low?
66+
This is related to the limited resources provided to Elasticsearch, which can cause performance problems when too much data is stored there. By increasing the available memory for Elasticsearch, you can also increase that number. At the same time, even with this rather low number, you can always guarantee the throughput of the core workflow engine during peak loads, as this performance is not affected. You can also increase memory for Elasticsearch later if needed.
67+
:::
68+
69+
## Next steps
70+
71+
Validate your chosen configuration by [running your own benchmarks](sizing-benchmarks.md).

0 commit comments

Comments
 (0)