diff --git a/docs/apis-tools/testing/assertions.md b/docs/apis-tools/testing/assertions.md
index 31c380de090..557216c6589 100644
--- a/docs/apis-tools/testing/assertions.md
+++ b/docs/apis-tools/testing/assertions.md
@@ -444,6 +444,42 @@ assertThat(processInstance).hasLocalVariableSatisfies(
});
```
+### hasVariableSatisfiesJudge
+
+Assert that a process variable satisfies a natural language expectation using a configured LLM judge. The assertion
+fails if the LLM score is below the configured threshold (default: 0.5). Requires [judge configuration](configuration.md#judge-configuration).
+
+```java
+assertThat(processInstance)
+ .hasVariableSatisfiesJudge("result", "Contains a valid JSON response with status OK.");
+```
+
+### hasLocalVariableSatisfiesJudge
+
+Assert that a local variable in the scope of a given element satisfies a natural language expectation. Use the BPMN
+element ID or an [element selector](utilities.md#element-selector) to identify the element.
+
+```java
+assertThat(processInstance)
+ .hasLocalVariableSatisfiesJudge(
+ ElementSelectors.byName("Greet Customer"), "output",
+ "Contains a polite greeting addressed to the customer.");
+```
+
+### withJudgeConfig
+
+Override the global judge configuration for a single assertion chain.
+
+```java
+assertThat(processInstance)
+ .withJudgeConfig(config -> config.withThreshold(0.9))
+ .hasVariableSatisfiesJudge("result", "Contains a valid JSON response with status OK.");
+```
+
+:::tip
+For a complete walkthrough of testing agentic processes with judge assertions and conditional behavior, see [Testing agentic processes](testing-agentic-processes.md).
+:::
+
## Process instance message assertions
You can verify the message subscriptions of a process instance using `CamundaAssert.assertThat(processInstance)`.
diff --git a/docs/apis-tools/testing/configuration.md b/docs/apis-tools/testing/configuration.md
index 23d16930b77..cbccf3eab9f 100644
--- a/docs/apis-tools/testing/configuration.md
+++ b/docs/apis-tools/testing/configuration.md
@@ -10,6 +10,15 @@ import TabItem from "@theme/TabItem";
By default, CPT uses a runtime based on [Testcontainers](#testcontainers-runtime). You can customize the runtime to your
needs, or replace it with a [Remote runtime](#remote-runtime), for example, if you can't install a Docker runtime.
+Configuration is provided through `application.yml` (or `application.properties`) when using the Camunda Spring Boot
+Starter, or through a `camunda-container-runtime.properties` file when using the Java client.
+
+:::tip Environment variable resolution (Java client)
+When using the Java client, properties in `camunda-container-runtime.properties` support automatic environment variable resolution. If a property is not explicitly set, it is resolved from an environment variable. The variable name is derived by prepending the `CAMUNDA_PROCESSTEST_` prefix, replacing dots with underscores, removing hyphens, and converting everything to uppercase.
+
+For example, `judge.chatModel.apiKey` resolves to `CAMUNDA_PROCESSTEST_JUDGE_CHATMODEL_APIKEY`.
+:::
+
## Testcontainers runtime
The default runtime of CPT is based on [Testcontainers](https://java.testcontainers.org/). It uses the Camunda Docker
@@ -816,3 +825,343 @@ for the following packages:
- `tc.camunda` - The Camunda Docker container (recommended level `error`)
- `tc.connectors` - The connectors Docker container (recommended level `error`)
- `org.testcontainers` - The Testcontainers framework (recommended level `warn`)
+
+## Judge configuration
+
+[Judge assertions](assertions.md#hasvariablesatisfiesjudge) use a configured LLM to score process variables against
+natural language expectations. This section covers how to set up the LLM provider and tune the judge behavior.
+
+### Prerequisites
+
+CPT provides an optional [LangChain4j](https://docs.langchain4j.dev/) integration module that ships with preconfigured
+support for major LLM providers: OpenAI, Anthropic, Amazon Bedrock, Azure OpenAI, and OpenAI-compatible APIs.
+LangChain4j requires Java 17+. You can provide your own LLM integration through a
+custom `ChatModelAdapter` instead (see [Custom ChatModelAdapter](#custom-chatmodeladapter)).
+
+:::tip
+For a guided walkthrough of setting up and testing agentic processes, see [Testing agentic processes](testing-agentic-processes.md).
+:::
+
+
+
+
+
+Camunda Process Test Spring includes the LangChain4j providers as a transitive dependency. No additional
+dependency is needed.
+
+
+
+
+
+Add the `camunda-process-test-langchain4j` dependency to your project:
+
+```xml
+
+ io.camunda
+ camunda-process-test-langchain4j
+ test
+
+```
+
+
+
+
+
+If you provide a custom `ChatModelAdapter` (see [Custom ChatModelAdapter](#custom-chatmodeladapter)), this dependency
+is not required.
+
+### Property reference
+
+All judge properties are nested under `camunda.process-test.judge` in Spring configuration. In Java properties files,
+use the `judge.` prefix with camelCase keys (for example, `judge.chat-model.api-key` becomes `judge.chatModel.apiKey`).
+
+For configuration examples, see [Set up an LLM provider](testing-agentic-processes.md#set-up-an-llm-provider).
+
+Unless noted otherwise, properties in the provider tables are required.
+
+#### Judge settings
+
+| Property | Type | Default | Description |
+| --------------------- | -------- | ------- | -------------------------------------------------------- |
+| `judge.threshold` | `double` | `0.5` | Confidence threshold (0.0 to 1.0) for the judge to pass. |
+| `judge.custom-prompt` | `string` | | Custom evaluation prompt replacing the default criteria. |
+
+The default threshold of `0.5` treats a response as acceptable when it is at least partially satisfied according to the
+judge rubric. This is a practical default for AI-generated output, where wording and level of detail may vary between
+runs even when the response is still useful. Increase the threshold when your assertion needs stricter semantic
+agreement.
+
+#### Chat model settings
+
+
+
+
+
+| Property | Required | Type | Description |
+| ------------------------------ | -------- | ---------- | --------------------------------------------------------- |
+| `judge.chat-model.provider` | Yes | `string` | Set to `openai`. |
+| `judge.chat-model.model` | Yes | `string` | Model name (for example `gpt-4o`). |
+| `judge.chat-model.api-key` | Yes | `string` | API key. |
+| `judge.chat-model.timeout` | No | `duration` | Request timeout (ISO-8601 duration, for example `PT30S`). |
+| `judge.chat-model.temperature` | No | `double` | Temperature for response randomness (0.0 to 2.0). |
+
+
+
+
+
+| Property | Required | Type | Description |
+| ------------------------------ | -------- | ---------- | --------------------------------------------------------- |
+| `judge.chat-model.provider` | Yes | `string` | Set to `anthropic`. |
+| `judge.chat-model.model` | Yes | `string` | Model name (for example `claude-sonnet-4-20250514`). |
+| `judge.chat-model.api-key` | Yes | `string` | API key. |
+| `judge.chat-model.timeout` | No | `duration` | Request timeout (ISO-8601 duration, for example `PT30S`). |
+| `judge.chat-model.temperature` | No | `double` | Temperature for response randomness (0.0 to 2.0). |
+
+
+
+
+
+Supports Bedrock long-term API keys or AWS IAM credentials. Falls back to the
+[AWS default credentials provider chain](https://docs.aws.amazon.com/sdk-for-java/latest/developer-guide/credentials-chain.html).
+
+| Property | Required | Type | Description |
+| ----------------------------------------- | ------------------------------ | ---------- | ---------------------------------------------------------------------------------------------- |
+| `judge.chat-model.provider` | Yes | `string` | Set to `amazon-bedrock`. |
+| `judge.chat-model.model` | Yes | `string` | Model name (for example `eu.anthropic.claude-haiku-4-5-20251001-v1:0`). |
+| `judge.chat-model.region` | No | `string` | AWS region (for example `eu-central-1`). |
+| `judge.chat-model.api-key` | No | `string` | Bedrock long-term API key. Optional if using IAM credentials or the default credentials chain. |
+| `judge.chat-model.credentials.access-key` | Conditionally, with secret key | `string` | AWS IAM access key. Optional if using an API key or the default credentials chain. |
+| `judge.chat-model.credentials.secret-key` | Conditionally, with access key | `string` | AWS IAM secret key. Optional if using an API key or the default credentials chain. |
+| `judge.chat-model.timeout` | No | `duration` | Request timeout (ISO-8601 duration, for example `PT30S`). |
+| `judge.chat-model.temperature` | No | `double` | Temperature for response randomness (0.0 to 2.0). |
+
+
+
+
+
+Supports API key authentication. Falls back to
+[`DefaultAzureCredential`](https://learn.microsoft.com/en-us/java/api/com.azure.identity.defaultazurecredential).
+
+| Property | Required | Type | Description |
+| ------------------------------ | -------- | ---------- | -------------------------------------------------------------------------------------------------------------------------- |
+| `judge.chat-model.provider` | Yes | `string` | Set to `azure-openai`. |
+| `judge.chat-model.model` | Yes | `string` | Azure [deployment name](https://learn.microsoft.com/en-us/azure/ai-services/openai/how-to/create-resource#deploy-a-model). |
+| `judge.chat-model.endpoint` | Yes | `string` | Azure OpenAI resource URL (for example `https://my-resource.openai.azure.com/`). |
+| `judge.chat-model.api-key` | No | `string` | API key. Optional; if omitted, falls back to `DefaultAzureCredential`. |
+| `judge.chat-model.timeout` | No | `duration` | Request timeout (ISO-8601 duration, for example `PT30S`). |
+| `judge.chat-model.temperature` | No | `double` | Temperature for response randomness (0.0 to 2.0). |
+
+
+
+
+
+For local models (such as [Ollama](https://ollama.com/)) or any third-party API that implements the
+[OpenAI chat completions format](https://platform.openai.com/docs/api-reference/chat).
+
+| Property | Required | Type | Description |
+| ------------------------------ | -------- | ---------- | ------------------------------------------------------------------------ |
+| `judge.chat-model.provider` | Yes | `string` | Set to `openai-compatible`. |
+| `judge.chat-model.model` | Yes | `string` | Model name (for example `llama3`). |
+| `judge.chat-model.base-url` | Yes | `string` | Base URL for the API endpoint (for example `http://localhost:11434/v1`). |
+| `judge.chat-model.api-key` | No | `string` | API key. Optional for local providers. |
+| `judge.chat-model.headers.*` | No | `map` | Custom HTTP headers. |
+| `judge.chat-model.timeout` | No | `duration` | Request timeout (ISO-8601 duration, for example `PT30S`). |
+| `judge.chat-model.temperature` | No | `double` | Temperature for response randomness (0.0 to 2.0). |
+
+
+
+
+
+For providers not listed above, use a custom provider name and pass arbitrary properties. See
+[Custom ChatModelAdapter](#custom-chatmodeladapter) for implementation details.
+
+| Property | Required | Type | Description |
+| -------------------------------------- | -------- | ---------- | --------------------------------------------------------------------------------------------- |
+| `judge.chat-model.provider` | Yes | `string` | Custom provider name matching your SPI implementation. |
+| `judge.chat-model.model` | Yes | `string` | Model name. |
+| `judge.chat-model.custom-properties.*` | No | `map` | Arbitrary key-value pairs passed to SPI providers via `ProviderConfig.getCustomProperties()`. |
+| `judge.chat-model.timeout` | No | `duration` | Request timeout (ISO-8601 duration, for example `PT30S`). |
+| `judge.chat-model.temperature` | No | `double` | Temperature for response randomness (0.0 to 2.0). |
+
+
+
+
+
+### Custom prompt
+
+You can replace the default evaluation criteria with a custom prompt. The custom prompt replaces only the evaluation
+criteria (the "You are an impartial judge..." preamble). The system still controls the expectation and value injection,
+the scoring rubric, and the JSON output format.
+
+By default, CPT uses an internal prompt that instructs the model to act as an impartial judge, compare the provided
+value against the natural language expectation, apply the documented scoring rubric, and return the result in the
+expected JSON structure.
+
+
+
+
+
+```yaml
+camunda:
+ process-test:
+ judge:
+ custom-prompt: "You are a domain expert evaluating financial data accuracy."
+```
+
+
+
+
+
+```properties
+judge.customPrompt=You are a domain expert evaluating financial data accuracy.
+```
+
+Or programmatically:
+
+```java
+JudgeConfig.of(prompt -> myChatModelAdapter.generate(prompt))
+ .withCustomPrompt("You are a domain expert evaluating financial data accuracy.");
+```
+
+
+
+
+
+You can also override the custom prompt for a single assertion chain:
+
+```java
+assertThat(processInstance)
+ .withJudgeConfig(config -> config
+ .withCustomPrompt("You are a domain expert evaluating financial data accuracy."))
+ .hasVariableSatisfiesJudge("result", "Contains valid totals.");
+```
+
+### Custom ChatModelAdapter
+
+You can provide your own `ChatModelAdapter` implementation without depending on the `camunda-process-test-langchain4j`
+module. A `ChatModelAdapter` is a functional interface that takes a prompt string and returns a response string.
+
+
+
+
+
+If you have a single `ChatModelAdapter` bean and no `provider` property is set, CPT auto-detects and uses it:
+
+```java
+@TestConfiguration
+class JudgeTestConfig {
+
+ @Bean
+ ChatModelAdapter chatModelAdapter() {
+ return prompt -> myChatModelAdapter.generate(prompt);
+ }
+}
+```
+
+When you have multiple beans, set `provider` to the bean name you want to use. In Spring, the bean name defaults to
+the method name:
+
+```java
+@TestConfiguration
+class JudgeTestConfig {
+
+ @Bean
+ ChatModelAdapter openAiAdapter() { /* ... */ }
+
+ @Bean
+ ChatModelAdapter ollamaAdapter() { /* ... */ }
+}
+```
+
+```yaml
+camunda:
+ process-test:
+ judge:
+ chat-model:
+ provider: "ollamaAdapter" # matches the bean method name
+```
+
+:::note Resolution order
+When using `@CamundaSpringProcessTest`, CPT resolves the judge adapter in the following order:
+
+1. If a single `ChatModelAdapter` bean exists and no `provider` property is configured, that bean is used automatically.
+2. If the `provider` property is configured and a bean with a matching name exists, that bean is selected.
+3. If no matching bean is found, CPT falls back to the built-in LangChain4j implementations, provided that `camunda-process-test-langchain4j` is on the classpath.
+4. If a `provider` is configured but no matching implementation can be resolved at all, CPT throws an exception.
+
+:::
+
+Alternatively, you can configure the judge programmatically. Set the configuration globally
+using `CamundaAssert.setJudgeConfig()`:
+
+```java
+CamundaAssert.setJudgeConfig(
+ JudgeConfig.of(prompt -> myChatModelAdapter.generate(prompt))
+ .withThreshold(0.8));
+```
+
+
+
+
+
+Implement `ChatModelAdapterProvider` and register it through `META-INF/services`:
+
+```java
+public class MyCustomProvider implements ChatModelAdapterProvider {
+
+ @Override
+ public String getProviderName() {
+ return "my-provider";
+ }
+
+ @Override
+ public ChatModelAdapter create(ProviderConfig config) {
+ String endpoint = config.getCustomProperties().get("endpoint");
+ return prompt -> callEndpoint(endpoint, prompt);
+ }
+}
+```
+
+Register the provider in `META-INF/services/io.camunda.process.test.api.judge.ChatModelAdapterProvider`:
+
+```
+com.example.MyCustomProvider
+```
+
+Alternatively, you can configure the judge programmatically. Set the configuration globally
+using `CamundaAssert.setJudgeConfig()`:
+
+```java
+CamundaAssert.setJudgeConfig(
+ JudgeConfig.of(prompt -> myChatModelAdapter.generate(prompt))
+ .withThreshold(0.8));
+```
+
+Or register the JUnit extension manually with a judge configuration:
+
+```java
+@RegisterExtension
+CamundaProcessTestExtension extension = new CamundaProcessTestExtension()
+ .withJudgeConfig(JudgeConfig.of(prompt -> myChatModelAdapter.generate(prompt))
+ .withThreshold(0.8));
+```
+
+
+
+
diff --git a/docs/apis-tools/testing/testing-agentic-processes.md b/docs/apis-tools/testing/testing-agentic-processes.md
new file mode 100644
index 00000000000..8a74e222bbb
--- /dev/null
+++ b/docs/apis-tools/testing/testing-agentic-processes.md
@@ -0,0 +1,483 @@
+---
+id: testing-agentic-processes
+title: Testing agentic processes
+description: "A guide for testing non-deterministic, agentic Camunda processes with CPT."
+---
+
+import Tabs from "@theme/Tabs";
+import TabItem from "@theme/TabItem";
+
+Agentic processes use AI agents that decide at runtime which actions to take. This makes their execution path and output content non-deterministic, which requires a different testing approach than traditional BPMN processes. This guide walks through the CPT features that address these challenges.
+
+## Why agentic processes need a different testing approach
+
+Traditional BPMN processes follow a predictable path: given the same input, they execute the same sequence of tasks and produce the same output. Tests can assert on specific tasks in a known order and compare variable values with exact equality checks.
+
+Agentic processes break both of these assumptions. A process that uses the [AI Agent connector](/components/connectors/out-of-the-box-connectors/available-connectors-overview.md) inside an [ad-hoc sub-process](/components/modeler/bpmn/ad-hoc-subprocesses/ad-hoc-subprocesses.md) lets the AI agent decide at runtime which tools to invoke and in what order. The same prompt may lead to different execution paths across runs. On top of that, the agent produces free-text output whose exact wording varies every time.
+
+This creates two concrete problems for tests:
+
+- **Non-deterministic execution order.** Standard CPT assertions are blocking: they wait for a specific condition before the test continues. A test that blocks on one particular tool task will stall if the agent chooses a different tool first, or skips that tool entirely.
+- **Non-deterministic output content.** Equality-based variable assertions cannot reliably verify free-text responses. The agent may phrase the same correct answer differently on each run, causing exact-match checks to fail even when the response is valid.
+
+In practice, CPT addresses these two problems with two complementary features:
+
+- Use [conditional behavior](utilities.md#conditional-behavior) to react to whichever tool tasks or user tasks the agent activates, without hard-coding a single execution order.
+- Use [judge assertions](assertions.md#hasvariablesatisfiesjudge) to check AI-generated output with a judge LLM. Instead of comparing exact wording, the judge scores whether the response satisfies a natural language expectation, which makes assertions more robust for free-text output.
+
+The rest of this guide shows how to apply these features in CPT tests for agentic processes.
+
+## Prerequisites
+
+This guide requires Camunda 8.9+ with [CPT set up](getting-started.md).
+
+[Judge assertions](#judge-assertions) require an LLM provider. CPT provides an optional
+[LangChain4j](https://docs.langchain4j.dev/) integration module that ships with preconfigured support for
+[several providers](configuration.md#judge-configuration). LangChain4j requires Java 17+.
+
+
+
+
+
+Camunda Process Test Spring includes the LangChain4j providers as a transitive dependency. No additional
+dependency is needed.
+
+
+
+
+
+Add the `camunda-process-test-langchain4j` dependency to your project:
+
+```xml
+
+ io.camunda
+ camunda-process-test-langchain4j
+ test
+
+```
+
+
+
+
+
+If you provide a custom `ChatModelAdapter` instead (see
+[Custom ChatModelAdapter](configuration.md#custom-chatmodeladapter)), this module is not required.
+
+## Example process
+
+The examples in this guide test the **AI Agent Chat With Tools** process from the [Build your first AI agent](/guides/getting-started-agentic-orchestration.md) guide. See [About the example AI agent process](/guides/getting-started-agentic-orchestration.md#about-the-example-ai-agent-process) for the full process structure.
+
+The test scenario is "Send Ervin a joke." To fulfill this request the agent could call `ListUsers` and `LoadUserByID` to find Ervin's email address, or call `Jokes_API` to fetch a joke, and it can do so in any order. The agent then presents the email for human review via the `AskHumanToSendEmail` user task, and after it finishes a `User_Feedback` task lets the user accept or follow up. A test cannot predict which tools the agent picks or in what sequence, so the sections below show how to handle this.
+
+## Handle non-deterministic flows
+
+[Conditional behavior](utilities.md#conditional-behavior) lets you register background reactions that monitor the process state and execute actions as conditions are met, without blocking the test thread. Register behaviors before starting the process, and they react independently as the process progresses.
+
+Each behavior watches for a specific element to become active and then completes it with test data. If the agent never activates that element, the behavior simply never triggers and the test does not stall.
+
+#### Complete tool tasks
+
+Register a behavior for each tool task the agent might invoke. The following two behaviors provide mock responses for the user lookup tools:
+
+```java
+// given: complete ListUsers when the agent invokes it
+processTestContext
+ .when(
+ () -> assertThatProcessInstance(ProcessInstanceSelectors.byProcessId("ai-agent-chat"))
+ .hasActiveElements("ListUsers"))
+ .as("complete ListUsers")
+ .then(
+ () -> processTestContext.completeJob(
+ JobSelectors.byElementId("ListUsers"),
+ Map.of("toolCallResult",
+ List.of(
+ Map.of("id", 1, "name", "Leanne Graham"),
+ Map.of("id", 2, "name", "Ervin Howell")))));
+
+// given: complete LoadUserByID with Ervin's details
+processTestContext
+ .when(
+ () -> assertThatProcessInstance(ProcessInstanceSelectors.byProcessId("ai-agent-chat"))
+ .hasActiveElements("LoadUserByID"))
+ .as("complete LoadUserByID")
+ .then(
+ () -> processTestContext.completeJob(
+ JobSelectors.byElementId("LoadUserByID"),
+ Map.of("toolCallResult",
+ Map.of("id", 2,
+ "name", "Ervin Howell",
+ "email", "Shanna@melissa.tv"))));
+```
+
+#### Complete user tasks
+
+The `AskHumanToSendEmail` user task requires human approval. Register a behavior that auto-approves the email when the task appears:
+
+```java
+// given: auto-approve the email when the human review task appears
+processTestContext
+ .when(
+ () -> assertThatProcessInstance(ProcessInstanceSelectors.byProcessId("ai-agent-chat"))
+ .hasActiveElements("AskHumanToSendEmail"))
+ .as("approve email")
+ .then(
+ () -> processTestContext.completeUserTask(
+ "AskHumanToSendEmail", Map.of("emailOk", true)));
+```
+
+:::important
+Each behavior's action should resolve the process state that the condition checks for. For example, if the condition checks for an active user task, the action should complete that task. Otherwise the behavior may execute repeatedly.
+:::
+
+#### Chained actions for repeated invocations
+
+Use chained `.then()` calls when a behavior should produce different results on repeated invocations. The first action is consumed on the first invocation, and the last action repeats for all subsequent invocations.
+
+In this example, the first feedback rejection sends the agent back with a follow-up request, and the second feedback loop approves the result:
+
+```java
+// given: first reject, then approve in the feedback loop
+processTestContext
+ .when(
+ () -> assertThatProcessInstance(ProcessInstanceSelectors.byProcessId("ai-agent-chat"))
+ .hasActiveElements("User_Feedback"))
+ .as("feedback loop")
+ // 1) first invocation: reject and ask for a better joke
+ .then(
+ () -> processTestContext.completeUserTask(
+ "User_Feedback",
+ Map.of(
+ "userSatisfied", false,
+ "followUpInput", "This joke is bad, send Ervin a better joke")))
+ // 2) subsequent invocations: approve
+ .then(
+ () -> processTestContext.completeUserTask(
+ "User_Feedback", Map.of("userSatisfied", true)));
+```
+
+For the full conditional behavior API, see [Utilities](utilities.md#conditional-behavior).
+
+## Judge assertions
+
+Judge assertions send a process variable and a natural language expectation to a configured LLM, which scores how well they match. The assertion passes if the score meets a configurable threshold. This avoids brittle string-matching on free-text AI output.
+
+The judge evaluates matches using the following scoring scale:
+
+| Score | Meaning |
+| ----- | ---------------------------------------------------------------------------------------------------------------------- |
+| 1.0 | Fully satisfied semantically. Different wording or formatting that conveys the same meaning counts as fully satisfied. |
+| 0.75 | Satisfied in substance with only minor differences that do not affect correctness. |
+| 0.5 | Partially satisfied. Some required elements are present but others are missing or incorrect. |
+| 0.25 | Mostly not satisfied. Only marginal relevance. |
+| 0.0 | Not satisfied at all, or the actual value is empty. |
+
+The LLM may return any value between these anchor points (for example, 0.6 or 0.85). The default threshold is 0.5. This means the assertion passes when the response is at least partially satisfied according to the rubric, which is a practical default for AI-generated output that may vary in wording or completeness across runs. Use a higher threshold when the response must satisfy stricter semantic requirements. You can change the threshold globally in the [judge configuration](configuration.md#judge-configuration) or per assertion using `withJudgeConfig`.
+
+### Set up an LLM provider
+
+Configure the chat model provider using the tabs below.
+
+
+
+
+
+##### Spring Boot Starter
+
+```yaml
+camunda:
+ process-test:
+ judge:
+ chat-model:
+ provider: "openai"
+ model: "gpt-4o"
+ api-key: "your-api-key"
+```
+
+##### Java client
+
+```properties
+judge.chatModel.provider=openai
+judge.chatModel.model=gpt-4o
+judge.chatModel.apiKey=your-api-key
+```
+
+
+
+
+
+##### Spring Boot Starter
+
+```yaml
+camunda:
+ process-test:
+ judge:
+ chat-model:
+ provider: "anthropic"
+ model: "claude-sonnet-4-20250514"
+ api-key: "your-api-key"
+```
+
+##### Java client
+
+```properties
+judge.chatModel.provider=anthropic
+judge.chatModel.model=claude-sonnet-4-20250514
+judge.chatModel.apiKey=your-api-key
+```
+
+
+
+
+
+If no authentication properties are set, the provider defaults to the
+[AWS default credentials provider chain](https://docs.aws.amazon.com/sdk-for-java/latest/developer-guide/credentials-chain.html).
+
+##### Spring Boot Starter
+
+```yaml
+camunda:
+ process-test:
+ judge:
+ chat-model:
+ provider: "amazon-bedrock"
+ model: "eu.anthropic.claude-haiku-4-5-20251001-v1:0"
+ region: "eu-central-1"
+ # Optional: authenticate with an API key
+ api-key: "your-api-key"
+ # Or use AWS credentials instead of api-key:
+ # credentials:
+ # access-key: "your-access-key"
+ # secret-key: "your-secret-key"
+```
+
+##### Java client
+
+```properties
+judge.chatModel.provider=amazon-bedrock
+judge.chatModel.model=eu.anthropic.claude-haiku-4-5-20251001-v1:0
+judge.chatModel.region=eu-central-1
+# Optional: authenticate with an API key
+judge.chatModel.apiKey=your-api-key
+# Or use AWS credentials instead of apiKey:
+# judge.chatModel.credentials.accessKey=your-access-key
+# judge.chatModel.credentials.secretKey=your-secret-key
+```
+
+
+
+
+
+Use this provider for local models (such as [Ollama](https://ollama.com/)) or any API that follows the OpenAI format.
+
+:::tip
+If both `api-key` and a custom `Authorization` header are set, the custom header takes precedence.
+:::
+
+##### Spring Boot Starter
+
+```yaml
+camunda:
+ process-test:
+ judge:
+ chat-model:
+ provider: "openai-compatible"
+ model: "llama3"
+ base-url: "http://localhost:11434/v1"
+ # api-key is optional for local providers
+ # Optional: custom HTTP headers
+ headers:
+ X-Custom-Header: "custom-value"
+```
+
+##### Java client
+
+```properties
+judge.chatModel.provider=openai-compatible
+judge.chatModel.model=llama3
+judge.chatModel.baseUrl=http://localhost:11434/v1
+# judge.chatModel.apiKey is optional for local providers
+# Optional: custom HTTP headers
+judge.chatModel.headers.X-Custom-Header=custom-value
+```
+
+
+
+
+
+The `model` property corresponds to your Azure
+[deployment name](https://learn.microsoft.com/en-us/azure/ai-services/openai/how-to/create-resource#deploy-a-model).
+If no API key is provided, the provider falls back to
+[`DefaultAzureCredential`](https://learn.microsoft.com/en-us/java/api/com.azure.identity.defaultazurecredential).
+
+##### Spring Boot Starter
+
+```yaml
+camunda:
+ process-test:
+ judge:
+ chat-model:
+ provider: "azure-openai"
+ model: "my-gpt-4o-deployment"
+ endpoint: "https://my-resource.openai.azure.com/"
+ # api-key is optional; if omitted, DefaultAzureCredential is used
+ api-key: "your-api-key"
+```
+
+##### Java client
+
+```properties
+judge.chatModel.provider=azure-openai
+judge.chatModel.model=my-gpt-4o-deployment
+judge.chatModel.endpoint=https://my-resource.openai.azure.com/
+# api-key is optional; if omitted, DefaultAzureCredential is used
+judge.chatModel.apiKey=your-api-key
+```
+
+
+
+
+
+For providers not listed above, use a custom provider name and pass arbitrary properties. These properties are available
+to SPI implementations through `ProviderConfig.getCustomProperties()`. See
+[Custom ChatModelAdapter](configuration.md#custom-chatmodeladapter) for implementation details.
+
+##### Spring Boot Starter
+
+```yaml
+camunda:
+ process-test:
+ judge:
+ chat-model:
+ provider: "my-generic"
+ model: "custom-model"
+ custom-properties:
+ endpoint: "http://localhost:8080"
+ api-version: "2024-01"
+```
+
+##### Java client
+
+```properties
+judge.chatModel.provider=my-generic
+judge.chatModel.model=custom-model
+judge.chatModel.customProperties.endpoint=http://localhost:8080
+judge.chatModel.customProperties.api-version=2024-01
+```
+
+
+
+
+
+For the full property reference, see [judge configuration](configuration.md#judge-configuration).
+
+### Basic usage
+
+After the process completes, use a judge assertion to verify that the agent's output satisfies a natural language expectation. The following example checks the full "Send Ervin a joke" scenario, including tool usage, email content, and the feedback loop:
+
+```java
+@Test
+void shouldSendErvinAJoke() {
+ // given: register conditional behaviors for tool tasks, email approval, and feedback
+ // ... (see Handle non-deterministic flows above)
+
+ // when: start the process
+ ProcessInstanceEvent processInstance = client.newCreateInstanceCommand()
+ .bpmnProcessId("ai-agent-chat")
+ .latestVersion()
+ .variables(Map.of("prompt", "Send Ervin a joke"))
+ .send()
+ .join();
+
+ // then: the agent completed the full scenario correctly
+ assertThat(processInstance).isCompleted();
+ assertThat(processInstance)
+ .hasVariableSatisfiesJudge(
+ "agent",
+ """
+ The agent correctly identified Ervin by calling the following tools:
+ 1. ListUsers
+ 2. LoadUserByID with id=2.
+ Furthermore, the agent called AskHumanToSendEmail and the email
+ should have been sent successfully!
+ The mail must contain a joke.
+ After the user rejected the first joke and asked for another one, the
+ agent offered a second, different joke.
+ """);
+}
+```
+
+The expectation is a plain-text description of what the agent should have done. The judge does not compare strings literally. It evaluates whether the actual variable content satisfies the expectation semantically, so different phrasing or formatting in the agent's output does not cause false failures.
+
+If the assertion fails, for example because the agent never called `LoadUserByID` or sent the email to the wrong address, the judge returns a low score with an explanation of which parts of the expectation were not met. This gives you a clear, human-readable failure message instead of a generic assertion error.
+
+### Custom threshold
+
+Use `withJudgeConfig` to set a stricter threshold for individual assertions:
+
+```java
+assertThat(processInstance)
+ .withJudgeConfig(config -> config.withThreshold(0.8))
+ .hasVariableSatisfiesJudge(
+ "agent",
+ "The email body contains a joke addressed to Ervin.");
+```
+
+### Custom prompt
+
+You can replace the default evaluation criteria with a custom prompt. The custom prompt replaces only the evaluation
+criteria. The system still controls the expectation and value injection, the scoring rubric, and the JSON output format.
+
+Set a custom prompt globally in configuration:
+
+
+
+
+
+```yaml
+camunda:
+ process-test:
+ judge:
+ custom-prompt: "You are a domain expert evaluating financial data accuracy."
+```
+
+
+
+
+
+```properties
+judge.customPrompt=You are a domain expert evaluating financial data accuracy.
+```
+
+
+
+
+
+Or override the prompt for a single assertion:
+
+```java
+assertThat(processInstance)
+ .withJudgeConfig(config -> config
+ .withCustomPrompt("You are a domain expert evaluating financial data accuracy."))
+ .hasVariableSatisfiesJudge("result", "Contains valid totals.");
+```
+
+For the full assertion API, see [Assertions](assertions.md#hasvariablesatisfiesjudge).
+
+## Next steps
+
+- [Assertions](assertions.md) documents the full assertion API reference, including all judge assertion methods.
+- [Configuration](configuration.md#judge-configuration) provides the complete property reference for judge settings and chat model providers.
+- [Utilities](utilities.md#conditional-behavior) describes the full conditional behavior API, including chained actions and lifecycle details.
diff --git a/docs/apis-tools/testing/utilities.md b/docs/apis-tools/testing/utilities.md
index 02d13d1cd95..9ed1aa40717 100644
--- a/docs/apis-tools/testing/utilities.md
+++ b/docs/apis-tools/testing/utilities.md
@@ -283,6 +283,89 @@ void shouldMockDmnDecision() {
}
```
+## Conditional behavior
+
+The `when(condition).then(action)` API on `CamundaProcessTestContext` registers background behaviors that react to process state changes without blocking the test thread. This is useful for non-deterministic flows where the execution order is unknown. You can register multiple behaviors before starting the process, and they will react independently as the process progresses. Behaviors are cleared automatically after each test.
+
+:::tip
+For a guided walkthrough of using conditional behavior to test agentic processes, see [Testing agentic processes](testing-agentic-processes.md).
+:::
+
+```java
+@Test
+void shouldCompleteTaskAutomatically() {
+ // given: define a conditional behavior
+ processTestContext
+ .when(() -> assertThat(processInstance).hasActiveElements("approve_order"))
+ .then(() -> processTestContext.completeUserTask("approve_order",
+ Map.of("approved", true)));
+
+ // when: create a process instance
+ // then: verify that the process instance is completed
+}
+```
+
+:::important
+The action should resolve the process state that the condition checks for. After an action executes, the engine waits for the condition to become false again before re-evaluating. For example, if the condition asserts that a user task is active, the action should complete that user task. This advances the process flow so that the condition no longer holds. Otherwise, the same condition may not be detected again reliably.
+:::
+
+If the same conditional behavior applies to multiple tests, you can define it in a `@BeforeEach` method:
+
+```java
+@BeforeEach
+void setupBehaviors() {
+ processTestContext
+ .when(() -> assertThat(processInstance).hasActiveElements("send_notification"))
+ .then(() -> processTestContext.completeJob("send-notification"));
+}
+
+@Test
+void shouldCompleteOrder() {
+ // given: the conditional behavior is defined in @BeforeEach
+
+ // when: create a process instance
+ // then: verify that the process instance completed the task
+}
+```
+
+### Chain multiple actions
+
+Actions are consumed in order on each condition match. The last action repeats indefinitely once all preceding actions are exhausted.
+
+```java
+@Test
+void shouldHandleRepeatedTask() {
+ // given: define a conditional behavior with chained actions
+ processTestContext
+ .when(() -> assertThat(processInstance).hasActiveElements("review_document"))
+ .then(() -> processTestContext.completeUserTask("review_document",
+ Map.of("approved", false, "comment", "Needs revision")))
+ .then(() -> processTestContext.completeUserTask("review_document",
+ Map.of("approved", true, "comment", "Looks good")));
+
+ // when: create a process instance
+ // then: verify that the process instance is completed
+}
+```
+
+### Name a behavior
+
+You can assign a descriptive name to a conditional behavior using `.as()`. The name is used in log messages and diagnostics.
+
+```java
+@Test
+void shouldNameBehavior() {
+ // given: define a named conditional behavior
+ processTestContext
+ .when(() -> assertThat(processInstance).hasActiveElements("send_notification"))
+ .as("send-notification is active")
+ .then(() -> processTestContext.completeJob("send-notification"));
+
+ // when: create a process instance
+ // then: verify that the process instance completed the task
+}
+```
+
## Complete jobs
You can complete an active job to simulate the behavior of a job worker without invoking the actual worker.
diff --git a/docs/components/agentic-orchestration/ai-agents.md b/docs/components/agentic-orchestration/ai-agents.md
index 383bcf3fa8f..1f600510f0e 100644
--- a/docs/components/agentic-orchestration/ai-agents.md
+++ b/docs/components/agentic-orchestration/ai-agents.md
@@ -92,6 +92,7 @@ Learn more about this model in the [example AI Agent Sub-process connector integ
Learn more about building and integrating AI agents in Camunda 8:
- [Building Your First AI Agent in Camunda](https://camunda.com/blog/2025/05/step-by-step-guide-ai-task-agents-camunda/)
+- [Testing agentic processes](/apis-tools/testing/testing-agentic-processes.md)
- [Intelligent by Design: A Step-by-Step Guide to AI Task Agents in Camunda](https://camunda.com/blog/2025/05/step-by-step-guide-ai-task-agents-camunda/)
- [Artificial Intelligence (AI) Agents: What You Need to Know](https://camunda.com/blog/2024/08/ai-agents-what-you-need-to-know/)
- [Camunda AI agents](https://camunda.com/blog/tag/ai-agent/)
diff --git a/docs/guides/getting-started-agentic-orchestration.md b/docs/guides/getting-started-agentic-orchestration.md
index 1861237ae4f..07f9efb7f65 100644
--- a/docs/guides/getting-started-agentic-orchestration.md
+++ b/docs/guides/getting-started-agentic-orchestration.md
@@ -314,6 +314,7 @@ For example:
- Update the system prompt to adjust the AI agent's behavior.
- Experiment with different model providers and configurations in the AI Agent connector.
- [Monitor your AI agents](/components/agentic-orchestration/monitor-ai-agents.md).
+- [Test your agentic processes](/apis-tools/testing/testing-agentic-processes.md) with Camunda Process Test, including handling non-deterministic flows and verifying agent output with AI-powered assertions.
- Learn more about [Camunda agentic orchestration](/components/agentic-orchestration/agentic-orchestration-overview.md) and the [AI Agent connector](/components/connectors/out-of-the-box-connectors/agentic-ai-aiagent.md).
:::info Camunda Academy
diff --git a/sidebars.js b/sidebars.js
index 0a8da4886dd..912055ef3b1 100644
--- a/sidebars.js
+++ b/sidebars.js
@@ -1401,6 +1401,7 @@ module.exports = {
{
"Camunda Process Test": [
"apis-tools/testing/getting-started",
+ "apis-tools/testing/testing-agentic-processes",
"apis-tools/testing/configuration",
"apis-tools/testing/assertions",
"apis-tools/testing/utilities",