Skip to content

Commit 672b3d8

Browse files
authored
DEV-298: Add group.experiment() API reference page (#1530)
* DEV-298: Add group.experiment() API reference page Pure reference page for the experimentation API, split from Andy's PR #1444 using the Diataxis framework. Covers signature, properties, selection strategies, and observability metadata. * DEV-299: Add how-to guide for running experiments in AI pipelines (#1532) Task-oriented how-to guide for group.experiment(), split from Andy's PR #1444 using the Diataxis framework. All examples framed around AI orchestration use cases (model comparison, prompt strategies, RAG vs single-shot) using step.run only.
1 parent b88e141 commit 672b3d8

2 files changed

Lines changed: 322 additions & 0 deletions

File tree

Lines changed: 192 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,192 @@
1+
import { Callout, CodeGroup } from "src/shared/Docs/mdx";
2+
3+
export const description = `Run experiments inside durable functions to compare AI models, prompt strategies, and pipeline approaches`
4+
5+
# Run experiments in AI pipelines
6+
7+
When you're building AI features with durable functions, you often need to compare approaches: which model produces better results, whether a refined prompt outperforms a simpler one, or how a RAG pipeline stacks up against a single-shot call. `group.experiment()` lets you run these comparisons inside your function, with durable memoization so the same variant is always selected on retries and replays.
8+
9+
```ts
10+
import { experiment } from "inngest";
11+
```
12+
13+
## Compare AI models
14+
15+
The simplest case: you want to test two models against each other. Define each model call as a variant and use `weighted` selection to control the traffic split.
16+
17+
```ts
18+
import { experiment } from "inngest";
19+
20+
export default inngest.createFunction(
21+
{
22+
id: "summarize-document",
23+
triggers: { event: "document/uploaded" },
24+
},
25+
async ({ event, step, group }) => {
26+
const doc = await step.run("fetch-document", () =>
27+
fetchDocument(event.data.documentId)
28+
);
29+
30+
const summary = await group.experiment("model-comparison", {
31+
variants: {
32+
gpt4o: () =>
33+
step.run("summarize-gpt4o", () =>
34+
callOpenAI({ model: "gpt-4o", prompt: `Summarize: ${doc.text}` })
35+
),
36+
claude: () =>
37+
step.run("summarize-claude", () =>
38+
callAnthropic({ model: "claude-sonnet-4-20250514", prompt: `Summarize: ${doc.text}` })
39+
),
40+
},
41+
select: experiment.weighted({ gpt4o: 50, claude: 50 }),
42+
});
43+
44+
return summary;
45+
}
46+
);
47+
```
48+
49+
The variant selection is wrapped in a memoized step. If the function retries or replays, the same model is used every time.
50+
51+
## Bucket users to a consistent model
52+
53+
When users interact with your AI features repeatedly, you usually want them to get consistent behavior. Use `experiment.bucket()` with the user ID so the same user always hits the same variant.
54+
55+
```ts
56+
const response = await group.experiment("assistant-model", {
57+
variants: {
58+
current: () =>
59+
step.run("current-model", () =>
60+
generateResponse({ model: "gpt-4o", messages: conversation })
61+
),
62+
candidate: () =>
63+
step.run("candidate-model", () =>
64+
generateResponse({ model: "gpt-4o-mini", messages: conversation })
65+
),
66+
},
67+
select: experiment.bucket(event.data.userId, {
68+
weights: { current: 90, candidate: 10 },
69+
}),
70+
});
71+
```
72+
73+
The same user ID always maps to the same variant, even across different function runs. This prevents users from experiencing inconsistent quality between requests.
74+
75+
## Test prompt strategies with multi-step variants
76+
77+
Variant callbacks can contain multiple sequential steps. Each step is individually retried and memoized. This is useful when one approach involves more work than another, like comparing a single-shot prompt against a retrieval-augmented pipeline.
78+
79+
```ts
80+
const answer = await group.experiment("prompt-strategy", {
81+
variants: {
82+
single_shot: () =>
83+
step.run("single-shot", () =>
84+
callLLM({ prompt: `Answer this question: ${question}` })
85+
),
86+
rag_pipeline: async () => {
87+
const chunks = await step.run("retrieve-context", () =>
88+
searchVectorStore(question, { topK: 5 })
89+
);
90+
const context = chunks.map((c) => c.text).join("\n\n");
91+
return await step.run("generate-with-context", () =>
92+
callLLM({
93+
prompt: `Using this context:\n${context}\n\nAnswer: ${question}`,
94+
})
95+
);
96+
},
97+
},
98+
select: experiment.weighted({ single_shot: 70, rag_pipeline: 30 }),
99+
});
100+
```
101+
102+
## Get the selected variant name
103+
104+
Set `withVariant: true` to receive both the result and which variant was selected. This is useful for logging, analytics, or downstream decisions.
105+
106+
```ts
107+
const outcome = await group.experiment("tone-test", {
108+
variants: {
109+
concise: () =>
110+
step.run("concise-prompt", () =>
111+
callLLM({ prompt: "Be brief. " + userQuery })
112+
),
113+
detailed: () =>
114+
step.run("detailed-prompt", () =>
115+
callLLM({ prompt: "Be thorough and explain your reasoning. " + userQuery })
116+
),
117+
},
118+
select: experiment.weighted({ concise: 50, detailed: 50 }),
119+
withVariant: true,
120+
});
121+
122+
await step.run("log-experiment", () =>
123+
trackExperiment({
124+
experiment: "tone-test",
125+
variant: outcome.variant,
126+
responseLength: outcome.result.length,
127+
})
128+
);
129+
```
130+
131+
## Run multiple experiments in one pipeline
132+
133+
You can run independent experiments in a single function. Use `experiment.bucket()` with a composite key so each experiment assigns variants independently.
134+
135+
```ts
136+
export default inngest.createFunction(
137+
{
138+
id: "ai-document-pipeline",
139+
triggers: { event: "document/process" },
140+
},
141+
async ({ event, step, group }) => {
142+
const userId = event.data.userId;
143+
144+
const extraction = await group.experiment("extraction-model", {
145+
variants: {
146+
structured: () =>
147+
step.run("structured-extract", () =>
148+
extractWithSchema(event.data.documentUrl)
149+
),
150+
freeform: () =>
151+
step.run("freeform-extract", () =>
152+
extractFreeform(event.data.documentUrl)
153+
),
154+
},
155+
select: experiment.bucket(`${userId}:extraction`),
156+
withVariant: true,
157+
});
158+
159+
const summary = await group.experiment("summary-approach", {
160+
variants: {
161+
map_reduce: async () => {
162+
const chunks = await step.run("chunk-document", () =>
163+
chunkText(extraction.result)
164+
);
165+
const partials = await step.run("summarize-chunks", () =>
166+
Promise.all(chunks.map((c) => summarize(c)))
167+
);
168+
return await step.run("combine-summaries", () =>
169+
combineSummaries(partials)
170+
);
171+
},
172+
single_pass: () =>
173+
step.run("single-pass-summary", () =>
174+
summarize(extraction.result)
175+
),
176+
},
177+
select: experiment.bucket(`${userId}:summary`),
178+
withVariant: true,
179+
});
180+
181+
return { extraction, summary };
182+
}
183+
);
184+
```
185+
186+
By appending a feature-specific suffix to the bucket key (`userId:extraction` vs `userId:summary`), the same user can be independently assigned to different variants in each experiment.
187+
188+
<Callout>
189+
Every variant callback must invoke at least one `step.run()` call. The SDK throws a `NonRetriableError` if a variant completes without calling any step tools.
190+
</Callout>
191+
192+
For the full API surface, parameters, and selection strategy details, see the [`group.experiment()` reference](/docs/reference/typescript/v4/functions/group-experiment).
Lines changed: 130 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,130 @@
1+
import { Callout, Properties, Property, Row, Col, CodeGroup } from "src/shared/Docs/mdx";
2+
3+
export const description = `API reference for group.experiment() — run experiments within durable functions`
4+
5+
# `group.experiment()`
6+
7+
Selects and executes a single variant from a set of options. The selection is memoized as a durable step, so the same variant runs on retries and replays.
8+
9+
---
10+
11+
## `group.experiment(id, options): Promise`
12+
13+
<Row>
14+
<Col>
15+
<Properties>
16+
<Property name="id" type="string" required>
17+
A unique identifier for the experiment. Used in logs and to memoize the variant selection across retries and replays.
18+
</Property>
19+
<Property name="options" type="object" required>
20+
Configuration for the experiment:
21+
<Properties nested={true}>
22+
<Property name="variants" type="Record<string, () => unknown>" required>
23+
A map of variant names to callbacks. Each callback should contain one or more `step.*` calls. Only the selected variant's callback is executed.
24+
</Property>
25+
<Property name="select" type="ExperimentSelectFn" required>
26+
A selection strategy that determines which variant to run. Use one of the built-in strategies: `experiment.fixed()`, `experiment.weighted()`, `experiment.bucket()`, or `experiment.custom()`.
27+
</Property>
28+
<Property name="withVariant" type="boolean">
29+
When `true`, the return value includes the selected variant name alongside the result. Defaults to `false`.
30+
</Property>
31+
</Properties>
32+
</Property>
33+
</Properties>
34+
</Col>
35+
<Col>
36+
<CodeGroup title="Basic usage">
37+
```ts
38+
const result = await group.experiment("my-experiment", {
39+
variants: {
40+
a: () => step.run("variant-a", () => doA()),
41+
b: () => step.run("variant-b", () => doB()),
42+
},
43+
select: experiment.weighted({ a: 50, b: 50 }),
44+
});
45+
```
46+
</CodeGroup>
47+
<CodeGroup title="With variant name returned">
48+
```ts
49+
const { result, variant } = await group.experiment("my-experiment", {
50+
variants: {
51+
a: () => step.run("variant-a", () => doA()),
52+
b: () => step.run("variant-b", () => doB()),
53+
},
54+
select: experiment.fixed("a"),
55+
withVariant: true,
56+
});
57+
// variant === "a"
58+
```
59+
</CodeGroup>
60+
</Col>
61+
</Row>
62+
63+
<Callout>
64+
Every variant callback **must** invoke at least one `step.*` tool (e.g., `step.run()`). Code that runs outside of a step is not memoized and will re-execute on every replay. The SDK throws a `NonRetriableError` if a variant completes without calling any step tools.
65+
</Callout>
66+
67+
## Selection strategies
68+
69+
Import the `experiment` object from the `inngest` package:
70+
71+
```ts
72+
import { experiment } from "inngest";
73+
```
74+
75+
### `experiment.fixed(variantName)`
76+
77+
Always selects the specified variant. Useful for manual overrides or testing a specific code path.
78+
79+
```ts
80+
select: experiment.fixed("control")
81+
```
82+
83+
### `experiment.weighted(weights)`
84+
85+
Weighted random selection, seeded with the current Inngest run ID. Deterministic: the same run always gets the same variant, even across retries.
86+
87+
```ts
88+
select: experiment.weighted({ control: 80, treatment: 20 })
89+
```
90+
91+
Weights are relative, not percentages. `{ a: 1, b: 3 }` gives `a` a 25% chance and `b` a 75% chance.
92+
93+
### `experiment.bucket(value, options?)`
94+
95+
Consistent hashing. The same input value always maps to the same variant. Useful for user-level bucketing where a user should see the same variant across multiple runs.
96+
97+
```ts
98+
select: experiment.bucket(event.data.userId, {
99+
weights: { control: 70, treatment: 30 },
100+
})
101+
```
102+
103+
When `weights` are omitted, equal weights are derived from the variant names:
104+
105+
```ts
106+
select: experiment.bucket(event.data.userId)
107+
```
108+
109+
If `value` is `null` or `undefined`, the SDK hashes an empty string and attaches a warning to the step metadata.
110+
111+
### `experiment.custom(fn)`
112+
113+
Provide your own selection logic. The function can be synchronous or asynchronous. The result is memoized durably, so it only runs once per function run.
114+
115+
```ts
116+
select: experiment.custom(async () => {
117+
const flag = await getFeatureFlag("checkout-variant");
118+
return flag; // Must return a key from `variants`
119+
})
120+
```
121+
122+
<Callout>
123+
The `custom` function must return a string that matches one of the keys in `variants`. If it returns an unknown variant name, the SDK throws a `NonRetriableError`.
124+
</Callout>
125+
126+
## Observability
127+
128+
The selection step carries experiment metadata: the experiment name, selected variant, strategy, available variants, and weights. This metadata is visible in the Inngest dashboard.
129+
130+
Steps executed within the selected variant's callback also carry experiment context, so you can trace which experiment and variant produced each step in a run.

0 commit comments

Comments
 (0)