DEV-298: Add group.experiment() API reference page (#1530)

Benanna2019 · web-flow · commit 672b3d8e7107 · 2026-04-17T15:12:49.000-04:00
* DEV-298: Add group.experiment() API reference page Pure reference page for the experimentation API, split from Andy's PR #1444 using the Diataxis framework. Covers signature, properties, selection strategies, and observability metadata. * DEV-299: Add how-to guide for running experiments in AI pipelines (#1532) Task-oriented how-to guide for group.experiment(), split from Andy's PR #1444 using the Diataxis framework. All examples framed around AI orchestration use cases (model comparison, prompt strategies, RAG vs single-shot) using step.run only.
diff --git a/pages/docs/features/inngest-functions/steps-workflows/running-experiments.mdx b/pages/docs/features/inngest-functions/steps-workflows/running-experiments.mdx
@@ -0,0 +1,192 @@
+import { Callout, CodeGroup } from "src/shared/Docs/mdx";
+
+export const description = `Run experiments inside durable functions to compare AI models, prompt strategies, and pipeline approaches`
+
+# Run experiments in AI pipelines
+
+When you're building AI features with durable functions, you often need to compare approaches: which model produces better results, whether a refined prompt outperforms a simpler one, or how a RAG pipeline stacks up against a single-shot call. `group.experiment()` lets you run these comparisons inside your function, with durable memoization so the same variant is always selected on retries and replays.
+
+```ts
+import { experiment } from "inngest";
+```
+
+## Compare AI models
+
+The simplest case: you want to test two models against each other. Define each model call as a variant and use `weighted` selection to control the traffic split.
+
+```ts
+import { experiment } from "inngest";
+
+export default inngest.createFunction(
+  {
+    id: "summarize-document",
+    triggers: { event: "document/uploaded" },
+  },
+  async ({ event, step, group }) => {
+    const doc = await step.run("fetch-document", () =>
+      fetchDocument(event.data.documentId)
+    );
+
+    const summary = await group.experiment("model-comparison", {
+      variants: {
+        gpt4o: () =>
+          step.run("summarize-gpt4o", () =>
+            callOpenAI({ model: "gpt-4o", prompt: `Summarize: ${doc.text}` })
+          ),
+        claude: () =>
+          step.run("summarize-claude", () =>
+            callAnthropic({ model: "claude-sonnet-4-20250514", prompt: `Summarize: ${doc.text}` })
+          ),
+      },
+      select: experiment.weighted({ gpt4o: 50, claude: 50 }),
+    });
+
+    return summary;
+  }
+);
+```
+
+The variant selection is wrapped in a memoized step. If the function retries or replays, the same model is used every time.
+
+## Bucket users to a consistent model
+
+When users interact with your AI features repeatedly, you usually want them to get consistent behavior. Use `experiment.bucket()` with the user ID so the same user always hits the same variant.
+
+```ts
+const response = await group.experiment("assistant-model", {
+  variants: {
+    current: () =>
+      step.run("current-model", () =>
+        generateResponse({ model: "gpt-4o", messages: conversation })
+      ),
+    candidate: () =>
+      step.run("candidate-model", () =>
+        generateResponse({ model: "gpt-4o-mini", messages: conversation })
+      ),
+  },
+  select: experiment.bucket(event.data.userId, {
+    weights: { current: 90, candidate: 10 },
+  }),
+});
+```
+
+The same user ID always maps to the same variant, even across different function runs. This prevents users from experiencing inconsistent quality between requests.
+
+## Test prompt strategies with multi-step variants
+
+Variant callbacks can contain multiple sequential steps. Each step is individually retried and memoized. This is useful when one approach involves more work than another, like comparing a single-shot prompt against a retrieval-augmented pipeline.
+
+```ts
+const answer = await group.experiment("prompt-strategy", {
+  variants: {
+    single_shot: () =>
+      step.run("single-shot", () =>
+        callLLM({ prompt: `Answer this question: ${question}` })
+      ),
+    rag_pipeline: async () => {
+      const chunks = await step.run("retrieve-context", () =>
+        searchVectorStore(question, { topK: 5 })
+      );
+      const context = chunks.map((c) => c.text).join("\n\n");
+      return await step.run("generate-with-context", () =>
+        callLLM({
+          prompt: `Using this context:\n${context}\n\nAnswer: ${question}`,
+        })
+      );
+    },
+  },
+  select: experiment.weighted({ single_shot: 70, rag_pipeline: 30 }),
+});
+```
+
+## Get the selected variant name
+
+Set `withVariant: true` to receive both the result and which variant was selected. This is useful for logging, analytics, or downstream decisions.
+
+```ts
+const outcome = await group.experiment("tone-test", {
+  variants: {
+    concise: () =>
+      step.run("concise-prompt", () =>
+        callLLM({ prompt: "Be brief. " + userQuery })
+      ),
+    detailed: () =>
+      step.run("detailed-prompt", () =>
+        callLLM({ prompt: "Be thorough and explain your reasoning. " + userQuery })
+      ),
+  },
+  select: experiment.weighted({ concise: 50, detailed: 50 }),
+  withVariant: true,
+});
+
+await step.run("log-experiment", () =>
+  trackExperiment({
+    experiment: "tone-test",
+    variant: outcome.variant,
+    responseLength: outcome.result.length,
+  })
+);
+```
+
+## Run multiple experiments in one pipeline
+
+You can run independent experiments in a single function. Use `experiment.bucket()` with a composite key so each experiment assigns variants independently.
+
+```ts
+export default inngest.createFunction(
+  {
+    id: "ai-document-pipeline",
+    triggers: { event: "document/process" },
+  },
+  async ({ event, step, group }) => {
+    const userId = event.data.userId;
+
+    const extraction = await group.experiment("extraction-model", {
+      variants: {
+        structured: () =>
+          step.run("structured-extract", () =>
+            extractWithSchema(event.data.documentUrl)
+          ),
+        freeform: () =>
+          step.run("freeform-extract", () =>
+            extractFreeform(event.data.documentUrl)
+          ),
+      },
+      select: experiment.bucket(`${userId}:extraction`),
+      withVariant: true,
+    });
+
+    const summary = await group.experiment("summary-approach", {
+      variants: {
+        map_reduce: async () => {
+          const chunks = await step.run("chunk-document", () =>
+            chunkText(extraction.result)
+          );
+          const partials = await step.run("summarize-chunks", () =>
+            Promise.all(chunks.map((c) => summarize(c)))
+          );
+          return await step.run("combine-summaries", () =>
+            combineSummaries(partials)
+          );
+        },
+        single_pass: () =>
+          step.run("single-pass-summary", () =>
+            summarize(extraction.result)
+          ),
+      },
+      select: experiment.bucket(`${userId}:summary`),
+      withVariant: true,
+    });
+
+    return { extraction, summary };
+  }
+);
+```
+
+By appending a feature-specific suffix to the bucket key (`userId:extraction` vs `userId:summary`), the same user can be independently assigned to different variants in each experiment.
+
+<Callout>
+  Every variant callback must invoke at least one `step.run()` call. The SDK throws a `NonRetriableError` if a variant completes without calling any step tools.
+</Callout>
+
+For the full API surface, parameters, and selection strategy details, see the [`group.experiment()` reference](/docs/reference/typescript/v4/functions/group-experiment).
diff --git a/pages/docs/reference/typescript/v4/functions/group-experiment.mdx b/pages/docs/reference/typescript/v4/functions/group-experiment.mdx
@@ -0,0 +1,130 @@
+import { Callout, Properties, Property, Row, Col, CodeGroup } from "src/shared/Docs/mdx";
+
+export const description = `API reference for group.experiment() — run experiments within durable functions`
+
+# `group.experiment()`
+
+Selects and executes a single variant from a set of options. The selection is memoized as a durable step, so the same variant runs on retries and replays.
+
+---
+
+## `group.experiment(id, options): Promise`
+
+<Row>
+  <Col>
+    <Properties>
+      <Property name="id" type="string" required>
+        A unique identifier for the experiment. Used in logs and to memoize the variant selection across retries and replays.
+      </Property>
+      <Property name="options" type="object" required>
+        Configuration for the experiment:
+        <Properties nested={true}>
+          <Property name="variants" type="Record<string, () => unknown>" required>
+            A map of variant names to callbacks. Each callback should contain one or more `step.*` calls. Only the selected variant's callback is executed.
+          </Property>
+          <Property name="select" type="ExperimentSelectFn" required>
+            A selection strategy that determines which variant to run. Use one of the built-in strategies: `experiment.fixed()`, `experiment.weighted()`, `experiment.bucket()`, or `experiment.custom()`.
+          </Property>
+          <Property name="withVariant" type="boolean">
+            When `true`, the return value includes the selected variant name alongside the result. Defaults to `false`.
+          </Property>
+        </Properties>
+      </Property>
+    </Properties>
+  </Col>
+  <Col>
+    <CodeGroup title="Basic usage">
+    ```ts
+    const result = await group.experiment("my-experiment", {
+      variants: {
+        a: () => step.run("variant-a", () => doA()),
+        b: () => step.run("variant-b", () => doB()),
+      },
+      select: experiment.weighted({ a: 50, b: 50 }),
+    });
+    ```
+    </CodeGroup>
+    <CodeGroup title="With variant name returned">
+    ```ts
+    const { result, variant } = await group.experiment("my-experiment", {
+      variants: {
+        a: () => step.run("variant-a", () => doA()),
+        b: () => step.run("variant-b", () => doB()),
+      },
+      select: experiment.fixed("a"),
+      withVariant: true,
+    });
+    // variant === "a"
+    ```
+    </CodeGroup>
+  </Col>
+</Row>
+
+<Callout>
+  Every variant callback **must** invoke at least one `step.*` tool (e.g., `step.run()`). Code that runs outside of a step is not memoized and will re-execute on every replay. The SDK throws a `NonRetriableError` if a variant completes without calling any step tools.
+</Callout>
+
+## Selection strategies
+
+Import the `experiment` object from the `inngest` package:
+
+```ts
+import { experiment } from "inngest";
+```
+
+### `experiment.fixed(variantName)`
+
+Always selects the specified variant. Useful for manual overrides or testing a specific code path.
+
+```ts
+select: experiment.fixed("control")
+```
+
+### `experiment.weighted(weights)`
+
+Weighted random selection, seeded with the current Inngest run ID. Deterministic: the same run always gets the same variant, even across retries.
+
+```ts
+select: experiment.weighted({ control: 80, treatment: 20 })
+```
+
+Weights are relative, not percentages. `{ a: 1, b: 3 }` gives `a` a 25% chance and `b` a 75% chance.
+
+### `experiment.bucket(value, options?)`
+
+Consistent hashing. The same input value always maps to the same variant. Useful for user-level bucketing where a user should see the same variant across multiple runs.
+
+```ts
+select: experiment.bucket(event.data.userId, {
+  weights: { control: 70, treatment: 30 },
+})
+```
+
+When `weights` are omitted, equal weights are derived from the variant names:
+
+```ts
+select: experiment.bucket(event.data.userId)
+```
+
+If `value` is `null` or `undefined`, the SDK hashes an empty string and attaches a warning to the step metadata.
+
+### `experiment.custom(fn)`
+
+Provide your own selection logic. The function can be synchronous or asynchronous. The result is memoized durably, so it only runs once per function run.
+
+```ts
+select: experiment.custom(async () => {
+  const flag = await getFeatureFlag("checkout-variant");
+  return flag; // Must return a key from `variants`
+})
+```
+
+<Callout>
+  The `custom` function must return a string that matches one of the keys in `variants`. If it returns an unknown variant name, the SDK throws a `NonRetriableError`.
+</Callout>
+
+## Observability
+
+The selection step carries experiment metadata: the experiment name, selected variant, strategy, available variants, and weights. This metadata is visible in the Inngest dashboard.
+
+Steps executed within the selected variant's callback also carry experiment context, so you can trace which experiment and variant produced each step in a run.