Skip to content

Commit 237201b

Browse files
Aaron1011Copilot
andauthored
Limit concurrency for evaluations/inferences in ui e2e tests (tensorzero#2948)
* Limit concurrency for evaluations/inferences in ui e2e tests When regenerating the fixtures, we need to run with an evaluations concurrency limit of 1, and only load one of the (duplicate) image inferences on the playground page. This prevents us from having multiple in-flight inferences at once, which can cause cache entries to get overwritten. While we might be able to increase the concurrency limit in 'normal' (non-regen) mode, we've had a lot of issues with this test. For now, let's have regen and non-regen modes run exactly the same logic. * Fix typo Co-authored-by: Copilot <[email protected]> --------- Co-authored-by: Copilot <[email protected]>
1 parent 3457624 commit 237201b

File tree

3 files changed

+18
-6
lines changed

3 files changed

+18
-6
lines changed

ui/app/routes/evaluations/LaunchEvaluationModal.tsx

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -197,6 +197,7 @@ function EvaluationForm({
197197
type="number"
198198
id="concurrency_limit"
199199
name="concurrency_limit"
200+
data-testid="concurrency-limit"
200201
min="1"
201202
value={concurrencyLimit}
202203
onChange={(e) => setConcurrencyLimit(e.target.value)}

ui/e2e_tests/evaluations.spec.ts

Lines changed: 8 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -27,6 +27,10 @@ test("push the new run button, launch an evaluation", async ({ page }) => {
2727
await page.getByText("Select a variant").click();
2828
await page.waitForTimeout(500);
2929
await page.getByRole("option", { name: "gpt4o_mini_initial_prompt" }).click();
30+
// IMPORTANT - we need to set concurrency to 1 in order to prevent a race condition
31+
// when regenerating fixtures, as we intentionally have multiple datapoints with
32+
// identical inputs. See https://www.notion.so/tensorzerodotcom/Evaluations-cache-non-determinism-23a7520bbad3801f80fceaa7e859ce06
33+
await page.getByTestId("concurrency-limit").fill("1");
3034
await page.getByRole("button", { name: "Launch" }).click();
3135

3236
await expect(
@@ -70,6 +74,10 @@ test("push the new run button, launch an image evaluation", async ({
7074
await page.getByText("Select a variant").click();
7175
await page.waitForTimeout(500);
7276
await page.getByRole("option", { name: "honest_answer" }).click();
77+
// IMPORTANT - we need to set concurrency to 1 in order to prevent a race condition
78+
// when regenerating fixtures, as we intentionally have multiple datapoints with
79+
// identical inputs. See https://www.notion.so/tensorzerodotcom/Evaluations-cache-non-determinism-23a7520bbad3801f80fceaa7e859ce06
80+
await page.getByTestId("concurrency-limit").fill("1");
7381
await page.getByRole("button", { name: "Launch" }).click();
7482

7583
await expect(

ui/e2e_tests/playground.spec.ts

Lines changed: 9 additions & 6 deletions
Original file line numberDiff line numberDiff line change
@@ -100,7 +100,10 @@ test("playground should work for extract_entities JSON function with 2 variants"
100100
test("playground should work for image_judger function with images in input", async ({
101101
page,
102102
}) => {
103-
await page.goto("/playground?limit=2");
103+
// We set 'limit=1' so that we don't make parallel inference requests
104+
// (two of the datapoints have the same input, and could trample on each other's
105+
// cache entries)
106+
await page.goto("/playground?limit=1");
104107
await expect(page.getByText("Select a function")).toBeVisible();
105108

106109
// Select function 'image_judger' by typing in the combobox
@@ -122,14 +125,14 @@ test("playground should work for image_judger function with images in input", as
122125
await expect(page.getByText("baz")).toBeVisible();
123126
await expect(page.getByRole("link", { name: "honest_answer" })).toBeVisible();
124127

125-
// Verify that there are 2 inputs and 2 reference outputs
126-
await expect(page.getByRole("heading", { name: "Input" })).toHaveCount(2);
128+
// Verify that there is 1 input and 1 reference output
129+
await expect(page.getByRole("heading", { name: "Input" })).toHaveCount(1);
127130
await expect(
128131
page.getByRole("heading", { name: "Reference Output" }),
129-
).toHaveCount(2);
132+
).toHaveCount(1);
130133

131-
// Verify that images are rendered in the input elements
132-
await expect(page.locator("img")).toHaveCount(2);
134+
// Verify that the image is rendered in the input element
135+
await expect(page.locator("img")).toHaveCount(1);
133136

134137
// Wait for at least one textbox containing "crab"
135138
// Wait for and assert at least one exists

0 commit comments

Comments
 (0)