fix(evals): infer task input types from eval data by c-ehrlich · Pull Request #281 · axiomhq/ai

c-ehrlich · 2026-03-09T14:24:47Z

Overview

Eval() was letting scorer inference drive the eval input type, which meant a scorer that only used output and expected could collapse task input to unknown.

This change makes Eval() derive input and expected from the eval data source instead, then validates task and scorers against those types. It also adds type-level regression tests that cover the scorer-without-input case, a structured data example, and negative cases for mismatched task/scorer input.

We use the NoInfer utility type which was added in TS 5.4 (March 2024), so I've added TypeScript as a peer dep. I could also see not adding it and just assuming our users are not on 2 year old versions of TS.

Example

Before this change, the scorer could influence Eval()'s generic inference enough that task would lose the dataset's input shape:

const queueMatchScorer = Scorer(
  'queue-match',
  ({ expected, output }: { expected: { queue: string }; output: { queue: string } }) =>
    expected.queue === output.queue,
);

Eval('route-support-ticket', {
  capability: 'support-routing',
  data: [
    {
      input: {
        ticketId: 'ticket-123',
        customer: { tier: 'enterprise' as const },
      },
      expected: { queue: 'billing' as const },
    },
  ],
  task: ({ input, expected }) => {
    // With this fix, these types come from `data` even though the scorer omits `input`.
    input.ticketId;
    input.customer.tier;
    expected.queue;

    return {
      queue: input.customer.tier === 'enterprise' ? 'billing' : 'general',
    };
  },
  scorers: [queueMatchScorer],
});

Testing

pnpm -C packages/ai typecheck
pnpm -C packages/ai exec vitest run test/evals/eval.types.test.ts

Note

Medium Risk
Type-level changes to Eval()’s public API and a new typescript>=5.4 peer dependency can affect downstream consumers’ compilation/inference, though runtime behavior is unchanged.

Overview
Fixes Eval()’s type inference so TInput/TExpected are derived from the eval data source (using NoInfer on scorers) instead of being driven by scorer parameter types, preventing task inputs from collapsing to unknown when scorers omit input.

Adds compile-time regression tests covering scorer-without-input cases, structured inputs, and negative cases for mismatched task/scorer input types, and updates the eval builder to call Eval() without explicit generics. Declares typescript>=5.4 as an (optional) peer dependency to support NoInfer.

^{Written by Cursor Bugbot for commit ee92aa6. This will update automatically on new commits. Configure here.}

pkg-pr-new · 2026-03-09T14:35:40Z

Open in StackBlitz

npm i https://pkg.pr.new/axiom@281

commit: d397db8

cursor

Cursor Bugbot has reviewed your changes and found 1 potential issue.

^{Bugbot Autofix is OFF. To automatically fix reported issues with cloud agents, enable autofix in the Cursor dashboard.}

packages/ai/package.json

c-ehrlich added 2 commits March 9, 2026 21:24

fix(evals): infer task input types from eval data

8ce7151

fix(evals): avoid scorer-driven input inference

cc49914

c-ehrlich added 3 commits March 9, 2026 22:55

add another test

367a019

add ts peer dep

9f8d283

another test

3331a1c

cursor bot reviewed Mar 9, 2026

View reviewed changes

packages/ai/package.json Show resolved Hide resolved

fix(ai): mark typescript peer dependency optional

ee92aa6

lukasmalkmus approved these changes Mar 10, 2026

View reviewed changes

c-ehrlich merged commit 4635813 into main Mar 10, 2026
10 checks passed

c-ehrlich deleted the better-eval-types branch March 10, 2026 10:49

axiom-automation mentioned this pull request Mar 10, 2026

chore(main): release axiom 0.49.1 #282

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix(evals): infer task input types from eval data#281

fix(evals): infer task input types from eval data#281
c-ehrlich merged 6 commits intomainfrom
better-eval-types

c-ehrlich commented Mar 9, 2026 •

edited by cursor bot

Loading

Uh oh!

pkg-pr-new bot commented Mar 9, 2026 •

edited

Loading

Uh oh!

cursor bot left a comment

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

c-ehrlich commented Mar 9, 2026 • edited by cursor bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Overview

Example

Testing

Uh oh!

pkg-pr-new bot commented Mar 9, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

cursor bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

c-ehrlich commented Mar 9, 2026 •

edited by cursor bot

Loading

pkg-pr-new bot commented Mar 9, 2026 •

edited

Loading