Skip to content

fix(evals): infer task input types from eval data#281

Merged
c-ehrlich merged 6 commits intomainfrom
better-eval-types
Mar 10, 2026
Merged

fix(evals): infer task input types from eval data#281
c-ehrlich merged 6 commits intomainfrom
better-eval-types

Conversation

@c-ehrlich
Copy link
Collaborator

@c-ehrlich c-ehrlich commented Mar 9, 2026

Overview

Eval() was letting scorer inference drive the eval input type, which meant a scorer that only used output and expected could collapse task input to unknown.

This change makes Eval() derive input and expected from the eval data source instead, then validates task and scorers against those types. It also adds type-level regression tests that cover the scorer-without-input case, a structured data example, and negative cases for mismatched task/scorer input.

We use the NoInfer utility type which was added in TS 5.4 (March 2024), so I've added TypeScript as a peer dep. I could also see not adding it and just assuming our users are not on 2 year old versions of TS.

Example

Before this change, the scorer could influence Eval()'s generic inference enough that task would lose the dataset's input shape:

const queueMatchScorer = Scorer(
  'queue-match',
  ({ expected, output }: { expected: { queue: string }; output: { queue: string } }) =>
    expected.queue === output.queue,
);

Eval('route-support-ticket', {
  capability: 'support-routing',
  data: [
    {
      input: {
        ticketId: 'ticket-123',
        customer: { tier: 'enterprise' as const },
      },
      expected: { queue: 'billing' as const },
    },
  ],
  task: ({ input, expected }) => {
    // With this fix, these types come from `data` even though the scorer omits `input`.
    input.ticketId;
    input.customer.tier;
    expected.queue;

    return {
      queue: input.customer.tier === 'enterprise' ? 'billing' : 'general',
    };
  },
  scorers: [queueMatchScorer],
});

Testing

  • pnpm -C packages/ai typecheck
  • pnpm -C packages/ai exec vitest run test/evals/eval.types.test.ts

Note

Medium Risk
Type-level changes to Eval()’s public API and a new typescript>=5.4 peer dependency can affect downstream consumers’ compilation/inference, though runtime behavior is unchanged.

Overview
Fixes Eval()’s type inference so TInput/TExpected are derived from the eval data source (using NoInfer on scorers) instead of being driven by scorer parameter types, preventing task inputs from collapsing to unknown when scorers omit input.

Adds compile-time regression tests covering scorer-without-input cases, structured inputs, and negative cases for mismatched task/scorer input types, and updates the eval builder to call Eval() without explicit generics. Declares typescript>=5.4 as an (optional) peer dependency to support NoInfer.

Written by Cursor Bugbot for commit ee92aa6. This will update automatically on new commits. Configure here.

@pkg-pr-new
Copy link

pkg-pr-new bot commented Mar 9, 2026

Open in StackBlitz

npm i https://pkg.pr.new/axiom@281

commit: d397db8

Copy link

@cursor cursor bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Cursor Bugbot has reviewed your changes and found 1 potential issue.

Bugbot Autofix is OFF. To automatically fix reported issues with cloud agents, enable autofix in the Cursor dashboard.

@c-ehrlich c-ehrlich merged commit 4635813 into main Mar 10, 2026
10 checks passed
@c-ehrlich c-ehrlich deleted the better-eval-types branch March 10, 2026 10:49
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants