Skip to content

tests(docs): add component-level eval tracing and dataset IO tests#2433

Open
BloggerBust wants to merge 11 commits intoconfident-ai:mainfrom
BloggerBust:test/add-eval-e2e-tests
Open

tests(docs): add component-level eval tracing and dataset IO tests#2433
BloggerBust wants to merge 11 commits intoconfident-ai:mainfrom
BloggerBust:test/add-eval-e2e-tests

Conversation

@BloggerBust
Copy link
Contributor

  • Add component-level eval tests (shape snapshots and semantic assertions)
  • Add dataset JSON/CSV loader tests for EvaluationDataset
  • refactor trace snapshot utilities to tests/utils/trace_assertions.py
  • Re-export trace snapshot utilities from tests/test_integrations/utils.py
  • Commit generated trace snapshot JSON fixtures

@greptile-apps
Copy link
Contributor

greptile-apps bot commented Jan 14, 2026

Skipped: This PR was not opened by one of your configured authors: (tanayvaswani, trevor-cai, kritinv, ...)

@vercel
Copy link

vercel bot commented Jan 14, 2026

@BloggerBust is attempting to deploy a commit to the Confident AI Team on Vercel.

A member of the Team first needs to authorize it.

- Add component-level eval tests (shape snapshots and semantic assertions)
- Add dataset JSON/CSV loader tests for EvaluationDataset
- refactor trace snapshot utilities to tests/utils/trace_assertions.py
- Re-export trace snapshot utilities from tests/test_integrations/utils.py
- Commit generated trace snapshot JSON fixtures
@BloggerBust BloggerBust force-pushed the test/add-eval-e2e-tests branch from edfea09 to f4d420e Compare January 15, 2026 20:39
- add coverage for tool spans, agent spans, metrics scoping, update_current_span last write wins, and evals_iterator input mapping
- move span/trace helpers into shared test helpers
- add rooted app smoke test asserting agent/retriever/generator and metrics
- add rooted app trace fixture snapshot
…pans

Filter observe kwargs to ToolSpan model fields and drop colliding keys before
constructing ToolSpan to avoid duplicate keyword errors.

Also adds doc-driven component-level tracing tests covering tool spans and
LLMTestCase tool call fields.
@trevor-cai trevor-cai force-pushed the test/add-eval-e2e-tests branch from 22fd071 to 940041e Compare January 19, 2026 02:31
ToolSpan now filters observe_kwargs to model fields and drops any keys that
would collide with explicit span_kwargs so reserved fields always win.

- Rename tool span in component-level doc tests to avoid name collisions
- Assert observe(name) overrides function name, and update_current_span overrides observe
- Add checklist coverage for parent/child UUID relationships in nested spans
- Add regression test to ensure @observe(type="tool", name=...) does not crash due to name collison
@trevor-cai trevor-cai force-pushed the test/add-eval-e2e-tests branch from 940041e to e7c7324 Compare January 19, 2026 02:33
@A-Vamshi A-Vamshi force-pushed the test/add-eval-e2e-tests branch from c58fa9e to b368a4b Compare January 20, 2026 18:36
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants