Skip to content

Testing: web agent pipeline #11

@functor-flow

Description

@functor-flow

Testing: web agent pipeline

full spec at spec

Status: Very limited.

Role. Provide module-level and end-to-end tests for the new web-search agent pipeline. This is an essential part of the system: the tests must run the pipeline on real Validator data and include cost benchmarking.

Responsibilities.

  • Add focused tests for each boundary (Input, Serper, Crawler, Embeddings, Nav, SearchHistory, Decision).
  • Add end-to-end tests hitting the agent entrypoint on real historical predictions from the Validator and checking both outcomes and evidence, while recording cost metrics.
  • Make it easy to compare cost across runs (LLM tokens, SERP/Scraper usage) so budget and behavior regressions are visible.
  • Document key findings from the testing process (edge cases, failure patterns, cost surprises) and share them with the team in Discord so the pipeline can be tuned collaboratively.

TODO

  • Add small, focused tests around each module boundary so SearchTaskSearchHistory transitions remain stable as the pipeline evolves.
  • Add at least one end-to-end test that replays a small, fixed set of Validator predictions through the agent entrypoint and asserts on the resulting AgentResult shape, key evidence fields, and basic cost metrics (without depending on live APIs).

Metadata

Metadata

Assignees

No one assigned

    Labels

    enhancementNew feature or request

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions