-
Notifications
You must be signed in to change notification settings - Fork 0
Open
Labels
enhancementNew feature or requestNew feature or request
Description
Testing: web agent pipeline
full spec at spec
Status: Very limited.
Role. Provide module-level and end-to-end tests for the new web-search agent pipeline. This is an essential part of the system: the tests must run the pipeline on real Validator data and include cost benchmarking.
Responsibilities.
- Add focused tests for each boundary (Input, Serper, Crawler, Embeddings, Nav, SearchHistory, Decision).
- Add end-to-end tests hitting the agent entrypoint on real historical predictions from the Validator and checking both outcomes and evidence, while recording cost metrics.
- Make it easy to compare cost across runs (LLM tokens, SERP/Scraper usage) so budget and behavior regressions are visible.
- Document key findings from the testing process (edge cases, failure patterns, cost surprises) and share them with the team in Discord so the pipeline can be tuned collaboratively.
TODO
- Add small, focused tests around each module boundary so
SearchTask→SearchHistorytransitions remain stable as the pipeline evolves. - Add at least one end-to-end test that replays a small, fixed set of Validator predictions through the agent entrypoint and asserts on the resulting
AgentResultshape, key evidence fields, and basic cost metrics (without depending on live APIs).
Metadata
Metadata
Assignees
Labels
enhancementNew feature or requestNew feature or request