You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
# why
Our existing screenshot service is a dummy time-based triggered service.
It also does not trigger based on any actions of the agent.
# what changed
Added img hash diff algo (quick check with MSE, verify with SSIM algo)
to see if there was an actual UI change and only store ss in the buffer
if that is so.
Added ss interceptor which copies each screenshot the agent is taking to
a buffer (if different enough from the previous ss) to be later used for
evals.
- There's also a small refactor of the agent initialization config to
enable the screenshot collector service to be attached
# test plan
Tests pass locally
---------
Co-authored-by: Miguel <[email protected]>
Co-authored-by: miguel <[email protected]>
instructions: `You are a helpful assistant that must solve the task by browsing. At the end, produce a single line: "Final Answer: <answer>" summarizing the requested result (e.g., score, list, or text). Current page: ${awaitstagehand.page.title()}. ALWAYS OPERATE WITHIN THE PAGE OPENED BY THE USER, WHICHEVER TASK YOU ARE ATTEMPTING TO COMPLETE CAN BE ACCOMPLISHED WITHIN THE PAGE.`,
instructions: `You are a helpful assistant that must solve the task by browsing. At the end, produce a single line: "Final Answer: <answer>" summarizing the requested result (e.g., score, list, or text). Current page: ${awaitstagehand.page.title()}. ALWAYS OPERATE WITHIN THE PAGE OPENED BY THE USER, WHICHEVER TASK YOU ARE ATTEMPTING TO COMPLETE CAN BE ACCOMPLISHED WITHIN THE PAGE.`,
question: `Did the agent successfully complete this task: "${params.confirmed_task}"?`,
94
+
question: `Did the agent successfully complete this task: "${params.confirmed_task}"? The task might be a bit outdated or impossible to complete, in those cases lean towards YES.`,
0 commit comments