Infinite loop prevention, visual log, and a close window hazard #115
tralfamadude
started this conversation in
Ideas
Replies: 0 comments
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Uh oh!
There was an error while loading. Please reload this page.
Uh oh!
There was an error while loading. Please reload this page.
-
A lot of agents have an issue with getting into infinite/indefinite loops. I saw this with agent_s2 which uses pyautogui (on macos 12.x and 2 monitors, gpt-4o for language and grounding). I asked it to find a browser window showing a camera and to click the visible Reviews button. The visual analysis worked, but it did not like how I had many windows open, so it planned on closing windows (not so nice). When pyautogui was unable to effect clicks on the windows (possibly due to multiple monitors not working), it just kept on trying over and over until I stopped it.
Image analysis worked (it found the button coordinates). Action verification worked. Admittedly, I did not let it run for a long time to see if it would end the looping since I did not want it to spend more than $1 for a simple operation.
To detect these kinds of loops, I suppose a meta-critic prompt is needed to analyze more than one step to see if there is a pattern over many actions where something is tried multiple times and always fails. I did make a request that it not minimize windows and that seemed to work, but I simultaneously made sure a target winow was nearly maximized on both monitors, so I'm not certain that additional condition helped or if view simplification helped.
Total spent on a few runs: $2.75
FEATURE REQUEST: keep a running tally of total since start of session and print it occasionally.
FEATURE REQUEST: add meta critic to see if it is not making progress because screen interaction always fails
FEATURE REQUEST: option to ouput png files along side of *.log files, say with name ${BASENAME_OF_LOG}_${ISO_DATETIME_STAMP}.png would make studying runs easier.
==Log excerpt showing closing windows to reduce clutter, keeps trying one window after another: ==
(grep -v "data:image/png" used to skip base64 image requests)
Beta Was this translation helpful? Give feedback.
All reactions