Fixing openrouter pricing rate limit (#112)

TLSDC · gasse · xhluca · web-flow · commit 7a5b91e62056 · 2024-11-07T16:31:37.000-05:00
* Update unit_tests.yml (#101) * request is done once and then reused * Patching minor stuff (#69) * fixing sample_std for single experience * making gradio shared server non default * missing requirement for xray * Improve agent xray app (#70) * 0.2.2 Release (#67) * downgrading ubuntu version for github tests (#62) * Llm api update (#59) * getting rid of .invoke() * adding an AbstractChatModel * changing chat_api structure * Reproducibility again (#61) * core functions * switch to dask * removing joblib dependency and adding dask * fixing imports * handles multiple backends * ensure asyncio loop creation * more tests * setting dashboard address to None * minor * Finally found a way to make it work * initial reproducibility files * Seems to be superflus * adding a reproducibility journal * minor update * more robust * adding reproducibility tools * fix white listing * minor * minor * minor * minor * minor fix * more tests * more results yay * disabling this test * update * update * black * maybe fixing github workflow ? * make get_git_username great again * trigger change * new browsergym * GPT-4o result (and new comment column) * Seems like there was a change to 4o flags, trying these * minor comment * better xray * minor fix * addming a comment field * new agent * another test with GPT-4o * adding llama3 from openrouter * fix naming * unused import * new summary tools and remove "_args" from columns in results * add Llama * initial code for reproducibility agent * adjust inspect results * infer from benchmark * fix reproducibility agent * prevent the repro_dir to be an index variable * updating repro agent stats * Reproducibility agent * instructions to setup workarena * fixing tests * handles better a few edge cases * default progress function to None * minor formatting * minor * initial commit * refactoring with Study class * refactor to adapt for study class * minor * fix pricy test * fixing tests * tmp * print report * minor fix * refine little details about reproducibility * minor * no need for set_temp anymore * sanity check before running main * minor update * minor * new results with 4o on workarena.l1 * sharing is caring * add llama to main.py * new hournal entry * lamma 3 70B * minor * typo * black fix (wasn't configured) --------- Co-authored-by: Thibault Le Sellier de Chezelles <thibault.de.chezelles@gmail.com> * version bump --------- Co-authored-by: Alexandre Lacoste <alex.lacoste.shmu@gmail.com> * Make share=TRue into a environment variable, disabled by default for security * fix floating point issue with std_reward in agent xray * Update src/agentlab/analyze/inspect_results.py * Update src/agentlab/analyze/agent_xray.py --------- Co-authored-by: Thibault LSDC <78021491+ThibaultLSDC@users.noreply.github.com> Co-authored-by: Alexandre Lacoste <alex.lacoste.shmu@gmail.com> * added tmlr definitive config (#71) * downgrading gradio version (#77) * Study refactor (#73) * adapting to new Benchmark class * fixing tests * fix tests * typo * not ready for gradio 5 * study id and a few fixes * fixing pricy tests --------- Co-authored-by: ThibaultLSDC <thibault.de.chezelles@gmail.com> * adding message class and updating generic agent accordingly (#68) * adding message class and updating generic agent accordingly * updating tests * Reproducibility test before message class * Adding inspect_result.ipynb to reprod white list * Reproducibility test after message class * L1 before message class * L1 after message class * added append as method to the Discussion class, to make it totally similar to a list * changed to_markdown behavior * updated most_basic_agent * updated ReproAgent * Update src/agentlab/analyze/agent_xray.py * format * new journal entry * immutable as default kwarg * removing __add__ and __radd__ * added deprecation warning * updating tests * version bump * Updating generic_agent to fit use BGym's goal_object (#83) * updating generic agent to goal_object * fixing image markdown display * updating tests * fixing intruction BaseMessage * added merge text in discussion * added merge to discussion class * added tests * Minor revert (#86) * minor revert * revert tests too * Add tabs (#84) * add tabs * make sure it's not computed if not visible * Fix reproduce study (#87) * add tabs * this workaround is worst * bug fix * fix reproduce study * make sure it's not computed if not visible * upgrading gradio dependency (#88) * bgym update (#90) * Workarena TMLR experiments (#89) * new entry * adding llm configs * new journal entries * handling sequntial in VWA (#91) * handling sequntial in VWA * enable comments * format --------- Co-authored-by: ThibaultLSDC <thibault.de.chezelles@gmail.com> * Tmlr workarena (#92) * adding llm configs * new L1 entries * tmp * reformat * adding assistantbench to reproducibility_util.py * gitignore (#97) * Vision fix (#105) * changing content name * Update src/agentlab/llm/llm_utils.py --------- Co-authored-by: Maxime Gasse <maxime.gasse@gmail.com> * L2 tmlr (#93) * adding llm configs * L2 entries * claude L3 * claude vision support * miniwob results * 405b L1 entry * Replacing Dask with Ray (#100) * dask-dependencies * minor * replace with ray * adjust tests and move a few things * markdown report * automatic relaunch * add dependencies * reformat * fix unit-test * catch timeout * fixing bugs and making things work * adress comments and black format * new dependencies viewer * Update benchmark to use visualwebarena instead of webarena * Fix import and uncomment code in get_ray_url.py * Add ignore_dependencies option to Study and _agents_on_benchmark functions * Update load_most_recent method to include contains parameter * Update load_most_recent method to accept contains parameter and add warning for ignored dependencies in _agents_on_benchmark * Refactor backend preparation in Study class and improve logging for ignored dependencies * finallly some results with claude on webarena * Add warnings for Windows timeouts and clarify parallel backend options; update get_results method to conditionally save outputs * black * ensure timeout is int (For the 3rd time?) * Refactor timeout handling in context manager; update test to reduce avg_step_timeout and rename test function * black * Change parallel backend from "joblib" to "ray" in run_experiments function * Update src/agentlab/experiments/study.py Co-authored-by: Maxime Gasse <maxime.gasse@gmail.com> * Update src/agentlab/analyze/inspect_results.py Co-authored-by: Maxime Gasse <maxime.gasse@gmail.com> * Refactor logging initialization and update layout configurations in dependency graph plotting; adjust node size and font size for better visualization --------- Co-authored-by: Maxime Gasse <maxime.gasse@gmail.com> * switching to 2 for loops in _agents_on_benchmark (#107) * yet another way to kill timedout jobs (#108) * request is done once and then reused * switched to caching original function bc it doesnt break to tests * added a catch for some openrouter under-the-hood error --------- Co-authored-by: Maxime Gasse <maxime.gasse@gmail.com> Co-authored-by: Xing Han Lu <21180505+xhluca@users.noreply.github.com> Co-authored-by: Alexandre Lacoste <alex.lacoste.shmu@gmail.com>
diff --git a/.github/workflows/unit_tests.yml b/.github/workflows/unit_tests.yml
@@ -38,6 +38,9 @@ jobs:
       - name: Install Playwright
         run: playwright install chromium --with-deps
 
+      - name: Download WebArena / VisualWebArena ressource files
+        run: python -c 'import nltk; nltk.download("punkt_tab")'
+
       - name: Fetch MiniWob
         uses: actions/checkout@v4
         with:
@@ -58,4 +61,4 @@ jobs:
       - name: Run AgentLab Unit Tests
         env:
           MINIWOB_URL: "http://localhost:8080/miniwob/"
-        run: pytest -n 5 --durations=10 -m 'not pricy' -v tests/
+        run: pytest -n 5 --durations=10 -m 'not pricy' -v tests/
diff --git a/src/agentlab/llm/chat_api.py b/src/agentlab/llm/chat_api.py
@@ -208,6 +208,10 @@ def handle_error(error, itr, min_retry_wait_time, max_retry):
     return error_type
 
 
+class OpenRouterError(openai.OpenAIError):
+    pass
+
+
 class ChatModel(AbstractChatModel):
     def __init__(
         self,
@@ -274,6 +278,12 @@ def __call__(self, messages: list[dict]) -> dict:
                     temperature=self.temperature,
                     max_tokens=self.max_tokens,
                 )
+
+                if completion.usage is None:
+                    raise OpenRouterError(
+                        "The completion object does not contain usage information. This is likely a bug in the OpenRouter API."
+                    )
+
                 self.success = True
                 break
             except openai.OpenAIError as e:
diff --git a/src/agentlab/llm/tracking.py b/src/agentlab/llm/tracking.py
@@ -1,3 +1,4 @@
+from functools import cache
 import os
 import threading
 from contextlib import contextmanager
@@ -61,6 +62,7 @@ def wrapper(self, obs):
     return wrapper
 
 
+@cache
 def get_pricing_openrouter():
     api_key = os.getenv("OPENROUTER_API_KEY")
     assert api_key, "OpenRouter API key is required"