Skip to content

Commit 7a5b91e

Browse files
TLSDCgassexhlucarecursix
authored
Fixing openrouter pricing rate limit (#112)
* Update unit_tests.yml (#101) * request is done once and then reused * Patching minor stuff (#69) * fixing sample_std for single experience * making gradio shared server non default * missing requirement for xray * Improve agent xray app (#70) * 0.2.2 Release (#67) * downgrading ubuntu version for github tests (#62) * Llm api update (#59) * getting rid of .invoke() * adding an AbstractChatModel * changing chat_api structure * Reproducibility again (#61) * core functions * switch to dask * removing joblib dependency and adding dask * fixing imports * handles multiple backends * ensure asyncio loop creation * more tests * setting dashboard address to None * minor * Finally found a way to make it work * initial reproducibility files * Seems to be superflus * adding a reproducibility journal * minor update * more robust * adding reproducibility tools * fix white listing * minor * minor * minor * minor * minor fix * more tests * more results yay * disabling this test * update * update * black * maybe fixing github workflow ? * make get_git_username great again * trigger change * new browsergym * GPT-4o result (and new comment column) * Seems like there was a change to 4o flags, trying these * minor comment * better xray * minor fix * addming a comment field * new agent * another test with GPT-4o * adding llama3 from openrouter * fix naming * unused import * new summary tools and remove "_args" from columns in results * add Llama * initial code for reproducibility agent * adjust inspect results * infer from benchmark * fix reproducibility agent * prevent the repro_dir to be an index variable * updating repro agent stats * Reproducibility agent * instructions to setup workarena * fixing tests * handles better a few edge cases * default progress function to None * minor formatting * minor * initial commit * refactoring with Study class * refactor to adapt for study class * minor * fix pricy test * fixing tests * tmp * print report * minor fix * refine little details about reproducibility * minor * no need for set_temp anymore * sanity check before running main * minor update * minor * new results with 4o on workarena.l1 * sharing is caring * add llama to main.py * new hournal entry * lamma 3 70B * minor * typo * black fix (wasn't configured) --------- Co-authored-by: Thibault Le Sellier de Chezelles <[email protected]> * version bump --------- Co-authored-by: Alexandre Lacoste <[email protected]> * Make share=TRue into a environment variable, disabled by default for security * fix floating point issue with std_reward in agent xray * Update src/agentlab/analyze/inspect_results.py * Update src/agentlab/analyze/agent_xray.py --------- Co-authored-by: Thibault LSDC <[email protected]> Co-authored-by: Alexandre Lacoste <[email protected]> * added tmlr definitive config (#71) * downgrading gradio version (#77) * Study refactor (#73) * adapting to new Benchmark class * fixing tests * fix tests * typo * not ready for gradio 5 * study id and a few fixes * fixing pricy tests --------- Co-authored-by: ThibaultLSDC <[email protected]> * adding message class and updating generic agent accordingly (#68) * adding message class and updating generic agent accordingly * updating tests * Reproducibility test before message class * Adding inspect_result.ipynb to reprod white list * Reproducibility test after message class * L1 before message class * L1 after message class * added append as method to the Discussion class, to make it totally similar to a list * changed to_markdown behavior * updated most_basic_agent * updated ReproAgent * Update src/agentlab/analyze/agent_xray.py * format * new journal entry * immutable as default kwarg * removing __add__ and __radd__ * added deprecation warning * updating tests * version bump * Updating generic_agent to fit use BGym's goal_object (#83) * updating generic agent to goal_object * fixing image markdown display * updating tests * fixing intruction BaseMessage * added merge text in discussion * added merge to discussion class * added tests * Minor revert (#86) * minor revert * revert tests too * Add tabs (#84) * add tabs * make sure it's not computed if not visible * Fix reproduce study (#87) * add tabs * this workaround is worst * bug fix * fix reproduce study * make sure it's not computed if not visible * upgrading gradio dependency (#88) * bgym update (#90) * Workarena TMLR experiments (#89) * new entry * adding llm configs * new journal entries * handling sequntial in VWA (#91) * handling sequntial in VWA * enable comments * format --------- Co-authored-by: ThibaultLSDC <[email protected]> * Tmlr workarena (#92) * adding llm configs * new L1 entries * tmp * reformat * adding assistantbench to reproducibility_util.py * gitignore (#97) * Vision fix (#105) * changing content name * Update src/agentlab/llm/llm_utils.py --------- Co-authored-by: Maxime Gasse <[email protected]> * L2 tmlr (#93) * adding llm configs * L2 entries * claude L3 * claude vision support * miniwob results * 405b L1 entry * Replacing Dask with Ray (#100) * dask-dependencies * minor * replace with ray * adjust tests and move a few things * markdown report * automatic relaunch * add dependencies * reformat * fix unit-test * catch timeout * fixing bugs and making things work * adress comments and black format * new dependencies viewer * Update benchmark to use visualwebarena instead of webarena * Fix import and uncomment code in get_ray_url.py * Add ignore_dependencies option to Study and _agents_on_benchmark functions * Update load_most_recent method to include contains parameter * Update load_most_recent method to accept contains parameter and add warning for ignored dependencies in _agents_on_benchmark * Refactor backend preparation in Study class and improve logging for ignored dependencies * finallly some results with claude on webarena * Add warnings for Windows timeouts and clarify parallel backend options; update get_results method to conditionally save outputs * black * ensure timeout is int (For the 3rd time?) * Refactor timeout handling in context manager; update test to reduce avg_step_timeout and rename test function * black * Change parallel backend from "joblib" to "ray" in run_experiments function * Update src/agentlab/experiments/study.py Co-authored-by: Maxime Gasse <[email protected]> * Update src/agentlab/analyze/inspect_results.py Co-authored-by: Maxime Gasse <[email protected]> * Refactor logging initialization and update layout configurations in dependency graph plotting; adjust node size and font size for better visualization --------- Co-authored-by: Maxime Gasse <[email protected]> * switching to 2 for loops in _agents_on_benchmark (#107) * yet another way to kill timedout jobs (#108) * request is done once and then reused * switched to caching original function bc it doesnt break to tests * added a catch for some openrouter under-the-hood error --------- Co-authored-by: Maxime Gasse <[email protected]> Co-authored-by: Xing Han Lu <[email protected]> Co-authored-by: Alexandre Lacoste <[email protected]>
1 parent feda734 commit 7a5b91e

File tree

3 files changed

+16
-1
lines changed

3 files changed

+16
-1
lines changed

.github/workflows/unit_tests.yml

Lines changed: 4 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -38,6 +38,9 @@ jobs:
3838
- name: Install Playwright
3939
run: playwright install chromium --with-deps
4040

41+
- name: Download WebArena / VisualWebArena ressource files
42+
run: python -c 'import nltk; nltk.download("punkt_tab")'
43+
4144
- name: Fetch MiniWob
4245
uses: actions/checkout@v4
4346
with:
@@ -58,4 +61,4 @@ jobs:
5861
- name: Run AgentLab Unit Tests
5962
env:
6063
MINIWOB_URL: "http://localhost:8080/miniwob/"
61-
run: pytest -n 5 --durations=10 -m 'not pricy' -v tests/
64+
run: pytest -n 5 --durations=10 -m 'not pricy' -v tests/

src/agentlab/llm/chat_api.py

Lines changed: 10 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -208,6 +208,10 @@ def handle_error(error, itr, min_retry_wait_time, max_retry):
208208
return error_type
209209

210210

211+
class OpenRouterError(openai.OpenAIError):
212+
pass
213+
214+
211215
class ChatModel(AbstractChatModel):
212216
def __init__(
213217
self,
@@ -274,6 +278,12 @@ def __call__(self, messages: list[dict]) -> dict:
274278
temperature=self.temperature,
275279
max_tokens=self.max_tokens,
276280
)
281+
282+
if completion.usage is None:
283+
raise OpenRouterError(
284+
"The completion object does not contain usage information. This is likely a bug in the OpenRouter API."
285+
)
286+
277287
self.success = True
278288
break
279289
except openai.OpenAIError as e:

src/agentlab/llm/tracking.py

Lines changed: 2 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -1,3 +1,4 @@
1+
from functools import cache
12
import os
23
import threading
34
from contextlib import contextmanager
@@ -61,6 +62,7 @@ def wrapper(self, obs):
6162
return wrapper
6263

6364

65+
@cache
6466
def get_pricing_openrouter():
6567
api_key = os.getenv("OPENROUTER_API_KEY")
6668
assert api_key, "OpenRouter API key is required"

0 commit comments

Comments
 (0)