feat: browser-use tool migration [0.9.0]#352
feat: browser-use tool migration [0.9.0]#352shanyu-strix wants to merge 40 commits intousestrix:mainfrom
Conversation
Greptile SummaryThis PR replaces the Playwright-based browser tool with browser-use, adding a natural-language agent mode ( Key findings:
Confidence Score: 3/5
Important Files Changed
|
5c1a320 to
4ab0c8c
Compare
|
@greptileai please review |
Enhances browser command robustness by gracefully handling cases where the DOM state is not yet available, returning an informative error instead of asserting. Updates the container healthcheck to directly probe Chromium's internal CDP port, simplifying the check by removing previous indirection.
|
@greptileai fixed, please check |
|
@greptileai review |
|
@greptileai check please |
|
@greptileai please check |
pyproject.toml
Outdated
| "strix/tools/browser/**/*.py" = [ | ||
| "ARG002", # Unused method argument (interface methods may not use all args) | ||
| "PLR0911", # Too many return statements (dispatchers and complex browser logic) | ||
| "PLR0912", # Too many branches (dispatch functions) | ||
| "PLR0915", # Too many statements (complex browser task handling) | ||
| ] | ||
| "strix/interface/tool_components/browser_renderer.py" = [ | ||
| "PLR0911", # Too many return statements (action dispatcher) | ||
| "PLR0912", # Too many branches | ||
| "PLR0915", # Too many statements | ||
| ] | ||
| "strix/telemetry/tracer.py" = [ | ||
| "PLR0912", # Too many branches (save_run_data is legitimately complex) | ||
| "PLR0915", # Too many statements | ||
| ] |
pyproject.toml
Outdated
| "browser_use", | ||
| "browser_use.*", | ||
| "cdp_use", | ||
| "cdp_use.*", | ||
| "aiohttp", | ||
| "aiohttp.*", |
There was a problem hiding this comment.
why duplicated? keep ones with * only
strix/tools/__init__.py
Outdated
| from .python import * # noqa: F403 | ||
| from .terminal import * # noqa: F403 | ||
|
|
||
| if not DISABLE_BROWSER: |
strix/tools/executor.py
Outdated
| # Propagate agent_id so tools can scope per-agent resources (e.g. browser instances). | ||
| # NOTE: This is needed for browser_use tools to work correctly. | ||
| agent_id = getattr(agent_state, "agent_id", None) if agent_state else None | ||
| if agent_id: | ||
| set_current_agent_id(agent_id) | ||
|
|
There was a problem hiding this comment.
isn't this already done? I think we already provide "state" obj to all tools, which has the agent id
| def llm_supports_vision() -> bool: | ||
| try: | ||
| import litellm | ||
|
|
||
| from strix.config.config import resolve_llm_config | ||
|
|
||
| model, _, _ = resolve_llm_config() | ||
| return bool(model and litellm.supports_vision(model)) | ||
| except Exception: # noqa: BLE001 | ||
| return False |
There was a problem hiding this comment.
Not a good idea to make a function for this...
| def _parse_usage(response: Any) -> ChatInvokeUsage | None: | ||
| """Extract token usage from a litellm response.""" | ||
| usage = getattr(response, "usage", None) | ||
| if usage is None: | ||
| return None | ||
|
|
||
| prompt_tokens = getattr(usage, "prompt_tokens", 0) or 0 | ||
| completion_tokens = getattr(usage, "completion_tokens", 0) or 0 | ||
|
|
||
| prompt_cached = getattr(usage, "cache_read_input_tokens", None) | ||
| cache_creation = getattr(usage, "cache_creation_input_tokens", None) | ||
|
|
||
| if prompt_cached is None: | ||
| details = getattr(usage, "prompt_tokens_details", None) | ||
| if details: | ||
| prompt_cached = getattr(details, "cached_tokens", None) | ||
|
|
||
| return ChatInvokeUsage( | ||
| prompt_tokens=prompt_tokens, | ||
| prompt_cached_tokens=int(prompt_cached) if prompt_cached is not None else None, | ||
| prompt_cache_creation_tokens=int(cache_creation) | ||
| if cache_creation is not None | ||
| else None, | ||
| prompt_image_tokens=None, | ||
| completion_tokens=completion_tokens, | ||
| total_tokens=prompt_tokens + completion_tokens, | ||
| ) |
There was a problem hiding this comment.
Do we need to get usage data in case of having browser use internally?
I think we can just get rid of it
There was a problem hiding this comment.
Don't we want to be tracking the token spend from the browser agent?
containers/browser-server.py
Outdated
| def _strip_token(url: str) -> str: | ||
| return re.sub(r"[?&]token=[^&]*", "", url).replace("?&", "?").rstrip("?") |
There was a problem hiding this comment.
why are we parsing it manually? why not just do it like the tool server, and use uvicorn/fastapi for the whole file
strix/runtime/docker_runtime.py
Outdated
| f"{CONTAINER_CAIDO_PORT}/tcp": self._caido_port, | ||
| }, | ||
| cap_add=["NET_ADMIN", "NET_RAW"], | ||
| shm_size="256m", |
3fb0d9e to
c52b199
Compare
… into feat/browseruse
feat: browser-use tool migration
TL;DR
Ripped out the old Playwright browser tool and replaced it with browser-use. Agents can now browse the web with natural language ("go to example.com and find the login form") or precise commands (click element #5, set a cookie, take a screenshot). Works both inside the sandbox container and against your local Chrome.
What's new
Two ways to use the browser:
action="run"+ a natural language task → browser-use Agent handles it end-to-endaction="click","screenshot","cookies", etc. → direct CDP commands, no LLM neededTwo modes:
use_local=True) — attaches to your system Chrome, keeps your login sessions and cookies [Self-healing sessions:
TODO
Notable decisions
socat for CDP — Chromium only binds its debug port to
127.0.0.1, even if you tell it not to. Docker port mapping needs0.0.0.0. socat forwards between the two. Standard workaround.WebSocket URL rewriting — Chromium reports its WebSocket URL with the container-internal address. We rewrite it to the Docker-mapped host:port so connections from outside the container actually work.
LiteLLM adapter — browser-use expects a specific chat model interface. Rather than hardcoding a provider,
ChatLiteLLMroutes through litellm so any model string (anthropic/claude-sonnet-4-20250514,openrouter/google/gemini-2.0-flash-001, etc.) just works.