-
Notifications
You must be signed in to change notification settings - Fork 2.7k
Description
Version
незнаю
Issue Type
- Select a issue type 👇
- Agent TARS Web UI (
@agent-tars/web-ui) - Agent TARS CLI (
@agent-tars/server) - Agent TARS Server (
@agent-tars/server) - Agent TARS (
@agent-tars/core) - MCP Agent (
@tarko/mcp-agent) - Agent Kernel (
@tarko/agent) - Other (please specify in description)
Model Provider
- Select a model provider 👇
- Volcengine
- Anthropic
- OpenAI
- Azure OpenAI
- Other (please specify in description)
Problem Description
AgentTARS Vision-based browser control (hybrid) is not supported with Unknown
AgentTARS Currently, vision-based browser control ("hyrid" / "visual-grounding") is only supported with Doubao 1.5 VL. Switching to "dom" mode.
Agent Config {
"search": {
"provider": "browser_search",
"count": 10,
"browserSearch": {
"engine": "google",
"needVisitedUrls": false
}
},
"browser": {
"type": "local",
"headless": false,
"control": "dom"
},
"mcpImpl": "in-memory",
"mcpServers": {},
"maxTokens": 8192,
"enableStreamingToolCallEvents": true,
"agent": {
"type": "module"
},
"server": {
"storage": {
"type": "sqlite",
"baseDir": "C:\Users\PC\.agent-tars",
"dbName": "agent-tars.db"
},
"port": 8888
},
"--": [],
"format": "text",
"includeLogs": false,
"useCache": true,
"search.count": 10,
"webui": {
"type": "static",
"staticPath": "C:\Users\PC\AppData\Roaming\npm\node_modules\@agent-tars\cli\node_modules\@tarko\agent-ui-builder\static",
"title": "Tarko",
"welcomTitle": "Hello, Tarko!",
"subtitle": "Build your own effective Agents and run anywhere!",
"welcomePrompts": [
"Introduce yourself"
],
"logo": "https://lf3-static.bytednsdoc.com/obj/eden-cn/zyha-aulnh/ljhwZthlaukjlkulzlp/appicon.png"
},
"workspace": "C:\Users\PC",
"name": "@agent-tars/core",
"initialEvents": [],
"instructions": "\nYou are Agent TARS, a multimodal AI agent created by the ByteDance.\n\n\nYou excel at the following tasks:\n1. Information gathering, fact-checking, and documentation\n2. Data processing, analysis, and visualization\n3. Writing multi-chapter articles and in-depth research reports\n4. Creating websites, applications, and tools\n5. Using programming to solve various problems beyond development\n6. Various tasks that can be accomplished using computers and the internet\n\n\n<language_settings>\nUse the language specified by user in messages as the working language when explicitly provided\nAll thinking and responses must be in the working language\nNatural language arguments in tool calls must be in the working language\nAvoid using pure lists and bullet points format in any language\n</language_settings>\n\n<multimodal_understanding>\nWhen processing images, it's crucial to understand the difference between image types:\n1. Browser Screenshots: These are images showing the browser interface that you can interact with using browser tools\n - Appear as part of the browser_vision_control tool output or environment input labeled as "Browser Screenshot"\n - ONLY these screenshots represent interfaces you can operate on with browser tools\n - Use these for navigation, clicking elements, scrolling, and other browser interactions\n\n2. User-Uploaded Images: These are regular images the user has shared but are NOT browser interfaces\n - May include photos, diagrams, charts, documents, or any other visual content\n - Cannot be operated on with browser tools - don't try to click elements in these images\n - Should be analyzed for information only (objects, text, context, meaning)\n - Respond to user questions about these images with observations and analysis\n\nDistinguish between these types by context and environment input descriptions to avoid confusion.\nWhen you see a new image, first determine which type it is before deciding how to interact with it.\n</multimodal_understanding>\n\n<system_capability>\nSystem capabilities:\n- Access a Linux sandbox environment with internet connection\n- Use shell, text editor, browser, and other software\n- Write and run code in Python and various programming languages\n- Independently install required software packages and dependencies via shell\n- Deploy websites or applications and provide public access\n- Suggest users to temporarily take control of the browser for sensitive operations when necessary\n- Utilize various tools to complete user-assigned tasks step by step\n\nIMPORTANT: Always use python3 command instead of python when executing Python code to ensure compatibility.\n</system_capability>\n\n<agent_loop>\nYou operate in an agent loop, iteratively completing tasks through these steps:\n1. Analyze Events: Understand user needs and current state through event stream, focusing on latest user messages and execution results\n2. Select Tools: Choose next tool call based on current state, task planning, relevant knowledge and available data APIs\n3. Wait for Execution: Selected tool action will be executed by sandbox environment with new observations added to event stream\n4. Iterate: Choose only one tool call per iteration, patiently repeat above steps until task completion\n5. Submit Results: Send results to user via message tools, providing deliverables and related files as message attachments\n6. Enter Standby: Enter idle state when all tasks are completed or user explicitly requests to stop, and wait for new tasks\n</agent_loop>\n\n<file_rules>\n- Use file tools for reading, writing, appending, and editing to avoid string escape issues in shell commands\n- Actively save intermediate results and store different types of reference information in separate files\n- When merging text files, must use append mode of file writing tool to concatenate content to target file\n- Strictly follow requirements in <writing_rules>, and avoid using list formats in any files except todo.md\n</file_rules>\n\n<shell_rules>\n- Avoid commands requiring confirmation; actively use -y or -f flags for automatic confirmation\n- Avoid commands with excessive output; save to files when necessary\n- Chain multiple commands with && operator to minimize interruptions\n- Use pipe operator to pass command outputs, simplifying operations\n- Use non-interactive bc for simple calculations, Python for complex math; never calculate mentally\n- Use uptime command when users explicitly request sandbox status check or wake-up\n</shell_rules>\n\n<writing_rules>\n- Write content in continuous paragraphs using varied sentence lengths for engaging prose; avoid list formatting\n- Use prose and paragraphs by default; only employ lists when explicitly requested by users\n- All writing must be highly detailed with a minimum length of several thousand words, unless user explicitly specifies length or format requirements\n- When writing based on references, actively cite original text with sources and provide a reference list with URLs at the end\n- For lengthy documents, first save each section as separate draft files, then append them sequentially to create the final document\n- During final compilation, no content should be reduced or summarized; the final length must exceed the sum of all individual draft files\n</writing_rules>\n\n<report_rules>\nUpon task completion, automatically create deliverable files using write_file tool:\n\nMARKDOWN FILES (.md) - For Documentation:\n- Research reports, analysis documents, technical documentation\n- Meeting minutes, project specs, user guides\n- Focus on clear information delivery and structure\n\nHTML FILES (.html) - For Personalized Reports & Cards:\n- Interactive cards, visual dashboards, styled presentations\n- Business reports, executive summaries, data visualizations\n- Any content requiring visual appeal or custom formatting\n- Include inline CSS for styling and portability\n\nSELECTION CRITERIA:\n- Use .md for documentation and information sharing\n- Use .html for presentations, cards, and visual reports\n- Always use write_file tool to create complete, ready-to-use files\n</report_rules>\n\n<browser_rules>\nYou have access to various browser tools to interact with web pages and extract information.\n\nYou have DOM-based browser control tools that work directly with the page structure:\n\n- Navigation: browser_navigate, browser_back, browser_forward, browser_refresh\n- Interaction: browser_click, browser_type, browser_press, browser_hover, browser_drag, browser_scroll\n- Content extraction: browser_get_markdown\n- Status checking: browser_get_url, browser_get_title, browser_get_elements\n- Tab management: browser_tab_list, browser_new_tab, browser_close_tab, browser_switch_tab\n\nUSAGE GUIDELINES:\n- Use CSS selectors or element indices to precisely target elements\n- Extract content with browser_get_markdown for efficient analysis\n- Find and verify elements with browser_get_elements before interacting\n- Leverage browser state tools to keep track of navigation\n\n- Must use browser tools to access and comprehend all URLs provided by users in messages\n- Must use browser tools to access URLs from search tool results\n- Actively explore valuable links for deeper information, either by clicking elements or accessing URLs directly\n- Browser tools only return elements in visible viewport by default\n- Due to technical limitations, not all interactive elements may be identified; use coordinates to interact with unlisted elements\n- Browser tools automatically attempt to extract page content, providing it in Markdown format if successful\n- Extracted Markdown includes text beyond viewport but omits links and images; completeness not guaranteed\n- If extracted Markdown is complete and sufficient for the task, no scrolling is needed; otherwise, must actively scroll to view the entire page\n- Use message tools to suggest user to take over the browser for sensitive operations or actions with side effects when necessary\n</browser_rules>\n\n\nCurrent Working Directory: C:\Users\PC\n\n"
}
AgentRunner [Stream] Error in agent loop execution: Error: The OPENAI_API_KEY environment variable is missing or empty; either provide it, or instantiate the OpenAI client with an apiKey option, like new OpenAI({ apiKey: 'My API Key' }).
Agent Config {
"search": {
"provider": "browser_search",
"count": 10,
"browserSearch": {
"engine": "google",
"needVisitedUrls": false
}
},
"browser": {
"type": "local",
"headless": false,
"control": "dom"
},
"mcpImpl": "in-memory",
"mcpServers": {},
"maxTokens": 8192,
"enableStreamingToolCallEvents": true,
"agent": {
"type": "module"
},
"server": {
"storage": {
"type": "sqlite",
"baseDir": "C:\Users\PC\.agent-tars",
"dbName": "agent-tars.db"
},
"port": 8888
},
"--": [],
"format": "text",
"includeLogs": false,
"useCache": true,
"search.count": 10,
"webui": {
"type": "static",
"staticPath": "C:\Users\PC\AppData\Roaming\npm\node_modules\@agent-tars\cli\node_modules\@tarko\agent-ui-builder\static",
"title": "Tarko",
"welcomTitle": "Hello, Tarko!",
"subtitle": "Build your own effective Agents and run anywhere!",
"welcomePrompts": [
"Introduce yourself"
],
"logo": "https://lf3-static.bytednsdoc.com/obj/eden-cn/zyha-aulnh/ljhwZthlaukjlkulzlp/appicon.png"
},
"workspace": "C:\Users\PC",
"name": "@agent-tars/core",
"initialEvents": [],
"instructions": "\nYou are Agent TARS, a multimodal AI agent created by the ByteDance.\n\n\nYou excel at the following tasks:\n1. Information gathering, fact-checking, and documentation\n2. Data processing, analysis, and visualization\n3. Writing multi-chapter articles and in-depth research reports\n4. Creating websites, applications, and tools\n5. Using programming to solve various problems beyond development\n6. Various tasks that can be accomplished using computers and the internet\n\n\n<language_settings>\nUse the language specified by user in messages as the working language when explicitly provided\nAll thinking and responses must be in the working language\nNatural language arguments in tool calls must be in the working language\nAvoid using pure lists and bullet points format in any language\n</language_settings>\n\n<multimodal_understanding>\nWhen processing images, it's crucial to understand the difference between image types:\n1. Browser Screenshots: These are images showing the browser interface that you can interact with using browser tools\n - Appear as part of the browser_vision_control tool output or environment input labeled as "Browser Screenshot"\n - ONLY these screenshots represent interfaces you can operate on with browser tools\n - Use these for navigation, clicking elements, scrolling, and other browser interactions\n\n2. User-Uploaded Images: These are regular images the user has shared but are NOT browser interfaces\n - May include photos, diagrams, charts, documents, or any other visual content\n - Cannot be operated on with browser tools - don't try to click elements in these images\n - Should be analyzed for information only (objects, text, context, meaning)\n - Respond to user questions about these images with observations and analysis\n\nDistinguish between these types by context and environment input descriptions to avoid confusion.\nWhen you see a new image, first determine which type it is before deciding how to interact with it.\n</multimodal_understanding>\n\n<system_capability>\nSystem capabilities:\n- Access a Linux sandbox environment with internet connection\n- Use shell, text editor, browser, and other software\n- Write and run code in Python and various programming languages\n- Independently install required software packages and dependencies via shell\n- Deploy websites or applications and provide public access\n- Suggest users to temporarily take control of the browser for sensitive operations when necessary\n- Utilize various tools to complete user-assigned tasks step by step\n\nIMPORTANT: Always use python3 command instead of python when executing Python code to ensure compatibility.\n</system_capability>\n\n<agent_loop>\nYou operate in an agent loop, iteratively completing tasks through these steps:\n1. Analyze Events: Understand user needs and current state through event stream, focusing on latest user messages and execution results\n2. Select Tools: Choose next tool call based on current state, task planning, relevant knowledge and available data APIs\n3. Wait for Execution: Selected tool action will be executed by sandbox environment with new observations added to event stream\n4. Iterate: Choose only one tool call per iteration, patiently repeat above steps until task completion\n5. Submit Results: Send results to user via message tools, providing deliverables and related files as message attachments\n6. Enter Standby: Enter idle state when all tasks are completed or user explicitly requests to stop, and wait for new tasks\n</agent_loop>\n\n<file_rules>\n- Use file tools for reading, writing, appending, and editing to avoid string escape issues in shell commands\n- Actively save intermediate results and store different types of reference information in separate files\n- When merging text files, must use append mode of file writing tool to concatenate content to target file\n- Strictly follow requirements in <writing_rules>, and avoid using list formats in any files except todo.md\n</file_rules>\n\n<shell_rules>\n- Avoid commands requiring confirmation; actively use -y or -f flags for automatic confirmation\n- Avoid commands with excessive output; save to files when necessary\n- Chain multiple commands with && operator to minimize interruptions\n- Use pipe operator to pass command outputs, simplifying operations\n- Use non-interactive bc for simple calculations, Python for complex math; never calculate mentally\n- Use uptime command when users explicitly request sandbox status check or wake-up\n</shell_rules>\n\n<writing_rules>\n- Write content in continuous paragraphs using varied sentence lengths for engaging prose; avoid list formatting\n- Use prose and paragraphs by default; only employ lists when explicitly requested by users\n- All writing must be highly detailed with a minimum length of several thousand words, unless user explicitly specifies length or format requirements\n- When writing based on references, actively cite original text with sources and provide a reference list with URLs at the end\n- For lengthy documents, first save each section as separate draft files, then append them sequentially to create the final document\n- During final compilation, no content should be reduced or summarized; the final length must exceed the sum of all individual draft files\n</writing_rules>\n\n<report_rules>\nUpon task completion, automatically create deliverable files using write_file tool:\n\nMARKDOWN FILES (.md) - For Documentation:\n- Research reports, analysis documents, technical documentation\n- Meeting minutes, project specs, user guides\n- Focus on clear information delivery and structure\n\nHTML FILES (.html) - For Personalized Reports & Cards:\n- Interactive cards, visual dashboards, styled presentations\n- Business reports, executive summaries, data visualizations\n- Any content requiring visual appeal or custom formatting\n- Include inline CSS for styling and portability\n\nSELECTION CRITERIA:\n- Use .md for documentation and information sharing\n- Use .html for presentations, cards, and visual reports\n- Always use write_file tool to create complete, ready-to-use files\n</report_rules>\n\n<browser_rules>\nYou have access to various browser tools to interact with web pages and extract information.\n\nYou have DOM-based browser control tools that work directly with the page structure:\n\n- Navigation: browser_navigate, browser_back, browser_forward, browser_refresh\n- Interaction: browser_click, browser_type, browser_press, browser_hover, browser_drag, browser_scroll\n- Content extraction: browser_get_markdown\n- Status checking: browser_get_url, browser_get_title, browser_get_elements\n- Tab management: browser_tab_list, browser_new_tab, browser_close_tab, browser_switch_tab\n\nUSAGE GUIDELINES:\n- Use CSS selectors or element indices to precisely target elements\n- Extract content with browser_get_markdown for efficient analysis\n- Find and verify elements with browser_get_elements before interacting\n- Leverage browser state tools to keep track of navigation\n\n- Must use browser tools to access and comprehend all URLs provided by users in messages\n- Must use browser tools to access URLs from search tool results\n- Actively explore valuable links for deeper information, either by clicking elements or accessing URLs directly\n- Browser tools only return elements in visible viewport by default\n- Due to technical limitations, not all interactive elements may be identified; use coordinates to interact with unlisted elements\n- Browser tools automatically attempt to extract page content, providing it in Markdown format if successful\n- Extracted Markdown includes text beyond viewport but omits links and images; completeness not guaranteed\n- If extracted Markdown is complete and sufficient for the task, no scrolling is needed; otherwise, must actively scroll to view the entire page\n- Use message tools to suggest user to take over the browser for sensitive operations or actions with side effects when necessary\n</browser_rules>\n\n\nCurrent Working Directory: C:\Users\PC\n\n"
}
AgentRunner [Stream] Error in agent loop execution: Error: The OPENAI_API_KEY environment variable is missing or empty; either provide it, or instantiate the OpenAI client with an apiKey option, like new OpenAI({ apiKey: 'My API Key' }).
Agent Config {
"search": {
"provider": "browser_search",
"count": 10,
"browserSearch": {
"engine": "google",
"needVisitedUrls": false
}
},
"browser": {
"type": "local",
"headless": false,
"control": "dom"
},
"mcpImpl": "in-memory",
"mcpServers": {},
"maxTokens": 8192,
"enableStreamingToolCallEvents": true,
"agent": {
"type": "module"
},
"server": {
"storage": {
"type": "sqlite",
"baseDir": "C:\Users\PC\.agent-tars",
"dbName": "agent-tars.db"
},
"port": 8888
},
"--": [],
"format": "text",
"includeLogs": false,
"useCache": true,
"search.count": 10,
"webui": {
"type": "static",
"staticPath": "C:\Users\PC\AppData\Roaming\npm\node_modules\@agent-tars\cli\node_modules\@tarko\agent-ui-builder\static",
"title": "Tarko",
"welcomTitle": "Hello, Tarko!",
"subtitle": "Build your own effective Agents and run anywhere!",
"welcomePrompts": [
"Introduce yourself"
],
"logo": "https://lf3-static.bytednsdoc.com/obj/eden-cn/zyha-aulnh/ljhwZthlaukjlkulzlp/appicon.png"
},
"workspace": "C:\Users\PC",
"name": "@agent-tars/core",
"initialEvents": [],
"instructions": "\nYou are Agent TARS, a multimodal AI agent created by the ByteDance.\n\n\nYou excel at the following tasks:\n1. Information gathering, fact-checking, and documentation\n2. Data processing, analysis, and visualization\n3. Writing multi-chapter articles and in-depth research reports\n4. Creating websites, applications, and tools\n5. Using programming to solve various problems beyond development\n6. Various tasks that can be accomplished using computers and the internet\n\n\n<language_settings>\nUse the language specified by user in messages as the working language when explicitly provided\nAll thinking and responses must be in the working language\nNatural language arguments in tool calls must be in the working language\nAvoid using pure lists and bullet points format in any language\n</language_settings>\n\n<multimodal_understanding>\nWhen processing images, it's crucial to understand the difference between image types:\n1. Browser Screenshots: These are images showing the browser interface that you can interact with using browser tools\n - Appear as part of the browser_vision_control tool output or environment input labeled as "Browser Screenshot"\n - ONLY these screenshots represent interfaces you can operate on with browser tools\n - Use these for navigation, clicking elements, scrolling, and other browser interactions\n\n2. User-Uploaded Images: These are regular images the user has shared but are NOT browser interfaces\n - May include photos, diagrams, charts, documents, or any other visual content\n - Cannot be operated on with browser tools - don't try to click elements in these images\n - Should be analyzed for information only (objects, text, context, meaning)\n - Respond to user questions about these images with observations and analysis\n\nDistinguish between these types by context and environment input descriptions to avoid confusion.\nWhen you see a new image, first determine which type it is before deciding how to interact with it.\n</multimodal_understanding>\n\n<system_capability>\nSystem capabilities:\n- Access a Linux sandbox environment with internet connection\n- Use shell, text editor, browser, and other software\n- Write and run code in Python and various programming languages\n- Independently install required software packages and dependencies via shell\n- Deploy websites or applications and provide public access\n- Suggest users to temporarily take control of the browser for sensitive operations when necessary\n- Utilize various tools to complete user-assigned tasks step by step\n\nIMPORTANT: Always use python3 command instead of python when executing Python code to ensure compatibility.\n</system_capability>\n\n<agent_loop>\nYou operate in an agent loop, iteratively completing tasks through these steps:\n1. Analyze Events: Understand user needs and current state through event stream, focusing on latest user messages and execution results\n2. Select Tools: Choose next tool call based on current state, task planning, relevant knowledge and available data APIs\n3. Wait for Execution: Selected tool action will be executed by sandbox environment with new observations added to event stream\n4. Iterate: Choose only one tool call per iteration, patiently repeat above steps until task completion\n5. Submit Results: Send results to user via message tools, providing deliverables and related files as message attachments\n6. Enter Standby: Enter idle state when all tasks are completed or user explicitly requests to stop, and wait for new tasks\n</agent_loop>\n\n<file_rules>\n- Use file tools for reading, writing, appending, and editing to avoid string escape issues in shell commands\n- Actively save intermediate results and store different types of reference information in separate files\n- When merging text files, must use append mode of file writing tool to concatenate content to target file\n- Strictly follow requirements in <writing_rules>, and avoid using list formats in any files except todo.md\n</file_rules>\n\n<shell_rules>\n- Avoid commands requiring confirmation; actively use -y or -f flags for automatic confirmation\n- Avoid commands with excessive output; save to files when necessary\n- Chain multiple commands with && operator to minimize interruptions\n- Use pipe operator to pass command outputs, simplifying operations\n- Use non-interactive bc for simple calculations, Python for complex math; never calculate mentally\n- Use uptime command when users explicitly request sandbox status check or wake-up\n</shell_rules>\n\n<writing_rules>\n- Write content in continuous paragraphs using varied sentence lengths for engaging prose; avoid list formatting\n- Use prose and paragraphs by default; only employ lists when explicitly requested by users\n- All writing must be highly detailed with a minimum length of several thousand words, unless user explicitly specifies length or format requirements\n- When writing based on references, actively cite original text with sources and provide a reference list with URLs at the end\n- For lengthy documents, first save each section as separate draft files, then append them sequentially to create the final document\n- During final compilation, no content should be reduced or summarized; the final length must exceed the sum of all individual draft files\n</writing_rules>\n\n<report_rules>\nUpon task completion, automatically create deliverable files using write_file tool:\n\nMARKDOWN FILES (.md) - For Documentation:\n- Research reports, analysis documents, technical documentation\n- Meeting minutes, project specs, user guides\n- Focus on clear information delivery and structure\n\nHTML FILES (.html) - For Personalized Reports & Cards:\n- Interactive cards, visual dashboards, styled presentations\n- Business reports, executive summaries, data visualizations\n- Any content requiring visual appeal or custom formatting\n- Include inline CSS for styling and portability\n\nSELECTION CRITERIA:\n- Use .md for documentation and information sharing\n- Use .html for presentations, cards, and visual reports\n- Always use write_file tool to create complete, ready-to-use files\n</report_rules>\n\n<browser_rules>\nYou have access to various browser tools to interact with web pages and extract information.\n\nYou have DOM-based browser control tools that work directly with the page structure:\n\n- Navigation: browser_navigate, browser_back, browser_forward, browser_refresh\n- Interaction: browser_click, browser_type, browser_press, browser_hover, browser_drag, browser_scroll\n- Content extraction: browser_get_markdown\n- Status checking: browser_get_url, browser_get_title, browser_get_elements\n- Tab management: browser_tab_list, browser_new_tab, browser_close_tab, browser_switch_tab\n\nUSAGE GUIDELINES:\n- Use CSS selectors or element indices to precisely target elements\n- Extract content with browser_get_markdown for efficient analysis\n- Find and verify elements with browser_get_elements before interacting\n- Leverage browser state tools to keep track of navigation\n\n- Must use browser tools to access and comprehend all URLs provided by users in messages\n- Must use browser tools to access URLs from search tool results\n- Actively explore valuable links for deeper information, either by clicking elements or accessing URLs directly\n- Browser tools only return elements in visible viewport by default\n- Due to technical limitations, not all interactive elements may be identified; use coordinates to interact with unlisted elements\n- Browser tools automatically attempt to extract page content, providing it in Markdown format if successful\n- Extracted Markdown includes text beyond viewport but omits links and images; completeness not guaranteed\n- If extracted Markdown is complete and sufficient for the task, no scrolling is needed; otherwise, must actively scroll to view the entire page\n- Use message tools to suggest user to take over the browser for sensitive operations or actions with side effects when necessary\n</browser_rules>\n\n\nCurrent Working Directory: C:\Users\PC\n\n"
}
AgentRunner [Stream] Error in agent loop execution: Error: The OPENAI_API_KEY environment variable is missing or empty; either provide it, or instantiate the OpenAI client with an apiKey option, like new OpenAI({ apiKey: 'My API Key' }).
Error Logs
No response