Skip to content
Open
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
25 commits
Select commit Hold shift + click to select a range
886e2e3
Update _config.yml
wupujun Mar 24, 2025
1d5b79d
Create 2025-3-23-week1-python101.md
wupujun Mar 24, 2025
51ee0c5
Update _config.yml
wupujun Mar 24, 2025
f93c933
Update _config.yml
wupujun Mar 24, 2025
f2156ca
Update _config.yml
wupujun Mar 24, 2025
7d4acbc
Update 2025-3-23-week1-python101.md
wupujun Mar 24, 2025
837df45
Create 2025-3-23-week1-highlevel.md
wupujun Mar 24, 2025
b5c897f
Update 2025-3-23-week1-highlevel.md
wupujun Mar 24, 2025
2ad7188
Update 2025-3-23-week1-python101.md
wupujun Mar 24, 2025
f10e43e
Create 2025-3-24-1st Run and log analysis.md
wupujun Mar 25, 2025
ef582b3
Update 2025-3-24-1st Run and log analysis.md
wupujun Mar 25, 2025
d0c226e
Update 2025-3-24-1st Run and log analysis.md
wupujun Mar 25, 2025
620643a
Update 2025-3-24-1st Run and log analysis.md
wupujun Mar 25, 2025
767e9dc
Update 2025-3-24-1st Run and log analysis.md
wupujun Mar 25, 2025
171199e
Create 2025-3-27-Agents Analysis.md
wupujun Mar 28, 2025
6cf5856
Update 2025-3-24-1st Run and log analysis.md
wupujun Mar 28, 2025
5e626e4
Update 2025-3-24-1st Run and log analysis.md
wupujun Mar 29, 2025
dff8cc9
Update 2025-3-24-1st Run and log analysis.md
wupujun Mar 29, 2025
eb20a7f
Delete _posts/2014-3-3-Hello-World.md
wupujun Mar 29, 2025
c7bb9d6
Create 2025-3-28-执行分析
wupujun Mar 29, 2025
ff333dd
Update 2025-3-28-执行分析
wupujun Mar 29, 2025
87e57dd
change file names
wupujun Mar 30, 2025
48d7b95
Merge pull request #1 from wupujun:rename
wupujun Mar 30, 2025
1812f85
format change
wupujun Mar 30, 2025
512b936
submit changes
wupujun Mar 30, 2025
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
8 changes: 4 additions & 4 deletions _config.yml
Original file line number Diff line number Diff line change
Expand Up @@ -3,13 +3,13 @@
#

# Name of your site (displayed in the header)
name: Your Name
name: Learning from openmanus

# Short bio or description (displayed in the header)
description: Web Developer from Somewhere
description: Personal site to share learning experiences of openmanus

# URL of your avatar or profile pic (you could use your GitHub profile pic)
avatar: https://raw.githubusercontent.com/barryclark/jekyll-now/master/images/jekyll-logo.png
avatar: https://avatars.githubusercontent.com/u/9513757?v=4

#
# Flags below are optional
Expand Down Expand Up @@ -41,7 +41,7 @@ google_analytics:

# Your website URL (e.g. http://barryclark.github.io or http://www.barryclark.co)
# Used for Sitemap.xml and your RSS feed
url:
url: wupujun.github.io

# If you're hosting your site at a Project repository on GitHub pages
# (http://yourusername.github.io/repository-name)
Expand Down
10 changes: 0 additions & 10 deletions _posts/2014-3-3-Hello-World.md

This file was deleted.

76 changes: 76 additions & 0 deletions _posts/2025-3-23-Python 类设计.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,76 @@
# 整体流程


main()

--> asyncio.run # 启动主线程 event loop

--> 获取 prompt,await agent.run

## Agent 继承关系

Manus Agent <--- BrowserAgent <--- ToolCallAgent <-- ReActAgent <--BaseAgent

### BaseAgent 继承 pydantic BaseModel + ABC
属性:
name/description

system_prompt
next_step_prompt

依赖的模块:
llm - wrapper of external LLM model, 处理与配置的模型的交互,重要方法: ask, ask_with_images, ask_tool, token_count...
max_steps/curent_step
memory - 存放输入的prompt信息
state - 状态: idle, running, finished, error

重要的方法:
async def run(...)
---> check status --> update moemory --> 启动 async loop: for each steps in steps, call self.step()

async def step: 需要子类override的方法

is_stuck: 检查是否陷入stuck

handle_stuck - Handle stuck state by adding a prompt to change strategy 需要进一步的研究。。。

## ReActAgent 继承BaseAgent + ABC

重要的方法:

think() - abstract方法,处理当前状态,决定下一步 action, 需要子类重载

act () - abstract方法,执行 action,需要子类重载

step () - 非抽象方法,模板方法调用 think 来决定是否执行动作,然后 call self.act

## ToolCallAgent 继承 ReActAgent

属性:
available_tools 获取可用的工具: chatCompletion, Terminate
tool_calls

方法:
think - 构建 prompt,然后 从 llm获取 可用的tool 以及参数项
act - 执行 tool call 并获取结果,添加到 memory
execute_tool/_handle_special_tool - helper 函数执行 tool 调用

## BrowserAgent 继承 ToolCallAgent

属性:
available_tools - BrowserUser, Terminate

方法:
get_browser_state - 从 browser-use 工具获取状态
think -
--- 获取 browser-use的状态信息,如: url, title, tabs, screenshot;
--- 获取下一个prompt
--- 调用父类的 think

## ManusAgent继承BrowserAgent
属性:
available_tools - PythonExecute, BrowserTool, StrReplaceEditor, Teminate

方法:
-- think : 检查最后3条在memory的browser activity,构建next prompt, call 父类的think

35 changes: 35 additions & 0 deletions _posts/2025-3-23-涉及的 Python 预备知识 .md
Original file line number Diff line number Diff line change
@@ -0,0 +1,35 @@
# python预备知识

python预备知识 (Relevant Python knowlege to know for a former C/C++, Java Engineer... )

- 异步编程/Asynchronous Programming:

*Essential Things to Know*
最大程度的利用单线程来实现IO bound的多任务
All Async call will run in one single thread by default (until you specific them to as in multi ones)
事件循环 Event Loop: This is the heart of asyncio. It’s like a manager that runs all your async tasks.

Async Functions: 定义的函数可以在需要时暂停,让event loop把处理器让给其他的任务 通过 async def 定义。 why: openmanus需要处理IO bound的任务,因此用async来提高单线程的多任务能力。

Await: 通知event loop, “Pause here, go do something else, and come back when this is ready.” You use it with async functions or other “awaitable” things (like coroutines or tasks).

- Type Annotations: Heavy use of Python's type hinting system (typing module) for better code clarity and static analysis.
类型注解在 Python 中用于提高代码的可读性和可维护性,明确指定变量和函数的预期类型。
它为静态类型检查工具提供了支持,可以在开发阶段捕获潜在的类型错误。
why: 增强动态语言的可读性

- Pydantic: Used extensively for data validation and settings management through BaseModel and Field classes.

Python库,用于数据验证和序列化,通过利用 Python 的类型注解来定义数据模型并自动验证输入数据是否符合预期,已经系列化。
why: 通过类型注解实现声明式数据验证和序列化。

- OOP: 应用ABC+pydantic, OOP Abstract类/继承/多态

- Web Scraping & Search: Utilizes libraries like BeautifulSoup, requests, and search engine specific packages (baidusearch, duckduckgo_search, googlesearch).
处理web文件下载解析

- LLM Integration: Uses openai and tiktoken for language model interactions and token counting.

- Browser-use: 用browser-use开源 agent来处理浏览器的自动化交互
why: 利用开源Agent简化开发

137 changes: 137 additions & 0 deletions _posts/2025-3-24-第一次运行与日志分析.md

Large diffs are not rendered by default.

14 changes: 14 additions & 0 deletions _posts/2025-3-27-Agents Analysis.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,14 @@

## ToolCallAgent

available_tools:

'python_execute': PythonExecute(name='python_execute', description='Executes Python code string. Note: Only print outputs are visible, function return values are not captured. Use print statements to see results.',

'browser_use': BrowserUseTool(name='browser_use', description="\nInteract with a web browser to perform various actions such as navigation, element interaction, content extraction, and tab management. This tool provides a comprehensive set of browser automation capabilities:\n\nNavigation:\n- 'go_to_url': Go to a specific URL in the current tab\n- 'go_back': Go back\n- 'refresh': Refresh the current page\n- 'web_search': Search the query in the current tab, the query should be a search query like humans search in web, concrete and not vague or super long. More the single most important items.\n\nElement Interaction:\n- 'click_element': Click an element by index\n- 'input_text': Input text into a form element\n- 'scroll_down'/'scroll_up': Scroll the page (with optional pixel amount)\n- 'scroll_to_text': If you dont find something which you want to interact with, scroll to it\n- 'send_keys': Send strings of special keys like Escape,Backspace, Insert, PageDown, Delete, Enter, Shortcuts such as `Control+o`, `Control+Shift+T` are supported as well. This gets used in keyboard.press.\n- 'get_dropdown_options': Get all options from a dropdown\n- 'select_dropdown_option': Select dropdown option for interactive element index by the text of the option you want to select\n\nContent Extraction:\n- 'extract_content': Extract page content to retrieve specific information from the page, e.g. all company names, a specifc description, all information about, links with companies in structured format or simply links\n\nTab Management:\n- 'switch_tab': Switch to a specific tab\n- 'open_tab': Open a new tab with a URL\n- 'close_tab': Close the current tab\n\nUtility:\n- 'wait': Wait for a specified number of seconds\n",


'str_replace_editor': StrReplaceEditor(name='str_replace_editor', description='Custom editing tool for viewing, creating and editing files\n* State is persistent across command calls and discussions with the user\n* If `path` is a file, `view` displays the result of applying `cat -n`. If `path` is a directory, `view` lists non-hidden files and directories up to 2 levels deep\n* The `create` command cannot be used if the specified `path` already exists as a file\n* If a `command` generates a long output, it will be truncated and marked with `<response clipped>`\n* The `undo_edit` command will revert the last edit made to the file at `path`\n\nNotes for using the `str_replace` command:\n* The `old_str` parameter should match EXACTLY one or more consecutive lines from the original file. Be mindful of whitespaces!\n* If the `old_str` parameter is not unique in the file, the replacement will not be performed. Make sure to include enough context in `old_str` to make it unique\n* The `new_str` parameter should contain the edited lines that should replace the `old_str`\n'


'terminate': Terminate(name='terminate', description='Terminate the interaction when the request is met OR if the assistant cannot proceed further with the task.\nWhen you have finished all the tasks, call this tool to end the work.'
143 changes: 143 additions & 0 deletions _posts/2025-3-28-调试分析 Qwen32B.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,143 @@
# 调试分析

## 任务:
I need a 7-day Japan itinerary for April 15-23 from Seattle, with a $2500-5000 budget for my fiancée and me. We love historical sites, hidden gems, and Japanese culture (kendo, tea ceremonies, Zen meditation). We want to see Nara's deer and explore cities on foot. I plan to propose during this trip and need a special location recommendation.
Please provide a detailed itinerary and a simple HTML travel handbook with maps, attraction descriptions, essential Japanese phrases, and travel tips we can reference throughout our journey."

LLM模型:
model = "Qwen/QwQ-32B"
base_url = "https://api.siliconflow.cn/v1"

## 基本运行模式:
step_no=0

while step_no< MAX_STEP_NO=20:

* #1 Manus.think()
ManusPrompt提示词 = '\nBased on user needs, proactively select the most appropriate tool or combination of tools. For complex tasks, you can break down the problem and use different tools step by step to solve it. After using each tool, clearly explain the execution results and suggest the next steps.\n'

* 检查最近3条消息决定是否调用Browseragent
BrowserAgent提示词=
'\nWhat should I do next to achieve my goal?\n\nWhen you see [Current state starts here], focus on the following:\n- Current URL and page title{url_placeholder}\n- Available tabs{tabs_placeholder}\n- Interactive elements and their indices\n- Content above{content_above_placeholder} or below{content_below_placeholder} the viewport (if indicated)\n- Any action results or errors{results_placeholder}\n\nFor browser interactions:\n- To navigate: browser_use with action="go_to_url", url="..."\n- To click: browser_use with action="click_element", index=N\n- To type: browser_use with action="input_text", index=N, text="..."\n- To extract: browser_use with action="extract_content", goal="..."\n- To scroll: browser_use with action="scroll_down" or "scroll_up"\n\nConsider both what\'s visible and what might be beyond the current viewport.\nBe methodical - remember your progress and what you\'ve learned so far.\n'

* #2 BrowserAgent.think()
替换prompt中的placeholders


* ToolCallAgent.think()
根据 manusAgent or browserAgent 提示词准备 user prompt
准备 sysmtem 提示词
准备 avalabile_tools
调用llm
*
分析LLM返回结果:
准备 tool_call
更新memory


*
Manus.act -> BrowserAgent.act() -> ToolCallAgent.act()

* step_no +=1

end while_loop



## Agent执行步骤:
输入: I need a 7-day Japan itinerary for April 15-23 from Seattle ...

step 1:

Think():
两条用户信息:

msg1= Message(role='user', content="I need a 7-day Japan itinerary for April 15-23 from Seattle, with a $2500-5000 budget for my fiancée and me. We love historical sites, hidden gems, and Japanese culture (kendo, tea ceremonies, Zen meditation). We want to see Nara's deer and explore cities on foot. I plan to propose during this trip and need a special location recommendation. Please provide a detailed itinerary and a simple HTML travel handbook with maps, attraction descriptions, essential Japanese phrases, and travel tips we can reference throughout our journey.", tool_calls=None, name=None, tool_call_id=None, base64_image=None)

msg2=Message(role='user', content='\nBased on user needs, proactively select the most appropriate tool or combination of tools. For complex tasks, you can break down the problem and use different tools step by step to solve it. After using each tool, clearly explain the execution results and suggest the next steps.\n', tool_calls=None, name=None, tool_call_id=None, base64_image=None)

LLM response: tool_call=none

content= "Okay, let's tackle this user's request. They want a 7-day Japan itinerary from April 15-23, starting from Seattle, with a budget between $2500 and $5000 for two people. They love historical sites, hidden gems, Japanese culture like kendo, tea ceremonies, and Zen meditation. They also want to see Nara's deer and explore cities on foot. Plus, they plan to propose and need a special location recommendation. Finally, they want a simple HTML travel handbook with maps, descriptions, phrases, and tips.

First, I need to plan the itinerary. Since they're starting from Seattle, the flight entry point is probably Tokyo or Osaka. Considering their interests in historical sites and culture, cities like Kyoto, Nara, and maybe Kanazawa come to mind. They want to explore on foot, so cities with walkable areas are better.

Day 1: Arrival in Tokyo. Maybe land in Haneda, then take a train to a central area. They might want to acclimate, but maybe a cultural experience like a tea ceremony in the afternoon.

Day 2: Tokyo day. Senso-ji Temple in Asakusa, which is historical. Then maybe a hidden gem like TeamLab Borderless for a modern cultural experience. Evening could be a kaiseki dinner.

Day 3: Day trip to Nikko. Historical shrines and natural beauty. That's a hidden gem and fits their interests.

Day 4: Travel to Kyoto. Arashiyama Bamboo Grove, then Nijo Castle. Evening tea ceremony.

Day 5: Nara. The deer park is a must. Then Todai-ji Temple. Maybe a Zen meditation session here.

Day 6: Kyoto again. Fushimi Inari for the torii gates. Then Gion district, maybe spot a geisha.

Day 7: Return to Tokyo for departure. Maybe a special proposal spot like Tokyo Tower or a quiet park.

Wait, but the user mentioned proposing, so maybe the proposal spot should be a meaningful place. Maybe during a tea ceremony or at a scenic spot like the bamboo grove or Nara's park. Need to suggest that.

Budget-wise, flights from Seattle to Tokyo are around $800-1200 per person round trip. Accommodation in Tokyo and Kyoto mid-range hotels would be around $150-200 per night. Activities might cost a few hundred more. Total should fit within $2500-5000 for two.

Next, the HTML handbook. Need to include maps, descriptions, phrases, tips. Using the str_replace_editor function to create the HTML file. Maybe structure it with sections for each day, maps using Google Maps links, essential phrases like "Konnichiwa", "Arigatou", etc. Tips on etiquette, transportation, etc.

Wait, but how to generate the HTML? The user wants it simple. Maybe create a basic HTML structure with placeholders, then fill in the content step by step. Use the str_replace_editor to create the file and edit it.

First, outline the itinerary in the response, then use the editor tool to build the HTML. Let me start by creating the itinerary as text, then proceed to build the HTML.

Wait, but the user might need the HTML as part of the answer. Since I can't directly send files, maybe generate the HTML code here and suggest they save it. Alternatively, use the str_replace_editor to create the file in the workspace.

The user's initial directory is C:\Users\grep\OpenManus\workspace. So I can create a file there. Let's plan to create an HTML file using the str_replace_editor's create command.

First, I'll need to outline the itinerary steps, then structure the HTML. Let me start by writing the itinerary details, then proceed to the HTML.

Wait, but the user also needs a special proposal location. Maybe suggest a quiet spot in Kyoto's Philosopher's Path or a scenic spot in Nara.

Putting it all together, the steps are:

1. Plan the 7-day itinerary with daily details.
2. Suggest a proposal location.
3. Create the HTML handbook using the editor tool, adding maps, descriptions, phrases, and tips.

Now, to implement this, first, I can outline the itinerary in the response. Then, use the str_replace_editor to create the HTML file. Let me start by writing the itinerary as a response, then call the function to create the HTML. "

更新 memory, 添加 assitant_msg

Act():


step 2/3/4/5/6:
添加一条 Manus 提示词到messages
LLM 返回值:

content: ''
tool_calls: ''
更新 messages , memory

step 7:
LLM response=“ '**Itinerary Outline (April 15–23):**\n\n### **Day 1: Arrival in Tokyo (April 15)**\n- **Morning:** Fly from Seattle to Tokyo (Haneda Airport). \n- **Afternoon:** Check into a central Tokyo hotel (e.g., Asakusa or Shibuya). \n- **Evening:** Experience a **traditional tea ceremony** in Ueno Park. \n- **Dinner:** Kaiseki meal in Ginza.\n\n### **Day 2: Tokyo Cultural Immersion (April 16)**\n- **Morning:** Explore **Senso-ji Temple** in Asakusa (historical site). \n- **Afternoon:** Visit **TeamLab Borderless** (modern art hidden gem). \n- **Evening:** Walk through **Odaiba** for futuristic views and the Rainbow Bridge.\n\n### **Day 3: Day Trip to Nikko (April 17)**\n- **Morning:** Train to Nikko. Visit **Toshogu Shrine** (UNESCO site with carvings). \n- **Afternoon:** Hike in **Kegon Falls** (spring blooms!). \n- **Evening:** Return to Tokyo. \n\n### **Day 4: Travel to Kyoto (April 18)**\n- **Morning:** Shinkansen to Kyoto. \n- **Afternoon:** **Arashiyama Bamboo Grove** and **Nijo Castle** (feudal history). \n- **Evening:** Zen meditation session at **Tenryu-ji Temple**. \n\n### **Day 5: Nara & Hidden Gems (April 19)**\n- **Morning:** Day trip to Nara. Feed the **wild deer** in Nara Park. \n- **Afternoon:** Explore **Todai-ji Temple** (home to a giant Buddha statue). \n- **Evening:** Return to Kyoto. \n\n### **Day 6: Kyoto Deep Dive (April 20)**\n- **Morning:** **Fushimi Inari Shrine** (thousands of torii gates). \n- **Afternoon:** Stroll **Gion District** (geisha culture). \n- **Evening:** **Kendo demonstration** at a local dojo (book in advance). \n\n### **Day 7: Proposal & Departure (April 21–23)**\n- **April 22:** Return to Tokyo. **Proposal spot recommendation:** **Tokyo Tower observation deck** (iconic skyline views) or **Philosopher’s Path in Kyoto** (peaceful cherry blossom walk). \n- **April 23:** Fly back to Seattle. \n\n---\n\n### **Budget Breakdown (for 2 people):**\n- **Flights:** ~$1,600–$2,400 (Seattle-Tokyo roundtrip). \n- **Accommodation:** ~$1,000–$1,400 (mid-range hotels in Tokyo/Kyoto). \n- **Activities/Meals:** ~$600–$1,000. \n- **Total:** ~$3,200–$5,000. \n\n---\n\n### **HTML Travel Handbook Creation**\nI’ll now use the `str_replace_editor` tool to create a simple HTML file with your itinerary, maps, phrases, and tips. Here’s the first step:'"

tool_call=none
用content创建一个assistent msg 并更新memory

Step 8-17: 貌似进入了空转模式 ...
添加一条 Manus 提示词到messages
LLM 返回值:

content: ''
tool_calls: ''
更新 messages , memory


Step 18:

LLM Response 返回了一个 Tool_call:
name:
参数:
Function(arguments='{"status": "success"}', name='terminate')


🏁 Special tool 'terminate' has completed the task!


Loading