barryclark · wupujun · Mar 24, 2025 · Mar 24, 2025 · Mar 24, 2025 · Mar 24, 2025
diff --git a/_config.yml b/_config.yml
@@ -3,13 +3,13 @@
 #
 
 # Name of your site (displayed in the header)
-name: Your Name
+name: Learning from openmanus
 
 # Short bio or description (displayed in the header)
-description: Web Developer from Somewhere
+description: Personal site to share learning experiences of openmanus
 
 # URL of your avatar or profile pic (you could use your GitHub profile pic)
-avatar: https://raw.githubusercontent.com/barryclark/jekyll-now/master/images/jekyll-logo.png
+avatar: https://avatars.githubusercontent.com/u/9513757?v=4
 
 #
 # Flags below are optional
@@ -41,7 +41,7 @@ google_analytics:
 
 # Your website URL (e.g. http://barryclark.github.io or http://www.barryclark.co)
 # Used for Sitemap.xml and your RSS feed
-url:
+url: wupujun.github.io
 
 # If you're hosting your site at a Project repository on GitHub pages
 # (http://yourusername.github.io/repository-name)

diff --git a/_posts/2014-3-3-Hello-World.md b/_posts/2014-3-3-Hello-World.md
diff --git a/_posts/2025-3-23-Python 类设计.md b/_posts/2025-3-23-Python 类设计.md
@@ -0,0 +1,76 @@
+# 整体流程
+
+
+main（） 
+
+--> asyncio.run # 启动主线程 event loop
+
+--> 获取 prompt，await agent.run
+
+## Agent 继承关系
+
+Manus Agent <--- BrowserAgent <--- ToolCallAgent <-- ReActAgent <--BaseAgent  
+
+### BaseAgent 继承 pydantic BaseModel + ABC
+属性： 
+name/description
+
+system_prompt
+next_step_prompt
+
+依赖的模块： 
+llm - wrapper of external LLM model, 处理与配置的模型的交互，重要方法： ask， ask_with_images, ask_tool, token_count...
+max_steps/curent_step 
+memory - 存放输入的prompt信息
+state - 状态： idle, running, finished, error
+
+重要的方法： 
+async def run(...) 
+---> check status --> update moemory --> 启动 async loop： for each steps in steps, call self.step()
+
+async def step:  需要子类override的方法
+
+is_stuck: 检查是否陷入stuck
+
+handle_stuck - Handle stuck state by adding a prompt to change strategy 需要进一步的研究。。。
+
+## ReActAgent 继承BaseAgent + ABC
+
+重要的方法： 
+
+think() - abstract方法，处理当前状态，决定下一步 action， 需要子类重载
+
+act （） - abstract方法，执行 action，需要子类重载
+
+step （） - 非抽象方法，模板方法调用 think 来决定是否执行动作，然后 call self.act
+
+## ToolCallAgent 继承 ReActAgent
+
+属性: 
+available_tools 获取可用的工具： chatCompletion, Terminate
+tool_calls 
+
+方法： 
+think - 构建 prompt，然后 从 llm获取 可用的tool 以及参数项
+act - 执行 tool call 并获取结果，添加到 memory 
+execute_tool/_handle_special_tool - helper 函数执行 tool 调用
+
+## BrowserAgent 继承 ToolCallAgent
+
+属性：
+available_tools - BrowserUser, Terminate
+
+方法：
+get_browser_state - 从 browser-use 工具获取状态
+think -
+--- 获取 browser-use的状态信息，如： url, title, tabs, screenshot; 
+--- 获取下一个prompt
+--- 调用父类的 think
+
+## ManusAgent继承BrowserAgent 
+属性：
+available_tools - PythonExecute, BrowserTool, StrReplaceEditor, Teminate
+
+方法： 
+-- think :  检查最后3条在memory的browser activity，构建next prompt， call 父类的think
+
diff --git a/_posts/2025-3-23-涉及的 Python 预备知识 .md b/_posts/2025-3-23-涉及的 Python 预备知识 .md
@@ -0,0 +1,35 @@
+# python预备知识
+
+python预备知识 （Relevant Python knowlege to know for a former C/C++, Java Engineer... ）
+
+-  异步编程/Asynchronous Programming: 
+
+*Essential Things to Know*
+最大程度的利用单线程来实现IO bound的多任务
+All Async call will run in one single thread by default (until you specific them to as in multi ones)
+事件循环 Event Loop: This is the heart of asyncio. It’s like a manager that runs all your async tasks.
+
+Async Functions:  定义的函数可以在需要时暂停，让event loop把处理器让给其他的任务 通过 async def 定义。 why： openmanus需要处理IO bound的任务，因此用async来提高单线程的多任务能力。 
+
+Await: 通知event loop, “Pause here, go do something else, and come back when this is ready.” You use it with async functions or other “awaitable” things (like coroutines or tasks).
+
+- Type Annotations: Heavy use of Python's type hinting system (typing module) for better code clarity and static analysis.
+类型注解在 Python 中用于提高代码的可读性和可维护性，明确指定变量和函数的预期类型。
+它为静态类型检查工具提供了支持，可以在开发阶段捕获潜在的类型错误。
+why： 增强动态语言的可读性  
+
+- Pydantic: Used extensively for data validation and settings management through BaseModel and Field classes.
+
+Python库，用于数据验证和序列化，通过利用 Python 的类型注解来定义数据模型并自动验证输入数据是否符合预期，已经系列化。
+why: 通过类型注解实现声明式数据验证和序列化。
+
+- OOP： 应用ABC+pydantic， OOP Abstract类/继承/多态
+
+- Web Scraping & Search: Utilizes libraries like BeautifulSoup, requests, and search engine specific packages (baidusearch, duckduckgo_search, googlesearch).
+  处理web文件下载解析
+
+- LLM Integration: Uses openai and tiktoken for language model interactions and token counting.
+
+- Browser-use： 用browser-use开源 agent来处理浏览器的自动化交互
+why： 利用开源Agent简化开发
+
diff --git a/_posts/2025-3-24-第一次运行与日志分析.md b/_posts/2025-3-24-第一次运行与日志分析.md
diff --git a/_posts/2025-3-27-Agents Analysis.md b/_posts/2025-3-27-Agents Analysis.md
@@ -0,0 +1,14 @@
+
+## ToolCallAgent
+
+available_tools:
+
+'python_execute': PythonExecute(name='python_execute', description='Executes Python code string. Note: Only print outputs are visible, function return values are not captured. Use print statements to see results.',  
+
+'browser_use': BrowserUseTool(name='browser_use', description="\nInteract with a web browser to perform various actions such as navigation, element interaction, content extraction, and tab management. This tool provides a comprehensive set of browser automation capabilities:\n\nNavigation:\n- 'go_to_url': Go to a specific URL in the current tab\n- 'go_back': Go back\n- 'refresh': Refresh the current page\n- 'web_search': Search the query in the current tab, the query should be a search query like humans search in web, concrete and not vague or super long. More the single most important items.\n\nElement Interaction:\n- 'click_element': Click an element by index\n- 'input_text': Input text into a form element\n- 'scroll_down'/'scroll_up': Scroll the page (with optional pixel amount)\n- 'scroll_to_text': If you dont find something which you want to interact with, scroll to it\n- 'send_keys': Send strings of special keys like Escape,Backspace, Insert, PageDown, Delete, Enter, Shortcuts such as `Control+o`, `Control+Shift+T` are supported as well. This gets used in keyboard.press.\n- 'get_dropdown_options': Get all options from a dropdown\n- 'select_dropdown_option': Select dropdown option for interactive element index by the text of the option you want to select\n\nContent Extraction:\n- 'extract_content': Extract page content to retrieve specific information from the page, e.g. all company names, a specifc description, all information about, links with companies in structured format or simply links\n\nTab Management:\n- 'switch_tab': Switch to a specific tab\n- 'open_tab': Open a new tab with a URL\n- 'close_tab': Close the current tab\n\nUtility:\n- 'wait': Wait for a specified number of seconds\n",
+
+
+'str_replace_editor': StrReplaceEditor(name='str_replace_editor', description='Custom editing tool for viewing, creating and editing files\n* State is persistent across command calls and discussions with the user\n* If `path` is a file, `view` displays the result of applying `cat -n`. If `path` is a directory, `view` lists non-hidden files and directories up to 2 levels deep\n* The `create` command cannot be used if the specified `path` already exists as a file\n* If a `command` generates a long output, it will be truncated and marked with `<response clipped>`\n* The `undo_edit` command will revert the last edit made to the file at `path`\n\nNotes for using the `str_replace` command:\n* The `old_str` parameter should match EXACTLY one or more consecutive lines from the original file. Be mindful of whitespaces!\n* If the `old_str` parameter is not unique in the file, the replacement will not be performed. Make sure to include enough context in `old_str` to make it unique\n* The `new_str` parameter should contain the edited lines that should replace the `old_str`\n' 
+
+
+'terminate': Terminate(name='terminate', description='Terminate the interaction when the request is met OR if the assistant cannot proceed further with the task.\nWhen you have finished all the tasks, call this tool to end the work.'
diff --git a/_posts/2025-3-28-调试分析 Qwen32B.md b/_posts/2025-3-28-调试分析 Qwen32B.md
@@ -0,0 +1,143 @@
+# 调试分析
+
+## 任务： 
+I need a 7-day Japan itinerary for April 15-23 from Seattle, with a $2500-5000 budget for my fiancée and me. We love historical sites, hidden gems, and Japanese culture (kendo, tea ceremonies, Zen meditation). We want to see Nara's deer and explore cities on foot. I plan to propose during this trip and need a special location recommendation.
+Please provide a detailed itinerary and a simple HTML travel handbook with maps, attraction descriptions, essential Japanese phrases, and travel tips we can reference throughout our journey."
+
+LLM模型： 
+model = "Qwen/QwQ-32B"
+base_url = "https://api.siliconflow.cn/v1"
+
+## 基本运行模式：
+step_no=0
+
+while step_no< MAX_STEP_NO=20:
+
+*   #1 Manus.think() 
+   ManusPrompt提示词 = '\nBased on user needs, proactively select the most appropriate tool or combination of tools. For complex tasks, you can break down the problem and use different tools step by step to solve it. After using each tool, clearly explain the execution results and suggest the next steps.\n'
+
+*  检查最近3条消息决定是否调用Browseragent
+   BrowserAgent提示词=
+   '\nWhat should I do next to achieve my goal?\n\nWhen you see [Current state starts here], focus on the following:\n- Current URL and page title{url_placeholder}\n- Available tabs{tabs_placeholder}\n- Interactive elements and their indices\n- Content above{content_above_placeholder} or below{content_below_placeholder} the viewport (if indicated)\n- Any action results or errors{results_placeholder}\n\nFor browser interactions:\n- To navigate: browser_use with action="go_to_url", url="..."\n- To click: browser_use with action="click_element", index=N\n- To type: browser_use with action="input_text", index=N, text="..."\n- To extract: browser_use with action="extract_content", goal="..."\n- To scroll: browser_use with action="scroll_down" or "scroll_up"\n\nConsider both what\'s visible and what might be beyond the current viewport.\nBe methodical - remember your progress and what you\'ve learned so far.\n'
+
+*   #2 BrowserAgent.think()
+   替换prompt中的placeholders 
+
+
+*   ToolCallAgent.think()  
+   根据 manusAgent or browserAgent 提示词准备 user prompt
+   准备 sysmtem 提示词 
+   准备 avalabile_tools
+   调用llm
+*
+   分析LLM返回结果：
+   准备 tool_call    
+   更新memory
+
+
+*   
+   Manus.act -> BrowserAgent.act() -> ToolCallAgent.act()
+
+*   step_no +=1
+
+end while_loop
+
+
+
+## Agent执行步骤： 
+输入: I need a 7-day Japan itinerary for April 15-23 from Seattle ...
+
+step 1: 
+
+Think(): 
+两条用户信息： 
+
+msg1= Message(role='user', content="I need a 7-day Japan itinerary for April 15-23 from Seattle, with a $2500-5000 budget for my fiancée and me. We love historical sites, hidden gems, and Japanese culture (kendo, tea ceremonies, Zen meditation). We want to see Nara's deer and explore cities on foot. I plan to propose during this trip and need a special location recommendation. Please provide a detailed itinerary and a simple HTML travel handbook with maps, attraction descriptions, essential Japanese phrases, and travel tips we can reference throughout our journey.", tool_calls=None, name=None, tool_call_id=None, base64_image=None)
+
+msg2=Message(role='user', content='\nBased on user needs, proactively select the most appropriate tool or combination of tools. For complex tasks, you can break down the problem and use different tools step by step to solve it. After using each tool, clearly explain the execution results and suggest the next steps.\n', tool_calls=None, name=None, tool_call_id=None, base64_image=None)
+
+LLM response:  tool_call=none
+
+content= "Okay, let's tackle this user's request. They want a 7-day Japan itinerary from April 15-23, starting from Seattle, with a budget between $2500 and $5000 for two people. They love historical sites, hidden gems, Japanese culture like kendo, tea ceremonies, and Zen meditation. They also want to see Nara's deer and explore cities on foot. Plus, they plan to propose and need a special location recommendation. Finally, they want a simple HTML travel handbook with maps, descriptions, phrases, and tips.  
+
+First, I need to plan the itinerary. Since they're starting from Seattle, the flight entry point is probably Tokyo or Osaka. Considering their interests in historical sites and culture, cities like Kyoto, Nara, and maybe Kanazawa come to mind. They want to explore on foot, so cities with walkable areas are better. 
+
+Day 1: Arrival in Tokyo. Maybe land in Haneda, then take a train to a central area. They might want to acclimate, but maybe a cultural experience like a tea ceremony in the afternoon. 
+
+Day 2: Tokyo day. Senso-ji Temple in Asakusa, which is historical. Then maybe a hidden gem like TeamLab Borderless for a modern cultural experience. Evening could be a kaiseki dinner. 
+
+Day 3: Day trip to Nikko. Historical shrines and natural beauty. That's a hidden gem and fits their interests.        
+
+Day 4: Travel to Kyoto. Arashiyama Bamboo Grove, then Nijo Castle. Evening tea ceremony. 
+
+Day 5: Nara. The deer park is a must. Then Todai-ji Temple. Maybe a Zen meditation session here. 
+
+Day 6: Kyoto again. Fushimi Inari for the torii gates. Then Gion district, maybe spot a geisha. 
+
+Day 7: Return to Tokyo for departure. Maybe a special proposal spot like Tokyo Tower or a quiet park. 
+
+Wait, but the user mentioned proposing, so maybe the proposal spot should be a meaningful place. Maybe during a tea ceremony or at a scenic spot like the bamboo grove or Nara's park. Need to suggest that. 
+
+Budget-wise, flights from Seattle to Tokyo are around $800-1200 per person round trip. Accommodation in Tokyo and Kyoto mid-range hotels would be around $150-200 per night. Activities might cost a few hundred more. Total should fit within $2500-5000 for two. 
+
+Next, the HTML handbook. Need to include maps, descriptions, phrases, tips. Using the str_replace_editor function to create the HTML file. Maybe structure it with sections for each day, maps using Google Maps links, essential phrases like "Konnichiwa", "Arigatou", etc. Tips on etiquette, transportation, etc. 
+
+Wait, but how to generate the HTML? The user wants it simple. Maybe create a basic HTML structure with placeholders, then fill in the content step by step. Use the str_replace_editor to create the file and edit it. 
+
+First, outline the itinerary in the response, then use the editor tool to build the HTML. Let me start by creating the itinerary as text, then proceed to build the HTML. 
+
+Wait, but the user might need the HTML as part of the answer. Since I can't directly send files, maybe generate the HTML code here and suggest they save it. Alternatively, use the str_replace_editor to create the file in the workspace. 
+
+The user's initial directory is C:\Users\grep\OpenManus\workspace. So I can create a file there. Let's plan to create an HTML file using the str_replace_editor's create command. 
+
+First, I'll need to outline the itinerary steps, then structure the HTML. Let me start by writing the itinerary details, then proceed to the HTML. 
+
+Wait, but the user also needs a special proposal location. Maybe suggest a quiet spot in Kyoto's Philosopher's Path or a scenic spot in Nara. 
+
+Putting it all together, the steps are:
+
+1. Plan the 7-day itinerary with daily details.
+2. Suggest a proposal location.
+3. Create the HTML handbook using the editor tool, adding maps, descriptions, phrases, and tips.
+
+Now, to implement this, first, I can outline the itinerary in the response. Then, use the str_replace_editor to create the HTML file. Let me start by writing the itinerary as a response, then call the function to create the HTML. "
+
+更新 memory， 添加 assitant_msg
+
+Act(): 
+
+
+step 2/3/4/5/6: 
+添加一条 Manus 提示词到messages
+LLM 返回值： 
+
+content: ''
+tool_calls: ''
+更新 messages , memory 
+
+step 7:
+LLM response=“ '**Itinerary Outline (April 15–23):**\n\n### **Day 1: Arrival in Tokyo (April 15)**\n- **Morning:** Fly from Seattle to Tokyo (Haneda Airport). \n- **Afternoon:** Check into a central Tokyo hotel (e.g., Asakusa or Shibuya). \n- **Evening:** Experience a **traditional tea ceremony** in Ueno Park. \n- **Dinner:** Kaiseki meal in Ginza.\n\n### **Day 2: Tokyo Cultural Immersion (April 16)**\n- **Morning:** Explore **Senso-ji Temple** in Asakusa (historical site). \n- **Afternoon:** Visit **TeamLab Borderless** (modern art hidden gem). \n- **Evening:** Walk through **Odaiba** for futuristic views and the Rainbow Bridge.\n\n### **Day 3: Day Trip to Nikko (April 17)**\n- **Morning:** Train to Nikko. Visit **Toshogu Shrine** (UNESCO site with carvings). \n- **Afternoon:** Hike in **Kegon Falls** (spring blooms!). \n- **Evening:** Return to Tokyo. \n\n### **Day 4: Travel to Kyoto (April 18)**\n- **Morning:** Shinkansen to Kyoto. \n- **Afternoon:** **Arashiyama Bamboo Grove** and **Nijo Castle** (feudal history). \n- **Evening:** Zen meditation session at **Tenryu-ji Temple**. \n\n### **Day 5: Nara & Hidden Gems (April 19)**\n- **Morning:** Day trip to Nara. Feed the **wild deer** in Nara Park. \n- **Afternoon:** Explore **Todai-ji Temple** (home to a giant Buddha statue). \n- **Evening:** Return to Kyoto. \n\n### **Day 6: Kyoto Deep Dive (April 20)**\n- **Morning:** **Fushimi Inari Shrine** (thousands of torii gates). \n- **Afternoon:** Stroll **Gion District** (geisha culture). \n- **Evening:** **Kendo demonstration** at a local dojo (book in advance). \n\n### **Day 7: Proposal & Departure (April 21–23)**\n- **April 22:** Return to Tokyo. **Proposal spot recommendation:** **Tokyo Tower observation deck** (iconic skyline views) or **Philosopher’s Path in Kyoto** (peaceful cherry blossom walk). \n- **April 23:** Fly back to Seattle. \n\n---\n\n### **Budget Breakdown (for 2 people):**\n- **Flights:** ~$1,600–$2,400 (Seattle-Tokyo roundtrip). \n- **Accommodation:** ~$1,000–$1,400 (mid-range hotels in Tokyo/Kyoto). \n- **Activities/Meals:** ~$600–$1,000. \n- **Total:** ~$3,200–$5,000. \n\n---\n\n### **HTML Travel Handbook Creation**\nI’ll now use the `str_replace_editor` tool to create a simple HTML file with your itinerary, maps, phrases, and tips. Here’s the first step:'"
+
+tool_call=none
+用content创建一个assistent msg 并更新memory
+
+Step 8-17: 貌似进入了空转模式 ... 
+添加一条 Manus 提示词到messages
+LLM 返回值： 
+
+content: ''
+tool_calls: ''
+更新 messages , memory 
+
+
+Step 18: 
+
+LLM Response 返回了一个 Tool_call: 
+name:
+参数： 
+Function(arguments='{"status": "success"}', name='terminate')
+
+
+ 🏁 Special tool 'terminate' has completed the task!
+
+