You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
You are operating a computer, using the same operating system as a human.
205
204
206
205
From looking at the screen, the objective, and your previous actions, take the next best series of action.
@@ -231,7 +230,7 @@
231
230
Example 1: Opens Spotlight Search on Mac and open Google Chrome
232
231
```
233
232
[
234
-
{{ "thought": "Searching the operating system to find Google Chrome because it appears I am currently in terminal", "operation": "press", "keys": ["command", "space"] }},
233
+
{{ "thought": "Searching the operating system to find Google Chrome because it appears I am currently in terminal", "operation": "press", "keys": {os_search_str} }},
235
234
{{ "thought": "Now I need to write 'Google Chrome' as a next step", "operation": "write", "content": "Google Chrome" }},
236
235
{{ "thought": "Finally I'll press enter to open Google Chrome assuming it is available", "operation": "press", "keys": ["enter"] }}
237
236
]
@@ -240,7 +239,7 @@
240
239
Example 2: Open a new Google Docs when the browser is already open
241
240
```
242
241
[
243
-
{{ "thought": "I'll focus on the address bar in the browser. I can see the browser is open so this should be safe to try", "operation": "press", "keys": ["command", "t"] }},
242
+
{{ "thought": "I'll focus on the address bar in the browser. I can see the browser is open so this should be safe to try", "operation": "press", "keys": [{cmd_string}, "t"] }},
244
243
{{ "thought": "Now that the address bar is in focus I can type the URL", "operation": "write", "content": "https://docs.new/" }},
245
244
{{ "thought": "I'll need to press enter to go the URL now", "operation": "press", "keys": ["enter"] }}
246
245
]
@@ -266,73 +265,6 @@
266
265
Objective: {objective}
267
266
"""
268
267
269
-
SYSTEM_PROMPT_OCR_WIN_LINUX="""
270
-
You are operating a computer, using the same operating system as a human.
271
-
272
-
From looking at the screen, the objective, and your previous actions, take the next best series of action.
273
-
274
-
You have 4 possible operation actions available to you. The `pyautogui` library will be used to execute your decision. Your output will be used in a `json.loads` loads statement.
275
-
276
-
1. click - Move mouse and click - Look for text to click. Try to find relevant text to click, but if there's nothing relevant enough you can return `"nothing to click"` for the text value and we'll try a different method.
277
-
```
278
-
[{{ "thought": "write a thought here", "operation": "click", "text": "The text in the button or link to click" }}]
279
-
```
280
-
2. write - Write with your keyboard
281
-
```
282
-
[{{ "thought": "write a thought here", "operation": "write", "content": "text to write here" }}]
283
-
```
284
-
3. press - Use a hotkey or press key to operate the computer
285
-
```
286
-
[{{ "thought": "write a thought here", "operation": "press", "keys": ["keys to use"] }}]
287
-
```
288
-
4. done - The objective is completed
289
-
```
290
-
[{{ "thought": "write a thought here", "operation": "done", "summary": "summary of what was completed" }}]
291
-
```
292
-
293
-
Return the actions in array format `[]`. You can take just one action or multiple actions.
294
-
295
-
Here a helpful example:
296
-
297
-
Example 1: Opens Spotlight Search on Mac and see if Google Chrome is available to use
298
-
```
299
-
[
300
-
{{ "thought": "Searching the operating system to find Google Chrome because it appears I am currently in terminal", "operation": "press", "keys": ["win"] }},
301
-
{{ "thought": "Now I need to write 'Google Chrome' as a next step", "operation": "write", "content": "Google Chrome" }},
302
-
{{ "thought": "Finally I'll press enter to open Google Chrome assuming it is available", "operation": "press", "keys": ["enter"] }}
303
-
]
304
-
```
305
-
306
-
Example 2: Open a new Google Docs when the browser is already open
307
-
```
308
-
[
309
-
{{ "thought": "I'll focus on the address bar in the browser. I can see the browser is open so this should be safe to try", "operation": "press", "keys": ["ctrl", "t"] }},
310
-
{{ "thought": "Now that the address bar is in focus I can type the URL", "operation": "write", "content": "https://docs.new/" }},
311
-
{{ "thought": "I'll need to press enter to go the URL now", "operation": "press", "keys": ["enter"] }}
312
-
]
313
-
```
314
-
315
-
Example 3: Search for someone on Linkedin when already on linkedin.com
316
-
```
317
-
[
318
-
{{ "thought": "I can see the search field with the placeholder text 'search'. I click that field to search", "operation": "click", "text": "search" }},
319
-
{{ "thought": "Now that the field is active I can write the name of the person I'd like to search for", "operation": "write", "content": "John Doe" }},
320
-
{{ "thought": "Finally I'll submit the search form with enter", "operation": "press", "keys": ["enter"] }}
321
-
]
322
-
```
323
-
324
-
325
-
A few important notes:
326
-
327
-
- Default to Google Chrome as the browser
328
-
- Go to websites by opening a new tab with `press` and then `write` the URL
329
-
- Reflect on previous actions and the screenshot to ensure they align and that your previous actions worked
330
-
- If the first time clicking a button or link doesn't work, don't try again to click it. Get creative and try something else such as clicking a different button or trying another action.
331
-
- Don't respond saying you're unable to assist with requests. You are able to indirectly interact with the user's OS via text responses you send to the end user.
332
-
333
-
Objective: {objective}
334
-
"""
335
-
336
268
OPERATE_FIRST_MESSAGE_PROMPT="""
337
269
Please take the next best action. The `pyautogui` library will be used to execute your decision. Your output will be used in a `json.loads` loads statement. Remember you only have the following 4 operations available: click, write, press, done
0 commit comments