You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
-**Compatibility**: Designed for various multimodal models.
16
16
-**Integration**: Currently integrated with **GPT-4v** as the default model.
17
17
-**Future Plans**: Support for additional models.
18
-
-**Accessibility**: Voice control thanks to [Whisper](https://github.com/mallorbc/whisper_mic) & [younesbram](https://github.com/younesbram)
19
18
20
-
21
-
## Current Challenges
19
+
### Current Challenges
22
20
> **Note:** GPT-4V's error rate in estimating XY mouse click locations is currently quite high. This framework aims to track the progress of multimodal models over time, aspiring to achieve human-level performance in computer operation.
23
21
24
-
## Ongoing Development
22
+
###Ongoing Development
25
23
At [HyperwriteAI](https://www.hyperwriteai.com/), we are developing Agent-1-Vision a multimodal model with more accurate click location predictions.
26
24
27
-
## Agent-1-Vision Model API Access
25
+
###Agent-1-Vision Model API Access
28
26
We will soon be offering API access to our Agent-1-Vision model.
29
27
30
28
If you're interested in gaining access to this API, sign up [here](https://othersideai.typeform.com/to/FszaJ1k8?typeform-source=www.hyperwriteai.com).
If you want to contribute yourself, see [CONTRIBUTING.md](https://github.com/OthersideAI/self-operating-computer/blob/main/CONTRIBUTING.md).
120
95
121
-
## Feedback
96
+
###Feedback
122
97
123
98
For any input on improving this project, feel free to reach out to [Josh](https://twitter.com/josh_bickett) on Twitter.
124
99
125
-
## Join Our Discord Community
100
+
###Join Our Discord Community
126
101
127
102
For real-time discussions and community support, join our Discord server.
128
103
- If you're already a member, join the discussion in [#self-operating-computer](https://discord.com/channels/877638638001877052/1181241785834541157).
129
104
- If you're new, first [join our Discord Server](https://discord.gg/YqaKtyBEzM) and then navigate to the [#self-operating-computer](https://discord.com/channels/877638638001877052/1181241785834541157).
130
105
131
-
## Follow HyperWriteAI for More Updates
106
+
###Follow HyperWriteAI for More Updates
132
107
133
108
Stay updated with the latest developments:
134
109
- Follow HyperWriteAI on [Twitter](https://twitter.com/HyperWriteAI).
135
110
- Follow HyperWriteAI on [LinkedIn](https://www.linkedin.com/company/othersideai/).
136
111
137
-
## Compatibility
112
+
###Compatibility
138
113
- This project is compatible with Mac OS, Windows, and Linux (with X server installed).
Copy file name to clipboardExpand all lines: operate/main.py
+37-78Lines changed: 37 additions & 78 deletions
Original file line number
Diff line number
Diff line change
@@ -13,7 +13,7 @@
13
13
importplatform
14
14
importXlib.display
15
15
importXlib.X
16
-
importXlib.Xutil# not sure if Xutil is necessary
16
+
importXlib.Xutil# not sure if Xutil is necessary
17
17
18
18
fromprompt_toolkitimportprompt
19
19
fromprompt_toolkit.shortcutsimportmessage_dialog
@@ -23,7 +23,6 @@
23
23
importmatplotlib.font_managerasfm
24
24
fromopenaiimportOpenAI
25
25
importsys
26
-
fromwhisper_micimportWhisperMic
27
26
28
27
29
28
load_dotenv()
@@ -97,9 +96,7 @@
97
96
Objective: {objective}
98
97
"""
99
98
100
-
ACCURATE_PIXEL_COUNT= (
101
-
200# mini_screenshot is ACCURATE_PIXEL_COUNT x ACCURATE_PIXEL_COUNT big
102
-
)
99
+
ACCURATE_PIXEL_COUNT=200# mini_screenshot is ACCURATE_PIXEL_COUNT x ACCURATE_PIXEL_COUNT big
103
100
ACCURATE_MODE_VISION_PROMPT="""
104
101
It looks like your previous attempted action was clicking on "x": {prev_x}, "y": {prev_y}. This has now been moved to the center of this screenshot.
105
102
As additional context to the previous message, before you decide the proper percentage to click on, please closely examine this additional screenshot as additional context for your next action.
@@ -195,12 +192,10 @@ def supports_ansi():
195
192
ANSI_BRIGHT_MAGENTA=""
196
193
197
194
198
-
defmain(model, accurate_mode, voice_mode=False):
195
+
defmain(model, accurate_mode):
199
196
"""
200
197
Main function for the Self-Operating Computer
201
198
"""
202
-
# Initialize WhisperMic if voice_mode is True if voice_mode is True
) # upscale the image so it's easier to see and percentage marks more visible
713
-
screenshot.save(file_path)
688
+
screenshot=screenshot.resize((screenshot.width*2, screenshot.height*2), Image.LANCZOS) # upscale the image so it's easier to see and percentage marks more visible
0 commit comments