You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
-**Compatibility**: Designed for various multimodal models.
16
16
-**Integration**: Currently integrated with **GPT-4v** as the default model.
17
17
-**Future Plans**: Support for additional models.
18
+
-**Accessibility**: Voice control thanks to [Whisper](https://github.com/mallorbc/whisper_mic) & [younesbram](https://github.com/younesbram)
18
19
19
-
### Current Challenges
20
+
21
+
## Current Challenges
20
22
> **Note:** GPT-4V's error rate in estimating XY mouse click locations is currently quite high. This framework aims to track the progress of multimodal models over time, aspiring to achieve human-level performance in computer operation.
21
23
22
-
###Ongoing Development
24
+
## Ongoing Development
23
25
At [HyperwriteAI](https://www.hyperwriteai.com/), we are developing Agent-1-Vision a multimodal model with more accurate click location predictions.
24
26
25
-
###Agent-1-Vision Model API Access
27
+
## Agent-1-Vision Model API Access
26
28
We will soon be offering API access to our Agent-1-Vision model.
27
29
28
30
If you're interested in gaining access to this API, sign up [here](https://othersideai.typeform.com/to/FszaJ1k8?typeform-source=www.hyperwriteai.com).
If you want to contribute yourself, see [CONTRIBUTING.md](https://github.com/OthersideAI/self-operating-computer/blob/main/CONTRIBUTING.md).
95
120
96
-
###Feedback
121
+
## Feedback
97
122
98
123
For any input on improving this project, feel free to reach out to [Josh](https://twitter.com/josh_bickett) on Twitter.
99
124
100
-
###Join Our Discord Community
125
+
## Join Our Discord Community
101
126
102
127
For real-time discussions and community support, join our Discord server.
103
128
- If you're already a member, join the discussion in [#self-operating-computer](https://discord.com/channels/877638638001877052/1181241785834541157).
104
129
- If you're new, first [join our Discord Server](https://discord.gg/YqaKtyBEzM) and then navigate to the [#self-operating-computer](https://discord.com/channels/877638638001877052/1181241785834541157).
105
130
106
-
###Follow HyperWriteAI for More Updates
131
+
## Follow HyperWriteAI for More Updates
107
132
108
133
Stay updated with the latest developments:
109
134
- Follow HyperWriteAI on [Twitter](https://twitter.com/HyperWriteAI).
110
135
- Follow HyperWriteAI on [LinkedIn](https://www.linkedin.com/company/othersideai/).
111
136
112
-
###Compatibility
137
+
## Compatibility
113
138
- This project is compatible with Mac OS, Windows, and Linux (with X server installed).
Copy file name to clipboardExpand all lines: operate/main.py
+78-37Lines changed: 78 additions & 37 deletions
Original file line number
Diff line number
Diff line change
@@ -13,7 +13,7 @@
13
13
importplatform
14
14
importXlib.display
15
15
importXlib.X
16
-
importXlib.Xutil# not sure if Xutil is necessary
16
+
importXlib.Xutil# not sure if Xutil is necessary
17
17
18
18
fromprompt_toolkitimportprompt
19
19
fromprompt_toolkit.shortcutsimportmessage_dialog
@@ -23,6 +23,7 @@
23
23
importmatplotlib.font_managerasfm
24
24
fromopenaiimportOpenAI
25
25
importsys
26
+
fromwhisper_micimportWhisperMic
26
27
27
28
28
29
load_dotenv()
@@ -96,7 +97,9 @@
96
97
Objective: {objective}
97
98
"""
98
99
99
-
ACCURATE_PIXEL_COUNT=200# mini_screenshot is ACCURATE_PIXEL_COUNT x ACCURATE_PIXEL_COUNT big
100
+
ACCURATE_PIXEL_COUNT= (
101
+
200# mini_screenshot is ACCURATE_PIXEL_COUNT x ACCURATE_PIXEL_COUNT big
102
+
)
100
103
ACCURATE_MODE_VISION_PROMPT="""
101
104
It looks like your previous attempted action was clicking on "x": {prev_x}, "y": {prev_y}. This has now been moved to the center of this screenshot.
102
105
As additional context to the previous message, before you decide the proper percentage to click on, please closely examine this additional screenshot as additional context for your next action.
@@ -192,10 +195,12 @@ def supports_ansi():
192
195
ANSI_BRIGHT_MAGENTA=""
193
196
194
197
195
-
defmain(model, accurate_mode):
198
+
defmain(model, accurate_mode, voice_mode=False):
196
199
"""
197
200
Main function for the Self-Operating Computer
198
201
"""
202
+
# Initialize WhisperMic if voice_mode is True if voice_mode is True
screenshot=screenshot.resize((screenshot.width*2, screenshot.height*2), Image.LANCZOS) # upscale the image so it's easier to see and percentage marks more visible
0 commit comments