|
11 | 11 | <img src="https://github.com/OthersideAI/self-operating-computer/blob/main/readme/self-operating-computer.png" width="750" style="margin: 10px;"/> |
12 | 12 | </div> |
13 | 13 |
|
14 | | -### Key Features |
| 14 | +## Key Features |
15 | 15 | - **Compatibility**: Designed for various multimodal models. |
16 | 16 | - **Integration**: Currently integrated with **GPT-4v** as the default model. |
17 | 17 | - **Future Plans**: Support for additional models. |
18 | 18 |
|
19 | | -### Current Challenges |
| 19 | +## Current Challenges |
20 | 20 | > **Note:** GPT-4V's error rate in estimating XY mouse click locations is currently quite high. This framework aims to track the progress of multimodal models over time, aspiring to achieve human-level performance in computer operation. |
21 | 21 |
|
22 | | -### Ongoing Development |
| 22 | +## Ongoing Development |
23 | 23 | At [HyperwriteAI](https://www.hyperwriteai.com/), we are developing Agent-1-Vision a multimodal model with more accurate click location predictions. |
24 | 24 |
|
25 | | -### Agent-1-Vision Model API Access |
| 25 | +## Agent-1-Vision Model API Access |
26 | 26 | We will soon be offering API access to our Agent-1-Vision model. |
27 | 27 |
|
28 | 28 | If you're interested in gaining access to this API, sign up [here](https://othersideai.typeform.com/to/FszaJ1k8?typeform-source=www.hyperwriteai.com). |
@@ -89,26 +89,47 @@ operate |
89 | 89 | <img src="https://github.com/OthersideAI/self-operating-computer/blob/main/readme/terminal-access-2.png" width="300" style="margin: 10px;"/> |
90 | 90 | </div> |
91 | 91 |
|
92 | | -### Contributions are Welcomed!: |
| 92 | +## Using `operate` Modes |
| 93 | + |
| 94 | +### Voice Mode |
| 95 | +- Install the additional `requirements-audio.txt` |
| 96 | +``` |
| 97 | +pip install -r requirements-audio.txt |
| 98 | +``` |
| 99 | +**Install device requirements** |
| 100 | +- For mac users: |
| 101 | +``` |
| 102 | +brew install portaudio |
| 103 | +``` |
| 104 | +- For Linux users: |
| 105 | +``` |
| 106 | +sudo apt install portaudio19-dev python3-pyaudio |
| 107 | +``` |
| 108 | +Run with voice mode |
| 109 | +``` |
| 110 | +operate --voice |
| 111 | +``` |
| 112 | + |
| 113 | +## Contributions are Welcomed!: |
93 | 114 |
|
94 | 115 | If you want to contribute yourself, see [CONTRIBUTING.md](https://github.com/OthersideAI/self-operating-computer/blob/main/CONTRIBUTING.md). |
95 | 116 |
|
96 | | -### Feedback |
| 117 | +## Feedback |
97 | 118 |
|
98 | 119 | For any input on improving this project, feel free to reach out to [Josh](https://twitter.com/josh_bickett) on Twitter. |
99 | 120 |
|
100 | | -### Join Our Discord Community |
| 121 | +## Join Our Discord Community |
101 | 122 |
|
102 | 123 | For real-time discussions and community support, join our Discord server. |
103 | 124 | - If you're already a member, join the discussion in [#self-operating-computer](https://discord.com/channels/877638638001877052/1181241785834541157). |
104 | 125 | - If you're new, first [join our Discord Server](https://discord.gg/YqaKtyBEzM) and then navigate to the [#self-operating-computer](https://discord.com/channels/877638638001877052/1181241785834541157). |
105 | 126 |
|
106 | | -### Follow HyperWriteAI for More Updates |
| 127 | +## Follow HyperWriteAI for More Updates |
107 | 128 |
|
108 | 129 | Stay updated with the latest developments: |
109 | 130 | - Follow HyperWriteAI on [Twitter](https://twitter.com/HyperWriteAI). |
110 | 131 | - Follow HyperWriteAI on [LinkedIn](https://www.linkedin.com/company/othersideai/). |
111 | 132 |
|
112 | | -### Compatibility |
| 133 | +## Compatibility |
113 | 134 | - This project is compatible with Mac OS, Windows, and Linux (with X server installed). |
114 | 135 |
|
0 commit comments