Skip to content

Commit 1f6c011

Browse files
committed
Add back ### Voice Mode to readme
1 parent d4aecc9 commit 1f6c011

File tree

1 file changed

+30
-9
lines changed

1 file changed

+30
-9
lines changed

README.md

Lines changed: 30 additions & 9 deletions
Original file line numberDiff line numberDiff line change
@@ -11,18 +11,18 @@
1111
<img src="https://github.com/OthersideAI/self-operating-computer/blob/main/readme/self-operating-computer.png" width="750" style="margin: 10px;"/>
1212
</div>
1313

14-
### Key Features
14+
## Key Features
1515
- **Compatibility**: Designed for various multimodal models.
1616
- **Integration**: Currently integrated with **GPT-4v** as the default model.
1717
- **Future Plans**: Support for additional models.
1818

19-
### Current Challenges
19+
## Current Challenges
2020
> **Note:** GPT-4V's error rate in estimating XY mouse click locations is currently quite high. This framework aims to track the progress of multimodal models over time, aspiring to achieve human-level performance in computer operation.
2121
22-
### Ongoing Development
22+
## Ongoing Development
2323
At [HyperwriteAI](https://www.hyperwriteai.com/), we are developing Agent-1-Vision a multimodal model with more accurate click location predictions.
2424

25-
### Agent-1-Vision Model API Access
25+
## Agent-1-Vision Model API Access
2626
We will soon be offering API access to our Agent-1-Vision model.
2727

2828
If you're interested in gaining access to this API, sign up [here](https://othersideai.typeform.com/to/FszaJ1k8?typeform-source=www.hyperwriteai.com).
@@ -89,26 +89,47 @@ operate
8989
<img src="https://github.com/OthersideAI/self-operating-computer/blob/main/readme/terminal-access-2.png" width="300" style="margin: 10px;"/>
9090
</div>
9191

92-
### Contributions are Welcomed!:
92+
## Using `operate` Modes
93+
94+
### Voice Mode
95+
- Install the additional `requirements-audio.txt`
96+
```
97+
pip install -r requirements-audio.txt
98+
```
99+
**Install device requirements**
100+
- For mac users:
101+
```
102+
brew install portaudio
103+
```
104+
- For Linux users:
105+
```
106+
sudo apt install portaudio19-dev python3-pyaudio
107+
```
108+
Run with voice mode
109+
```
110+
operate --voice
111+
```
112+
113+
## Contributions are Welcomed!:
93114

94115
If you want to contribute yourself, see [CONTRIBUTING.md](https://github.com/OthersideAI/self-operating-computer/blob/main/CONTRIBUTING.md).
95116

96-
### Feedback
117+
## Feedback
97118

98119
For any input on improving this project, feel free to reach out to [Josh](https://twitter.com/josh_bickett) on Twitter.
99120

100-
### Join Our Discord Community
121+
## Join Our Discord Community
101122

102123
For real-time discussions and community support, join our Discord server.
103124
- If you're already a member, join the discussion in [#self-operating-computer](https://discord.com/channels/877638638001877052/1181241785834541157).
104125
- If you're new, first [join our Discord Server](https://discord.gg/YqaKtyBEzM) and then navigate to the [#self-operating-computer](https://discord.com/channels/877638638001877052/1181241785834541157).
105126

106-
### Follow HyperWriteAI for More Updates
127+
## Follow HyperWriteAI for More Updates
107128

108129
Stay updated with the latest developments:
109130
- Follow HyperWriteAI on [Twitter](https://twitter.com/HyperWriteAI).
110131
- Follow HyperWriteAI on [LinkedIn](https://www.linkedin.com/company/othersideai/).
111132

112-
### Compatibility
133+
## Compatibility
113134
- This project is compatible with Mac OS, Windows, and Linux (with X server installed).
114135

0 commit comments

Comments
 (0)