Skip to content

Commit 631a522

Browse files
committed
Add ### Set-of-Mark Prompting
1 parent b7cbd84 commit 631a522

File tree

1 file changed

+13
-0
lines changed

1 file changed

+13
-0
lines changed

README.md

Lines changed: 13 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -123,6 +123,19 @@ Start `operate` with the Gemini model
123123
operate -m gemini-pro-vision
124124
```
125125

126+
### Set-of-Mark Prompting `-m gpt-4-with-som`
127+
The Self-Operating Computer Framework now supports Set-of-Mark (SoM) Prompting with the `gpt-4-with-som` command. This new visual prompting method enhances the visual grounding capabilities of large multimodal models.
128+
129+
Learn more about SoM Prompting in the detailed arXiv paper: [here](https://arxiv.org/abs/2310.11441).
130+
131+
For this initial version, a simple YOLOv8 model is trained for button detection, and the `best.pt` file is included under `model/weights/`. Users are encouraged to swap in their `best.pt` file to evaluate performance improvements. If your model outperforms the existing one, please contribute by creating a pull request (PR).
132+
133+
Start `operate` with the SoM model
134+
135+
```
136+
operate -m gpt-4-with-som
137+
```
138+
126139
### Voice Mode `--voice`
127140
The framework supports voice inputs for the objective. Try voice by following the instructions below.
128141

0 commit comments

Comments
 (0)