You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
We use vLLM to enable batched generation. First, install dependencies:
3
+
```bash
4
+
pip install vllm openai
5
+
```
6
+
7
+
## Start server
8
+
9
+
```bash
10
+
python -m vllm.entrypoints.openai.api_server \
11
+
--model YOUR_MODEL_NAME --port 8000
12
+
```
13
+
You can also start multiple servers with different ports to enable parallel generation. In `generate.py`, we scan the ports from 8000 to 8009 to find available servers. You can modify the code to use other ports.
14
+
15
+
## Generate data
16
+
The following command will let the model to continue the first prompt from each sample in `DATA_PATH`, this is suitable for models that can play both roles in a conversation (e.g., Zephyr 7B). If you want to use all prompts in each sample to repeatly talk to the model, use `--chat` instead. `--chat` mode works for more models but may take longer time to generate due to repeated computation (welcome to contribute a better implementation).
When generated with `--chat`, the output file will follow the ShareGPT format ([example](https://github.com/lm-sys/FastChat/blob/main/data/dummy_conversation.json)).
24
+
You can use the following command to convert the generated text withour `--chat` to the same format:
0 commit comments