Skip to content

Commit 394f505

Browse files
committed
update datasets.md
Signed-off-by: guangli.bao <[email protected]>
1 parent a347948 commit 394f505

File tree

2 files changed

+25
-24
lines changed

2 files changed

+25
-24
lines changed
Lines changed: 25 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,25 @@
1+
# ShareGPT Datasets
2+
3+
You can use ShareGPT_V3_unfiltered_cleaned_split.json as benchmark datasets.
4+
5+
## Example Commands
6+
7+
Download and prepare the ShareGPT dataset; You can specify the proportion of data to process by providing a number between 0 and 1 as an argument to the script.
8+
9+
```bash
10+
cd contrib/sharegpt_preprocess
11+
pip install -r requirements.txt
12+
bash prepare_sharegpt_data.sh 1
13+
14+
```
15+
16+
In this example, 1 indicates processing 100% of the dataset. You can adjust this value as needed. Conda env is Recommanded to install libs.
17+
18+
```bash
19+
guidellm benchmark \
20+
--target "http://localhost:8000" \
21+
--rate-type "throughput" \
22+
--data-args '{"prompt_column": "value", "split": "train"}' \
23+
--max-requests 10 \
24+
--data "/${local_path}/ShareGPT.json"
25+
```

docs/datasets.md

Lines changed: 0 additions & 24 deletions
Original file line numberDiff line numberDiff line change
@@ -220,27 +220,3 @@ benchmark_generative_text(data=data, ...)
220220
- For lists of dictionaries, all items must have the same keys.
221221
- For lists of items, all elements must be of the same type.
222222
- A processor/tokenizer is only required if `GUIDELLM__PREFERRED_PROMPT_TOKENS_SOURCE="local"` or `GUIDELLM__PREFERRED_OUTPUT_TOKENS_SOURCE="local"` is set in the environment. In this case, the processor/tokenizer must be specified using the `--processor` argument. If not set, the processor/tokenizer will be set to the model passed in or retrieved from the server.
223-
224-
### ShareGPT Datasets
225-
226-
You can use ShareGPT_V3_unfiltered_cleaned_split.json as benchmark datasets.
227-
228-
#### Example Commands
229-
230-
Download and prepare the ShareGPT dataset; You can specify the proportion of data to process by providing a number between 0 and 1 as an argument to the script.
231-
232-
```bash
233-
cd contrib/sharegpt_preprocess && pip install -r requirements.txt && bash prepare_sharegpt_data.sh 1
234-
235-
```
236-
237-
In this example, 1 indicates processing 100% of the dataset. You can adjust this value as needed. Conda env is Recommanded to install libs.
238-
239-
```bash
240-
guidellm benchmark \
241-
--target "http://localhost:8000" \
242-
--rate-type "throughput" \
243-
--data-args '{"prompt_column": "value", "split": "train"}' \
244-
--max-requests 10 \
245-
--data "/${local_path}/ShareGPT.json"
246-
```

0 commit comments

Comments
 (0)