update datasets.md

tukwila · tukwila · commit 394f50591287 · 2025-09-16T11:54:21.000+08:00
Signed-off-by: guangli.bao &lt;guangli.bao@daocloud.io&gt;
diff --git a/contrib/sharegpt_preprocess/README.md b/contrib/sharegpt_preprocess/README.md
@@ -0,0 +1,25 @@
+# ShareGPT Datasets
+
+You can use ShareGPT_V3_unfiltered_cleaned_split.json as benchmark datasets.
+
+## Example Commands
+
+Download and prepare the ShareGPT dataset; You can specify the proportion of data to process by providing a number between 0 and 1 as an argument to the script.
+
+```bash
+cd contrib/sharegpt_preprocess
+pip install -r requirements.txt
+bash prepare_sharegpt_data.sh 1
+
+```
+
+In this example, 1 indicates processing 100% of the dataset. You can adjust this value as needed. Conda env is Recommanded to install libs.
+
+```bash
+guidellm benchmark \
+  --target "http://localhost:8000" \
+  --rate-type "throughput" \
+  --data-args '{"prompt_column": "value", "split": "train"}' \
+  --max-requests 10 \
+  --data "/${local_path}/ShareGPT.json"
+```
diff --git a/docs/datasets.md b/docs/datasets.md
@@ -220,27 +220,3 @@ benchmark_generative_text(data=data, ...)
 - For lists of dictionaries, all items must have the same keys.
 - For lists of items, all elements must be of the same type.
 - A processor/tokenizer is only required if `GUIDELLM__PREFERRED_PROMPT_TOKENS_SOURCE="local"` or `GUIDELLM__PREFERRED_OUTPUT_TOKENS_SOURCE="local"` is set in the environment. In this case, the processor/tokenizer must be specified using the `--processor` argument. If not set, the processor/tokenizer will be set to the model passed in or retrieved from the server.
-
-### ShareGPT Datasets
-
-You can use ShareGPT_V3_unfiltered_cleaned_split.json as benchmark datasets.
-
-#### Example Commands
-
-Download and prepare the ShareGPT dataset; You can specify the proportion of data to process by providing a number between 0 and 1 as an argument to the script.
-
-```bash
-  cd contrib/sharegpt_preprocess && pip install -r requirements.txt && bash prepare_sharegpt_data.sh 1
-
-```
-
-In this example, 1 indicates processing 100% of the dataset. You can adjust this value as needed. Conda env is Recommanded to install libs.
-
-```bash
-guidellm benchmark \
-  --target "http://localhost:8000" \
-  --rate-type "throughput" \
-  --data-args '{"prompt_column": "value", "split": "train"}' \
-  --max-requests 10 \
-  --data "/${local_path}/ShareGPT.json"
-```