update code

tukwila · tukwila · commit a546a165cc7c · 2025-09-08T17:30:11.000+08:00
Signed-off-by: guangli.bao &lt;guangli.bao@daocloud.io&gt;
diff --git a/docs/datasets.md b/docs/datasets.md
@@ -221,32 +221,26 @@ benchmark_generative_text(data=data, ...)
 - For lists of items, all elements must be of the same type.
 - A processor/tokenizer is only required if `GUIDELLM__PREFERRED_PROMPT_TOKENS_SOURCE="local"` or `GUIDELLM__PREFERRED_OUTPUT_TOKENS_SOURCE="local"` is set in the environment. In this case, the processor/tokenizer must be specified using the `--processor` argument. If not set, the processor/tokenizer will be set to the model passed in or retrieved from the server.
 
-
 ### ShareGPT Datasets
 
 You can use ShareGPT_V3_unfiltered_cleaned_split.json as benchmark datasets.
 
-1. Download and prepare the ShareGPT dataset 
-    You can specify the proportion of data to process by providing a number between 0 and 1 as an argument to the script.
+#### Example Commands
 
-    ```bash
-    cd src/guidellm/utils
-    pip install -r requirements.txt
-    bash prepare_sharegpt_data.sh 1
-    ```
+Download and prepare the ShareGPT dataset; You can specify the proportion of data to process by providing a number between 0 and 1 as an argument to the script.
 
-    In this example, 1 indicates processing 100% of the dataset. You can adjust this value as needed.
+```bash
+  cd src/guidellm/utils && pip install -r requirements.txt && bash prepare_sharegpt_data.sh 1
 
-    Conda env Recommanded to install libs.
+```
 
-2. Run the benchmark
-    Example:
+In this example, 1 indicates processing 100% of the dataset. You can adjust this value as needed. Conda env Recommanded to install libs.
 
-    ```bash
-    guidellm benchmark \
-      --target "http://localhost:8000" \
-      --rate-type "throughput" \
-      --data-args '{"prompt_column": "value", "split": "train"}' \
-      --max-requests 10 \
-      --data "/${local_path}/ShareGPT.json"
-    ```
+```bash
+guidellm benchmark \
+  --target "http://localhost:8000" \
+  --rate-type "throughput" \
+  --data-args '{"prompt_column": "value", "split": "train"}' \
+  --max-requests 10 \
+  --data "/${local_path}/ShareGPT.json"
+```
diff --git a/src/guidellm/utils/prepare_sharegpt_data.sh b/src/guidellm/utils/prepare_sharegpt_data.sh
@@ -1,4 +1,4 @@
 #!/bin/bash
 
 wget https://huggingface.co/datasets/anon8231489123/ShareGPT_Vicuna_unfiltered/resolve/main/ShareGPT_V3_unfiltered_cleaned_split.json
-python3 sharegpt_data_preprocessing.py --parse $1
+python3 preprocessing_sharegpt_data.py --parse $1
diff --git a/src/guidellm/utils/preprocessing_sharegpt_data.py b/src/guidellm/utils/preprocessing_sharegpt_data.py
diff --git a/src/guidellm/utils/requirements.txt b/src/guidellm/utils/requirements.txt
@@ -1,4 +1,4 @@
 tqdm
 pandas
 openai
-pyyaml
+pyyaml

-Original file line number
+Diff line change
@@ @@ -1,4 +1,4 @@ @@
 tqdm
 pandas
 openai
 -pyyaml
 +pyyaml