Skip to content

Commit 4c9ac83

Browse files
authored
Update beginner documentation (#1822)
SUMMARY: While exploring the LLM-Compressor project, I noticed that several beginner-level examples in the documentation were out of date and no longer run as written. This PR aims to fix these small issues, making the docs use non-deprecated code. A summary of the changes is below: - Utilize `SamplingParams` as an input to `model.generate()` since old code no longer worked - Align CLI + `curl` examples: use "TinyLlama-1.1B-Chat-v1.0-INT8" consistently (removes `./` prefix but keeps model key consistent between `vllm serve` and `curl`) - Update import paths as needed These changes affect only documentation, not runtime code. TEST PLAN: All changes here **only affect documentation**. All changes to the example code blocks were tested locally on a blank Python 3.9 conda environment with `llmcompressor` and `vllm` installed. Signed-off-by: Rayan Syed <[email protected]>
1 parent a7bc45d commit 4c9ac83

File tree

3 files changed

+13
-7
lines changed

3 files changed

+13
-7
lines changed

docs/getting-started/deploy.md

Lines changed: 6 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -24,11 +24,13 @@ Before deploying your model, ensure you have the following prerequisites:
2424
vLLM provides a Python API for easy integration with your applications, enabling you to load and use your compressed model directly in your Python code. To test the compressed model, use the following code:
2525

2626
```python
27-
from vllm import LLM
27+
from vllm import LLM, SamplingParams
2828

2929
model = LLM("./TinyLlama-1.1B-Chat-v1.0-INT8")
30-
output = model.generate("What is machine learning?", max_tokens=256)
31-
print(output)
30+
sampling_params = SamplingParams(max_tokens=256)
31+
outputs = model.generate("What is machine learning?", sampling_params)
32+
for output in outputs:
33+
print(output.outputs[0].text)
3234
```
3335

3436
After running the above code, you should see the generated output from your compressed model. This confirms that the model is loaded and ready for inference.
@@ -39,7 +41,7 @@ vLLM also provides an HTTP server for serving your model via a RESTful API that
3941
To start the HTTP server, use the following command:
4042

4143
```bash
42-
vllm serve "./TinyLlama-1.1B-Chat-v1.0-INT8"
44+
vllm serve "TinyLlama-1.1B-Chat-v1.0-INT8"
4345
```
4446

4547
By default, the server will run on `localhost:8000`. You can change the host and port by using the `--host` and `--port` flags. Now that the server is running, you can send requests to it using any HTTP client. For example, you can use `curl` to send a request:

docs/getting-started/install.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -38,7 +38,7 @@ If you need a specific version of LLM Compressor, you can specify the version nu
3838
pip install llmcompressor==0.5.1
3939
```
4040

41-
Replace `0.1.0` with your desired version number.
41+
Replace `0.5.1` with your desired version number.
4242

4343
### Install from Source
4444

docs/guides/saving_a_model.md

Lines changed: 6 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -69,7 +69,7 @@ If you need more control, you can wrap `save_pretrained` manually:
6969

7070
```python
7171
from transformers import AutoModelForCausalLM
72-
from llmcompressor.transformers.sparsification import modify_save_pretrained
72+
from llmcompressor.transformers.sparsification.compressed_tensors_utils import modify_save_pretrained
7373

7474
# Load model
7575
model = AutoModelForCausalLM.from_pretrained("your-model")
@@ -88,7 +88,11 @@ model.save_pretrained(
8888
### Saving with Custom Sparsity Configuration
8989

9090
```python
91-
from compressed_tensors.sparsification import SparsityCompressionConfig
91+
from transformers import AutoModelForCausalLM
92+
from compressed_tensors import SparsityCompressionConfig
93+
94+
# Load model
95+
model = AutoModelForCausalLM.from_pretrained("your-model")
9296

9397
# Create custom sparsity config
9498
custom_config = SparsityCompressionConfig(

0 commit comments

Comments
 (0)