Skip to content

Commit 665c94b

Browse files
committed
Update Blog “using-structured-outputs-in-vllm”
1 parent 6246b17 commit 665c94b

File tree

2 files changed

+16
-17
lines changed

2 files changed

+16
-17
lines changed

content/blog/using-structured-outputs-in-vllm.md

Lines changed: 16 additions & 17 deletions
Original file line numberDiff line numberDiff line change
@@ -11,9 +11,9 @@ tags:
1111
- opensource
1212
- LLM
1313
---
14-
Generating predictable and reliable outputs from large language models (LLMs) can be challenging, especially when those outputs need to integrate seamlessly with downstream systems. Structured outputs solve this problem by enforcing specific formats, such as JSON, regex patterns, or even grammars. vLLM supported this since some time ago, but there were no documentation on how to use it, and that´s why I decided to do a contribution and write the [Structured Outputs documentation page](https://docs.vllm.ai/en/latest/usage/structured_outputs.html).
14+
Generating predictable and reliable outputs from large language models (LLMs) can be challenging, especially when those outputs need to integrate seamlessly with downstream systems. Structured outputs solve this problem by enforcing specific formats, such as JSON, regex patterns, or even grammars. vLLM, an open source inference and serving engine for LLMs, supports structured outputs since some time ago, but there were no documentation on how to use it supported this since some time ago, but there were no documentation on how to use it, and that´s why I decided to do a contribution and write the [Structured Outputs documentation page](https://docs.vllm.ai/en/latest/usage/structured_outputs.html).
1515

16-
## Why Structured Outputs?
16+
## Why structured outputs?
1717

1818
LLMs are incredibly powerful, but their outputs can be inconsistent when a specific format is required. Structured outputs address this issue by restricting the model’s generated text to adhere to predefined rules or formats, ensuring:
1919

@@ -25,16 +25,15 @@ Imagine we have an external system which receives a JSON with the all the detail
2525

2626
How these tools work? The idea is that we´ll be able to filter the list of possible next tokens to force that we are always generating a token that is valid for the desired output format.
2727

28-
2928
![Structured outputs in vLLM](/img/structured_outputs_thumbnail.png "Structured outputs in vLLM")
3029

3130
## What is vLLM?
3231

3332
vLLM is a state-of-the-art, open-source inference and serving engine for LLMs. It’s built for performance and simplicity, offering:
3433

3534
* **PagedAttention:** An innovative memory management mechanism for efficient attention key-value handling.
36-
* **Continuous Batching:** Supports concurrent requests dynamically.
37-
* **Advanced Optimizations:** Includes features like quantization, speculative decoding, and CUDA graphs.
35+
* **Continuous batching:** Supports concurrent requests dynamically.
36+
* **Advanced optimizations:** Includes features like quantization, speculative decoding, and CUDA graphs.
3837

3938
These optimizations make vLLM one of the fastest and most versatile engines for production environments.
4039

@@ -49,7 +48,7 @@ vLLM extends the OpenAI API with additional parameters to enable structured outp
4948

5049
Here’s how each works, along with example outputs:
5150

52-
### **1. Guided Choice**
51+
### **1. Guided choice**
5352

5453
Simplest form of structured output, ensuring the response is one of a set of predefined options.
5554

@@ -68,7 +67,7 @@ completion = client.chat.completions.create(
6867
print(completion.choices[0].message.content)
6968
```
7069

71-
**Example Output:**
70+
**Example output:**
7271

7372
```
7473
positive
@@ -92,7 +91,7 @@ completion = client.chat.completions.create(
9291
print(completion.choices[0].message.content)
9392
```
9493

95-
**Example Output:**
94+
**Example output:**
9695

9796
```
9897
@@ -129,7 +128,7 @@ completion = client.chat.completions.create(
129128
print(completion.choices[0].message.content)
130129
```
131130

132-
**Example Output:**
131+
**Example output:**
133132

134133
```json
135134
{
@@ -139,9 +138,9 @@ print(completion.choices[0].message.content)
139138
}
140139
```
141140

142-
### **4. Guided Grammar**
141+
### **4. Guided grammar**
143142

144-
Uses an EBNF grammar to define complex output structures, such as SQL queries.
143+
Uses an Extended Backus–Naur Form (EBNF) grammar to define complex output structures, such as SQL queries.
145144

146145
```python
147146
completion = client.chat.completions.create(
@@ -161,19 +160,19 @@ completion = client.chat.completions.create(
161160
print(completion.choices[0].message.content)
162161
```
163162

164-
**Example Output:**
163+
**Example output:**
165164

166165
```sql
167166
SELECT * FROM users WHERE age > 30;
168167
```
169168

170-
## **Next Steps**
169+
## **Next steps**
171170

172171
To start integrating structured outputs into your projects:
173172

174-
1. **Explore the Documentation:** Check out the official documentation for more examples and detailed explanations.
175-
2. **Install vLLM Locally:** Set up the inference server on your local machine using the vLLM GitHub repository.
176-
3. **Experiment with Structured Outputs:** Try out different formats (choice, regex, JSON, grammar) and observe how they can simplify your workflow.
177-
4. **Deploy in Production:** Once comfortable, deploy vLLM to your production environment and integrate it with your applications.
173+
1. **Explore the documentation:** Check out the official documentation for more examples and detailed explanations.
174+
2. **Install vLLM locally:** Set up the inference server on your local machine using the vLLM GitHub repository.
175+
3. **Experiment with structured outputs:** Try out different formats (choice, regex, JSON, grammar) and observe how they can simplify your workflow.
176+
4. **Deploy in production:** Once comfortable, deploy vLLM to your production environment and integrate it with your applications.
178177

179178
Structured outputs make LLMs not only powerful but also practical for real-world applications. Dive in and see what you can build!
-14.8 KB
Loading

0 commit comments

Comments
 (0)