You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Generating predictable and reliable outputs from large language models (LLMs) can be challenging, especially when those outputs need to integrate seamlessly with downstream systems. Structured outputs solve this problem by enforcing specific formats, such as JSON, regex patterns, or even grammars. vLLM supported this since some time ago, but there were no documentation on how to use it, and that´s why I decided to do a contribution and write the Structured Outputs documentation page (https://docs.vllm.ai/en/latest/usage/structured_outputs.html).
16
-
17
-
### Why Structured Outputs?
18
-
19
-
LLMs are incredibly powerful, but their outputs can be inconsistent when a specific format is required. Structured outputs address this issue by restricting the model’s generated text to adhere to predefined rules or formats, ensuring:
20
-
21
-
1.**Reliability:** Outputs are predictable and machine-readable.
22
-
2.**Compatibility:** Seamless integration with APIs, databases, or other systems.
23
-
3.**Efficiency:** No need for extensive post-processing to validate or fix outputs.
24
-
25
-
Imagine we have an external system which receives a JSON with the all the details to trigger an alert, and we want our LLM-based system to be able to use it. Of course we can try to explain the LLM what should be the output format and that it must be a valid JSON, but LLMs are not deterministic and thus we may end up with an invalid JSON. Probably, if you have tried to do something like this before, you would have found yourself in this situation.
26
-
27
-
How these tools work? The idea is that we´ll be able to filter the list of possible next tokens to force that we are always generating a token that is valid for the desired output format.
28
-
29
-
### What is vLLM?
30
-
31
-
vLLM is a state-of-the-art, open-source inference and serving engine for LLMs. It’s built for performance and simplicity, offering:
32
-
33
-
-**PagedAttention:** An innovative memory management mechanism for efficient attention key-value handling.
To start integrating structured outputs into your projects:
171
-
172
-
1.**Explore the Documentation:** Check out the official documentation for more examples and detailed explanations.
173
-
2.**Install vLLM Locally:** Set up the inference server on your local machine using the vLLM GitHub repository.
174
-
3.**Experiment with Structured Outputs:** Try out different formats (choice, regex, JSON, grammar) and observe how they can simplify your workflow.
175
-
4.**Deploy in Production:** Once comfortable, deploy vLLM to your production environment and integrate it with your applications.
176
-
15
+
16
+
Generating predictable and reliable outputs from large language models (LLMs) can be challenging, especially when those outputs need to integrate seamlessly with downstream systems. Structured outputs solve this problem by enforcing specific formats, such as JSON, regex patterns, or even grammars. vLLM supported this since some time ago, but there were no documentation on how to use it, and that´s why I decided to do a contribution and write the Structured Outputs documentation page (https://docs.vllm.ai/en/latest/usage/structured_outputs.html).
17
+
18
+
### Why Structured Outputs?
19
+
20
+
LLMs are incredibly powerful, but their outputs can be inconsistent when a specific format is required. Structured outputs address this issue by restricting the model’s generated text to adhere to predefined rules or formats, ensuring:
21
+
22
+
1.**Reliability:** Outputs are predictable and machine-readable.
23
+
2.**Compatibility:** Seamless integration with APIs, databases, or other systems.
24
+
3.**Efficiency:** No need for extensive post-processing to validate or fix outputs.
25
+
26
+
Imagine we have an external system which receives a JSON with the all the details to trigger an alert, and we want our LLM-based system to be able to use it. Of course we can try to explain the LLM what should be the output format and that it must be a valid JSON, but LLMs are not deterministic and thus we may end up with an invalid JSON. Probably, if you have tried to do something like this before, you would have found yourself in this situation.
27
+
28
+
How these tools work? The idea is that we´ll be able to filter the list of possible next tokens to force that we are always generating a token that is valid for the desired output format.
29
+

30
+
31
+
### What is vLLM?
32
+
33
+
vLLM is a state-of-the-art, open-source inference and serving engine for LLMs. It’s built for performance and simplicity, offering:
34
+
35
+
***PagedAttention:** An innovative memory management mechanism for efficient attention key-value handling.
0 commit comments