-
Notifications
You must be signed in to change notification settings - Fork 1.2k
Open
Description
The bug
In trying out the experimental guidance.models.experimental.LiteLLM model from this notebook, I get a sequence of Warning: can't backtrack over ⟦',⟧; this may confuse the model errors.
In cases where I see this backtrack warning in vLLM logs, guidance raises a KeyError in models/_base/_model.py.
To Reproduce
I'm working with a function to generate n items in a list. I've confirmed the below works with other guidance backends (llamacpp and transformers).
First, I start my vLLM server with:
vllm serve google/gemma-3-4b-it --host 0.0.0.0 \
--port 8000 \
--enable-prefix-caching \
--guided-decoding-backend guidance \
--max-model-len 1000
Then:
import guidance
import typing as t
from guidance.chat import Gemma29BInstructChatTemplate
SHOW_BUG = True
if SHOW_BUG:
lm = guidance.models.experimental.LiteLLM({
"model_name": "google/gemma-3-4b-it",
"litellm_params": {
"model": "hosted_vllm/google/gemma-3-4b-it",
"api_base": "http://localhost:8000/v1", # change to your vLLM API base URL
},
}, echo=False)
else:
# The below works
lm = guidance.models.Transformers(
"google/gemma-3-4b-it", device_map='auto', chat_template=Gemma29BInstructChatTemplate, echo=False
)
@guidance(stateless=True, dedent=False)
def gen_list(
lm,
options: t.Optional[t.List[str]] = None,
):
"""Generate 3 quoted strings in a list"""
if options:
single_item = guidance.select(options, list_append=True, name="response")
else:
single_item = guidance.gen(
max_tokens=100,
# Stop at Python list item separators
stop_regex="""(\n|',|",|']|"])""",
list_append=True,
name="response",
) # type: ignore
single_item = guidance.select(["'" + single_item + "'", '"' + single_item + '"'])
single_item += guidance.optional(", ") # type: ignore
return lm + "[" + guidance.sequence(single_item, min_length=3, max_length=3) + "]"
with guidance.user():
lm += "Give me a Python list of 3 strings."
with guidance.assistant():
lm += "```python\nl = " + gen_list() + "```"
print(str(lm))
print(lm['response'])When SHOW_BUG == False:
<start_of_turn>user
Give me a Python list of 3 strings.<end_of_turn>
<start_of_turn>model
```python
l =["apple", "banana", "cherry"]```
['apple', 'banana', 'cherry']
When SHOW_BUG == True:
<start_of_turn>user
Give me a Python list of 3 strings.<end_of_turn>
<start_of_turn>model
```python
l =["apple"",""banana" , "cherry""]"]```
Traceback (most recent call last):
File "/home/parkervg/miniconda3/envs/blendsql/lib/python3.10/site-packages/guidance/models/_base/_model.py", line 225, in __getitem__
captures = self._interpreter.state.captures[key]
KeyError: 'response'
System info
- Ubuntu 25.04
- cuda 13.0
- vLLM==0.10.2
- guidance==0.3.0
Reactions are currently unavailable
Metadata
Metadata
Assignees
Labels
No labels