'Can't backtrack' Error with Experimental hosted_vllm LiteLLM

**The bug**
In trying out the experimental `guidance.models.experimental.LiteLLM` model from [this notebook](https://github.com/guidance-ai/guidance/blob/main/notebooks/tutorials/litellm_models.ipynb), I get a sequence of `Warning: can't backtrack over ⟦',⟧; this may confuse the model` errors. 

In cases where I see this backtrack warning in vLLM logs, guidance raises a `KeyError` in [models/_base/_model.py](https://github.com/guidance-ai/guidance/blob/main/guidance/models/_base/_model.py#L227).

**To Reproduce**
I'm working with a function to generate *n* items in a list. I've confirmed the below works with other guidance backends (llamacpp and transformers). 

First, I start my vLLM server with:

```
vllm serve google/gemma-3-4b-it --host 0.0.0.0 \
--port 8000 \
--enable-prefix-caching \
--guided-decoding-backend guidance \
--max-model-len 1000
```

Then: 

```python
import guidance
import typing as t
from guidance.chat import Gemma29BInstructChatTemplate

SHOW_BUG = True
if SHOW_BUG:
    lm = guidance.models.experimental.LiteLLM({
        "model_name": "google/gemma-3-4b-it",
        "litellm_params": {
              "model": "hosted_vllm/google/gemma-3-4b-it",
              "api_base": "http://localhost:8000/v1",  # change to your vLLM API base URL
          },
    }, echo=False)
else:
    # The below works
    lm = guidance.models.Transformers(
        "google/gemma-3-4b-it", device_map='auto', chat_template=Gemma29BInstructChatTemplate, echo=False
    )

@guidance(stateless=True, dedent=False)
def gen_list(
    lm,
    options: t.Optional[t.List[str]] = None,
):
    """Generate 3 quoted strings in a list"""
    if options:
        single_item = guidance.select(options, list_append=True, name="response")
    else:
        single_item = guidance.gen(
            max_tokens=100,
            # Stop at Python list item separators
            stop_regex="""(\n|',|",|']|"])""",
            list_append=True,
            name="response",
        )  # type: ignore
    single_item = guidance.select(["'" + single_item + "'", '"' + single_item + '"'])
    single_item += guidance.optional(", ")  # type: ignore
    return lm + "[" + guidance.sequence(single_item, min_length=3, max_length=3) + "]"

with guidance.user():
    lm += "Give me a Python list of 3 strings."
with guidance.assistant():
    lm += "```python\nl = " + gen_list() + "```"

print(str(lm))
print(lm['response'])
```

When `SHOW_BUG == False`:
```
<start_of_turn>user
Give me a Python list of 3 strings.<end_of_turn>
<start_of_turn>model
```python
l =["apple", "banana", "cherry"]```
['apple', 'banana', 'cherry']
```

When `SHOW_BUG == True`:
```
<start_of_turn>user
Give me a Python list of 3 strings.<end_of_turn>
<start_of_turn>model
```python
l =["apple"",""banana" , "cherry""]"]```
Traceback (most recent call last):
  File "/home/parkervg/miniconda3/envs/blendsql/lib/python3.10/site-packages/guidance/models/_base/_model.py", line 225, in __getitem__
    captures = self._interpreter.state.captures[key]
KeyError: 'response'
```


**System info**
 - Ubuntu 25.04
 - cuda 13.0
 - vLLM==0.10.2
 - guidance==0.3.0

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

'Can't backtrack' Error with Experimental hosted_vllm LiteLLM #1383

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

'Can't backtrack' Error with Experimental hosted_vllm LiteLLM #1383

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions