Skip to content

'Can't backtrack' Error with Experimental hosted_vllm LiteLLM #1383

@parkervg

Description

@parkervg

The bug
In trying out the experimental guidance.models.experimental.LiteLLM model from this notebook, I get a sequence of Warning: can't backtrack over ⟦',⟧; this may confuse the model errors.

In cases where I see this backtrack warning in vLLM logs, guidance raises a KeyError in models/_base/_model.py.

To Reproduce
I'm working with a function to generate n items in a list. I've confirmed the below works with other guidance backends (llamacpp and transformers).

First, I start my vLLM server with:

vllm serve google/gemma-3-4b-it --host 0.0.0.0 \
--port 8000 \
--enable-prefix-caching \
--guided-decoding-backend guidance \
--max-model-len 1000

Then:

import guidance
import typing as t
from guidance.chat import Gemma29BInstructChatTemplate

SHOW_BUG = True
if SHOW_BUG:
    lm = guidance.models.experimental.LiteLLM({
        "model_name": "google/gemma-3-4b-it",
        "litellm_params": {
              "model": "hosted_vllm/google/gemma-3-4b-it",
              "api_base": "http://localhost:8000/v1",  # change to your vLLM API base URL
          },
    }, echo=False)
else:
    # The below works
    lm = guidance.models.Transformers(
        "google/gemma-3-4b-it", device_map='auto', chat_template=Gemma29BInstructChatTemplate, echo=False
    )

@guidance(stateless=True, dedent=False)
def gen_list(
    lm,
    options: t.Optional[t.List[str]] = None,
):
    """Generate 3 quoted strings in a list"""
    if options:
        single_item = guidance.select(options, list_append=True, name="response")
    else:
        single_item = guidance.gen(
            max_tokens=100,
            # Stop at Python list item separators
            stop_regex="""(\n|',|",|']|"])""",
            list_append=True,
            name="response",
        )  # type: ignore
    single_item = guidance.select(["'" + single_item + "'", '"' + single_item + '"'])
    single_item += guidance.optional(", ")  # type: ignore
    return lm + "[" + guidance.sequence(single_item, min_length=3, max_length=3) + "]"

with guidance.user():
    lm += "Give me a Python list of 3 strings."
with guidance.assistant():
    lm += "```python\nl = " + gen_list() + "```"

print(str(lm))
print(lm['response'])

When SHOW_BUG == False:

<start_of_turn>user
Give me a Python list of 3 strings.<end_of_turn>
<start_of_turn>model
```python
l =["apple", "banana", "cherry"]```
['apple', 'banana', 'cherry']

When SHOW_BUG == True:

<start_of_turn>user
Give me a Python list of 3 strings.<end_of_turn>
<start_of_turn>model
```python
l =["apple"",""banana" , "cherry""]"]```
Traceback (most recent call last):
  File "/home/parkervg/miniconda3/envs/blendsql/lib/python3.10/site-packages/guidance/models/_base/_model.py", line 225, in __getitem__
    captures = self._interpreter.state.captures[key]
KeyError: 'response'

System info

  • Ubuntu 25.04
  • cuda 13.0
  • vLLM==0.10.2
  • guidance==0.3.0

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions