BUG: VLLMModel breaks when using vllm > 0.10.1

# Bug Report: VLLMModel breaks when using vllm > 0.10.1

## Description
VLLMModel in smolagents breaks when using vllm version 0.10.1 or higher due to API changes in vllm that removed the `guided_decoding_backend` parameter.

## Steps to Reproduce
1. Install vllm > 0.10.1
2. Install smolagents 1.22.0
3. Initialize a VLLMModel
4. Create a CodeAgent with the VLLMModel
5. Run GradioUI with the CodeAgent
6. Chat with the agent

## Code to Reproduce
```python
from smolagents import VLLMModel, CodeAgent, GradioUI


def main():
    model = VLLMModel(
        model_id="HuggingFaceTB/SmolLM3-3B",
        model_kwargs={
            "max_model_len": 4096,
            "max_num_batched_tokens": 4096,
        }
    )
    
    agent = CodeAgent(model=model, tools=[])
    gradio_ui = GradioUI(agent)
    gradio_ui.launch()


if __name__ == "__main__":
    main()
```

## Expected Behavior
The agent should work normally with vllm 0.10.1+.

## Actual Behavior
The following exception is raised:

```
gradio.exceptions.Error: "Error in interaction: Error in generating model output:\nLLM.generate() got an unexpected keyword argument 'guided_options_request'"
```

## Root Cause
Starting from vllm 0.10.1, `guided_decoding_backend` was removed in [PR #21347](https://github.com/vllm-project/vllm/pull/21347/files). According to the [vllm structured outputs documentation](https://github.com/vllm-project/vllm/blob/main/docs/features/structured_outputs.md), the migration path is to remove `guided_decoding_backend` and replace it with `structured_outputs` within the sampling_params using `StructuredOutputsParams`.

## Proposed Solution
The `VLLMModel.generate()` method needs to be updated to convert the old `guided_options_request` format to the new `structured_outputs` format. Here's a potential fix:

```python
class PatchedVLLMModel(VLLMModel):
    def generate(
        self,
        messages,
        stop_sequences=None,
        response_format=None,
        tools_to_call_from=None,
        **kwargs,
    ) -> ChatMessage:
        # NOTE: This overrides smolagents' VLLMModel.generate to convert
        # the old 'guided_options_request' to the new 'structured_outputs' format.
        from vllm import SamplingParams  # type: ignore
        from vllm.sampling_params import StructuredOutputsParams  # type: ignore

        completion_kwargs = self._prepare_completion_kwargs(
            messages=messages,
            flatten_messages_as_text=(not self._is_vlm),
            stop_sequences=stop_sequences,
            tools_to_call_from=tools_to_call_from,
            **kwargs,
        )

        messages = completion_kwargs.pop("messages")
        prepared_stop_sequences = completion_kwargs.pop("stop", [])
        tools = completion_kwargs.pop("tools", None)
        completion_kwargs.pop("tool_choice", None)

        prompt = self.tokenizer.apply_chat_template(
            messages,
            tools=tools,
            add_generation_prompt=True,
            tokenize=False,
        )

        # Convert old guided_options_request format to new structured_outputs
        structured_outputs_params = None
        if response_format:
            if "json_schema" in response_format:
                # Extract the JSON schema from the response_format
                json_schema = response_format["json_schema"]["schema"]
                structured_outputs_params = StructuredOutputsParams(json=json_schema)
            elif "choice" in response_format:
                # Handle choice-based structured outputs
                structured_outputs_params = StructuredOutputsParams(choice=response_format["choice"])
            elif "regex" in response_format:
                # Handle regex-based structured outputs
                structured_outputs_params = StructuredOutputsParams(regex=response_format["regex"])
            elif "grammar" in response_format:
                # Handle grammar-based structured outputs
                structured_outputs_params = StructuredOutputsParams(grammar=response_format["grammar"])
            elif "structural_tag" in response_format:
                # Handle structural tag-based structured outputs
                structured_outputs_params = StructuredOutputsParams(structural_tag=response_format["structural_tag"])
            else:
                print(f"WARNING: Unsupported response_format type: {response_format}")
                structured_outputs_params = None

        sampling_params = SamplingParams(
            n=kwargs.get("n", 1),
            temperature=kwargs.get("temperature", 0.7),
            max_tokens=kwargs.get("max_tokens", 64),
            stop=prepared_stop_sequences,
            structured_outputs=structured_outputs_params,
        )

        out = self.model.generate(
            prompt,
            sampling_params=sampling_params,
        )

        output_text = out[0].outputs[0].text
        
        return ChatMessage(
            role=MessageRole.ASSISTANT,
            content=output_text,
            raw={"out": output_text, "completion_kwargs": completion_kwargs},
            token_usage=TokenUsage(
                input_tokens=len(out[0].prompt_token_ids),
                output_tokens=len(out[0].outputs[0].token_ids),
            ),
        )
```

## Environment
- Python version: Python 3.12.10
- smolagents version: 1.22.0
- vllm version: 0.11.0
- OS: macOS 15.6.1

## Additional Context
This is a breaking change in vllm that affects backward compatibility. The fix should maintain compatibility with both older and newer versions of vllm if possible.


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

BUG: VLLMModel breaks when using vllm > 0.10.1 #1794

Bug Report: VLLMModel breaks when using vllm > 0.10.1

Description

Steps to Reproduce

Code to Reproduce

Expected Behavior

Actual Behavior

Root Cause

Proposed Solution

Environment

Additional Context

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

BUG: VLLMModel breaks when using vllm > 0.10.1 #1794

Description

Bug Report: VLLMModel breaks when using vllm > 0.10.1

Description

Steps to Reproduce

Code to Reproduce

Expected Behavior

Actual Behavior

Root Cause

Proposed Solution

Environment

Additional Context

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions