fix: Best-effort use of chat completion #1789

Ki-Seki · 2025-12-02T09:03:08Z

… chat handling

…ream test assertions

Ki-Seki · 2025-12-04T07:55:26Z

@RobinPicard Since this PR turned out a bit larger, I’ve put together a detailed change log to help streamline your review. 🤗

1. Best-Effort Use of Chat Completion

For the four local model backends (llamacpp, mlxlm, transformers, and vllm_offline), this PR implements the following logic proposed in issue #1784 :

If a local model provides a chat template, we assume it expects us to use it — so we do. If not, we fall back to plain completion mode. If a backend does not support chat mode at all, we also fall back to plain completion mode.

Key changes:

Added a helper function _check_hf_chat_template in outlines/models/tokenizer.py to centralize Hugging Face chat template checks, since the logic is shared.
Introduced a new property has_chat_template in the TypeAdapter of the local models, typically determined by _check_hf_chat_template.
When has_chat_template is True, string-based model_input is converted to chat format whenever possible, usually via format_chat_input.
Updated the common/stream/batch generate functions across all four local models accordingly.

2. Special Case: vLLM Offline

Since vLLM internally uses TokenizerBase (a non-Hugging Face tokenizer class), additional checks were added for compatibility.
Previously, generate_batch was assumed not to support chat mode, and the following code was present:

if any(isinstance(item, Chat) for item in model_input):
    raise TypeError(
        "Batch generation is not available for the `Chat` input type."
    )

After re-verification, chat mode is supported (see vLLM documentation), so this restriction has been removed.

3. Special Case: LlamaCPP

LlamaCPP does not automatically check for chat templates; instead, it relies on user-provided parameter.
This is because llama-cpp-python provides a default fallback chat template even when the user has not explicitly configured one (source).
To align with this behavior, users are now expected to explicitly pass the chat_mode parameter (default: True).

Note:
This is the only interface-level change in the PR. Compared to the previous implementation, where string inputs defaulted to plain text completion, they now default to chat completion whenever possible.
I believe this is reasonable: since llama-cpp-python itself encourages a fallback chat template, this change should feel natural to users familiar with LlamaCPP.
For strict backward compatibility, however, we could consider setting the default to chat_mode=False.

4. Other Changes

Updated documentation to reflect the LlamaCPP interface change.
Added comprehensive unit tests for all new functionality, all of which pass and maintain 100% coverage.
Ensured consistent code style throughout.

Copilot

Pull request overview

This PR implements best-effort chat completion support across multiple model adapters (vLLM, Transformers, MLX-LM, and LlamaCpp) by automatically detecting whether a model's tokenizer has a chat template and conditionally formatting string inputs as chat messages.

Key changes:

Added automatic chat template detection that wraps plain string inputs as user messages when a chat template is available
Introduced a chat_mode parameter for LlamaCpp to allow users to explicitly disable chat-style formatting
Implemented _check_hf_chat_template() helper function to check for HuggingFace chat template availability

Reviewed changes

Copilot reviewed 12 out of 12 changed files in this pull request and generated 5 comments.

Show a summary per file

File	Description
`outlines/models/tokenizer.py`	Added `_check_hf_chat_template()` helper function to detect chat template availability
`outlines/models/vllm_offline.py`	Updated VLLMOfflineTypeAdapter to conditionally format string inputs as chat messages based on template availability
`outlines/models/transformers.py`	Modified TransformersTypeAdapter to support chat template detection and conditional formatting
`outlines/models/mlxlm.py`	Updated MLXLMTypeAdapter with chat template support and conditional string input formatting
`outlines/models/llamacpp.py`	Added `chat_mode` parameter to LlamaCpp model to allow explicit control over chat-style input formatting
`docs/features/models/llamacpp.md`	Updated documentation to describe the new `chat_mode` parameter and its usage
`tests/models/test_tokenizer.py`	Added tests for the new chat template detection function
`tests/models/test_vllm_offline_type_adapter.py`	Added tests for string input formatting with and without chat templates
`tests/models/test_transformers_type_adapter.py`	Updated tests to cover chat template conditional behavior
`tests/models/test_mlxlm_type_adapter.py`	Added tests for chat template support using mocks
`tests/models/test_llamacpp_type_adapter.py`	Added tests for chat template conditional formatting
`tests/models/test_llamacpp.py`	Added test fixture and tests for non-chat mode, updated streaming tests to handle empty tokens

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

outlines/models/vllm_offline.py

tests/models/test_llamacpp.py

outlines/models/vllm_offline.py

outlines/models/llamacpp.py

…ne.py so that no runtime errors occur

RobinPicard · 2025-12-04T09:33:31Z

Thanks a lot for the great description! I'll review it in the coming days

Ki-Seki · 2025-12-04T09:36:12Z

Thanks a lot for the great description! I'll review it in the coming days

No worries at all, Robin — no rush! Really happy to be working with you. 🥳

Ki-Seki mentioned this pull request Dec 3, 2025

fix: correct chat format usage #1790

Merged

Ki-Seki added 10 commits December 3, 2025 14:24

feat: add chat template checking func

e3b0aa0

fix: Best-effort use of chat completion for VLLMOffline

0b44946

fix: Best-effort use of chat completion for Transformers

6e7a3c8

fix: Best-effort use of chat completion for mlxlm

94d085a

fix: Best-effort use of chat completion for LlamaCpp

57a4e97

test: Add some initial tests

1dc9efd

fix: update stream tests to handle potential empty chunks

01f92ee

fix: add chat_mode parameter to LlamaCpp and from_llamacpp for better…

197815d

… chat handling

fix: remove TODO comments for adding tests in LlamaCpp class methods

db8a614

fix: improve error handling for unexpected prompt types and update st…

06eba73

…ream test assertions

RobinPicard force-pushed the fix/chat-template branch from 13e2ecb to 06eba73 Compare December 3, 2025 13:24

Ki-Seki added 3 commits December 4, 2025 14:39

test: simplify some tests

9d9a6bf

fix: add tests for _check_hf_chat_template function in tokenizer module

59986e1

fix: remove unused AnyTokenizer import from vllm_offline.py

f94caf2

Ki-Seki marked this pull request as ready for review December 4, 2025 07:55

Copilot AI review requested due to automatic review settings December 4, 2025 07:55

Copilot started reviewing on behalf of Ki-Seki December 4, 2025 07:55 View session

Copilot finished reviewing on behalf of Ki-Seki December 4, 2025 08:00

Copilot AI reviewed Dec 4, 2025

View reviewed changes

Ki-Seki added 2 commits December 4, 2025 16:05

fix: correct typo

bff3f6a

fix: move tokenizer imports to the appropriate location in vllm_offli…

bce315d

…ne.py so that no runtime errors occur

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

fix: Best-effort use of chat completion #1789

fix: Best-effort use of chat completion #1789

Uh oh!

Ki-Seki commented Dec 2, 2025

Uh oh!

Ki-Seki commented Dec 4, 2025 •

edited

Loading

Uh oh!

Copilot AI left a comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

RobinPicard commented Dec 4, 2025

Uh oh!

Ki-Seki commented Dec 4, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

fix: Best-effort use of chat completion #1789

Are you sure you want to change the base?

fix: Best-effort use of chat completion #1789

Uh oh!

Conversation

Ki-Seki commented Dec 2, 2025

Uh oh!

Ki-Seki commented Dec 4, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

1. Best-Effort Use of Chat Completion

Key changes:

2. Special Case: vLLM Offline

3. Special Case: LlamaCPP

4. Other Changes

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Reviewed changes

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

RobinPicard commented Dec 4, 2025

Uh oh!

Ki-Seki commented Dec 4, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Ki-Seki commented Dec 4, 2025 •

edited

Loading