-
Notifications
You must be signed in to change notification settings - Fork 651
fix: Best-effort use of chat completion #1789
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
…ream test assertions
13e2ecb to
06eba73
Compare
|
@RobinPicard Since this PR turned out a bit larger, I’ve put together a detailed change log to help streamline your review. 🤗 1. Best-Effort Use of Chat CompletionFor the four local model backends (
Key changes:
2. Special Case: vLLM Offline
if any(isinstance(item, Chat) for item in model_input):
raise TypeError(
"Batch generation is not available for the `Chat` input type."
)
3. Special Case: LlamaCPP
4. Other Changes
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pull request overview
This PR implements best-effort chat completion support across multiple model adapters (vLLM, Transformers, MLX-LM, and LlamaCpp) by automatically detecting whether a model's tokenizer has a chat template and conditionally formatting string inputs as chat messages.
Key changes:
- Added automatic chat template detection that wraps plain string inputs as user messages when a chat template is available
- Introduced a
chat_modeparameter for LlamaCpp to allow users to explicitly disable chat-style formatting - Implemented
_check_hf_chat_template()helper function to check for HuggingFace chat template availability
Reviewed changes
Copilot reviewed 12 out of 12 changed files in this pull request and generated 5 comments.
Show a summary per file
| File | Description |
|---|---|
outlines/models/tokenizer.py |
Added _check_hf_chat_template() helper function to detect chat template availability |
outlines/models/vllm_offline.py |
Updated VLLMOfflineTypeAdapter to conditionally format string inputs as chat messages based on template availability |
outlines/models/transformers.py |
Modified TransformersTypeAdapter to support chat template detection and conditional formatting |
outlines/models/mlxlm.py |
Updated MLXLMTypeAdapter with chat template support and conditional string input formatting |
outlines/models/llamacpp.py |
Added chat_mode parameter to LlamaCpp model to allow explicit control over chat-style input formatting |
docs/features/models/llamacpp.md |
Updated documentation to describe the new chat_mode parameter and its usage |
tests/models/test_tokenizer.py |
Added tests for the new chat template detection function |
tests/models/test_vllm_offline_type_adapter.py |
Added tests for string input formatting with and without chat templates |
tests/models/test_transformers_type_adapter.py |
Updated tests to cover chat template conditional behavior |
tests/models/test_mlxlm_type_adapter.py |
Added tests for chat template support using mocks |
tests/models/test_llamacpp_type_adapter.py |
Added tests for chat template conditional formatting |
tests/models/test_llamacpp.py |
Added test fixture and tests for non-chat mode, updated streaming tests to handle empty tokens |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
…ne.py so that no runtime errors occur
|
Thanks a lot for the great description! I'll review it in the coming days |
No worries at all, Robin — no rush! Really happy to be working with you. 🥳 |
Fixes #1784