Chat Bricks supports directly using HuggingFace model repository IDs as templates. When you pass a HuggingFace repo ID (e.g., "Qwen/Qwen2.5-3B-Instruct") to the Chat class, it automatically creates an HFTemplate that uses the tokenizer's built-in chat template. This allows you to work with any model that has a chat template defined in its tokenizer, without needing to manually configure the template format.
!!! warning Current implementation only works with chat templates that directly append elements when getting more turns of messages. It may not work correctly with templates that modify previous prompt content (e.g., Qwen3's template which deletes previous thinking content).
The simplest way to use a HuggingFace template is to pass the repo ID directly to Chat:
from chat_bricks import Chat
# Use a HuggingFace repo ID directly
chat = Chat(
template="Qwen/Qwen2.5-3B-Instruct",
messages=[
{"role": "user", "content": "Hello, how are you?"},
{"role": "assistant", "content": "I am fine, thank you."}
],
)
# Render the prompt
prompt = chat.prompt()
print(prompt)When you pass a string that's not a registered built-in template name, Chat Bricks automatically:
- Loads the tokenizer from the HuggingFace repository
- Checks if the tokenizer has a
chat_templateattribute - Creates an
HFTemplateinstance that uses the tokenizer's chat template - Handles message formatting, masking, and tokenization automatically
from chat_bricks import Chat
from chat_bricks.templates import HFTemplate
# These are equivalent:
chat1 = Chat(template="Qwen/Qwen2.5-3B-Instruct", messages=messages)
# Or explicitly create HFTemplate
hf_template = HFTemplate("Qwen/Qwen2.5-3B-Instruct")
chat2 = Chat(template=hf_template, messages=messages)HFTemplate supports standard chat message rendering:
from chat_bricks import Chat
messages = [
{"role": "user", "content": "What is the capital of France?"},
{"role": "assistant", "content": "The capital of France is Paris."},
{"role": "user", "content": "Tell me more about it."}
]
chat = Chat(template="Qwen/Qwen2.5-3B-Instruct", messages=messages)
prompt = chat.prompt()
print(prompt)System messages are automatically handled according to the model's chat template:
messages = [
{"role": "system", "content": "You are a helpful assistant."},
{"role": "user", "content": "Hello!"}
]
chat = Chat(template="Qwen/Qwen2.5-3B-Instruct", messages=messages)
prompt = chat.prompt()Add generation prompts for inference:
chat = Chat(
template="Qwen/Qwen2.5-3B-Instruct",
messages=[{"role": "user", "content": "Hello!"}]
)
# Add generation prompt for inference
prompt = chat.prompt(add_generation_prompt=True)
print(prompt)HFTemplate supports tool definitions and tool calls:
messages = [
{"role": "user", "content": "What's the weather in Paris?"},
{
"role": "assistant",
"content": "",
"tool_calls": [
{
"type": "function",
"function": {
"name": "get_weather",
"arguments": '{"city": "Paris"}'
}
}
]
},
{"role": "tool", "content": "Sunny, 22°C"}
]
tools = [
{
"type": "function",
"function": {
"name": "get_weather",
"description": "Get weather information for a city",
"parameters": {
"type": "object",
"properties": {
"city": {"type": "string", "description": "City name"}
},
"required": ["city"]
}
}
}
]
chat = Chat(
template="Qwen/Qwen2.5-3B-Instruct",
messages=messages,
tools=tools
)
prompt = chat.prompt(tools=tools)
print(prompt)HFTemplate provides proper tokenization with masking for training:
from transformers import AutoTokenizer
from chat_bricks import Chat
tokenizer = AutoTokenizer.from_pretrained("Qwen/Qwen2.5-3B-Instruct")
messages = [
{"role": "user", "content": "Hello, how are you?"},
{"role": "assistant", "content": "I am fine, thank you."}
]
chat = Chat(template="Qwen/Qwen2.5-3B-Instruct", messages=messages)
# Tokenize with proper masking
inputs = chat.tokenize(
tokenizer=tokenizer,
add_generation_prompt=False # Set to True for inference
)
print("Input IDs:", inputs["input_ids"])
print("Attention Mask:", inputs["attention_mask"])
print("Labels:", inputs["labels"]) # -100 for non-assistant tokens
print("Action Mask:", inputs["action_mask"]) # 1 for assistant tokensThe tokenization automatically:
- Masks non-assistant tokens (sets labels to -100)
- Creates action masks (1 for assistant tokens, 0 for others)
- Handles BOS tokens if needed
- Maintains proper alignment with the chat template
You can visualize which parts of the prompt are masked:
chat = Chat(
template="Qwen/Qwen2.5-3B-Instruct",
messages=[
{"role": "user", "content": "Hello!"},
{"role": "assistant", "content": "Hi there!"}
]
)
# Get prompt with color-coded masking
prompt_with_mask = chat.prompt_with_mask()
print(prompt_with_mask) # Red = masked, Green = not maskedYou can also create an HFTemplate directly:
from chat_bricks.templates import HFTemplate
from chat_bricks import Chat
# Create HFTemplate directly
hf_template = HFTemplate("Qwen/Qwen2.5-3B-Instruct")
# Use it with Chat
chat = Chat(template=hf_template, messages=messages)
prompt = chat.prompt()The HFTemplate stores the loaded tokenizer, which you can access:
from chat_bricks.templates import HFTemplate
hf_template = HFTemplate("Qwen/Qwen2.5-3B-Instruct")
tokenizer = hf_template.tokenizer
# Use the tokenizer directly
print(tokenizer.chat_template)You can retrieve the original Jinja chat template:
from chat_bricks.templates import HFTemplate
hf_template = HFTemplate("Qwen/Qwen2.5-3B-Instruct")
jinja_template = hf_template.jinja_template()
print(jinja_template)HFTemplate works best with chat templates that directly append elements when getting more turns of messages. It may not work correctly with templates that modify previous prompt content (e.g., Qwen3's template which deletes previous thinking content).
For vision-language models, HFTemplate automatically detects if the model is a vision model and adjusts message formatting accordingly:
# For vision models, messages are kept in vision format
chat = Chat(
template="Qwen/Qwen2.5-VL-3B-Instruct",
messages=[
{
"role": "user",
"content": [
{"type": "text", "text": "What's in this image?"},
{"type": "image", "image": "path/to/image.jpg"}
]
}
]
)If a tokenizer doesn't have a chat template, an error will be raised:
try:
hf_template = HFTemplate("some-model-without-chat-template")
except Exception as e:
print(f"Error: {e}")
# Error: Tokenizer from some-model-without-chat-template does not have a chat_template.| Feature | Built-in Templates | HFTemplate |
|---|---|---|
| Setup | Pre-configured | Automatic from HF repo |
| Customization | Full control | Limited to tokenizer's template |
| Masking | Custom logic | Automatic via iterative rendering |
| Tools | Full support | Depends on tokenizer support |
| Vision | Built-in support | Auto-detected for vision models |
| Performance | Optimized | May be slower (loads tokenizer) |
- Use HFTemplate for: Quick prototyping, working with new models, ensuring compatibility with HuggingFace's chat templates
- Use Built-in Templates for: Production systems, custom requirements, better performance, fine-grained control
- Cache Tokenizers: If using the same model multiple times, consider caching the HFTemplate instance
- Verify Template: Always test that the generated prompts match your expectations
- Check Masking: Use
prompt_with_mask()to verify masking behavior for training
If masking doesn't work as expected:
- Check if the template modifies previous content (not supported)
- Verify the template format with
prompt_with_mask() - Consider using a built-in template if available