-
Notifications
You must be signed in to change notification settings - Fork 407
Description
Hi, what is the recommended best practice for evaluating a base model with inspect on, for example, gsm8k?
For example, I'd like to evaluate Qwen/Qwen3-1.7B-Base. It appears that the execution will eventually reach
if self.tokenizer.chat_template is not None:
chat = self.tokenizer.apply_chat_template(
hf_messages,
add_generation_prompt=True,
tokenize=False,
tools=tools_list if len(tools_list) > 0 else None,
enable_thinking=self.enable_thinking, # not all models use this, check if it is supported
)
else:
chat = ""
for message in hf_messages:
chat += f"{message.role}: {message.content}\n"
# return
return cast(str, chat)at
| if self.tokenizer.chat_template is not None: |
hf provider.
However, since the tokenizer of the Qwen/Qwen3-1.7B-Base base model does have a chat template, it will format the text nonetheless using the chat template (which results in gibberish).
Similarly, for vllm, it seems the text will always get formatted into chat messages if I'm reading this execution correctly:
| async def generate( |
Is there a recommended workaround for generating an input text without a chat template for base models with these two providers? I suppose you could check if "base" is in the name but this feels rather hacky (especially for different model families having different conventions for whether excluding "base" from the name is the base model or instruct, e.g., Qwen2.5 vs Qwen3). Thanks for the help!