-
Notifications
You must be signed in to change notification settings - Fork 6
Open
Description
Congratulations on your paper being accepted at NAACL! This is a very meaningful benchmark, and I’d like to follow it. However, while attempting to replicate the zero-shot and fine-tuning results of Gemma-2-9b-it using trl.SFTTrainer, I encountered significantly lower results than those reported in the paper. I’m using the following prompt format for zero-shot replication with vLLM:
problem_prompt = (
"Provide me with the complete, valid problem PDDL file that "
"describes the following planning problem directly without further "
"explanations or texts."
)
domain_prompt = "The domain for the planning problem is:"
formatted_prompts = []
for nl, domain in zip(natural_language_texts, domain_texts):
messages = [
{
"role": "user",
"content": (
f"{problem_prompt} {nl} "
f"{domain_prompt} {domain}"
),
},
]
Could you share the relevant code for zero-shot evaluation and fine-tuning, along with the corresponding start commands?
I’ve attempted to replicate using finetune.py, but was unsuccessful.
Reactions are currently unavailable
Metadata
Metadata
Assignees
Labels
No labels