Skip to content

Commit 3e0eb0f

Browse files
helunwencserLunwen He
andauthored
Do not print eos (#4654)
* allow models to use customized token ids during export (#4649) Summary: LLama3.1's [bos and eos](https://huggingface.co/meta-llama/Meta-Llama-3.1-8B-Instruct/blob/main/tokenizer_config.json) are different from what is hardcoded in the code. This PR updates the export flow to allow read customized token ids instead of hardcoded ones. It also deletes a few metadata entries that are not used by the runner. Pull Request resolved: #4649 Differential Revision: D61044259 Pulled By: helunwencser * Do not print eos Summary: We don't want to print eos in the response because some eos tokens could be `<|end_of_text|>`. Differential Revision: D61048254 --------- Co-authored-by: Lunwen He <[email protected]>
1 parent 728a29d commit 3e0eb0f

File tree

1 file changed

+7
-7
lines changed

1 file changed

+7
-7
lines changed

examples/models/llama2/runner/runner.cpp

Lines changed: 7 additions & 7 deletions
Original file line numberDiff line numberDiff line change
@@ -228,19 +228,19 @@ Error Runner::generate(
228228
tokens_managed.resize({1, static_cast<int>(token_data.size())});
229229
}
230230

231-
// print the token as string, decode it with the Tokenizer object
232-
wrapped_callback(ET_UNWRAP(tokenizer_->decode(prev_token, cur_token)));
233-
234-
if (shouldStop_) {
235-
break;
236-
}
237-
238231
// data-dependent terminating condition: we have n_eos_ number of EOS
239232
if (pos >= num_prompt_tokens && cur_token == eos_id_) {
240233
printf("\n");
241234
ET_LOG(Info, "\nReached to the end of generation");
242235
break;
243236
}
237+
238+
// print the token as string, decode it with the Tokenizer object
239+
wrapped_callback(ET_UNWRAP(tokenizer_->decode(prev_token, cur_token)));
240+
241+
if (shouldStop_) {
242+
break;
243+
}
244244
}
245245
stats_.inference_end_ms = util::time_in_ms();
246246
printf("\n");

0 commit comments

Comments
 (0)