You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: examples/models/llama/main.cpp
+10Lines changed: 10 additions & 0 deletions
Original file line number
Diff line number
Diff line change
@@ -42,6 +42,16 @@ DEFINE_int32(
42
42
-1,
43
43
"Number of CPU threads for inference. Defaults to -1, which implies we'll use a heuristic to derive the # of performant cores for a specific device.");
44
44
45
+
DEFINE_int32(
46
+
num_bos,
47
+
0,
48
+
"Number of BOS tokens to prepend to the prompt. Defaults to 0. If > 0, the prompt will be prepended with BOS tokens. This is useful for models that expect one or more BOS token at the start.");
49
+
50
+
DEFINE_int32(
51
+
num_eos,
52
+
0,
53
+
"Number of EOS tokens to append to the prompt. Defaults to 0. If > 0, the prompt will be appended with EOS tokens. This is useful for models that expect one or more EOS token at the end.");
54
+
45
55
DEFINE_bool(warmup, false, "Whether to run a warmup run.");
0 commit comments