Skip to content
This repository was archived by the owner on Sep 10, 2025. It is now read-only.

Commit ecd78a5

Browse files
committed
Default max_seq_length to 128 for ExecuTorch export
With the current default behavior, performance for e.g. stories110Mwithout custom SDPA is bad because the QKV tensors are long (8192 in the last dim). Limiting the max sequence length remedies this. ghstack-source-id: e664a14 Pull Request resolved: #1170
1 parent 7ddfdf8 commit ecd78a5

File tree

1 file changed

+11
-7
lines changed

1 file changed

+11
-7
lines changed

torchchat/export.py

Lines changed: 11 additions & 7 deletions
Original file line numberDiff line numberDiff line change
@@ -364,13 +364,17 @@ def main(args):
364364
except:
365365
tokenizer = None
366366

367-
if (
368-
output_dso_path is not None
369-
and builder_args.max_seq_length is None
370-
and not builder_args.dynamic_shapes
371-
):
372-
print("Setting max_seq_length to 300 for DSO export.")
373-
builder_args.max_seq_length = 300
367+
if builder_args.max_seq_length is None:
368+
if (
369+
output_dso_path is not None
370+
and not builder_args.dynamic_shapes
371+
):
372+
print("Setting max_seq_length to 300 for DSO export.")
373+
builder_args.max_seq_length = 300
374+
elif output_pte_path is not None:
375+
# The value of 128 was chosen to match the ExecuTorch Llama example setup.
376+
print("Setting max_seq_length to 128 for ExecuTorch export.")
377+
builder_args.max_seq_length = 128
374378

375379
model = _initialize_model(
376380
builder_args,

0 commit comments

Comments
 (0)