Skip to content

[Feature request] Add support for external data file (.onnx_data) #105

@felladrin

Description

@felladrin

Not really an bug with Transformers.js, but with the conversion script.

Got an error when trying to convert lmsys/fastchat-t5-3b-v1.0 with text2text-generation-with-past task.
Using task text2text-generation works fine, though.

Am I missing something?

And is there a way to run the model without it being created with -with-past?
Currently when I run const pipe = await pipeline("text2text-generation", "lmsys/fastchat-t5-3b-v1.0"); it triggers

Error: File not found. Could not locate "models/lmsys/fastchat-t5-3b-v1.0/seq2seq-lm-with-past/encoder_model.onnx".

Files inside models/lmsys/fastchat-t5-3b-v1.0/seq2seq-lm-with-past/ are the following:

added_tokens.json
config.json
generation_config.json
model.onnx
model.onnx_data
special_tokens_map.json
spiece.model
tokenizer.json
tokenizer_config.json

How to reproduce

Run:

python -m scripts.convert --model_id lmsys/fastchat-t5-3b-v1.0 --from_hub --quantize --task text2text-generation-with-past

Expect an output like this:

Using framework PyTorch: 2.0.0
Overriding 1 configuration item(s)
        - use_cache -> True
Asked a sequence length of 16, but a sequence length of 1 will be used with use_past == True for `input_ids`.
Asked a sequence length of 16, but a sequence length of 1 will be used with use_past == True for `decoder_input_ids`.
/opt/homebrew/lib/python3.10/site-packages/transformers/modeling_utils.py:828: TracerWarning: Converting a tensor to a Python boolean might cause the trace to be incorrect. We can't record the data flow of Python values, so this value will be treated as a constant in the future. This means that the trace might not generalize to other inputs!
  if causal_mask.shape[1] < attention_mask.shape[1]:
/opt/homebrew/lib/python3.10/site-packages/transformers/models/t5/modeling_t5.py:507: TracerWarning: Converting a tensor to a Python boolean might cause the trace to be incorrect. We can't record the data flow of Python values, so this value will be treated as a constant in the future. This means that the trace might not generalize to other inputs!
  elif past_key_value.shape[2] != key_value_states.shape[1]:
In-place op on output of tensor.shape. See https://pytorch.org/docs/master/onnx.html#avoid-inplace-operations-when-using-tensor-shape-in-tracing-mode
In-place op on output of tensor.shape. See https://pytorch.org/docs/master/onnx.html#avoid-inplace-operations-when-using-tensor-shape-in-tracing-mode
================ Diagnostic Run torch.onnx.export version 2.0.0 ================
verbose: False, log level: Level.ERROR
======================= 0 NONE 0 NOTE 0 WARNING 0 ERROR ========================

Saving external data to one file...
Asked a sequence length of 16, but a sequence length of 1 will be used with use_past == True for `input_ids`.
Asked a sequence length of 16, but a sequence length of 1 will be used with use_past == True for `decoder_input_ids`.
Traceback (most recent call last):
  File "/opt/homebrew/Cellar/[email protected]/3.10.11/Frameworks/Python.framework/Versions/3.10/lib/python3.10/runpy.py", line 196, in _run_module_as_main
    return _run_code(code, main_globals, None,
  File "/opt/homebrew/Cellar/[email protected]/3.10.11/Frameworks/Python.framework/Versions/3.10/lib/python3.10/runpy.py", line 86, in _run_code
    exec(code, run_globals)
  File "/Users/user/Repositories/transformers.js/scripts/convert.py", line 310, in <module>
    main()
  File "/Users/user/Repositories/transformers.js/scripts/convert.py", line 282, in main
    _, onnx_outputs = export_models(
  File "/opt/homebrew/lib/python3.10/site-packages/optimum/exporters/onnx/convert.py", line 609, in export_models
    export(
  File "/opt/homebrew/lib/python3.10/site-packages/optimum/exporters/onnx/convert.py", line 722, in export
    config.fix_dynamic_axes(output, device=device, input_shapes=input_shapes, dtype=dtype)
  File "/opt/homebrew/lib/python3.10/site-packages/optimum/exporters/onnx/base.py", line 285, in fix_dynamic_axes
    outputs = session.run(None, onnx_inputs)
  File "/opt/homebrew/lib/python3.10/site-packages/onnxruntime/capi/onnxruntime_inference_collection.py", line 200, in run
    return self._sess.run(output_names, input_feed, run_options)
onnxruntime.capi.onnxruntime_pybind11_state.InvalidArgument: [ONNXRuntimeError] : 2 : INVALID_ARGUMENT : Invalid Feed Input Name:past_key_values.9.encoder.value

Expected behavior

Was expecting it to work, the same way python -m scripts.convert --model_id lmsys/fastchat-t5-3b-v1.0 --from_hub --quantize --task text2text-generation worked.

Environment

  • Transformers.js version: N/A
  • Browser (if applicable): N/A
  • Operating system (if applicable): MacOS

Metadata

Metadata

Assignees

No one assigned

    Labels

    enhancementNew feature or request

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions