Skip to content

Commit 1e4fe5b

Browse files
Sub 0.1s cli calls (#68)
```bash (optimum-onnx) (base) ilyas@hf-dgx-01:~/optimum-onnx$ time optimum-cli export onnx -h usage: optimum-cli export onnx [-h] -m MODEL [--task TASK] [--opset OPSET] [--device DEVICE] [--dtype {fp32,fp16,bf16}] [--optimize {O1,O2,O3,O4}] [--monolith] [--no-post-process] [--variant VARIANT] [--framework {pt}] [--atol ATOL] [--cache_dir CACHE_DIR] [--trust-remote-code] [--pad_token_id PAD_TOKEN_ID] [--library-name {transformers,diffusers,timm,sentence_transformers}] [--model-kwargs MODEL_KWARGS] [--no-dynamic-axes] [--no-constant-folding] [--slim] [--batch_size BATCH_SIZE] [--sequence_length SEQUENCE_LENGTH] [--num_choices NUM_CHOICES] [--width WIDTH] [--height HEIGHT] [--num_channels NUM_CHANNELS] [--feature_size FEATURE_SIZE] [--nb_max_frames NB_MAX_FRAMES] [--audio_sequence_length AUDIO_SEQUENCE_LENGTH] [--point_batch_size POINT_BATCH_SIZE] [--nb_points_per_image NB_POINTS_PER_IMAGE] [--visual_seq_length VISUAL_SEQ_LENGTH] output options: -h, --help show this help message and exit Required arguments: -m MODEL, --model MODEL Model ID on huggingface.co or path on disk to load model from. output Path indicating the directory where to store the generated ONNX model. Optional arguments: --task TASK The task to export the model for. If not specified, the task will be auto-inferred from the model's metadata or files. For decoder models, use `xxx-with-past` to export the model using past key values in the decoder.Available tasks depend on the model, but are among the following list: ['audio-classification', 'audio-frame-classification', 'audio-xvector', 'automatic-speech-recognition', 'depth-estimation', 'document-question-answering', 'feature-extraction', 'fill-mask', 'image-classification', 'image-segmentation', 'image-text-to-text', 'image-to-image', 'image-to-text', 'inpainting', 'keypoint-detection', 'mask-generation', 'masked-im', 'multiple-choice', 'object-detection', 'question-answering', 'reinforcement-learning', 'semantic-segmentation', 'sentence-similarity', 'text-classification', 'text-generation', 'text-to-audio', 'text-to-image', 'text2text-generation', 'time-series-forecasting', 'token-classification', 'visual- question-answering', 'zero-shot-image-classification', 'zero-shot-object-detection']. --opset OPSET If specified, ONNX opset version to export the model with. Otherwise, the default opset for the given model architecture will be used. --device DEVICE The device to use to do the export. Defaults to "cpu". --dtype {fp32,fp16,bf16} The floating point precision to use for the export. Supported options: fp32 (float32), fp16 (float16), bf16 (bfloat16). --optimize {O1,O2,O3,O4} Allows to run ONNX Runtime optimizations directly during the export. Some of these optimizations are specific to ONNX Runtime, and the resulting ONNX will not be usable with other runtime as OpenVINO or TensorRT. Possible options: - O1: Basic general optimizations - O2: Basic and extended general optimizations, transformers-specific fusions - O3: Same as O2 with GELU approximation - O4: Same as O3 with mixed precision (fp16, GPU-only, requires `--device cuda`) --monolith Forces to export the model as a single ONNX file. By default, the ONNX exporter may break the model in several ONNX files, for example for encoder-decoder models where the encoder should be run only once while the decoder is looped over. --no-post-process Allows to disable any post-processing done by default on the exported ONNX models. For example, the merging of decoder and decoder-with-past models into a single ONNX model file to reduce memory usage. --variant VARIANT Select a variant of the model to export. --framework {pt} The framework to use for the ONNX export. If not provided, will attempt to use the local checkpoint's original framework or what is available in the environment. --atol ATOL If specified, the absolute difference tolerance when validating the model. Otherwise, the default atol for the model will be used. --cache_dir CACHE_DIR Path indicating where to store cache. --trust-remote-code Allows to use custom code for the modeling hosted in the model repository. This option should only be set for repositories you trust and in which you have read the code, as it will execute on your local machine arbitrary code present in the model repository. --pad_token_id PAD_TOKEN_ID This is needed by some models, for some tasks. If not provided, will attempt to use the tokenizer to guess it. --library-name {transformers,diffusers,timm,sentence_transformers} The library on the model. If not provided, will attempt to infer the local checkpoint's library --model-kwargs MODEL_KWARGS Any kwargs passed to the model forward, or used to customize the export for a given model. --no-dynamic-axes Disable dynamic axes during ONNX export --no-constant-folding PyTorch-only argument. Disables PyTorch ONNX export constant folding. --slim Enables onnxslim optimization. Input shapes (if necessary, this allows to override the shapes of the input given to the ONNX exporter, that requires an example input).: --batch_size BATCH_SIZE Text tasks only. Batch size to use in the example input given to the ONNX export. --sequence_length SEQUENCE_LENGTH Text tasks only. Sequence length to use in the example input given to the ONNX export. --num_choices NUM_CHOICES Text tasks only. Num choices to use in the example input given to the ONNX export. --width WIDTH Image tasks only. Width to use in the example input given to the ONNX export. --height HEIGHT Image tasks only. Height to use in the example input given to the ONNX export. --num_channels NUM_CHANNELS Image tasks only. Number of channels to use in the example input given to the ONNX export. --feature_size FEATURE_SIZE Audio tasks only. Feature size to use in the example input given to the ONNX export. --nb_max_frames NB_MAX_FRAMES Audio tasks only. Maximum number of frames to use in the example input given to the ONNX export. --audio_sequence_length AUDIO_SEQUENCE_LENGTH Audio tasks only. Audio sequence length to use in the example input given to the ONNX export. --point_batch_size POINT_BATCH_SIZE For Segment Anything. It corresponds to how many segmentation masks we want the model to predict per input point. --nb_points_per_image NB_POINTS_PER_IMAGE For Segment Anything. It corresponds to the number of points per segmentation masks. --visual_seq_length VISUAL_SEQ_LENGTH Visual sequence length real 0m0,086s user 0m0,082s sys 0m0,004s ```
1 parent e543e2a commit 1e4fe5b

30 files changed

+74
-131
lines changed

.github/workflows/build_documentation.yml

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -25,7 +25,7 @@ jobs:
2525
- uses: actions/checkout@v5
2626
- uses: actions/setup-node@v4
2727
with:
28-
node-version: "18"
28+
node-version: "20"
2929
cache-dependency-path: "kit/package-lock.json"
3030

3131
- name: Set up Python

.github/workflows/build_pr_documentation.yml

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -26,7 +26,7 @@ jobs:
2626
- uses: actions/checkout@v5
2727
- uses: actions/setup-node@v4
2828
with:
29-
node-version: "18"
29+
node-version: "20"
3030
cache-dependency-path: "kit/package-lock.json"
3131

3232
- name: Set up Python

README.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -38,7 +38,7 @@ For more information on the ONNX export, please check the [documentation](https:
3838

3939
#### Inference
4040

41-
Once the model is exported to the ONNX format, we provide Python classes enabling you to run the exported ONNX model in a seemless manner using [ONNX Runtime](https://onnxruntime.ai/) in the backend:
41+
Once the model is exported to the ONNX format, we provide Python classes enabling you to run the exported ONNX model in a seamless manner using [ONNX Runtime](https://onnxruntime.ai/) in the backend:
4242

4343

4444
```diff

docs/source/onnx/usage_guides/export_a_model.mdx

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -317,7 +317,7 @@ For tasks that require only a single ONNX file (e.g. encoder-only), an exported
317317

318318
### Customize the export of Transformers models with custom modeling
319319

320-
Optimum supports the export of Transformers models with custom modeling that use [`trust_remote_code=True`](https://huggingface.co/docs/transformers/en/model_doc/auto#transformers.AutoModel.from_pretrained.trust_remote_code), not officially supported in the Transormers library but usable with its functionality as [pipelines](https://huggingface.co/docs/transformers/main_classes/pipelines) and [generation](https://huggingface.co/docs/transformers/main_classes/text_generation#transformers.GenerationMixin.generate).
320+
Optimum supports the export of Transformers models with custom modeling that use [`trust_remote_code=True`](https://huggingface.co/docs/transformers/en/model_doc/auto#transformers.AutoModel.from_pretrained.trust_remote_code), not officially supported in the Transformers library but usable with its functionality as [pipelines](https://huggingface.co/docs/transformers/main_classes/pipelines) and [generation](https://huggingface.co/docs/transformers/main_classes/text_generation#transformers.GenerationMixin.generate).
321321

322322
Examples of such models are [THUDM/chatglm2-6b](https://huggingface.co/THUDM/chatglm2-6b) and [mosaicml/mpt-30b](https://huggingface.co/mosaicml/mpt-30b).
323323

docs/source/onnxruntime/usage_guides/gpu.mdx

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -126,7 +126,7 @@ Due to current limitations in ONNX Runtime, it is not possible to use quantized
126126

127127
[IOBinding](https://onnxruntime.ai/docs/api/python/api_summary.html#iobinding) is an efficient way to avoid expensive data copying when using GPUs. By default, ONNX Runtime will copy the input from the CPU (even if the tensors are already copied to the targeted device), and assume that outputs also need to be copied back to the CPU from GPUs after the run. These data copying overheads between the host and devices are expensive, and __can lead to worse inference latency than vanilla PyTorch__ especially for the decoding process.
128128

129-
To avoid the slowdown, 🤗 Optimum adopts the IOBinding to copy inputs onto GPUs and pre-allocate memory for outputs prior the inference. When instanciating the `ORTModel`, set the value of the argument `use_io_binding` to choose whether to turn on the IOBinding during the inference. `use_io_binding` is set to `True` by default, if you choose CUDA as execution provider.
129+
To avoid the slowdown, 🤗 Optimum adopts the IOBinding to copy inputs onto GPUs and pre-allocate memory for outputs prior the inference. When instantiating the `ORTModel`, set the value of the argument `use_io_binding` to choose whether to turn on the IOBinding during the inference. `use_io_binding` is set to `True` by default, if you choose CUDA as execution provider.
130130

131131
And if you want to turn off IOBinding:
132132
```python

optimum/commands/export/onnx.py

Lines changed: 11 additions & 10 deletions
Original file line numberDiff line numberDiff line change
@@ -22,8 +22,8 @@
2222
from huggingface_hub.constants import HUGGINGFACE_HUB_CACHE
2323

2424
from optimum.commands.base import BaseOptimumCLICommand, CommandInfo
25-
from optimum.exporters.tasks import TasksManager
26-
from optimum.utils import DEFAULT_DUMMY_SHAPES
25+
from optimum.utils.constant import ALL_TASKS
26+
from optimum.utils.input_generators import DEFAULT_DUMMY_SHAPES
2727

2828

2929
if TYPE_CHECKING:
@@ -38,14 +38,19 @@ def parse_args_onnx(parser):
3838
required_group.add_argument(
3939
"output", type=Path, help="Path indicating the directory where to store the generated ONNX model."
4040
)
41+
# NOTE: why using a positional argument here ? should we deprecate in favor of -o/--output keyword argument ?
42+
# required_group.add_argument(
43+
# "-o", "--output", type=Path, help="Path indicating the directory where to store the generated ONNX model."
44+
# )
4145

4246
optional_group = parser.add_argument_group("Optional arguments")
4347
optional_group.add_argument(
4448
"--task",
4549
default="auto",
4650
help=(
47-
"The task to export the model for. If not specified, the task will be auto-inferred based on the model. Available tasks depend on the model, but are among:"
48-
f" {TasksManager.get_all_tasks()}. For decoder models, use `xxx-with-past` to export the model using past key values in the decoder."
51+
"The task to export the model for. If not specified, the task will be auto-inferred from the model's metadata or files. "
52+
"For tasks that generate text, add the `xxx-with-past` suffix to export the model using past key values caching. "
53+
f"Available tasks depend on the model, but are among the following list: {ALL_TASKS}."
4954
),
5055
)
5156
optional_group.add_argument(
@@ -107,12 +112,8 @@ def parse_args_onnx(parser):
107112
"--framework",
108113
type=str,
109114
choices=["pt"],
110-
default=None,
111-
help=(
112-
"The framework to use for the ONNX export."
113-
" If not provided, will attempt to use the local checkpoint's original framework"
114-
" or what is available in the environment."
115-
),
115+
default="pt",
116+
help="The framework to use for the export. Defaults to 'pt' for PyTorch.",
116117
)
117118
optional_group.add_argument(
118119
"--atol",

optimum/commands/onnxruntime/__init__.py

Lines changed: 0 additions & 10 deletions
This file was deleted.

optimum/commands/onnxruntime/base.py

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -33,7 +33,7 @@ class ONNXRuntimeCommand(BaseOptimumCLICommand):
3333
),
3434
CommandInfo(
3535
name="quantize",
36-
help="Dynammic quantization for ONNX models.",
36+
help="Dynamic quantization for ONNX models.",
3737
subcommand_class=ONNXRuntimeQuantizeCommand,
3838
),
3939
)

optimum/commands/register/register_export.py

Lines changed: 3 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -16,4 +16,6 @@
1616
from optimum.commands.export.onnx import ONNXExportCommand
1717

1818

19-
REGISTER_COMMANDS = [(ONNXExportCommand, ExportCommand)]
19+
REGISTER_COMMANDS = [
20+
(ONNXExportCommand, ExportCommand),
21+
]

optimum/commands/register/register_onnxruntime.py

Lines changed: 4 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -13,7 +13,9 @@
1313
# limitations under the License.
1414

1515

16-
from optimum.commands.onnxruntime import ONNXRuntimeCommand
16+
from optimum.commands.onnxruntime.base import ONNXRuntimeCommand
1717

1818

19-
REGISTER_COMMANDS = [ONNXRuntimeCommand]
19+
REGISTER_COMMANDS = [
20+
ONNXRuntimeCommand,
21+
]

0 commit comments

Comments
 (0)