Skip to content

Commit ca37f23

Browse files
Refactor IO config to yaml in assets (#2311)
This pull request introduces a major refactor of the HuggingFace ONNX I/O configuration system in Olive. The changes move away from hardcoded Python class registries and instead adopt a YAML-driven approach for specifying input/output configurations for both tasks and diffusers pipelines. This enables easier extensibility and clearer, data-driven configuration management. Additionally, the update adds default dummy input shapes and rich configuration templates for tasks and diffusers components. The most important changes are: **YAML-based IO Configuration System:** - Added new YAML files (`defaults.yaml`, `tasks.yaml`, `diffusers.yaml`) in `olive/assets/io_configs/` that define default dummy input shapes, task templates, and diffusers component/pipeline specifications for ONNX export. These files replace the previous Python-based registries and provide a more flexible and maintainable configuration system. [[1]](diffhunk://#diff-6b06016102bbc575e9fa17da2c2aa9e54172dea3fa610de6604c8b388d54d111R1-R32) [[2]](diffhunk://#diff-2cf88124ed9192111582c2746192e39b1e675d9224addde89f560e3374611e7dR1-R231) [[3]](diffhunk://#diff-28e9d219224be5365b9a855dd119d92f7571dbd782775d592418259f9e3361a7R1-R206) **Refactor of IO Config Python Interface:** - Replaced the contents of `olive/common/hf/io_config/__init__.py` to remove all hardcoded class registries and mapping logic, and instead import new YAML-driven utility functions (`generate_dummy_inputs`, `get_io_config`, `get_diffusers_io_config`, etc.). This greatly simplifies the interface and shifts responsibility to the new YAML configuration system. **Legal and Organizational Improvements:** - Added copyright and license headers to `olive/assets/__init__.py` and `olive/assets/io_configs/__init__.py`.## Describe your changes ## Checklist before requesting a review - [ ] Add unit tests for this change. - [ ] Make sure all tests can pass. - [ ] Update documents if necessary. - [ ] Lint and apply fixes to your code by running `lintrunner -a` - [ ] Is this a user-facing change? If yes, give a description of this change to be included in the release notes. ## (Optional) Issue link --------- Co-authored-by: Copilot Autofix powered by AI <62310815+github-advanced-security[bot]@users.noreply.github.com>
1 parent 264c92b commit ca37f23

36 files changed

+2236
-5671
lines changed
Lines changed: 308 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,308 @@
1+
# How to Add a New Task or Diffusers Component for ONNX Export
2+
3+
This guide explains how to add IO configurations for a new HuggingFace task or diffusers component to enable ONNX model export.
4+
5+
Olive uses YAML-based IO configurations to define input/output specifications for ONNX export. These configurations specify tensor shapes, data types, and dynamic axes for each model input and output.
6+
7+
There are two types of configurations:
8+
- **Task configs** (`tasks.yaml`): For HuggingFace transformers tasks like text-generation, text-classification, etc.
9+
- **Diffusers component configs** (`diffusers.yaml`): For Stable Diffusion and similar diffusion model components like UNet, VAE, text encoders, etc.
10+
11+
## File Locations
12+
13+
IO config files are located in `olive/assets/io_configs/`:
14+
15+
```
16+
olive/assets/io_configs/
17+
├── tasks.yaml # Task-based configurations
18+
├── diffusers.yaml # Diffusers component configurations
19+
└── defaults.yaml # Default dimension values and aliases
20+
```
21+
22+
## Task-based IO Configs (`tasks.yaml`)
23+
24+
### Format
25+
26+
Each task defines its input/output specifications:
27+
28+
```yaml
29+
task-name:
30+
inputs:
31+
input_name:
32+
shape: [dim1, dim2, ...] # Shape template for dummy input generation
33+
axes: {0: axis_name, 1: ...} # Dynamic axes for ONNX export
34+
dtype: int64 | float # Data type (default: int64)
35+
max_value: vocab_size # Optional: max value for random input
36+
optional: true # Optional: skip if not in model.forward()
37+
outputs:
38+
output_name:
39+
axes: {0: axis_name, ...} # Dynamic axes for ONNX export
40+
with_past: # Optional: overrides for KV cache scenarios
41+
input_name:
42+
shape: [...]
43+
axes: {...}
44+
```
45+
46+
### Field Descriptions
47+
48+
| Field | Description |
49+
|-------|-------------|
50+
| `shape` | List of dimension names or integers. Used to generate dummy inputs for ONNX export. Dimension names are resolved from model config or defaults. |
51+
| `axes` | Dict mapping axis index to axis name. Defines which dimensions are dynamic in the exported ONNX model. |
52+
| `dtype` | Data type: `int64`, `int32`, or `float`. Defaults to `int64` for inputs. |
53+
| `optional` | If `true`, the input is only included if it exists in `model.forward()` signature. |
54+
| `max_value` | Maximum value for random input generation (e.g., `vocab_size` for input_ids). |
55+
| `with_past` | Alternative shapes/axes when using KV cache (`use_past_in_inputs=True`). |
56+
57+
### Example: Adding a New Task
58+
59+
To add support for a new task, add an entry to `tasks.yaml`:
60+
61+
```yaml
62+
# Custom task for a new model type
63+
my-custom-task:
64+
inputs:
65+
input_ids:
66+
shape: [batch_size, sequence_length]
67+
axes: {0: batch_size, 1: sequence_length}
68+
dtype: int64
69+
max_value: vocab_size
70+
attention_mask:
71+
shape: [batch_size, sequence_length]
72+
axes: {0: batch_size, 1: sequence_length}
73+
dtype: int64
74+
custom_input:
75+
shape: [batch_size, custom_dim]
76+
axes: {0: batch_size, 1: custom_dim}
77+
dtype: float
78+
optional: true
79+
outputs:
80+
logits:
81+
axes: {0: batch_size, 1: sequence_length, 2: vocab_size}
82+
custom_output:
83+
axes: {0: batch_size, 1: hidden_size}
84+
```
85+
86+
### Supported Tasks
87+
88+
Currently supported tasks include:
89+
- `text-generation`
90+
- `text-classification`
91+
- `feature-extraction`
92+
- `fill-mask`
93+
- `token-classification`
94+
- `question-answering`
95+
- `multiple-choice`
96+
- `text2text-generation`
97+
- `image-classification`
98+
- `object-detection`
99+
- `semantic-segmentation`
100+
- `audio-classification`
101+
- `automatic-speech-recognition`
102+
- `zero-shot-image-classification`
103+
104+
## Diffusers Component Configs (`diffusers.yaml`)
105+
106+
### Format
107+
108+
Diffusers configurations define components and pipelines:
109+
110+
```yaml
111+
components:
112+
component_name:
113+
inputs:
114+
input_name:
115+
shape: [dim1, dim2, ...]
116+
axes: {0: axis_name, ...}
117+
dtype: int64 | float
118+
outputs:
119+
output_name:
120+
axes: {0: axis_name, ...}
121+
sdxl_inputs: # Optional: additional inputs for SDXL
122+
extra_input:
123+
shape: [...]
124+
axes: {...}
125+
optional_inputs: # Optional: conditional inputs
126+
optional_input:
127+
shape: [...]
128+
axes: {...}
129+
condition: config_attr # Only include if config.config_attr is True
130+
131+
pipelines:
132+
pipeline_name:
133+
- component_name
134+
- component_config:alias_name # Use component_config with alias
135+
```
136+
137+
### Example: Adding a New Diffusers Component
138+
139+
```yaml
140+
components:
141+
my_custom_transformer:
142+
inputs:
143+
hidden_states:
144+
shape: [batch_size, in_channels, height, width]
145+
axes: {0: batch_size, 1: in_channels, 2: height, 3: width}
146+
dtype: float
147+
encoder_hidden_states:
148+
shape: [batch_size, sequence_length, hidden_size]
149+
axes: {0: batch_size, 1: sequence_length, 2: hidden_size}
150+
dtype: float
151+
timestep:
152+
shape: [batch_size]
153+
axes: {0: batch_size}
154+
dtype: float
155+
outputs:
156+
out_sample:
157+
axes: {0: batch_size, 1: in_channels, 2: height, 3: width}
158+
optional_inputs:
159+
guidance:
160+
shape: [batch_size]
161+
axes: {0: batch_size}
162+
dtype: float
163+
condition: guidance_embeds # Only if config.guidance_embeds is True
164+
165+
pipelines:
166+
my_custom_pipeline:
167+
- text_encoder
168+
- my_custom_transformer:transformer
169+
- vae_encoder
170+
- vae_decoder
171+
```
172+
173+
### Supported Diffusers Components
174+
175+
Currently supported components include:
176+
- `text_encoder`, `text_encoder_with_projection`, `t5_encoder`, `gemma2_text_encoder`
177+
- `unet`, `sd3_transformer`, `flux_transformer`, `sana_transformer`
178+
- `vae_encoder`, `vae_decoder`, `dcae_encoder`, `dcae_decoder`
179+
180+
Supported pipelines: `sd`, `sdxl`, `sd3`, `flux`, `sana`
181+
182+
## Default Values (`defaults.yaml`)
183+
184+
The `defaults.yaml` file defines:
185+
1. **Aliases**: Alternative attribute names for the same concept across different models
186+
2. **Default dimensions**: Fallback values when dimensions can't be resolved from model config
187+
188+
### Aliases
189+
190+
Aliases help resolve config attributes that have different names across models:
191+
192+
```yaml
193+
aliases:
194+
num_layers: [num_hidden_layers, n_layer, n_layers]
195+
hidden_size: [dim, d_model, n_embd]
196+
num_attention_heads: [num_heads, n_head, n_heads, encoder_attention_heads]
197+
num_kv_heads: [num_key_value_heads]
198+
height: [sample_size, image_size, vision_config.image_size]
199+
width: [sample_size, image_size, vision_config.image_size]
200+
num_channels: [in_channels, vision_config.num_channels]
201+
```
202+
203+
### Default Dimensions
204+
205+
Default values used when dimensions can't be resolved from model config:
206+
207+
```yaml
208+
batch_size: 2
209+
sequence_length: 16
210+
past_sequence_length: 16
211+
vocab_size: 32000
212+
height: 64
213+
width: 64
214+
num_channels: 3
215+
```
216+
217+
### Adding New Defaults
218+
219+
If your model uses a dimension not already defined, add it to `defaults.yaml`:
220+
221+
```yaml
222+
# Add new dimension for your model
223+
my_custom_dim: 128
224+
225+
# Add aliases if the same concept has different names
226+
aliases:
227+
my_custom_dim: [custom_dim, my_dim]
228+
```
229+
230+
## Dimension Resolution
231+
232+
When generating dummy inputs, dimensions in `shape` are resolved in this order:
233+
234+
1. **Model config with aliases**: Check `config.attr_name` for each alias
235+
2. **Computed dimensions**: Special dimensions like `height_latent = height // 8`
236+
3. **Default values**: Fall back to values in `defaults.yaml`
237+
238+
## Usage in Olive Workflows
239+
240+
Once you've added your IO config, Olive will automatically use it during ONNX conversion.
241+
242+
### Task-based Models
243+
244+
For HuggingFace transformers models, specify the task in `HfModel`:
245+
246+
```yaml
247+
# olive_config.yaml
248+
input_model:
249+
type: HfModel
250+
model_path: my-model
251+
task: my-custom-task # Uses the task config you defined
252+
253+
passes:
254+
conversion:
255+
type: OnnxConversion
256+
```
257+
258+
### Diffusers Models
259+
260+
For diffusion models, use `DiffusersModel`. Olive automatically detects the pipeline type and exports all components using the IO configs defined in `diffusers.yaml`:
261+
262+
```yaml
263+
# olive_config.yaml
264+
input_model:
265+
type: DiffusersModel
266+
model_path: stabilityai/stable-diffusion-xl-base-1.0
267+
268+
passes:
269+
conversion:
270+
type: OnnxConversion
271+
```
272+
273+
Olive will automatically:
274+
1. Detect the pipeline type (e.g., `sdxl`)
275+
2. Identify exportable components (text_encoder, text_encoder_2, unet, vae_encoder, vae_decoder)
276+
3. Use the corresponding IO configs from `diffusers.yaml` for each component
277+
278+
## Testing Your Config
279+
280+
After adding a new IO config, verify it works:
281+
282+
```python
283+
from olive.common.hf.io_config import get_io_config, generate_dummy_inputs
284+
285+
# Test task config
286+
io_config = get_io_config("my-model-path", task="my-custom-task")
287+
print(io_config["input_names"])
288+
print(io_config["output_names"])
289+
print(io_config["dynamic_axes"])
290+
291+
# Generate dummy inputs
292+
dummy_inputs = generate_dummy_inputs("my-model-path", task="my-custom-task")
293+
for name, tensor in dummy_inputs.items():
294+
print(f"{name}: {tensor.shape}")
295+
```
296+
297+
For diffusers:
298+
299+
```python
300+
from olive.common.hf.io_config import get_diffusers_io_config, generate_diffusers_dummy_inputs
301+
302+
# Test diffusers config
303+
io_config = get_diffusers_io_config("my_custom_transformer", config)
304+
print(io_config["input_names"])
305+
306+
# Generate dummy inputs
307+
dummy_inputs = generate_diffusers_dummy_inputs("my_custom_transformer", config)
308+
```

docs/source/how-to/index.md

Lines changed: 2 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -32,6 +32,7 @@ The Olive CLI provides a set of primitives such as `quantize`, `finetune`, `onnx
3232

3333
- [Olive design overview](extending/design)
3434
- [How to add a new Pass](extending/how-to-add-optimization-pass)
35+
- [How to add a new task for ONNX export](extending/how-to-add-new-task.md)
3536
- [How to add custom model evaluator](extending/custom-model-evaluator)
3637
- [How to add custom scripts to load datasets](extending/custom-scripts)
3738

@@ -57,6 +58,7 @@ configure-workflows/systems
5758
configure-workflows/engine-configuration
5859
extending/design
5960
extending/how-to-add-optimization-pass
61+
extending/how-to-add-new-task
6062
extending/custom-model-evaluator
6163
extending/custom-scripts
6264
```

olive/assets/__init__.py

Lines changed: 4 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,4 @@
1+
# -------------------------------------------------------------------------
2+
# Copyright (c) Microsoft Corporation. All rights reserved.
3+
# Licensed under the MIT License.
4+
# -------------------------------------------------------------------------
Lines changed: 4 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,4 @@
1+
# -------------------------------------------------------------------------
2+
# Copyright (c) Microsoft Corporation. All rights reserved.
3+
# Licensed under the MIT License.
4+
# -------------------------------------------------------------------------
Lines changed: 40 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,40 @@
1+
# Default dimension values for dummy input generation
2+
# These values are used when dimensions cannot be resolved from model config
3+
4+
# Attribute aliases (same concept, different naming across models)
5+
aliases:
6+
# Layer count
7+
num_layers: [num_hidden_layers, n_layer, n_layers]
8+
# Hidden dimensions
9+
hidden_size: [dim, d_model, n_embd]
10+
num_attention_heads: [num_heads, n_head, n_heads, encoder_attention_heads]
11+
num_kv_heads: [num_key_value_heads]
12+
# Image dimensions
13+
height: [sample_size, image_size, vision_config.image_size]
14+
width: [sample_size, image_size, vision_config.image_size]
15+
num_channels: [in_channels, vision_config.num_channels]
16+
17+
# Common
18+
batch_size: 2
19+
sequence_length: 16
20+
past_sequence_length: 16
21+
num_choices: 4
22+
vocab_size: 32000
23+
24+
# Image
25+
width: 64
26+
height: 64
27+
num_channels: 3
28+
point_batch_size: 3
29+
nb_points_per_image: 2
30+
visual_seq_length: 16
31+
32+
# Multimodal (CLIP, etc.)
33+
text_batch_size: 2
34+
image_batch_size: 2
35+
projection_dim: 512
36+
37+
# Audio
38+
feature_size: 80
39+
nb_max_frames: 3000
40+
audio_sequence_length: 16000

0 commit comments

Comments
 (0)