-
Notifications
You must be signed in to change notification settings - Fork 3.6k
Update genai API docs #26900
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: gh-pages
Are you sure you want to change the base?
Update genai API docs #26900
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pull request overview
This PR updates the GenAI API documentation to reflect the latest code changes across multiple programming language bindings. The updates include new features, API refinements, and expanded configuration options for the ONNX Runtime GenAI library.
Key Changes:
- Added new configuration options for multi-modal models (image/video token IDs, vision processing parameters)
- Introduced batching engine support (dynamic and static batching configurations)
- Expanded API surface with new classes for Request/Engine-based generation patterns
- Enhanced tokenizer capabilities with new methods for token ID retrieval and option updates
Reviewed changes
Copilot reviewed 7 out of 7 changed files in this pull request and generated 7 comments.
Show a summary per file
| File | Description |
|---|---|
| src/routes/onnx/+page.svelte | Removed deprecated model types link reference from UI |
| docs/genai/reference/config.md | Added engine/batching sections, multi-modal token IDs, session/run options, and hardware device filtering |
| docs/genai/api/python.md | Added Request/Engine classes, updated GeneratorParams, enhanced Config with model data methods, and tokenizer improvements |
| docs/genai/api/java.md | Restructured Model constructors, added close() methods, expanded Tokenizer/Generator APIs, and documented multi-modal processors |
| docs/genai/api/csharp.md | Added hardware device filtering to Config, updated Generator/Tokenizer APIs, documented chat client and multi-modal support |
| docs/genai/api/cpp.md | Added model data management, hardware device options, Request/Engine classes, and enhanced processor batch support |
| docs/genai/api/c.md | Comprehensive updates including Request/Engine API, enhanced tokenizer functions, multi-modal processing, and execution provider registration |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| Path to the vision adapter file. | ||
|
|
||
| - **spatial_merge_size**: *(int, optional)* | ||
| Patch merge size used by some models (for example, Qwen2.5-VL). Defaults to 2. |
Copilot
AI
Jan 5, 2026
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The example reference "for example, Qwen2.5-VL" should use consistent spacing. The version number format "2.5" appears to use a period separator which is standard, but verify this matches the official model name capitalization.
| Patch merge size used by some models (for example, Qwen2.5-VL). Defaults to 2. | |
| Patch merge size used by some models (such as Qwen2.5-VL). Defaults to 2. |
| Name of the image features output tensor. | ||
|
|
||
| - **pipeline**: *(array, optional)* | ||
| Ordered list of sub-models for vision pipelines (for example, patch embedding, attention, merge). |
Copilot
AI
Jan 5, 2026
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The description states "for example, patch embedding, attention, merge" but should have consistent list formatting. Consider using "e.g., patch embedding, attention, and merge" for better readability.
| Ordered list of sub-models for vision pipelines (for example, patch embedding, attention, merge). | |
| Ordered list of sub-models for vision pipelines (e.g., patch embedding, attention, and merge). |
| - **outputs**: *(array of string)* | ||
| Graph output names. | ||
| - **run_on_cpu**: *(bool, optional)* | ||
| If true, forces CPU EP when multiple EPs are configured. |
Copilot
AI
Jan 5, 2026
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The description "If true, forces CPU EP when multiple EPs are configured" uses inconsistent abbreviations. Consider expanding "EP" to "execution provider" on first use or consistently using the full term for clarity, especially since this is user-facing documentation.
| If true, forces CPU EP when multiple EPs are configured. | |
| If true, forces use of the CPU execution provider when multiple execution providers are configured. |
| ``` | ||
|
|
||
| - `apply_chat_template(template_str: str, messages: str, tools: str = None, add_generation_prompt: bool = False) -> str` | ||
| - `apply_chat_template(messages: str, template_str: str | None = None, tools: str | None = None, add_generation_prompt: bool = True) -> str` |
Copilot
AI
Jan 5, 2026
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The parameter order in the function signature has changed from the old version. The old signature had "template_str" as the first parameter, but the new signature has "messages" first and "template_str" as optional second parameter. Ensure this change is intentional and matches the actual API implementation.
|
|
||
| ```python | ||
| chat = tokenizer.apply_chat_template("{user}: {message}", messages="Hi!", add_generation_prompt=True) | ||
| chat = tokenizer.apply_chat_template(messages="Hi!", template_str="{user}: {message}", add_generation_prompt=True) |
Copilot
AI
Jan 5, 2026
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The example usage still shows the old parameter order with "template_str" first: tokenizer.apply_chat_template(messages="Hi!", template_str="{user}: {message}", add_generation_prompt=True). However, if the new signature has "messages" as first parameter and "template_str" as optional second parameter, verify this example is consistent with the signature shown on line 299.
|
|
||
| ```python | ||
| result = processor(prompt="Describe this image", images=onnxruntime_genai.Images.open("image.png")) | ||
| batched = processor(["Describe first", "Describe second"], images=onnxruntime_genai.Images.open("image1.png", "image2.png")) |
Copilot
AI
Jan 5, 2026
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The example shows batched = processor(["Describe first", "Describe second"], images=onnxruntime_genai.Images.open("image1.png", "image2.png")) but Images.open() is documented as a static load method in other parts of the docs. Verify that Images.open() supports multiple file paths as arguments, or if it should be Images.open(["image1.png", "image2.png"]) with a list.
| batched = processor(["Describe first", "Describe second"], images=onnxruntime_genai.Images.open("image1.png", "image2.png")) | |
| batched = processor(["Describe first", "Describe second"], images=onnxruntime_genai.Images.open(["image1.png", "image2.png"])) |
|
|
||
| ### OgaShutdown | ||
|
|
||
| Cleanly shutdown the genai library and its ONNX Runtime usage on process exit. |
Copilot
AI
Jan 5, 2026
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The description states "Cleanly shutdown the genai library" but should be "Cleans up and shuts down the GenAI library" or "Cleanly shuts down the GenAI library" for proper grammar. "shutdown" as a verb should be two words "shut down".
| Cleanly shutdown the genai library and its ONNX Runtime usage on process exit. | |
| Cleanly shuts down the GenAI library and its ONNX Runtime usage on process exit. |
Description
Update the API docs with the latest code changes