Skip to content

Conversation

@thewh1teagle
Copy link
Contributor

@thewh1teagle thewh1teagle commented Oct 4, 2025

@thewh1teagle
Copy link
Contributor Author

Happy to add more examples if they’re welcomed. for example, this one is hugely useful for fine-tuned models with LoRA

"""Simple example: Export Gemma3 270M with LoRA adapter to ONNX and generate text.

Usage:
    uv pip install onnxruntime peft
    uv run examples/gemma3.py
"""

from peft import PeftModel
from transformers import AutoModelForCausalLM, AutoTokenizer

from optimum.exporters.onnx import onnx_export_from_model
from optimum.onnxruntime import ORTModelForCausalLM
import time


# Load base model and merge with LoRA adapter
base_model_id = "google/gemma-3-270m-it"  # The base model for your LoRA
adapter_id = "thewh1teagle/gemma3-heb-g2p"

base_model = AutoModelForCausalLM.from_pretrained(base_model_id)
model = PeftModel.from_pretrained(base_model, adapter_id)
model = model.merge_and_unload()  # Merge LoRA weights into base model

tokenizer = AutoTokenizer.from_pretrained(adapter_id)

# Export merged model to ONNX
print("Exporting to ONNX...")
output_dir = "gemma3_onnx"
onnx_export_from_model(
    model=model,
    output=output_dir,
    task="text-generation-with-past"
)

# Save tokenizer to the same directory
tokenizer.save_pretrained(output_dir)

# Load the exported ONNX model
ort_model = ORTModelForCausalLM.from_pretrained(output_dir)

# Chat with instruction-tuned model
system_message = """Given the following Hebrew sentence, convert it to IPA phonemes.
Input Format: A Hebrew sentence.
Output Format: A string of IPA phonemes.
"""

user_prompt = "אז מה דעתך, האם אתה יודע לדבר עברית גם כמו שאני יודע לדבר או שאתה לא?"

conversation = [
    {"role": "system", "content": system_message},
    {"role": "user", "content": user_prompt}
]

prompt = tokenizer.apply_chat_template(conversation, tokenize=False, add_generation_prompt=True)
inputs = tokenizer(prompt, return_tensors="pt")

# Generate with parameters similar to the working Ollama script
start_time = time.time()
outputs = ort_model.generate(
    **inputs,
    max_new_tokens=150,
    temperature=0.9,
    top_p=0.95,
    top_k=64,
    pad_token_id=tokenizer.eos_token_id,
    eos_token_id=tokenizer.convert_tokens_to_ids(["<end_of_turn>", "</s>"])
)

response = tokenizer.decode(outputs[0], skip_special_tokens=True)

# Extract only the model's response (after the last "model" turn)
if "<start_of_turn>model" in response:
    response = response.split("<start_of_turn>model")[-1].strip()
    # Remove any end tokens
    for end_token in ["<end_of_turn>", "</s>"]:
        response = response.replace(end_token, "")

print(response.strip())

print(f"Time taken: {time.time() - start_time:.2f} seconds")

@bil-ash
Copy link

bil-ash commented Oct 5, 2025

Looking forward to gemma3n multimodal support

@IlyasMoutawwakil
Copy link
Member

IlyasMoutawwakil commented Oct 5, 2025

Thanks for the addition ! I don't think an example script is the best way, maybe it would be better to add the snippet in the documentation under a relevant section or make it into a notebook can also be very useful !
Can you please add the model types gemma3 and gemma3_text in the testing files (tests/onnxruntime/test_decoder.py, tests/onnxruntime/testing_utils.py and tests/exporters/onnx) along with a tiny model id from the hub for testing.

CohereRotaryEmbedding.forward = self.original_forward


class Gemma3LMModelPatcher(DecoderModelPatcher):
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

did you try exporting without this patcher ? (it might not be necessary for text only generation)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yup, u right
I removed them and tests are working

uv.lock Outdated
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

no need for the lock file 🤗

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

fixed

@thewh1teagle
Copy link
Contributor Author

@IlyasMoutawwakil
I think a lot of people would benefit from an examples folder with sample scripts. it’s a big win for dev experience. why not add it?
I don’t have much time to keep working on this PR so just checking in advance, does it require any more work beyond what you mentioned in your last comment?

@IlyasMoutawwakil
Copy link
Member

IlyasMoutawwakil commented Oct 16, 2025

@thewh1teagle

I think a lot of people would benefit from an examples folder with sample scripts.

yeah ofc feel free, my proposition is to simply put it in the docs for better viz

does it require any more work beyond what you mentioned in your last comment?

yes needs to add testing (you can see how it's added in https://github.com/huggingface/optimum-onnx/pull/43/files 🤗)

@fosple
Copy link

fosple commented Oct 16, 2025

@thewh1teagle
I just found this. Maybe this helps with the implementation:
Convert_Gemma_3_270M_to_ONNX.ipynb

Build script from Xenova:
build_gemma.py

@thewh1teagle
Copy link
Contributor Author

@IlyasMoutawwakil
Since optimum-onnx doesn't have official website (maybe somewhat docs in HF) having examples folder is great and it does have great visibility. people tend to look for it in my experience.

@thewh1teagle
Copy link
Contributor Author

Added tests and verified with

uv run --extra tests --extra onnxruntime pytest tests/onnxruntime/test_decoder.py -k "gemma3" -v

@thewh1teagle
Copy link
Contributor Author

@IlyasMoutawwakil
Also I added a mention about the examples folder in the main readme. feel free to modify anything as you prefer.

@IlyasMoutawwakil
Copy link
Member

Since optimum-onnx doesn't have official website (maybe somewhat docs in HF) having examples folder is great and it does have great visibility. people tend to look for it in my experience.

We do have docs 😥 the repo's mian page links to it right under the description and also in the readme if you click on "Documentation"
https://huggingface.co/docs/optimum-onnx/en/quickstart

@HuggingFaceDocBuilderDev

The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. The docs are available until 30 days after the last update.

@thewh1teagle
Copy link
Contributor Author

thewh1teagle commented Oct 19, 2025

@IlyasMoutawwakil
Maybe it's not something about optimum-onnx itself, the docs template in huggingfae is just not friendly in general

image image

many unclear navigation buttons

@IlyasMoutawwakil
Copy link
Member

@thewh1teagle what's not clear exactly ? btw it's open source so contributions are welcome https://github.com/huggingface/doc-builder

@IlyasMoutawwakil
Copy link
Member

@bot \style

@geraldstanje1
Copy link

hi @thewh1teagle does that also work with gemma3 4b model?

@thewh1teagle
Copy link
Contributor Author

@IlyasMoutawwakil my feedback was just to let you know so you can potentially improve it. not a complaint! :) I really appreciate the work you’re doing on this open source library (and open source projects in general from HF)

@register_tasks_manager_onnx("gemma3", *COMMON_TEXT_GENERATION_TASKS)
@register_tasks_manager_onnx("gemma3_text", *COMMON_TEXT_GENERATION_TASKS)
class Gemma3OnnxConfig(GemmaOnnxConfig):
MIN_TRANSFORMERS_VERSION = version.parse("4.52.0")
Copy link
Member

@IlyasMoutawwakil IlyasMoutawwakil Oct 22, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I bumped to 4.53.0 and added a comment about why


@register_tasks_manager_onnx("gemma", *[*COMMON_TEXT_GENERATION_TASKS, "text-classification"])
class GemmaOnnxConfig(LlamaOnnxConfig):
class GemmaOnnxConfig(TextDecoderOnnxConfig):
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I discovered that gemma models in general don't need the position ids argument

Copy link
Member

@IlyasMoutawwakil IlyasMoutawwakil Oct 22, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@echarlaix wdyt ? this also removes the need for position ids from gpt_oss and nemotron

Copy link
Member

@IlyasMoutawwakil IlyasMoutawwakil left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks a lot for the contribution ! I made some changes making sure the minimal transformers version passes all tests !

@IlyasMoutawwakil IlyasMoutawwakil merged commit f5df6b5 into huggingface:main Oct 22, 2025
29 of 36 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

6 participants