[Frontend] Gemma3n audio `transcriptions`/`translations` endpoint #23735

NickLucche · 2025-08-27T10:49:21Z

This PR enables Gemma3n for use with the audio-specific endpoints (transcriptions/translations).

I've also added a "soft" interface changes to add a to_language parameter to the API as I found it helps some with translation.
The rationale is that I would like to keep this changes lightweight for now as we're only slightly steering away from the original oai whisper-only specs, and instead see where the broader audio community wants it to be.

No chunking for now, as I believe a long-audio capability assessment is in order for this model.

A list of additional minor changes:

conftest.py for audio entrypoints tests
seed params for translations
whisper+gemma3n audio tests with module-level server fixture

I also plan to follow up with revamped benchmark+evaluation scripts to better cover these models.

# pre 
python -m pytest tests/entrypoints/openai/test_translation_validation.py  146.99s user 22.77s system 154% cpu 1:50.18 total

# post
python -m pytest tests/entrypoints/openai/test_translation_validation.py  243.81s user 31.39s system 146% cpu 3:07.72 total

Signed-off-by: NickLucche <[email protected]>

NickLucche · 2025-08-27T10:50:50Z

cc @DarkLight1337

gemini-code-assist

Code Review

This pull request enables Gemma3n for audio transcription and translation endpoints, which is a great addition. The changes include a soft API modification to add a to_language parameter, which will be useful for future enhancements. The tests have been updated to cover Gemma3n, including parameterization over different models, which is good practice. I've found one issue regarding input validation for the new model implementation that should be addressed.

vllm/model_executor/models/gemma3n_mm.py

tests/entrypoints/openai/test_translation_validation.py

DarkLight1337 · 2025-08-27T11:06:54Z

vllm/model_executor/models/gemma3n_mm.py

+        if task_type == "transcribe" and full_lang_name:
+            prompt += f" into {full_lang_name}"
+        elif task_type == "translate":
+            if full_lang_name:


We should validate that both languages are valid when doing translation

I am assuming languages are validated beforehand here https://github.com/vllm-project/vllm/blob/main/vllm/entrypoints/openai/speech_to_text.py#L91.
Do you have some extra checks in mind?

I see, in that case perhaps we should pass the full_lang_name directly into the method?

I also think that we should have a separate function for each task to reduce branching

separate function for each task to reduce branching

I think this may cause duplication for the other models as of now.
The change makes sense, I just wanted to wait and see a bit more models supported here before changing the interface.

I see, we can merge this PR for now then, thanks

Signed-off-by: NickLucche <[email protected]>

pratapyash · 2025-08-29T20:52:26Z

@NickLucche Can we expect LoRA support for text and audio modules for Gemma3n. [https://github.com//issues/21746](relevant issue #21746)

NickLucche · 2025-08-31T09:45:14Z

@pratapyash check #24003 out

pratapyash · 2025-08-31T16:59:22Z

Facing a bug when running multimodal inference (specifically audio) with Gemma3n. Would be relevant to this PR. #24006

Summary:

In gemma3n_mm.py::_process_audio_input we call:
audio_input["input_features"].squeeze(1)
For batched audio requests, input_features arrives as a Python list →
AttributeError: 'list' object has no attribute 'squeeze' → EngineCore dies.
Result: repeated HTTP 500s on /v1/chat/completions and NCCL shutdown warning.

Signed-off-by: NickLucche <[email protected]>

NickLucche · 2025-09-01T09:52:49Z

@DarkLight1337 things look green here

…lm-project#23735) Signed-off-by: NickLucche <[email protected]> Co-authored-by: Cyrus Leung <[email protected]>

NickLucche added 3 commits August 25, 2025 15:50

init

8fd3572

Signed-off-by: NickLucche <[email protected]>

to_language interface + seed

54e0f94

Signed-off-by: NickLucche <[email protected]>

tests

79bebe9

Signed-off-by: NickLucche <[email protected]>

NickLucche requested review from DarkLight1337, aarnphm, patrickvonplaten, robertgshaw2-redhat and simon-mo as code owners August 27, 2025 10:49

mergify bot added the frontend label Aug 27, 2025

gemini-code-assist bot reviewed Aug 27, 2025

View reviewed changes

vllm/model_executor/models/gemma3n_mm.py Show resolved Hide resolved

DarkLight1337 reviewed Aug 27, 2025

View reviewed changes

to_language validation

52bab20

Signed-off-by: NickLucche <[email protected]>

DarkLight1337 approved these changes Sep 1, 2025

View reviewed changes

Merge branch 'main' into gemma3n-audio

d63e894

DarkLight1337 added the ready ONLY add when PR is ready to merge/full CI is needed label Sep 1, 2025

conftest

853ac5b

Signed-off-by: NickLucche <[email protected]>

DarkLight1337 merged commit d46934b into vllm-project:main Sep 1, 2025
43 checks passed

NickLucche mentioned this pull request Sep 9, 2025

[Docs] Gemma3n transcriptions endpoint support #24512

Merged

eicherseiji pushed a commit to eicherseiji/vllm that referenced this pull request Sep 9, 2025

[Frontend] Gemma3n audio transcriptions/translations endpoint (vl…

943240a

…lm-project#23735) Signed-off-by: NickLucche <[email protected]> Co-authored-by: Cyrus Leung <[email protected]>

FeiDaLI pushed a commit to FeiDaLI/vllm that referenced this pull request Sep 25, 2025

[Frontend] Gemma3n audio transcriptions/translations endpoint (vl…

de2c9ef

…lm-project#23735) Signed-off-by: NickLucche <[email protected]> Co-authored-by: Cyrus Leung <[email protected]>

bbrowning mentioned this pull request Oct 27, 2025

[CI Failure]: Entrypoints Integration Test (API Server) flake in entrypoints/openai/test_transcription_validation.py::test_basic_audio_gemma #27576

Closed

3 tasks

Uh oh!

[Frontend] Gemma3n audio transcriptions/translations endpoint #23735

[Frontend] Gemma3n audio transcriptions/translations endpoint #23735

Uh oh!

Conversation

NickLucche commented Aug 27, 2025 • edited by github-actions bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

NickLucche commented Aug 27, 2025

Uh oh!

gemini-code-assist bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

Uh oh!

Uh oh!

DarkLight1337 Aug 27, 2025

Choose a reason for hiding this comment

Uh oh!

NickLucche Aug 27, 2025

Choose a reason for hiding this comment

Uh oh!

DarkLight1337 Aug 27, 2025

Choose a reason for hiding this comment

Uh oh!

DarkLight1337 Aug 27, 2025

Choose a reason for hiding this comment

Uh oh!

NickLucche Aug 31, 2025

Choose a reason for hiding this comment

Uh oh!

DarkLight1337 Sep 1, 2025

Choose a reason for hiding this comment

Uh oh!

pratapyash commented Aug 29, 2025

Uh oh!

NickLucche commented Aug 31, 2025

Uh oh!

pratapyash commented Aug 31, 2025

Uh oh!

NickLucche commented Sep 1, 2025

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

[Frontend] Gemma3n audio `transcriptions`/`translations` endpoint #23735

[Frontend] Gemma3n audio `transcriptions`/`translations` endpoint #23735

NickLucche commented Aug 27, 2025 •

edited by github-actions bot

Loading