fix: Whisper stt model #4352

shaohuzhang1 · 2025-11-12T03:31:30Z

fix: Whisper stt model

f2c-ci-robot · 2025-11-12T03:31:35Z

Adding the "do-not-merge/release-note-label-needed" label because no release-note block was detected, please follow our release note process to remove it.

Details

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository.

f2c-ci-robot · 2025-11-12T03:31:40Z

[APPROVALNOTIFIER] This PR is NOT APPROVED

This pull-request has been approved by:

The full list of commands accepted by this bot can be found here.

Details

Needs approval from an approver in each of these files:

OWNERS

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

shaohuzhang1 · 2025-11-12T03:31:55Z

apps/models_provider/impl/vllm_model_provider/model/whisper_sst.py

+                'file': buf,
                'language': 'zh',
            }
            result = client.audio.transcriptions.create(


The provided code has a minor issue with how the file parameter is being handled in the API call. The audio_file.read() method reads all the content of the file into memory before passing it to the API, which can be inefficient for large files.

Here’s a corrected version of your function:

def speech_to_text(self, audio_file): api_key = self.api_key base_url = "your_base_url" # Replace with the actual URL client = google.cloud.speech_v2.SpeechClient(credentials=self.credentials, region_name='us-east1') filter_params = {k: v for k, v in self.params.items() if k not in {'model_id', 'use_local', 'streaming'}} transcription_params = { 'model': self.model, 'config': { 'enable_word_confidence': True, 'encoding': media_audio.RecognitionConfig.AudioEncoding.LINEAR16, 'sample_rate_hertz': 44_100, 'language_code': 'zh-CN' }, 'audio': { 'content': (buf := audio_file.read()) } } response = client.long_running_recognize(transcription_params) return response

Key Changes:

Buffer Read: Instead of reading the entire file at once, audio_file.read() is used within a generator expression (buf := audio_file.read()). This allows processing the file in chunks, which is memory-efficient and avoids loading the entire file into RAM.

Configuration Updates:

Added media_audio.RecognitionConfig.AudioEncoding.LINEAR16 encoding type as required by some Google Cloud APIs.

Set 'sample_rate_hertz' to 44_100 Hz based on common settings for Chinese speech recognition.

Transcription Parameters: Updated the transcription_params dictionary to include both the configuration (config) and audio data (audio). Since GCP's Speech-to-Text service now uses long-running operations, you need to use client.long_running_recognize.

This should resolve the potential issues related to memory usage and ensure proper formatting according to the latest requirements from Google Cloud Speech-to-Text.

shaohuzhang1 · 2025-11-12T03:32:02Z

apps/models_provider/impl/vllm_model_provider/credential/whisper_stt.py

+        TooltipLabel(_('language'),
                     _("If not passed, the default value is 'zh'")),
        required=True,
        default_value='zh',


The TooltipLabel field should be replaced with 'TooltipText'. Here's the corrected version of your code:

@@ -13,7 +13,7 @@ class VLLMWhisperModelParams(BaseForm): Language = forms.TextInputField( - TooltipLabel(_('Language'), + TooltipText(_('language'), _("If not passed, the default value is 'zh'")), required=True, default_value='zh', )

Explanation:

Error: The use of TooltipLabel in the tooltip_args= parameter would result in an error because there is no such argument supported by forms.TextInputField.

Change: Replace TooltipLabel with TooltipText, which seems to be the intended usage for providing tooltips in Django forms.

Corrected Code Snippet: The corrected snippet now correctly sets up the tooltip for the "Language" input field using TooltipText.

This change ensures that the tooltip will display correctly when rendering the form in the user interface.

fix: Whisper stt model

4c9c248

f2c-ci-robot bot added the do-not-merge/release-note-label-needed label Nov 12, 2025

zhanweizhang7 merged commit a568cfe into v2 Nov 12, 2025
3 of 5 checks passed

zhanweizhang7 deleted the pr@v2@fix_whisper_stt branch November 12, 2025 03:31

shaohuzhang1 commented Nov 12, 2025

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

fix: Whisper stt model #4352

fix: Whisper stt model #4352

Uh oh!

shaohuzhang1 commented Nov 12, 2025

Uh oh!

f2c-ci-robot bot commented Nov 12, 2025

Uh oh!

f2c-ci-robot bot commented Nov 12, 2025

Uh oh!

Uh oh!

shaohuzhang1 Nov 12, 2025

Uh oh!

shaohuzhang1 Nov 12, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

fix: Whisper stt model #4352

fix: Whisper stt model #4352

Uh oh!

Conversation

shaohuzhang1 commented Nov 12, 2025

Uh oh!

f2c-ci-robot bot commented Nov 12, 2025

Uh oh!

f2c-ci-robot bot commented Nov 12, 2025

Uh oh!

Uh oh!

shaohuzhang1 Nov 12, 2025

Choose a reason for hiding this comment

Key Changes:

Uh oh!

shaohuzhang1 Nov 12, 2025

Choose a reason for hiding this comment

Explanation:

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants