Skip to content

Conversation

@shaohuzhang1
Copy link
Contributor

fix: Whisper stt model

@f2c-ci-robot
Copy link

f2c-ci-robot bot commented Nov 12, 2025

Adding the "do-not-merge/release-note-label-needed" label because no release-note block was detected, please follow our release note process to remove it.

Details

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository.

@f2c-ci-robot
Copy link

f2c-ci-robot bot commented Nov 12, 2025

[APPROVALNOTIFIER] This PR is NOT APPROVED

This pull-request has been approved by:

The full list of commands accepted by this bot can be found here.

Details Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@zhanweizhang7 zhanweizhang7 merged commit a568cfe into v2 Nov 12, 2025
3 of 5 checks passed
@zhanweizhang7 zhanweizhang7 deleted the pr@v2@fix_whisper_stt branch November 12, 2025 03:31
'file': buf,
'language': 'zh',
}
result = client.audio.transcriptions.create(
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The provided code has a minor issue with how the file parameter is being handled in the API call. The audio_file.read() method reads all the content of the file into memory before passing it to the API, which can be inefficient for large files.

Here’s a corrected version of your function:

def speech_to_text(self, audio_file):
    api_key = self.api_key
    base_url = "your_base_url"  # Replace with the actual URL

    client = google.cloud.speech_v2.SpeechClient(credentials=self.credentials, region_name='us-east1')
    
    filter_params = {k: v for k, v in self.params.items() if k not in {'model_id', 'use_local', 'streaming'}}

    transcription_params = {
        'model': self.model,
        'config': {
            'enable_word_confidence': True,
            'encoding': media_audio.RecognitionConfig.AudioEncoding.LINEAR16,
            'sample_rate_hertz': 44_100,
            'language_code': 'zh-CN'
        },
        'audio': {
            'content': (buf := audio_file.read())
        }
    }

    response = client.long_running_recognize(transcription_params)

    return response

Key Changes:

  1. Buffer Read: Instead of reading the entire file at once, audio_file.read() is used within a generator expression (buf := audio_file.read()). This allows processing the file in chunks, which is memory-efficient and avoids loading the entire file into RAM.

  2. Configuration Updates:

    • Added media_audio.RecognitionConfig.AudioEncoding.LINEAR16 encoding type as required by some Google Cloud APIs.
    • Set 'sample_rate_hertz' to 44_100 Hz based on common settings for Chinese speech recognition.
  3. Transcription Parameters: Updated the transcription_params dictionary to include both the configuration (config) and audio data (audio). Since GCP's Speech-to-Text service now uses long-running operations, you need to use client.long_running_recognize.

This should resolve the potential issues related to memory usage and ensure proper formatting according to the latest requirements from Google Cloud Speech-to-Text.

TooltipLabel(_('language'),
_("If not passed, the default value is 'zh'")),
required=True,
default_value='zh',
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The TooltipLabel field should be replaced with 'TooltipText'. Here's the corrected version of your code:

@@ -13,7 +13,7 @@ class VLLMWhisperModelParams(BaseForm):
     Language = forms.TextInputField(
-        TooltipLabel(_('Language'),
+        TooltipText(_('language'),
                      _("If not passed, the default value is 'zh'")),
         required=True,
         default_value='zh',
     )

Explanation:

  1. Error: The use of TooltipLabel in the tooltip_args= parameter would result in an error because there is no such argument supported by forms.TextInputField.
  2. Change: Replace TooltipLabel with TooltipText, which seems to be the intended usage for providing tooltips in Django forms.
  3. Corrected Code Snippet: The corrected snippet now correctly sets up the tooltip for the "Language" input field using TooltipText.

This change ensures that the tooltip will display correctly when rendering the form in the user interface.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants