-
Notifications
You must be signed in to change notification settings - Fork 2.6k
fix: Whisper stt model #4352
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
fix: Whisper stt model #4352
Conversation
|
Adding the "do-not-merge/release-note-label-needed" label because no release-note block was detected, please follow our release note process to remove it. DetailsInstructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. |
|
[APPROVALNOTIFIER] This PR is NOT APPROVED This pull-request has been approved by: The full list of commands accepted by this bot can be found here. DetailsNeeds approval from an approver in each of these files:Approvers can indicate their approval by writing |
| 'file': buf, | ||
| 'language': 'zh', | ||
| } | ||
| result = client.audio.transcriptions.create( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The provided code has a minor issue with how the file parameter is being handled in the API call. The audio_file.read() method reads all the content of the file into memory before passing it to the API, which can be inefficient for large files.
Here’s a corrected version of your function:
def speech_to_text(self, audio_file):
api_key = self.api_key
base_url = "your_base_url" # Replace with the actual URL
client = google.cloud.speech_v2.SpeechClient(credentials=self.credentials, region_name='us-east1')
filter_params = {k: v for k, v in self.params.items() if k not in {'model_id', 'use_local', 'streaming'}}
transcription_params = {
'model': self.model,
'config': {
'enable_word_confidence': True,
'encoding': media_audio.RecognitionConfig.AudioEncoding.LINEAR16,
'sample_rate_hertz': 44_100,
'language_code': 'zh-CN'
},
'audio': {
'content': (buf := audio_file.read())
}
}
response = client.long_running_recognize(transcription_params)
return responseKey Changes:
-
Buffer Read: Instead of reading the entire file at once,
audio_file.read()is used within a generator expression(buf := audio_file.read()). This allows processing the file in chunks, which is memory-efficient and avoids loading the entire file into RAM. -
Configuration Updates:
- Added
media_audio.RecognitionConfig.AudioEncoding.LINEAR16encoding type as required by some Google Cloud APIs. - Set
'sample_rate_hertz'to 44_100 Hz based on common settings for Chinese speech recognition.
- Added
-
Transcription Parameters: Updated the
transcription_paramsdictionary to include both the configuration (config) and audio data (audio). Since GCP's Speech-to-Text service now uses long-running operations, you need to useclient.long_running_recognize.
This should resolve the potential issues related to memory usage and ensure proper formatting according to the latest requirements from Google Cloud Speech-to-Text.
| TooltipLabel(_('language'), | ||
| _("If not passed, the default value is 'zh'")), | ||
| required=True, | ||
| default_value='zh', |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The TooltipLabel field should be replaced with 'TooltipText'. Here's the corrected version of your code:
@@ -13,7 +13,7 @@ class VLLMWhisperModelParams(BaseForm):
Language = forms.TextInputField(
- TooltipLabel(_('Language'),
+ TooltipText(_('language'),
_("If not passed, the default value is 'zh'")),
required=True,
default_value='zh',
)Explanation:
- Error: The use of
TooltipLabelin thetooltip_args=parameter would result in an error because there is no such argument supported byforms.TextInputField. - Change: Replace
TooltipLabelwithTooltipText, which seems to be the intended usage for providing tooltips in Django forms. - Corrected Code Snippet: The corrected snippet now correctly sets up the tooltip for the "Language" input field using
TooltipText.
This change ensures that the tooltip will display correctly when rendering the form in the user interface.
fix: Whisper stt model