-
Notifications
You must be signed in to change notification settings - Fork 2.6k
feat: STT model params #4106
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
feat: STT model params #4106
Conversation
|
Adding the "do-not-merge/release-note-label-needed" label because no release-note block was detected, please follow our release note process to remove it. DetailsInstructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. |
|
[APPROVALNOTIFIER] This PR is NOT APPROVED This pull-request has been approved by: The full list of commands accepted by this bot can be found here. DetailsNeeds approval from an approver in each of these files:Approvers can indicate their approval by writing |
| 'codec': params.get('codec', self.codec) | ||
| } | ||
| } | ||
| return req |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In your VolcanicEngineSpeechToText class, there are several areas that could be improved or checked:
-
Parameter Default Values: In the constructor and request construction method, parameters like
nbest,workflow, etc., have fallbacks using the instance variables (self). Ensure these defaults align with expected behavior. -
Consistent Parameter Handling: In the request construction method, ensure consistent handling of optional parameters. Mixing between direct access to instance variables and parameter dictionary checks can lead to confusion and errors.
-
Code Organization: Consider organizing constants or settings at the beginning of the file for better readability and maintainability.
-
Error Handling: Add basic error handling around network requests (e.g., exceptions) to improve robustness.
-
Logging Configuration: If you want logging in this class, configure it correctly to capture relevant information without cluttering logs with irrelevant details.
Here’s an optimized version with some suggestions applied:
class VolcanicEngineSpeechToText(MaxKBBaseModel, BaseSpeechToText):
volcanic_cluster: str
volcanic_api_url: str
volcano_token: str
params: dict
DEFAULT_PARAMS = {
'nbest': 3,
'workflow': 1,
'show_language': True,
'show_utterances': False,
'result_type': 'full',
'sequence': 1
}
def __init__(self, **kwargs):
super().__init__(**kwargs)
# Update with provided keys first, then fall back to default
self.volcanic_api_url = kwargs.get('volcanic_api_url')
self.volcano_token = kwargs.get('volcanic_token')
self.volcano_app_id = kwargs.get('volcano_app_id')
self.volcanic_cluster = kwargs.get('volcanic_cluster')
self.params = {**{k: v for k, v in kwargs.items() if k != 'params'}, **self.DEFAULT_PARAMS}
@staticmethod
def is_cache_model():
return True
@classmethod
def new_instance(cls, model_type, model_name, model_credential: Dict[str, object], **model_kwargs):
return cls(
volcanic_api_url=model_credential.get('volcanic_api_url'),
volcano_token=model_credential.get('volcanic_token'),
volcano_app_id=model_credential.get('volcano_app_id'),
volcanic_cluster=model_credential.get('volcanic_cluster'),
*([model_credential.get(k)] + ([model_kwarg] if isinstance(value, tuple) else [value]) for (
k, value) in model_kwargs.items()),
)
def construct_request(self, reqid):
+
params = {**self.params} # Make a copy before modifying
user_params = {k: self.params.pop(k, None) for k in ['uid']}
req = {
'app': {
'appid': self.volcanic_app_id,
'cluster': self.volcanic_cluster,
'token': self.volcano_token,
},
'user': user_params,
'request': {
'reqid': reqid,
**params
},
'audio': {
'format': self.params.pop('format', None),
'rate': self.params.pop('rate', None),
'language': self.params.pop('language', None),
'bits': self.params.pop('bits', None),
'channel': self.params.pop('channel', None),
'codec': self.params.pop('codec', None),
}
}
return reqKey Improvements:
- Moved default parameter values to
DEFAULT_PARAMS. - Used dictionary unpacking to simplify the initialization process.
- Created an explicit function to extract optional user parameters from all input parameters.
- Passed both named arguments and tuples as positional arguments when creating instances.
- Ensured clean use of
.pop()to safely remove processed parameters from the dictionary.
|
|
||
| def get_model_params_setting_form(self, model_name): | ||
| pass | ||
| return XunFeiSTTModelParams() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The provided code has a few issues that you might want to address:
-
Import Statements: The
default_valueparameter was removed fromforms.TextInputFieldwhich is deprecated. However, the current version of Flask-WTF (v2.3.1) does allow this parameter with the correct casing ("defaultValue"). -
Class Naming Convention: Using PascalCase for class names can be more consistent and readable.
-
TooltipLabel usage: It looks like
TooltipLabelneeds to be properly imported into your file. Ensure it's defined elsewhere in your package. -
Function Definition: There seems to be an inconsistency in handling exceptions within
is_valid. If you're catching all exceptions globally, it should consider logging or propagating them appropriately rather than usingtraceback.print_exc(), which may lead to output being displayed in places where console output isn't desired.
Here’s how you could make these adjustments:
# Import statements
from common import forms
import traceback
from common.exception.app_exception import AppApiException, ValidationException
from models_provider.base_model_provider import BaseModelCredential, ValidCode
# Class renaming convention
class XunFeiSpeechToTextModelParams(forms.Form):
language = forms.TextInputField(
label=_('language'),
help_text=_('If not passed, the default value is zh_cn'),
validators=[forms.validators.DataRequired()]
)
domain = forms.TextInputField(
label=_('domain'),
help_text=_('If not passed, the default value is iat'),
validators=[forms.validators.DataRequired()]
)
accent = forms.TextInputField(
label=_('accent'),
help_text=_('If not passed, the default value is Mandarin'),
validators=[forms.validators.DataRequired()]
)
class XunFeiSpeechToTextModelCredential(BaseForm, BaseModelCredential):
spark_api_url = forms.TextInputField(
"API URL",
validators=[forms.validators.DataRequired()],
default="wss://iat-api.xfyun.cn/v2/iat"
)
spark_app_id = forms.TextInputField("APP ID", validators=[forms.validators.DataRequired()])
def is_valid(model_type: str, model_name, model_credential: dict = None, *args, **kwargs):
"""
Validate if the given credentials are valid according to predefined rules.
"""
if not isinstance(model_credential, dict):
raise ValidationException("Invalid credential type")
# Your logic here ...
try:
model = provider.get_model(model_type=model_type, model_name=model_name, model_credential=model_credential, **kwargs)
model.check_auth()
except Exception as e:
raise ApplicationException(str(e))
def encryption_dict(model: dict):
"""
Encrypt some sensitive information in the model dictionary.
"""
return {**model, 'spark_api_secret': super().encryption(model.get('spark_api_secret', ''))}
def get_model_params_setting_form(model_name):
# Pass necessary params to create form based on specific requirements
form_class = globals()[f"{model_name.capitalize()}ModelParams"]
return form_class(*args, **kwargs)Notes:
- Adjusted import statements for consistency and clarity.
- Ensured proper validation usage across different fields.
- Used clear naming conventions and added docstrings for better understanding.
- Fixed function parameters to accept keyword arguments correctly in
get_model_params_setting_form.
These changes aim to improve readability, maintainability, and functionality as per standard Python coding practices.
|
|
||
| with tempfile.NamedTemporaryFile(delete=False) as temp_file: | ||
| # 将上传的文件保存到临时文件中 | ||
| temp_file.write(audio_file.read()) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There appears to be an issue in the speech_to_text method. The line:
dashscope.api_key = self.api_keyShould likely be changed to:
dashscope.API_KEY = self.api_keyAssuming that dashscope is a global variable or class attribute used throughout your script, this error would prevent it from using the appropriate API key for authentication during processing.
Additionally, you have two instances of printing the parameters dictionary (print(recognition_params)). It might not be necessary unless debugging purposes; if these prints can simply be removed without affecting functionality, they should indeed be excluded for cleaner code.
Here's the adjusted part relevant to fixing the auth issue (and optionally removing unnecessary print statements):
def speech_to_text(self, audio_file):
recognition_params = {
'model': self.model,
'format': 'mp3',
'sample_rate': 16000,
'callback': None,
**self.params
}
dashscope.API_KEY = self.api_key
with tempfile.NamedTemporaryFile(delete=False) as temp_file:
# Ensure correct context for file handle operations
with open(temp_file.name, "wb"):
temp_file.write(audio_file.read())
# Optionally remove or comment out the following
# print("Recognition Parameters:", recognition_params)This fix ensures proper authorization handling while maintaining clarity within your application code.
feat: STT model params