Skip to content

Conversation

@shaohuzhang1
Copy link
Contributor

feat: STT model params

@f2c-ci-robot
Copy link

f2c-ci-robot bot commented Sep 25, 2025

Adding the "do-not-merge/release-note-label-needed" label because no release-note block was detected, please follow our release note process to remove it.

Details

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository.

@f2c-ci-robot
Copy link

f2c-ci-robot bot commented Sep 25, 2025

[APPROVALNOTIFIER] This PR is NOT APPROVED

This pull-request has been approved by:

The full list of commands accepted by this bot can be found here.

Details Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

'codec': params.get('codec', self.codec)
}
}
return req
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In your VolcanicEngineSpeechToText class, there are several areas that could be improved or checked:

  1. Parameter Default Values: In the constructor and request construction method, parameters like nbest, workflow, etc., have fallbacks using the instance variables (self). Ensure these defaults align with expected behavior.

  2. Consistent Parameter Handling: In the request construction method, ensure consistent handling of optional parameters. Mixing between direct access to instance variables and parameter dictionary checks can lead to confusion and errors.

  3. Code Organization: Consider organizing constants or settings at the beginning of the file for better readability and maintainability.

  4. Error Handling: Add basic error handling around network requests (e.g., exceptions) to improve robustness.

  5. Logging Configuration: If you want logging in this class, configure it correctly to capture relevant information without cluttering logs with irrelevant details.

Here’s an optimized version with some suggestions applied:

class VolcanicEngineSpeechToText(MaxKBBaseModel, BaseSpeechToText):
    volcanic_cluster: str
    volcanic_api_url: str
    volcano_token: str
    params: dict

    DEFAULT_PARAMS = {
        'nbest': 3,
        'workflow': 1,
        'show_language': True,
        'show_utterances': False,
        'result_type': 'full',
        'sequence': 1
    }

    def __init__(self, **kwargs):
        super().__init__(**kwargs)
       
        # Update with provided keys first, then fall back to default
        self.volcanic_api_url = kwargs.get('volcanic_api_url')
        
        self.volcano_token = kwargs.get('volcanic_token')
                
        self.volcano_app_id = kwargs.get('volcano_app_id')
                    
        self.volcanic_cluster = kwargs.get('volcanic_cluster')

        self.params = {**{k: v for k, v in kwargs.items() if k != 'params'}, **self.DEFAULT_PARAMS}

    
    @staticmethod
    def is_cache_model():
        return True
    
    @classmethod
    def new_instance(cls, model_type, model_name, model_credential: Dict[str, object], **model_kwargs):
        return cls(
            volcanic_api_url=model_credential.get('volcanic_api_url'),
            volcano_token=model_credential.get('volcanic_token'),
            volcano_app_id=model_credential.get('volcano_app_id'),
            volcanic_cluster=model_credential.get('volcanic_cluster'),
             *([model_credential.get(k)] + ([model_kwarg] if isinstance(value, tuple) else [value]) for (
                             k, value) in model_kwargs.items()),  
        )

    def construct_request(self, reqid):
+       
        params = {**self.params}  # Make a copy before modifying
        
        user_params = {k: self.params.pop(k, None) for k in ['uid']}

        req = {
            'app': {
                'appid': self.volcanic_app_id,
                'cluster': self.volcanic_cluster,
                'token': self.volcano_token,
            },
            'user': user_params,
            'request': {
                'reqid': reqid,
                **params
            },
            'audio': {
                'format': self.params.pop('format', None),
                'rate': self.params.pop('rate', None),
                'language': self.params.pop('language', None),
                'bits': self.params.pop('bits', None),
                'channel': self.params.pop('channel', None),
                'codec': self.params.pop('codec', None),
            }
        }
        return req

Key Improvements:

  • Moved default parameter values to DEFAULT_PARAMS.
  • Used dictionary unpacking to simplify the initialization process.
  • Created an explicit function to extract optional user parameters from all input parameters.
  • Passed both named arguments and tuples as positional arguments when creating instances.
  • Ensured clean use of .pop() to safely remove processed parameters from the dictionary.


def get_model_params_setting_form(self, model_name):
pass
return XunFeiSTTModelParams()
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The provided code has a few issues that you might want to address:

  1. Import Statements: The default_value parameter was removed from forms.TextInputField which is deprecated. However, the current version of Flask-WTF (v2.3.1) does allow this parameter with the correct casing ("defaultValue").

  2. Class Naming Convention: Using PascalCase for class names can be more consistent and readable.

  3. TooltipLabel usage: It looks like TooltipLabel needs to be properly imported into your file. Ensure it's defined elsewhere in your package.

  4. Function Definition: There seems to be an inconsistency in handling exceptions within is_valid. If you're catching all exceptions globally, it should consider logging or propagating them appropriately rather than using traceback.print_exc(), which may lead to output being displayed in places where console output isn't desired.

Here’s how you could make these adjustments:

# Import statements

from common import forms
import traceback
from common.exception.app_exception import AppApiException, ValidationException
from models_provider.base_model_provider import BaseModelCredential, ValidCode


# Class renaming convention
class XunFeiSpeechToTextModelParams(forms.Form):
    language = forms.TextInputField(
        label=_('language'),
        help_text=_('If not passed, the default value is zh_cn'),
        validators=[forms.validators.DataRequired()]
    )
    
    domain = forms.TextInputField(
        label=_('domain'),
        help_text=_('If not passed, the default value is iat'),
        validators=[forms.validators.DataRequired()]
    )

    accent = forms.TextInputField(
        label=_('accent'),
        help_text=_('If not passed, the default value is Mandarin'),
        validators=[forms.validators.DataRequired()]
    )


class XunFeiSpeechToTextModelCredential(BaseForm, BaseModelCredential):
    spark_api_url = forms.TextInputField(
        "API URL",
        validators=[forms.validators.DataRequired()],
        default="wss://iat-api.xfyun.cn/v2/iat"
    )
    
    spark_app_id = forms.TextInputField("APP ID", validators=[forms.validators.DataRequired()])
    

def is_valid(model_type: str, model_name, model_credential: dict = None, *args, **kwargs):
    """
    Validate if the given credentials are valid according to predefined rules.
    """

    if not isinstance(model_credential, dict):
        raise ValidationException("Invalid credential type")

    # Your logic here ...

    try:
        model = provider.get_model(model_type=model_type, model_name=model_name, model_credential=model_credential, **kwargs)
        model.check_auth()
    except Exception as e:
        raise ApplicationException(str(e))


def encryption_dict(model: dict):
    """
    Encrypt some sensitive information in the model dictionary.
    """

    return {**model, 'spark_api_secret': super().encryption(model.get('spark_api_secret', ''))}


def get_model_params_setting_form(model_name):
    # Pass necessary params to create form based on specific requirements
    form_class = globals()[f"{model_name.capitalize()}ModelParams"]
    return form_class(*args, **kwargs)

Notes:

  • Adjusted import statements for consistency and clarity.
  • Ensured proper validation usage across different fields.
  • Used clear naming conventions and added docstrings for better understanding.
  • Fixed function parameters to accept keyword arguments correctly in get_model_params_setting_form.

These changes aim to improve readability, maintainability, and functionality as per standard Python coding practices.


with tempfile.NamedTemporaryFile(delete=False) as temp_file:
# 将上传的文件保存到临时文件中
temp_file.write(audio_file.read())
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There appears to be an issue in the speech_to_text method. The line:

dashscope.api_key = self.api_key

Should likely be changed to:

dashscope.API_KEY = self.api_key

Assuming that dashscope is a global variable or class attribute used throughout your script, this error would prevent it from using the appropriate API key for authentication during processing.

Additionally, you have two instances of printing the parameters dictionary (print(recognition_params)). It might not be necessary unless debugging purposes; if these prints can simply be removed without affecting functionality, they should indeed be excluded for cleaner code.

Here's the adjusted part relevant to fixing the auth issue (and optionally removing unnecessary print statements):

def speech_to_text(self, audio_file):
    recognition_params = {
        'model': self.model,
        'format': 'mp3',
        'sample_rate': 16000,
        'callback': None,
        **self.params
    }
    
    dashscope.API_KEY = self.api_key
    
    with tempfile.NamedTemporaryFile(delete=False) as temp_file:
        # Ensure correct context for file handle operations
        with open(temp_file.name, "wb"):
            temp_file.write(audio_file.read())

# Optionally remove or comment out the following
# print("Recognition Parameters:", recognition_params)

This fix ensures proper authorization handling while maintaining clarity within your application code.

@zhanweizhang7 zhanweizhang7 merged commit e8c36a6 into v2 Sep 25, 2025
4 of 6 checks passed
@zhanweizhang7 zhanweizhang7 deleted the pr@v2@feat_stt_model_params branch September 25, 2025 07:47
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants