Skip to content

Conversation

@shaohuzhang1
Copy link
Contributor

feat: Support tag in knowledge_workflow

@f2c-ci-robot
Copy link

f2c-ci-robot bot commented Dec 9, 2025

Adding the "do-not-merge/release-note-label-needed" label because no release-note block was detected, please follow our release note process to remove it.

Details

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository.

@f2c-ci-robot
Copy link

f2c-ci-robot bot commented Dec 9, 2025

[APPROVALNOTIFIER] This PR is NOT APPROVED

This pull-request has been approved by:

The full list of commands accepted by this bot can be found here.

Details Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

batch_add_document_tag(document_tag_id_map)

problem_model_list, problem_paragraph_mapping_list = (
ProblemParagraphManage(problem_paragraph_object_list, knowledge_id).to_problem_model_list()
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The provided Python code is generally well-structured and follows Django best practices. However, there are a few areas where improvements could be made:

  1. Imports: The typing.Any import can often lead to type-checking errors. It might be better to specify more precise types where possible.

  2. Method Names: Method names like convert_uuid_to_str, link_file, etc., are good for clarity but consider using underscores consistently (snake_case) instead of hyphens (-).

  3. Error Handling: Ensure that error handling is robust enough to manage any exceptions that may arise during file operations or database interactions.

  4. Database Operations: For larger datasets, ensure that transactions are used to group multiple SQL queries into one transaction block, which can improve performance and consistency.

  5. Serialization: Consider simplifying the serialization process by using Django's built-in serializers where appropriate, which can reduce boilerplate code.

  6. Functionality: Verify that all functions, particularly those related to data insertion, handle edge cases and invalid inputs gracefully.

  7. Code Comments: Add comments explaining complex logic or sections of the code, especially around multi-line comments or blocks of repeated code.

Here are some specific suggestions:

# Example refinement: Using snake_case for method names
def convert_uuid_to_str(obj):
    # Implementation...
    pass

# Adding error handling
try:
    link_file(source_file_id, document_id)
except Exception as e:
    print(f"Error linking file: {e}")

# Simplified serialization with Django SerializerUtils if applicable
from django.core.exceptions import ValidationError

def create_serializer(data):
    try:
        serializer = MyDjangoModelSerializer(data=data)
        serializer.is_valid(raise_exception=True)
        return serializer.validated_data
    except ValidationError as e:
        raise SerialzationException(str(e))

# Improved function for tag management
@transaction.atomic
def save_knowledge_tags(concerned_document_id, concerned_tags):
    old_tag_models = DocumentTag.objects.filter(document_id__in=[concerned_document_id])
    
    new_tag_list = []
    existing_tag_dict = {}

    for tag in concerned_tags:
        key_value_pair = (tag['key'], tag['value'])
        if key_value_pair not in existing_tag_dict.values():
            new_tag = DocmumentTag(document_id=concerned_document_id, **key_value_pair)
            new_tag_list.append(new_tag)
            existing_tag_dict[key_value_pair] = f"{new_tag.id}"

    if new_tag_list:
        DocumentTag.objects.bulk_create(new_tag_list)

# Refactoring batch_add_document_tag
@transaction.atomic
def batch_add_document_tag(relation_map):
    existing_relations = set(QuerySet(...).filter(...).values_list('...'))
    new_entries = []

    for doc_id in relation_map:
        for tag_id in relation_map[doc_id]:
            if (doc_id, tag_id) not in existing_relations:
                entry = {...}
                new_entries.append(entry)
   
    if new_entries:
       _queryset.objects.bulk_create(..., new_entries)

These changes make the code cleaner, more maintainable, and safer from potential runtime errors when interacting with databases or external services.

{'label': _('Real-time speech recognition - Fun-ASR/Paraformer'),
'value': 'other'}
])
api_url = forms.TextInputField(_('API URL'), required=True, relation_show_field_dict={'type': ['qwen', 'omni']})
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The code looks mostly clean, but here are some suggestions:

  1. The option_list should be indented under the type field definition.

Here's the corrected version with indentation added:

class AliyunBaiLianDefaultSTTModelCredential(BaseForm, BaseModelCredential):
    type = forms.SingleSelect(
        _("API"), 
        required=True,
        text_field='label',
        default_value='qwen', 
        provider='', method='',
        value_field='value',
        option_list=[
            {'label': _('Audio file recognition - Tongyi Qwen'),
                'value': 'qwen'},
            {'label': _('Qwen-Omni'),
                'value': 'omni'},
            {'label': _('Real-time speech recognition - Fun-ASR/Paraformer'),
                'value': 'other'}
        ]
    )
    api_url = forms.TextInputField(
        _('API URL'), 
        required=True,
        relation_show_field_dict={'type': ['qwen', 'omni']}
    )
  1. Consider renaming the api_url to something more descriptive like speech_recognition_endpoint, as it suggests that this field is related to speech recognition rather than simply an API URL.

  2. Although not strictly necessary from a technical perspective, ensuring consistent coding style (e.g., using consistent spacing around operators and closing braces) can improve readability.

These changes make the code more readable and maintainable while maintaining its functional correctness.

extra_body = {'enable_thinking': False, **self.params},
)
result = []
for chunk in completion:
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The provided code checks for certain irregularities that need addressing:

Irregularity/Issue

  1. Missing self Parameter: The extra_body parameter is missing the self. prefix before it. This will raise an AttributeError because Python needs to access class attributes using the instance self.

  •            extra_body=self.params
    
  •            extra_body=self.extra_body
    

### Optimization Suggestion (if applicable)

There doesn't appear to be any immediate optimization opportunities within this function, as it primarily involves processing audio and sending requests.

### Additional Considerations

- **Exception Handling**: Ensure there's proper error handling around HTTP requests if they might fail unexpectedly.
  
  ```python
  except Exception as e:
      print(f"An error occurred in speech_to_text: {e}")
  • Logging: Implement logging statements to track the flow of execution and errors more effectively.

    import logging
    
    logger = logging.getLogger(__name__)
    logger.setLevel(logging.INFO)
    formatter = logging.Formatter('%(asctime)s - %(levelname)s - %(message)s')
    
    handler = logging.StreamHandler()
    handler.setFormatter(formatter)
    
    logger.addHandler(handler)
    
    try:
        completion_stream = openai.Completion.create(
            ...
            additional_body={'enable_thinking': False, **self.params}
        )
    except Exception as e:
        logger.error(f"Error generating text from AI: {e}")
        return None

These changes should help ensure that the code functions correctly without syntax errors and improves robustness through some basic error handling practices.

@zhanweizhang7 zhanweizhang7 merged commit 6e87cef into v2 Dec 9, 2025
3 of 6 checks passed
@zhanweizhang7 zhanweizhang7 deleted the pr@v2@feat_support branch December 9, 2025 04:05
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants