-
Notifications
You must be signed in to change notification settings - Fork 2.6k
feat: Support tag in knowledge_workflow #4468
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
|
Adding the "do-not-merge/release-note-label-needed" label because no release-note block was detected, please follow our release note process to remove it. DetailsInstructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. |
|
[APPROVALNOTIFIER] This PR is NOT APPROVED This pull-request has been approved by: The full list of commands accepted by this bot can be found here. DetailsNeeds approval from an approver in each of these files:Approvers can indicate their approval by writing |
| batch_add_document_tag(document_tag_id_map) | ||
|
|
||
| problem_model_list, problem_paragraph_mapping_list = ( | ||
| ProblemParagraphManage(problem_paragraph_object_list, knowledge_id).to_problem_model_list() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The provided Python code is generally well-structured and follows Django best practices. However, there are a few areas where improvements could be made:
-
Imports: The
typing.Anyimport can often lead to type-checking errors. It might be better to specify more precise types where possible. -
Method Names: Method names like
convert_uuid_to_str,link_file, etc., are good for clarity but consider using underscores consistently (snake_case) instead of hyphens (-). -
Error Handling: Ensure that error handling is robust enough to manage any exceptions that may arise during file operations or database interactions.
-
Database Operations: For larger datasets, ensure that transactions are used to group multiple SQL queries into one transaction block, which can improve performance and consistency.
-
Serialization: Consider simplifying the serialization process by using Django's built-in serializers where appropriate, which can reduce boilerplate code.
-
Functionality: Verify that all functions, particularly those related to data insertion, handle edge cases and invalid inputs gracefully.
-
Code Comments: Add comments explaining complex logic or sections of the code, especially around multi-line comments or blocks of repeated code.
Here are some specific suggestions:
# Example refinement: Using snake_case for method names
def convert_uuid_to_str(obj):
# Implementation...
pass
# Adding error handling
try:
link_file(source_file_id, document_id)
except Exception as e:
print(f"Error linking file: {e}")
# Simplified serialization with Django SerializerUtils if applicable
from django.core.exceptions import ValidationError
def create_serializer(data):
try:
serializer = MyDjangoModelSerializer(data=data)
serializer.is_valid(raise_exception=True)
return serializer.validated_data
except ValidationError as e:
raise SerialzationException(str(e))
# Improved function for tag management
@transaction.atomic
def save_knowledge_tags(concerned_document_id, concerned_tags):
old_tag_models = DocumentTag.objects.filter(document_id__in=[concerned_document_id])
new_tag_list = []
existing_tag_dict = {}
for tag in concerned_tags:
key_value_pair = (tag['key'], tag['value'])
if key_value_pair not in existing_tag_dict.values():
new_tag = DocmumentTag(document_id=concerned_document_id, **key_value_pair)
new_tag_list.append(new_tag)
existing_tag_dict[key_value_pair] = f"{new_tag.id}"
if new_tag_list:
DocumentTag.objects.bulk_create(new_tag_list)
# Refactoring batch_add_document_tag
@transaction.atomic
def batch_add_document_tag(relation_map):
existing_relations = set(QuerySet(...).filter(...).values_list('...'))
new_entries = []
for doc_id in relation_map:
for tag_id in relation_map[doc_id]:
if (doc_id, tag_id) not in existing_relations:
entry = {...}
new_entries.append(entry)
if new_entries:
_queryset.objects.bulk_create(..., new_entries)These changes make the code cleaner, more maintainable, and safer from potential runtime errors when interacting with databases or external services.
| {'label': _('Real-time speech recognition - Fun-ASR/Paraformer'), | ||
| 'value': 'other'} | ||
| ]) | ||
| api_url = forms.TextInputField(_('API URL'), required=True, relation_show_field_dict={'type': ['qwen', 'omni']}) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The code looks mostly clean, but here are some suggestions:
- The
option_listshould be indented under thetypefield definition.
Here's the corrected version with indentation added:
class AliyunBaiLianDefaultSTTModelCredential(BaseForm, BaseModelCredential):
type = forms.SingleSelect(
_("API"),
required=True,
text_field='label',
default_value='qwen',
provider='', method='',
value_field='value',
option_list=[
{'label': _('Audio file recognition - Tongyi Qwen'),
'value': 'qwen'},
{'label': _('Qwen-Omni'),
'value': 'omni'},
{'label': _('Real-time speech recognition - Fun-ASR/Paraformer'),
'value': 'other'}
]
)
api_url = forms.TextInputField(
_('API URL'),
required=True,
relation_show_field_dict={'type': ['qwen', 'omni']}
)-
Consider renaming the
api_urlto something more descriptive likespeech_recognition_endpoint, as it suggests that this field is related to speech recognition rather than simply an API URL. -
Although not strictly necessary from a technical perspective, ensuring consistent coding style (e.g., using consistent spacing around operators and closing braces) can improve readability.
These changes make the code more readable and maintainable while maintaining its functional correctness.
| extra_body = {'enable_thinking': False, **self.params}, | ||
| ) | ||
| result = [] | ||
| for chunk in completion: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The provided code checks for certain irregularities that need addressing:
Irregularity/Issue
-
Missing
selfParameter: Theextra_bodyparameter is missing theself.prefix before it. This will raise an AttributeError because Python needs to access class attributes using the instanceself.
-
extra_body=self.params
-
extra_body=self.extra_body
### Optimization Suggestion (if applicable)
There doesn't appear to be any immediate optimization opportunities within this function, as it primarily involves processing audio and sending requests.
### Additional Considerations
- **Exception Handling**: Ensure there's proper error handling around HTTP requests if they might fail unexpectedly.
```python
except Exception as e:
print(f"An error occurred in speech_to_text: {e}")
-
Logging: Implement logging statements to track the flow of execution and errors more effectively.
import logging logger = logging.getLogger(__name__) logger.setLevel(logging.INFO) formatter = logging.Formatter('%(asctime)s - %(levelname)s - %(message)s') handler = logging.StreamHandler() handler.setFormatter(formatter) logger.addHandler(handler) try: completion_stream = openai.Completion.create( ... additional_body={'enable_thinking': False, **self.params} ) except Exception as e: logger.error(f"Error generating text from AI: {e}") return None
These changes should help ensure that the code functions correctly without syntax errors and improves robustness through some basic error handling practices.
feat: Support tag in knowledge_workflow