-
Notifications
You must be signed in to change notification settings - Fork 2.6k
fix: Filter special character #4500
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Changes from all commits
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
| Original file line number | Diff line number | Diff line change |
|---|---|---|
|
|
@@ -340,3 +340,13 @@ def generate_uuid(tag: str): | |
|
|
||
| def filter_workspace(query_list): | ||
| return [q for q in query_list if q.name != "workspace_id"] | ||
|
|
||
|
|
||
| def filter_special_character(_str): | ||
| """ | ||
| 过滤特殊字符 | ||
| """ | ||
| s_list = ["\\u0000"] | ||
| for t in s_list: | ||
| _str = _str.replace(t, '') | ||
| return _str | ||
|
Contributor
Author
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. The code is mostly clean but has a couple of minor improvements and corrections:
Here's the revised version of your code: # Define the main UUID generation function
def generate_uuid(tag: str):
... # Implement your logic here
# Definition for filtering specific workspaces by their unique identifiers
def filter_workspace(query_list):
"""
Filter out all query items with 'workspace_id' as their name.
Parameters:
- query_list: A list of objects with a 'name' attribute.
Returns:
- A new list of query objects excluding those named 'workspace_id'.
"""
return [q for q in query_list if q.name != "workspace_id"]
# Example usage: filter Workspace IDs from a list of objects
filtered_queries = filter_workspace(your_query_list_here)
# Helper function to remove special characters (_str), specifically "\\u0000"
def filter_special_character(_str):
"""
Remove all occurrences of '\\u0000' special character from the string `_str`.
Parameters:
- _str: Input string to be processed.
Returns:
- A new string without '\\u0000' special character.
"""
s_list = ["\\u0000"]
for t in s_list:
_str = _str.replace(t, '')
return _str
... # Rest of the code remains unchanged...These suggestions should help make the code clearer and potentially a little faster depending on its use case. |
||
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The provided code appears to be a part of a flow processing system that deals with updating and managing knowledge records within a database. Below are some points of attention and potential improvements:
Imports: The imports should consider Pythonic practices such as
typingmodule for type hints.Method Names and Comments: Method names are slightly awkward and could benefit from more descriptive comments or renaming.
String Formatting: Use f-strings or string formatting methods instead of concatenation (
+) which are generally considered more readable and safer.Code Readability: Functions like
batch_add_document_tag,save_knowledge_tags, etc., are nested deeply into each other. Consider refactoring these operations into individual functions.Bulk Create Operations: Although these work correctly, it may make sense to add validation before calling
bulk_create.Special Character Filtering: The filtering operation on chunk content seems necessary but should also ensure that the filtered version is suitable for further processing (e.g., no special characters causing issues).
Here's an improved version based on these considerations:
This version adds static method
_validate_contentfor consistent handling of string validations, improves readability through clearer function definitions, uses f-strings where appropriate, and provides initial scaffolding for saving new tags and mapping them with documents.