-
Notifications
You must be signed in to change notification settings - Fork 2.6k
fix: The image uploaded from the workflow knowledge base zip file cannot be parsed #4505
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Changes from all commits
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
| Original file line number | Diff line number | Diff line change |
|---|---|---|
|
|
@@ -216,20 +216,18 @@ def get_content(self, file, save_image): | |
| real_name = get_file_name(zf.name) | ||
| except Exception: | ||
| real_name = zf.name | ||
|
|
||
| # 为 split_handle 提供可重复读取的 file-like 对象 | ||
| zf.name = real_name | ||
| get_buffer = FileBufferHandle().get_buffer | ||
| for split_handle in split_handles: | ||
| # 准备一个简单的 get_buffer 回调,返回当前 raw | ||
| get_buffer = FileBufferHandle().get_buffer | ||
| if split_handle.support(zf, get_buffer): | ||
| row = get_buffer(zf) | ||
| md_text = split_handle.get_content(io.BytesIO(row), save_image) | ||
| file_content_list.append({'content': md_text, 'name': real_name}) | ||
| break | ||
| for file_content in file_content_list: | ||
| _image_list, content = get_image_list_by_content(file_content.get('name'), file_content.get("content"), | ||
| files) | ||
| files) | ||
| content_parts.append(content) | ||
| for image in _image_list: | ||
| image_list.append(image) | ||
|
Contributor
Author
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. The provided code snippet looks well-structured but contains a few potential issues and areas for optimization: Issues:
Optimization Suggestions:
Here's an optimized version of the code with these improvements: def get_content(self, file, save_image):
file_content_list = []
# Prepare a simple get_buffer callback, returning current raw
get_buffer = FileBufferHandle().get_buffer
for zf in zip_files:
try:
real_name = get_file_name(zf.name)
except Exception:
real_name = zf.name
for split_handle in split_handles:
if split_handle.support(zf, get_buffer):
row = get_buffer(zf)
md_text = split_handle.get_content(io.BytesIO(row), save_image)
file_content_list.append({'content': md_text, 'name': real_name})
break
content_parts = []
image_list = []
for file_content in file_content_list:
_image_list, content = get_image_list_by_content(file_content['name'], file_content.get("content"), files)
content_parts.append(content)
for image in _image_list:
image_list.append(image)This version ensures clarity by avoiding redundancy and improving readability through proper formatting and naming conventions. |
||
|
|
||
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In the provided code, there are several potential issues that need attention:
Magic Number Usage: The line
if file_name.endswith(".md")directly compares strings using.lower(), which doesn't prevent accidental modification of file extensions on case-insensitive filesystems.Unfiltered File Extension Check: Adding
'.TXT'and'MD'duplicates cases with existing checks (e.g.,'md'). This could lead to redundancy.Buffer Loading Logic Issues:
get_buffer(file)) should be placed inside a block or method scope where it's actually used.detect(buffer), you overwritebuffer. Depending on whatdetect(buffer)returns, this might not be intentional.Encoding Detection: There's no further processing done after checking encoding confidence. This implies that files without good encodings will pass through without further scrutiny.
Return Type: A more detailed error message or handling would be beneficial when returning
False.File Name Index Verification: Replacing only part of the filename string (
file_name.index('.') > 0) can mask other issues related to different file formats or content.Possible Improvements:
Simplify Encoding Checks:
Move Buffer Loading Inside Use Case:
These changes improve robustness and readability while addressing some common coding issues.