fix: The image uploaded from the workflow knowledge base zip file cannot be parsed by shaohuzhang1 · Pull Request #4505 · 1Panel-dev/MaxKB

shaohuzhang1 · 2025-12-12T06:01:09Z

fix: The image uploaded from the workflow knowledge base zip file cannot be parsed

…not be parsed

f2c-ci-robot · 2025-12-12T06:01:12Z

Adding the "do-not-merge/release-note-label-needed" label because no release-note block was detected, please follow our release note process to remove it.

Details

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository.

f2c-ci-robot · 2025-12-12T06:01:18Z

[APPROVALNOTIFIER] This PR is NOT APPROVED

This pull-request has been approved by:

The full list of commands accepted by this bot can be found here.

Details

Needs approval from an approver in each of these files:

OWNERS

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

shaohuzhang1 · 2025-12-12T06:01:32Z

apps/common/handle/impl/text/zip_split_handle.py

+                                                                 files)
                content_parts.append(content)
                for image in _image_list:
                    image_list.append(image)


The provided code snippet looks well-structured but contains a few potential issues and areas for optimization:

Issues:

Redundant split_handle.support(zf, get_buffer) Check: The call to if split_handle.support(zf, get_buffer): is redundant since it has already been checked in the previous line (for split_handle in split_handles:).

Variable Shadowing: There is variable shadowing with get_buffer. This can cause confusion and bugs if not handled carefully.

Optimization Suggestions:

Avoid Redundant Calls: Remove the unnecessary assignment of zf.name = real_name before entering the loop.

Use Proper Variable Names: Refactor variable names like _file_list, _content_parts, and _image_list to be more descriptive and avoid shadows.

Here's an optimized version of the code with these improvements:

def get_content(self, file, save_image): file_content_list = [] # Prepare a simple get_buffer callback, returning current raw get_buffer = FileBufferHandle().get_buffer for zf in zip_files: try: real_name = get_file_name(zf.name) except Exception: real_name = zf.name for split_handle in split_handles: if split_handle.support(zf, get_buffer): row = get_buffer(zf) md_text = split_handle.get_content(io.BytesIO(row), save_image) file_content_list.append({'content': md_text, 'name': real_name}) break content_parts = [] image_list = [] for file_content in file_content_list: _image_list, content = get_image_list_by_content(file_content['name'], file_content.get("content"), files) content_parts.append(content) for image in _image_list: image_list.append(image)

This version ensures clarity by avoiding redundancy and improving readability through proper formatting and naming conventions.

shaohuzhang1 · 2025-12-12T06:01:47Z

apps/common/handle/impl/text/text_split_handle.py

+        buffer = get_buffer(file)
        result = detect(buffer)
        if result['encoding'] is not None and result['confidence'] is not None and result['encoding'] != 'ascii' and \
                result['confidence'] > 0.5:


In the provided code, there are several potential issues that need attention:

Magic Number Usage: The line if file_name.endswith(".md") directly compares strings using .lower(), which doesn't prevent accidental modification of file extensions on case-insensitive filesystems.

Unfiltered File Extension Check: Adding '.TXT' and 'MD' duplicates cases with existing checks (e.g., 'md'). This could lead to redundancy.

Buffer Loading Logic Issues:

Potential Buffer Leak: The logic for loading the buffer (get_buffer(file)) should be placed inside a block or method scope where it's actually used.

Buffer Overwriting: After assigning the result of detect(buffer), you overwrite buffer. Depending on what detect(buffer) returns, this might not be intentional.

Encoding Detection: There's no further processing done after checking encoding confidence. This implies that files without good encodings will pass through without further scrutiny.

Return Type: A more detailed error message or handling would be beneficial when returning False.

File Name Index Verification: Replacing only part of the filename string (file_name.index('.') > 0) can mask other issues related to different file formats or content.

Possible Improvements:

Simplify Encoding Checks:

supported_encodings = set(['utf-8', 'latin1', 'unicode_escape']) # Example supported encodings if result['encoding'] and result['encoding'].lower() in supported_encodings and result['confidence'] > 0.5: return True

Move Buffer Loading Inside Use Case:

class TextSplitHandle(BaseSplitHandle): def support(self, file, get_buffer): buffer = get_buffer(file) file_name: str = file.name.lower() if file_name.endswith(('.md', '.txt')): return True if file_name.startswith('.'): # Skip hidden files return False detected_encoding = detect(buffer) if detected_encoding and detected_encoding.lower() in supported_encodings and detected_encoding['confidence'] > 0.5: return True raise ValueError("Unsupported file format or encoding")

These changes improve robustness and readability while addressing some common coding issues.

…not be parsed (#4505)

fix: The image uploaded from the workflow knowledge base zip file can…

7343c39

…not be parsed

f2c-ci-robot bot added the do-not-merge/release-note-label-needed label Dec 12, 2025

shaohuzhang1 merged commit 621fbd3 into v2 Dec 12, 2025
3 of 5 checks passed

shaohuzhang1 deleted the pr@v2@fiw_workflow branch December 12, 2025 06:01

shaohuzhang1 commented Dec 12, 2025

View reviewed changes

liuruibin pushed a commit that referenced this pull request Dec 12, 2025

fix: The image uploaded from the workflow knowledge base zip file can…

47cea10

…not be parsed (#4505)

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Comments

fix: The image uploaded from the workflow knowledge base zip file cannot be parsed#4505

fix: The image uploaded from the workflow knowledge base zip file cannot be parsed#4505
shaohuzhang1 merged 1 commit intov2from
pr@v2@fiw_workflow

shaohuzhang1 commented Dec 12, 2025

Uh oh!

f2c-ci-robot bot commented Dec 12, 2025

Uh oh!

f2c-ci-robot bot commented Dec 12, 2025

Uh oh!

Uh oh!

shaohuzhang1 Dec 12, 2025

Uh oh!

shaohuzhang1 Dec 12, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Comments

Conversation

shaohuzhang1 commented Dec 12, 2025

Uh oh!

f2c-ci-robot bot commented Dec 12, 2025

Uh oh!

f2c-ci-robot bot commented Dec 12, 2025

Uh oh!

Uh oh!

shaohuzhang1 Dec 12, 2025

Choose a reason for hiding this comment

Issues:

Optimization Suggestions:

Uh oh!

shaohuzhang1 Dec 12, 2025

Choose a reason for hiding this comment

Possible Improvements:

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant