Skip to content

Conversation

@shaohuzhang1
Copy link
Contributor

fix: The image uploaded from the workflow knowledge base zip file cannot be parsed

@f2c-ci-robot
Copy link

f2c-ci-robot bot commented Dec 12, 2025

Adding the "do-not-merge/release-note-label-needed" label because no release-note block was detected, please follow our release note process to remove it.

Details

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository.

@f2c-ci-robot
Copy link

f2c-ci-robot bot commented Dec 12, 2025

[APPROVALNOTIFIER] This PR is NOT APPROVED

This pull-request has been approved by:

The full list of commands accepted by this bot can be found here.

Details Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@shaohuzhang1 shaohuzhang1 merged commit 621fbd3 into v2 Dec 12, 2025
3 of 5 checks passed
@shaohuzhang1 shaohuzhang1 deleted the pr@v2@fiw_workflow branch December 12, 2025 06:01
files)
content_parts.append(content)
for image in _image_list:
image_list.append(image)
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The provided code snippet looks well-structured but contains a few potential issues and areas for optimization:

Issues:

  1. Redundant split_handle.support(zf, get_buffer) Check: The call to if split_handle.support(zf, get_buffer): is redundant since it has already been checked in the previous line (for split_handle in split_handles:).
  2. Variable Shadowing: There is variable shadowing with get_buffer. This can cause confusion and bugs if not handled carefully.

Optimization Suggestions:

  1. Avoid Redundant Calls: Remove the unnecessary assignment of zf.name = real_name before entering the loop.
  2. Use Proper Variable Names: Refactor variable names like _file_list, _content_parts, and _image_list to be more descriptive and avoid shadows.

Here's an optimized version of the code with these improvements:

def get_content(self, file, save_image):
    file_content_list = []
    
    # Prepare a simple get_buffer callback, returning current raw
    get_buffer = FileBufferHandle().get_buffer
    
    for zf in zip_files:
        try:
            real_name = get_file_name(zf.name)
        except Exception:
            real_name = zf.name

        for split_handle in split_handles:
            if split_handle.support(zf, get_buffer):
                row = get_buffer(zf)
                md_text = split_handle.get_content(io.BytesIO(row), save_image)
                file_content_list.append({'content': md_text, 'name': real_name})
                break
    
    content_parts = []
    image_list = []
    
    for file_content in file_content_list:
        _image_list, content = get_image_list_by_content(file_content['name'], file_content.get("content"), files)
        content_parts.append(content)
        for image in _image_list:
            image_list.append(image)

This version ensures clarity by avoiding redundancy and improving readability through proper formatting and naming conventions.

buffer = get_buffer(file)
result = detect(buffer)
if result['encoding'] is not None and result['confidence'] is not None and result['encoding'] != 'ascii' and \
result['confidence'] > 0.5:
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In the provided code, there are several potential issues that need attention:

  1. Magic Number Usage: The line if file_name.endswith(".md") directly compares strings using .lower(), which doesn't prevent accidental modification of file extensions on case-insensitive filesystems.

  2. Unfiltered File Extension Check: Adding '.TXT' and 'MD' duplicates cases with existing checks (e.g., 'md'). This could lead to redundancy.

  3. Buffer Loading Logic Issues:

    • Potential Buffer Leak: The logic for loading the buffer (get_buffer(file)) should be placed inside a block or method scope where it's actually used.
    • Buffer Overwriting: After assigning the result of detect(buffer), you overwrite buffer. Depending on what detect(buffer) returns, this might not be intentional.
  4. Encoding Detection: There's no further processing done after checking encoding confidence. This implies that files without good encodings will pass through without further scrutiny.

  5. Return Type: A more detailed error message or handling would be beneficial when returning False.

  6. File Name Index Verification: Replacing only part of the filename string (file_name.index('.') > 0) can mask other issues related to different file formats or content.

Possible Improvements:

  • Simplify Encoding Checks:

    supported_encodings = set(['utf-8', 'latin1', 'unicode_escape'])  # Example supported encodings
    
    if result['encoding'] and result['encoding'].lower() in supported_encodings and result['confidence'] > 0.5:
        return True
  • Move Buffer Loading Inside Use Case:

    class TextSplitHandle(BaseSplitHandle):
        def support(self, file, get_buffer):
            buffer = get_buffer(file)
            file_name: str = file.name.lower()
    
            if file_name.endswith(('.md', '.txt')):
                return True
    
            if file_name.startswith('.'):
                # Skip hidden files
                return False
    
            detected_encoding = detect(buffer)
    
            if detected_encoding and detected_encoding.lower() in supported_encodings and detected_encoding['confidence'] > 0.5:
                return True
    
            raise ValueError("Unsupported file format or encoding")

These changes improve robustness and readability while addressing some common coding issues.

liuruibin pushed a commit that referenced this pull request Dec 12, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants