-
Notifications
You must be signed in to change notification settings - Fork 0
fix: [xxx](xxx) render as xxx](xxx) #81
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: greptile_combined-20260114-qodo-grep-copilot_base_fix_xxxxxx_render_as_xxxxxx_pr275
Are you sure you want to change the base?
Conversation
Greptile SummaryThis PR fixes markdown link rendering by implementing a placeholder-based approach to preserve markdown links and images during URL removal. The fix prevents markdown links like Key changes:
Critical issues found:
Confidence Score: 2/5
Important Files Changed
Sequence DiagramsequenceDiagram
participant User
participant CleanProcessor
participant RegexEngine
User->>CleanProcessor: clean(text, process_rule)
Note over CleanProcessor: Apply default cleaning
CleanProcessor->>RegexEngine: Remove <| and |>
CleanProcessor->>RegexEngine: Remove control characters
alt remove_urls_emails enabled
Note over CleanProcessor: Protect markdown links/images
CleanProcessor->>RegexEngine: Match markdown links [text](url)
RegexEngine-->>CleanProcessor: Replace with placeholders
CleanProcessor->>RegexEngine: Match markdown images 
RegexEngine-->>CleanProcessor: Replace with placeholders
Note over CleanProcessor: Remove emails and URLs
CleanProcessor->>RegexEngine: Remove email addresses
CleanProcessor->>RegexEngine: Remove remaining URLs
Note over CleanProcessor: Restore markdown
CleanProcessor->>CleanProcessor: Restore links from placeholders
CleanProcessor->>CleanProcessor: Restore images from placeholders
end
alt remove_extra_spaces enabled
CleanProcessor->>RegexEngine: Reduce multiple newlines to 2
CleanProcessor->>RegexEngine: Reduce multiple spaces to 1
end
CleanProcessor-->>User: Return cleaned text
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
5 files reviewed, 3 comments
| # Replace the ENTIRE markdown link/image with a single placeholder to protect | ||
| # the link text (which might also be a URL) from being removed | ||
| markdown_link_pattern = r"\[([^\]]*)\]\((https?://[^)]+)\)" | ||
| markdown_image_pattern = r"!\[.*?\]\((https?://[^)]+)\)" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
syntax: pattern missing capture group for alt text - line 51 calls match.group(2) but this pattern only has one capture group (the URL)
| markdown_image_pattern = r"!\[.*?\]\((https?://[^)]+)\)" | |
| markdown_image_pattern = r"!\[(.*?)\]\((https?://[^)]+)\)" |
Prompt To Fix With AI
This is a comment left during a code review.
Path: api/core/rag/cleaner/clean_processor.py
Line: 38:38
Comment:
**syntax:** pattern missing capture group for alt text - line 51 calls `match.group(2)` but this pattern only has one capture group (the URL)
```suggestion
markdown_image_pattern = r"!\[(.*?)\]\((https?://[^)]+)\)"
```
How can I resolve this? If you propose a fix, please make it concise.| def replace_image_with_placeholder(match, placeholders=placeholders): | ||
| link_type = "image" | ||
| url = match.group(2) | ||
| placeholder = f"__MARKDOWN_PLACEHOLDER_{len(placeholders)}__" | ||
| placeholders.append((link_type, "image", url)) | ||
| return placeholder |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
logic: hardcoding "image" as alt text instead of preserving the original - should use match.group(1) after fixing the pattern
| def replace_image_with_placeholder(match, placeholders=placeholders): | |
| link_type = "image" | |
| url = match.group(2) | |
| placeholder = f"__MARKDOWN_PLACEHOLDER_{len(placeholders)}__" | |
| placeholders.append((link_type, "image", url)) | |
| return placeholder | |
| def replace_image_with_placeholder(match, placeholders=placeholders): | |
| link_type = "image" | |
| alt_text = match.group(1) | |
| url = match.group(2) | |
| placeholder = f"__MARKDOWN_PLACEHOLDER_{len(placeholders)}__" | |
| placeholders.append((link_type, alt_text, url)) | |
| return placeholder |
Prompt To Fix With AI
This is a comment left during a code review.
Path: api/core/rag/cleaner/clean_processor.py
Line: 49:54
Comment:
**logic:** hardcoding "image" as alt text instead of preserving the original - should use `match.group(1)` after fixing the pattern
```suggestion
def replace_image_with_placeholder(match, placeholders=placeholders):
link_type = "image"
alt_text = match.group(1)
url = match.group(2)
placeholder = f"__MARKDOWN_PLACEHOLDER_{len(placeholders)}__"
placeholders.append((link_type, alt_text, url))
return placeholder
```
How can I resolve this? If you propose a fix, please make it concise.| def __init__(self, default_rules: dict | None = None): | ||
| """Initialize CleanProcessor with optional default rules.""" | ||
| self.default_rules = default_rules or {} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
style: __init__ added but default_rules is never used - the clean method is a classmethod that doesn't access instance attributes
Prompt To Fix With AI
This is a comment left during a code review.
Path: api/core/rag/cleaner/clean_processor.py
Line: 5:7
Comment:
**style:** `__init__` added but `default_rules` is never used - the `clean` method is a classmethod that doesn't access instance attributes
How can I resolve this? If you propose a fix, please make it concise.
Benchmark PR from qodo-benchmark#275