-
Notifications
You must be signed in to change notification settings - Fork 22
saga4/fix duplication suggestion issue #284
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
PR Reviewer Guide 🔍Here are some key observations to aid the review process:
|
PR Code Suggestions ✨Explore these optional code suggestions:
|
…/fix_duplication_suggestion_issue`) Here's a faster version of your program. The main bottleneck is the list comprehension calling a nested function for every line, leading to relatively high overhead. Avoiding the inner function, only iterating once, and not building an intermediate list (since we only need the count) can significantly improve performance. You do **not** need to build the list of lines; simply scan and count qualifying lines in a single pass. Optimized version. **Changes made:** - Inlined the `is_diff_line` logic for better performance (function call overhead avoided). - Used a running integer (`count`) instead of a list, so memory use and processing are reduced. - Avoided creating an unnecessary list and removed the nested function. **This is as fast and memory-efficient as this logic gets in idiomatic Python.**
| lines = diff_output.split("\n") | ||
|
|
||
| def is_diff_line(line: str) -> bool: | ||
| return line.startswith(("+", "-")) and not line.startswith(("+++", "---")) | ||
|
|
||
| diff_lines = [line for line in lines if is_diff_line(line)] | ||
| return len(diff_lines) | ||
|
|
||
|
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
⚡️Codeflash found 16% (0.16x) speedup for get_diff_lines_count in codeflash/code_utils/formatter.py
⏱️ Runtime : 2.65 milliseconds → 2.29 milliseconds (best of 263 runs)
📝 Explanation and details
Here's a faster version of your program. The main bottleneck is the list comprehension calling a nested function for every line, leading to relatively high overhead. Avoiding the inner function, only iterating once, and not building an intermediate list (since we only need the count) can significantly improve performance.
You do not need to build the list of lines; simply scan and count qualifying lines in a single pass.
Optimized version.
Changes made:
- Inlined the
is_diff_linelogic for better performance (function call overhead avoided). - Used a running integer (
count) instead of a list, so memory use and processing are reduced. - Avoided creating an unnecessary list and removed the nested function.
This is as fast and memory-efficient as this logic gets in idiomatic Python.
✅ Correctness verification report:
| Test | Status |
|---|---|
| ⚙️ Existing Unit Tests | 🔘 None Found |
| 🌀 Generated Regression Tests | ✅ 61 Passed |
| ⏪ Replay Tests | 🔘 None Found |
| 🔎 Concolic Coverage Tests | ✅ 1 Passed |
| 📊 Tests Coverage | 100.0% |
🌀 Generated Regression Tests Details
from __future__ import annotations
# imports
import pytest # used for our unit tests
from codeflash.code_utils.formatter import get_diff_lines_count
# unit tests
# ------------------------
# Basic Test Cases
# ------------------------
def test_empty_string_returns_zero():
# No lines at all
codeflash_output = get_diff_lines_count("")
def test_no_diff_lines():
# No lines starting with + or -
diff = " context line 1\n context line 2"
codeflash_output = get_diff_lines_count(diff)
def test_single_addition_line():
# One line added
diff = "+added line"
codeflash_output = get_diff_lines_count(diff)
def test_single_deletion_line():
# One line deleted
diff = "-deleted line"
codeflash_output = get_diff_lines_count(diff)
def test_mixed_add_and_delete():
# Mixed + and - lines
diff = "+added\n-context\n+another add\n-context2\n-deleted"
codeflash_output = get_diff_lines_count(diff) # 2 additions, 2 deletions
def test_ignores_diff_headers():
# Ignore lines like --- and +++
diff = "--- a/file.txt\n+++ b/file.txt\n+added\n-context\n-deleted"
codeflash_output = get_diff_lines_count(diff) # +added, -deleted
def test_only_diff_headers():
# Only diff headers, no actual changes
diff = "--- a/file.txt\n+++ b/file.txt"
codeflash_output = get_diff_lines_count(diff)
def test_context_lines_with_leading_spaces():
# Lines with spaces before + or - are not counted
diff = " +not counted\n -not counted\n+counted\n-counted"
codeflash_output = get_diff_lines_count(diff)
def test_multiple_changes_and_headers():
# Headers and multiple diff lines
diff = (
"--- a/foo.txt\n"
"+++ b/foo.txt\n"
"@@ -1,3 +1,4 @@\n"
" line 1\n"
"+added 1\n"
" line 2\n"
"-deleted 1\n"
" line 3\n"
"+added 2\n"
"-deleted 2"
)
codeflash_output = get_diff_lines_count(diff) # +added 1, -deleted 1, +added 2, -deleted 2
# ------------------------
# Edge Test Cases
# ------------------------
def test_lines_with_plus_and_minus_in_middle():
# Lines with + or - not at start should not be counted
diff = "context + not counted\ncontext - not counted"
codeflash_output = get_diff_lines_count(diff)
def test_lines_with_only_plus_and_minus():
# Lines that are just "+" or "-" are counted
diff = "+\n-\n---\n+++"
codeflash_output = get_diff_lines_count(diff)
def test_lines_with_trailing_spaces():
# Lines with trailing spaces after + or - are counted
diff = "+added \n-deleted \n context"
codeflash_output = get_diff_lines_count(diff)
def test_lines_with_multiple_plus_or_minus():
# Lines starting with ++ or -- but not +++ or ---
diff = "++double plus\n--double minus\n+++triple plus\n---triple minus"
codeflash_output = get_diff_lines_count(diff) # only ++double plus, --double minus
def test_lines_with_tabs():
# Lines with tabs after + or - are counted
diff = "+\tadded with tab\n-\tdeleted with tab"
codeflash_output = get_diff_lines_count(diff)
def test_lines_with_unicode_characters():
# Lines with unicode characters
diff = "+añadido\n-удалено\n context"
codeflash_output = get_diff_lines_count(diff)
def test_lines_with_empty_lines_between():
# Empty lines should not affect the count
diff = "+add1\n\n-add2\n\n\n+add3"
codeflash_output = get_diff_lines_count(diff)
def test_lines_with_only_newlines():
# Only newlines, no diff lines
diff = "\n\n\n"
codeflash_output = get_diff_lines_count(diff)
def test_lines_with_mixed_line_endings():
# Test with mixed \n and \r\n endings
diff = "+add1\r\n-context\r\n-add2\n+add3"
codeflash_output = get_diff_lines_count(diff)
def test_lines_with_leading_and_trailing_whitespace():
# Only lines that start with + or - (no spaces before) are counted
diff = " +not counted\n- counted\n+counted\n-counted"
codeflash_output = get_diff_lines_count(diff)
# ------------------------
# Large Scale Test Cases
# ------------------------
def test_large_number_of_additions():
# 1000 addition lines
diff = "\n".join(["+line{}".format(i) for i in range(1000)])
codeflash_output = get_diff_lines_count(diff)
def test_large_number_of_deletions():
# 1000 deletion lines
diff = "\n".join(["-line{}".format(i) for i in range(1000)])
codeflash_output = get_diff_lines_count(diff)
def test_large_mixed_diff():
# 500 additions, 500 deletions, 500 context lines, 2 headers
additions = ["+add{}".format(i) for i in range(500)]
deletions = ["-del{}".format(i) for i in range(500)]
context = [" context{}".format(i) for i in range(500)]
headers = ["--- a/file.txt", "+++ b/file.txt"]
diff = "\n".join(headers + additions + deletions + context)
codeflash_output = get_diff_lines_count(diff)
def test_large_diff_with_headers_and_noise():
# 400 additions, 400 deletions, 100 context, 100 lines starting with +++ or ---
additions = ["+add{}".format(i) for i in range(400)]
deletions = ["-del{}".format(i) for i in range(400)]
context = [" context{}".format(i) for i in range(100)]
noise = ["+++ noise{}".format(i) for i in range(50)] + ["--- noise{}".format(i) for i in range(50)]
diff = "\n".join(noise + additions + deletions + context)
codeflash_output = get_diff_lines_count(diff)
def test_large_diff_with_varied_line_types():
# 300 additions, 300 deletions, 200 context, 100 lines with leading spaces
additions = ["+add{}".format(i) for i in range(300)]
deletions = ["-del{}".format(i) for i in range(300)]
context = [" context{}".format(i) for i in range(200)]
leading_spaces = [" +notcounted{}".format(i) for i in range(50)] + [" -notcounted{}".format(i) for i in range(50)]
diff = "\n".join(additions + deletions + context + leading_spaces)
codeflash_output = get_diff_lines_count(diff)
# ------------------------
# Additional Edge Cases for Robustness
# ------------------------
def test_diff_with_only_headers_and_context():
# Only headers and context lines
diff = "--- a/file.txt\n+++ b/file.txt\n context1\n context2"
codeflash_output = get_diff_lines_count(diff)
def test_diff_with_plus_and_minus_in_middle_of_line():
# + or - in the middle should not count
diff = "context + not counted\ncontext - not counted\n+counted\n-counted"
codeflash_output = get_diff_lines_count(diff)
def test_diff_with_empty_lines_and_whitespace():
# Empty lines and whitespace-only lines
diff = "\n \n\t\n+add\n-del\n"
codeflash_output = get_diff_lines_count(diff)
def test_diff_with_non_ascii_and_binary_like_lines():
# Non-ASCII and binary-like content
diff = "+\x00\x01\n-\x02\x03\n context"
codeflash_output = get_diff_lines_count(diff)
def test_diff_with_at_at_lines():
# Lines starting with @@ should not be counted
diff = "@@ -1,3 +1,4 @@\n+add\n-del"
codeflash_output = get_diff_lines_count(diff)
def test_diff_with_tricky_headers():
# Lines like ---- or ++++ (more than 3) are not headers and should be counted
diff = "---- not header\n++++ not header\n+++ header\n--- header"
codeflash_output = get_diff_lines_count(diff) # ---- not header, ++++ not header
def test_diff_with_mixed_case_headers():
# Headers are always lowercase (per diff), so +++A is not a header
diff = "+++A\n---B\n+add\n-del"
codeflash_output = get_diff_lines_count(diff)
def test_diff_with_windows_line_endings():
# Windows CRLF endings
diff = "+add1\r\n-context\r\n-add2\r\n+add3\r\n"
codeflash_output = get_diff_lines_count(diff)
def test_diff_with_leading_newlines():
# Leading newlines before diff lines
diff = "\n\n+add\n-del"
codeflash_output = get_diff_lines_count(diff)
def test_diff_with_trailing_newlines():
# Trailing newlines after diff lines
diff = "+add\n-del\n\n\n"
codeflash_output = get_diff_lines_count(diff)
# codeflash_output is used to check that the output of the original code is the same as that of the optimized code.
from __future__ import annotations
# imports
import pytest # used for our unit tests
from codeflash.code_utils.formatter import get_diff_lines_count
# unit tests
# -------------------- BASIC TEST CASES --------------------
def test_empty_string_returns_zero():
# No diff lines in empty input
codeflash_output = get_diff_lines_count("")
def test_no_diff_lines():
# Input contains lines but none are diff lines
diff = "context line\nanother context line"
codeflash_output = get_diff_lines_count(diff)
def test_single_added_line():
# Single added line
diff = "+added line"
codeflash_output = get_diff_lines_count(diff)
def test_single_removed_line():
# Single removed line
diff = "-removed line"
codeflash_output = get_diff_lines_count(diff)
def test_mixed_diff_lines():
# Mixed added and removed lines
diff = "+added\n-removed\n context"
codeflash_output = get_diff_lines_count(diff)
def test_ignores_diff_headers():
# Should not count diff headers like '+++', '---'
diff = "--- a/file.txt\n+++ b/file.txt\n+added\n-removed\n context"
codeflash_output = get_diff_lines_count(diff)
def test_multiple_diff_lines_with_context():
# Multiple diff lines interleaved with context
diff = (
" context1\n"
"+added1\n"
" context2\n"
"-removed1\n"
"+added2\n"
" context3\n"
"-removed2"
)
codeflash_output = get_diff_lines_count(diff)
def test_diff_lines_with_leading_spaces():
# Only lines starting *exactly* with '+' or '-' count
diff = " +not counted\n-added\n -not counted\n+added"
codeflash_output = get_diff_lines_count(diff)
# -------------------- EDGE TEST CASES --------------------
def test_only_diff_headers():
# Only diff headers, should return 0
diff = "--- a/file\n+++ b/file"
codeflash_output = get_diff_lines_count(diff)
def test_lines_with_multiple_plus_minus():
# Lines starting with multiple '+' or '-' but not '+++' or '---'
diff = "++not header\n--not header\n+++ header\n--- header"
# '++not header' and '--not header' should count, '+++ header' and '--- header' should not
codeflash_output = get_diff_lines_count(diff)
def test_lines_with_only_plus_minus():
# Lines that are just '+' or '-'
diff = "+\n-\n context"
codeflash_output = get_diff_lines_count(diff)
def test_lines_with_tabs_and_spaces():
# Tabs or spaces after + or - are fine, but not before
diff = "+\tadded with tab\n- removed with space"
codeflash_output = get_diff_lines_count(diff)
def test_lines_with_unicode_characters():
# Unicode should not affect counting
diff = "+äöü\n-你好\n context"
codeflash_output = get_diff_lines_count(diff)
def test_lines_with_trailing_newlines():
# Trailing newlines should not affect count
diff = "+added\n-removed\n"
codeflash_output = get_diff_lines_count(diff)
def test_lines_with_only_newlines():
# Only newlines, no diff lines
diff = "\n\n\n"
codeflash_output = get_diff_lines_count(diff)
def test_lines_with_mixed_line_endings():
# Mixed \n and \r\n endings
diff = "+added\r\n-removed\n context\r\n"
codeflash_output = get_diff_lines_count(diff.replace('\r\n', '\n'))
def test_lines_with_just_plus_or_minus_and_spaces():
# Lines like '+ ' or '- ' should count
diff = "+ \n- \n context"
codeflash_output = get_diff_lines_count(diff)
def test_lines_with_plus_minus_in_middle():
# Only lines starting with + or - count
diff = "context +added\ncontext -removed"
codeflash_output = get_diff_lines_count(diff)
def test_lines_with_plus_minus_but_not_at_start():
# Should not count if + or - is not at the start
diff = " context+added\n context-removed"
codeflash_output = get_diff_lines_count(diff)
def test_lines_with_plus_minus_but_too_short_for_header():
# Lines like '++', '--' should count, but '+++' and '---' should not
diff = "++\n--\n+++\n---"
codeflash_output = get_diff_lines_count(diff)
def test_lines_with_plus_minus_and_numbers():
# Should count as diff lines
diff = "+1\n-2\n context"
codeflash_output = get_diff_lines_count(diff)
# -------------------- LARGE SCALE TEST CASES --------------------
def test_large_number_of_added_lines():
# 1000 added lines
diff = "\n".join(["+line{}".format(i) for i in range(1000)])
codeflash_output = get_diff_lines_count(diff)
def test_large_number_of_removed_lines():
# 1000 removed lines
diff = "\n".join(["-line{}".format(i) for i in range(1000)])
codeflash_output = get_diff_lines_count(diff)
def test_large_mixed_diff_and_context():
# 500 added, 500 removed, 500 context lines
added = ["+a{}".format(i) for i in range(500)]
removed = ["-r{}".format(i) for i in range(500)]
context = [" context{}".format(i) for i in range(500)]
# Interleave them
lines = []
for i in range(500):
lines.append(added[i])
lines.append(removed[i])
lines.append(context[i])
diff = "\n".join(lines)
codeflash_output = get_diff_lines_count(diff)
def test_large_with_headers_and_diff_lines():
# 100 diff headers, 800 diff lines, 100 context lines
headers = ["--- a/file{}".format(i) for i in range(50)] + ["+++ b/file{}".format(i) for i in range(50)]
diff_lines = ["+add{}".format(i) for i in range(400)] + ["-rem{}".format(i) for i in range(400)]
context = [" context{}".format(i) for i in range(100)]
lines = headers + diff_lines + context
diff = "\n".join(lines)
codeflash_output = get_diff_lines_count(diff)
def test_large_all_headers():
# 1000 header lines, should return 0
headers = ["+++ file{}".format(i) for i in range(500)] + ["--- file{}".format(i) for i in range(500)]
diff = "\n".join(headers)
codeflash_output = get_diff_lines_count(diff)
def test_large_randomized_diff_lines():
# Mix of diff, context, and header lines
lines = []
for i in range(333):
lines.append("+add{}".format(i))
lines.append(" context{}".format(i))
lines.append("--- file{}".format(i))
lines.append("-rem{}".format(i))
lines.append("+++ file{}".format(i))
diff = "\n".join(lines)
# Only +add and -rem lines count, so 333*2 = 666
codeflash_output = get_diff_lines_count(diff)
# codeflash_output is used to check that the output of the original code is the same as that of the optimized code.
from codeflash.code_utils.formatter import get_diff_lines_count
def test_get_diff_lines_count():
get_diff_lines_count('')To test or edit this optimization locally git merge codeflash/optimize-pr284-2025-06-04T20.10.23
| lines = diff_output.split("\n") | |
| def is_diff_line(line: str) -> bool: | |
| return line.startswith(("+", "-")) and not line.startswith(("+++", "---")) | |
| diff_lines = [line for line in lines if is_diff_line(line)] | |
| return len(diff_lines) | |
| # Count only the diff lines as per is_diff_line logic, but avoid unnecessary function call and list allocation. | |
| count = 0 | |
| for line in diff_output.split("\n"): | |
| if line and line[0] in "+-" and not (line.startswith("+++") or line.startswith("---")): | |
| count += 1 | |
| return count | |
User description
PR Type
Enhancement, Tests
Description
Add Black dev dependency and safe formatting checks
Add sample files for formatting scenarios (none/few/many)
Add tests for skip/apply formatting scenarios
Improve function blocklist skipping logic
Changes walkthrough 📝
4 files
Add sample file with few formatting errorsAdd sample file with many formatting errorsAdd sample file with no formatting errorsAdd tests for formatting diff scenarios1 files
Add diff-based safe formatting checks1 files
Improve blocklist skipping logic1 files
Add Black as development dependency