Codeflash demo 15 #827

misrasaurabh1 · 2025-10-16T18:36:33Z

PR Type

Enhancement, Tests

Description

Introduce common tags utility
Add unit tests for utility
Use configurable base URL for staging

Diagram Walkthrough

flowchart LR
  common["Add common tags function"] -- used by --> tests["Unit tests for common tags"]
  optimizer["Function optimizer"] -- use CFAPI base URL --> staging["Staging link generation"]

File Walkthrough

Relevant files

Enhancement

function_optimizer.py `Use CFAPI_BASE_URL for staging link` codeflash/optimization/function_optimizer.py Import `CFAPI_BASE_URL` from `cfapi`. Build staging URL using `CFAPI_BASE_URL` instead of hardcoded domain.	+2/-2
common_tags.py `Add common tags computation utility` codeflash/result/common_tags.py Add `find_common_tags` function. Computes intersection of tags across article dicts. Handles empty input by returning empty set.	+11/-0

Tests

test_common_tags.py `Add tests for common tags utility` tests/test_common_tags.py Add tests for `find_common_tags`. Verify common tags for multiple article lists.	+22/-0

Signed-off-by: Saurabh Misra <[email protected]>

github-actions · 2025-10-16T18:37:30Z

PR Reviewer Guide 🔍

Here are some key observations to aid the review process:

⏱️ Estimated effort to review: 2 🔵🔵⚪⚪⚪
🧪 PR contains tests
🔒 No security concerns identified
⚡ Recommended focus areas for review Algorithm Choice The current approach does repeated list filtering resulting in O(nmk) behavior and loses tag multiplicity. Consider using set intersection across articles' tags for clarity and performance, e.g., initialize with first set and intersect iteratively. def find_common_tags(articles: list[dict[str, list[str]]]) -> set[str]: if not articles: return set() common_tags = articles[0].get("tags", []) for article in articles[1:]: common_tags = [tag for tag in common_tags if tag in article.get("tags", [])] return set(common_tags) URL Construction The f-string mixes slicing and conditional concatenation; verify that the constructed path is correct for both experiment and non-experiment cases and that CFAPI_BASE_URL already excludes trailing slash to avoid double slashes. staging_url = f"{CFAPI_BASE_URL}/review-optimizations/{self.function_trace_id[:-4] + exp_type if self.experiment_id else self.function_trace_id}" console.print( Panel( f"[bold green]✅ Staging created:[/bold green]\n[link={staging_url}]{staging_url}[/link]",

github-actions · 2025-10-16T18:37:44Z

PR Code Suggestions ✨

Explore these optional code suggestions:

Category	Suggestion	Impact
General	Use set intersection for tags Avoid quadratic list filtering by using set intersections, which is both clearer and faster. This also normalizes to sets early and handles duplicates naturally. codeflash/result/common_tags.py [8-11] -common_tags = articles[0].get("tags", []) +common_tags = set(articles[0].get("tags", [])) for article in articles[1:]: - common_tags = [tag for tag in common_tags if tag in article.get("tags", [])] -return set(common_tags) + common_tags &= set(article.get("tags", [])) +return common_tags Suggestion importance[1-10]: 7 __ Why: Replacing list filtering with set intersections is correct here and improves performance and clarity, while preserving behavior and matching the tests; it's a solid maintainability/performance improvement but not critical.	Medium
Possible issue	Normalize base URL before join Ensure `CFAPI_BASE_URL` has no trailing slash to prevent double slashes in `staging_url`. Normalize the base URL at use to build a correct path reliably. codeflash/optimization/function_optimizer.py [1478] -staging_url = f"{CFAPI_BASE_URL}/review-optimizations/{self.function_trace_id[:-4] + exp_type if self.experiment_id else self.function_trace_id}" +base_url = CFAPI_BASE_URL.rstrip("/") +staging_url = f"{base_url}/review-optimizations/{self.function_trace_id[:-4] + exp_type if self.experiment_id else self.function_trace_id}" Suggestion importance[1-10]: 6 __ Why: The suggestion is accurate and guards against double slashes if `CFAPI_BASE_URL` includes a trailing slash; it's a minor robustness enhancement with low risk and moderate impact.	Low

codeflash-ai · 2025-10-16T18:41:51Z

codeflash/result/common_tags.py

+    common_tags = articles[0].get("tags", [])
+    for article in articles[1:]:
+        common_tags = [tag for tag in common_tags if tag in article.get("tags", [])]
+    return set(common_tags)


⚡️Codeflash found 7,954% (79.54x) speedup for find_common_tags in codeflash/result/common_tags.py

⏱️ Runtime : 577 milliseconds → 7.16 milliseconds (best of 93 runs)

📝 Explanation and details

The optimized version achieves a 79x speedup by making three key changes:

1. Using sets instead of list comprehensions: The original code used [tag for tag in common_tags if tag in article.get("tags", [])] which has O(n×m) complexity for each iteration (checking if each tag exists in the article's tag list). The optimized version uses set.intersection_update() which is O(min(n,m)) - significantly faster for set operations.

2. Early termination: Added if not common_tags: break to exit the loop as soon as no common tags remain. This prevents unnecessary processing of remaining articles when the result is already determined to be empty.

3. Eliminating final set conversion: The original code maintained a list and converted to a set at the end with return set(common_tags). The optimized version works directly with sets throughout, avoiding the conversion overhead.

The performance gains are most dramatic for large datasets - the line profiler shows the bottleneck line (list comprehension) went from 99.6% of execution time to being eliminated entirely. Test cases with large tag sets see improvements of 5400%+ (like test_large_number_of_tags) and 11000%+ (like large-scale tests), while smaller datasets still benefit with 15-50% improvements. The early termination is particularly effective when articles have no common tags, as seen in the "no common tags" test cases showing 25%+ speedups.

✅ Correctness verification report:

Test Status

⚙️ Existing Unit Tests ✅ 2 Passed

🌀 Generated Regression Tests ✅ 29 Passed

⏪ Replay Tests 🔘 None Found

🔎 Concolic Coverage Tests ✅ 2 Passed

📊 Tests Coverage 100.0%

⚙️ Existing Unit Tests and Runtime

Test File::Test Function Original ⏱️ Optimized ⏱️ Speedup

test_common_tags.py::test_common_tags_1 5.68μs 4.23μs 34.3%✅

🌀 Generated Regression Tests and Runtime

# imports # function to test from __future__ import annotations import pytest # used for our unit tests from codeflash.result.common_tags import find_common_tags # unit tests def test_single_article(): # Single article should return its tags articles = [{"tags": ["python", "coding", "tutorial"]}] codeflash_output = find_common_tags(articles) # 1.63μs -> 1.43μs (14.0% faster) # Outputs were verified to be equal to the original implementation def test_multiple_articles_with_common_tags(): # Multiple articles with common tags should return the common tags articles = [ {"tags": ["python", "coding"]}, {"tags": ["python", "data"]}, {"tags": ["python", "machine learning"]} ] codeflash_output = find_common_tags(articles) # 2.83μs -> 2.46μs (15.5% faster) # Outputs were verified to be equal to the original implementation def test_empty_list_of_articles(): # Empty list of articles should return an empty set articles = [] codeflash_output = find_common_tags(articles) # 601ns -> 491ns (22.4% faster) # Outputs were verified to be equal to the original implementation def test_articles_with_no_common_tags(): # Articles with no common tags should return an empty set articles = [ {"tags": ["python"]}, {"tags": ["java"]}, {"tags": ["c++"]} ] codeflash_output = find_common_tags(articles) # 2.37μs -> 1.89μs (25.4% faster) # Outputs were verified to be equal to the original implementation def test_articles_with_empty_tag_lists(): # Articles with some empty tag lists should return an empty set articles = [ {"tags": []}, {"tags": ["python"]}, {"tags": ["python", "java"]} ] codeflash_output = find_common_tags(articles) # 1.97μs -> 1.67μs (17.9% faster) # Outputs were verified to be equal to the original implementation def test_all_articles_with_empty_tag_lists(): # All articles with empty tag lists should return an empty set articles = [ {"tags": []}, {"tags": []}, {"tags": []} ] codeflash_output = find_common_tags(articles) # 1.92μs -> 1.59μs (20.7% faster) # Outputs were verified to be equal to the original implementation def test_tags_with_special_characters(): # Tags with special characters should be handled correctly articles = [ {"tags": ["python!", "coding"]}, {"tags": ["python!", "data"]} ] codeflash_output = find_common_tags(articles) # 2.23μs -> 2.06μs (8.28% faster) # Outputs were verified to be equal to the original implementation def test_case_sensitivity(): # Tags with different cases should not be considered the same articles = [ {"tags": ["Python", "coding"]}, {"tags": ["python", "data"]} ] codeflash_output = find_common_tags(articles) # 2.07μs -> 1.87μs (10.7% faster) # Outputs were verified to be equal to the original implementation def test_large_number_of_articles(): # Large number of articles with a common tag should return that tag articles = [{"tags": ["common_tag", f"tag{i}"]} for i in range(1000)] codeflash_output = find_common_tags(articles) # 229μs -> 154μs (48.2% faster) # Outputs were verified to be equal to the original implementation def test_large_number_of_tags(): # Large number of tags with some common tags should return the common tags articles = [ {"tags": [f"tag{i}" for i in range(1000)]}, {"tags": [f"tag{i}" for i in range(500, 1500)]} ] expected = {f"tag{i}" for i in range(500, 1000)} codeflash_output = find_common_tags(articles) # 4.38ms -> 78.6μs (5474% faster) # Outputs were verified to be equal to the original implementation def test_mixed_length_of_tag_lists(): # Articles with mixed length of tag lists should return the common tags articles = [ {"tags": ["python", "coding"]}, {"tags": ["python"]}, {"tags": ["python", "coding", "tutorial"]} ] codeflash_output = find_common_tags(articles) # 2.65μs -> 2.22μs (19.4% faster) # Outputs were verified to be equal to the original implementation def test_tags_with_different_data_types(): # Tags with different data types should only consider strings articles = [ {"tags": ["python", 123]}, {"tags": ["python", "123"]} ] codeflash_output = find_common_tags(articles) # 2.23μs -> 1.97μs (13.2% faster) # Outputs were verified to be equal to the original implementation def test_performance_with_large_data(): # Performance with large data should return the common tag articles = [{"tags": ["common_tag", f"tag{i}"]} for i in range(10000)] codeflash_output = find_common_tags(articles) # 2.27ms -> 1.52ms (48.8% faster) # Outputs were verified to be equal to the original implementation def test_scalability_with_increasing_tags(): # Scalability with increasing tags should return the common tag articles = [{"tags": ["common_tag"] + [f"tag{i}" for i in range(j)]} for j in range(1, 1001)] codeflash_output = find_common_tags(articles) # 392μs -> 259μs (51.2% faster) # Outputs were verified to be equal to the original implementation #------------------------------------------------ # imports # function to test from __future__ import annotations import pytest # used for our unit tests from codeflash.result.common_tags import find_common_tags # unit tests def test_empty_input_list(): # Test with an empty list codeflash_output = find_common_tags([]) # 561ns -> 551ns (1.81% faster) # Outputs were verified to be equal to the original implementation def test_single_article(): # Test with a single article with tags codeflash_output = find_common_tags([{"tags": ["python", "coding", "development"]}]) # 1.44μs -> 1.28μs (12.4% faster) # Test with a single article with no tags codeflash_output = find_common_tags([{"tags": []}]) # 591ns -> 510ns (15.9% faster) # Outputs were verified to be equal to the original implementation def test_multiple_articles_some_common_tags(): # Test with multiple articles having some common tags articles = [ {"tags": ["python", "coding", "development"]}, {"tags": ["python", "development", "tutorial"]}, {"tags": ["python", "development", "guide"]} ] codeflash_output = find_common_tags(articles) # 2.88μs -> 2.44μs (18.1% faster) articles = [ {"tags": ["tech", "news"]}, {"tags": ["tech", "gadgets"]}, {"tags": ["tech", "reviews"]} ] codeflash_output = find_common_tags(articles) # 1.57μs -> 1.15μs (36.5% faster) # Outputs were verified to be equal to the original implementation def test_multiple_articles_no_common_tags(): # Test with multiple articles having no common tags articles = [ {"tags": ["python", "coding"]}, {"tags": ["development", "tutorial"]}, {"tags": ["guide", "learning"]} ] codeflash_output = find_common_tags(articles) # 2.29μs -> 2.00μs (14.5% faster) articles = [ {"tags": ["apple", "banana"]}, {"tags": ["orange", "grape"]}, {"tags": ["melon", "kiwi"]} ] codeflash_output = find_common_tags(articles) # 1.23μs -> 972ns (26.7% faster) # Outputs were verified to be equal to the original implementation def test_articles_with_duplicate_tags(): # Test with articles having duplicate tags articles = [ {"tags": ["python", "python", "coding"]}, {"tags": ["python", "development", "python"]}, {"tags": ["python", "guide", "python"]} ] codeflash_output = find_common_tags(articles) # 2.83μs -> 2.41μs (17.0% faster) articles = [ {"tags": ["tech", "tech", "news"]}, {"tags": ["tech", "tech", "gadgets"]}, {"tags": ["tech", "tech", "reviews"]} ] codeflash_output = find_common_tags(articles) # 1.59μs -> 1.17μs (35.8% faster) # Outputs were verified to be equal to the original implementation def test_articles_with_mixed_case_tags(): # Test with articles having mixed case tags articles = [ {"tags": ["Python", "Coding"]}, {"tags": ["python", "Development"]}, {"tags": ["PYTHON", "Guide"]} ] codeflash_output = find_common_tags(articles) # 2.23μs -> 1.90μs (17.4% faster) articles = [ {"tags": ["Tech", "News"]}, {"tags": ["tech", "Gadgets"]}, {"tags": ["TECH", "Reviews"]} ] codeflash_output = find_common_tags(articles) # 1.06μs -> 901ns (17.9% faster) # Outputs were verified to be equal to the original implementation def test_articles_with_non_string_tags(): # Test with articles having non-string tags articles = [ {"tags": ["python", 123, "coding"]}, {"tags": ["python", "development", 123]}, {"tags": ["python", "guide", 123]} ] codeflash_output = find_common_tags(articles) # 2.85μs -> 2.52μs (13.1% faster) articles = [ {"tags": [None, "news"]}, {"tags": ["tech", None]}, {"tags": [None, "reviews"]} ] codeflash_output = find_common_tags(articles) # 1.62μs -> 1.20μs (35.0% faster) # Outputs were verified to be equal to the original implementation def test_large_scale_test_cases(): # Test with large scale input where all tags should be common articles = [ {"tags": ["tag" + str(i) for i in range(1000)]} for _ in range(100) ] expected_output = {"tag" + str(i) for i in range(1000)} codeflash_output = find_common_tags(articles) # 380ms -> 3.44ms (10974% faster) # Test with large scale input where no tags should be common articles = [ {"tags": ["tag" + str(i) for i in range(1000)]} for _ in range(50) ] + [{"tags": ["unique_tag"]}] codeflash_output = find_common_tags(articles) # 188ms -> 1.66ms (11249% faster) # Outputs were verified to be equal to the original implementation #------------------------------------------------ from codeflash.result.common_tags import find_common_tags def test_find_common_tags(): find_common_tags([{}, {}]) def test_find_common_tags_2(): find_common_tags([])

🔎 Concolic Coverage Tests and Runtime

Test File::Test Function Original ⏱️ Optimized ⏱️ Speedup

codeflash_concolic_g9hfh7kd/tmp2gmm179f/test_concolic_coverage.py::test_find_common_tags 1.96μs 1.80μs 8.93%✅

codeflash_concolic_g9hfh7kd/tmp2gmm179f/test_concolic_coverage.py::test_find_common_tags_2 671ns 501ns 33.9%✅

To test or edit this optimization locally git merge codeflash/optimize-pr827-2025-10-16T18.41.45

Suggested change

common_tags = articles[0].get("tags", [])

for article in articles[1:]:

common_tags = [tag for tag in common_tags if tag in article.get("tags", [])]

return set(common_tags)

common_tags = set(articles[0].get("tags", []))

for article in articles[1:]:

common_tags.intersection_update(article.get("tags", []))

if not common_tags:

break

return common_tags

misrasaurabh1 added 2 commits October 15, 2025 17:32

add a incomplete Dockerfile

6e1da35

Signed-off-by: Saurabh Misra <[email protected]>

find common tags

9bafa0e

Signed-off-by: Saurabh Misra <[email protected]>

github-actions bot added the Review effort 2/5 label Oct 16, 2025

codeflash-ai bot reviewed Oct 16, 2025

View reviewed changes

misrasaurabh1 closed this Oct 17, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Codeflash demo 15 #827

Codeflash demo 15 #827

Uh oh!

misrasaurabh1 commented Oct 16, 2025 •

edited by github-actions bot

Loading

Uh oh!

github-actions bot commented Oct 16, 2025

Uh oh!

github-actions bot commented Oct 16, 2025

Uh oh!

codeflash-ai bot Oct 16, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Test	Status
⚙️ Existing Unit Tests	✅ 2 Passed
🌀 Generated Regression Tests	✅ 29 Passed
⏪ Replay Tests	🔘 None Found
🔎 Concolic Coverage Tests	✅ 2 Passed
📊 Tests Coverage	100.0%

Test File::Test Function	Original ⏱️	Optimized ⏱️	Speedup
`codeflash_concolic_g9hfh7kd/tmp2gmm179f/test_concolic_coverage.py::test_find_common_tags`	1.96μs	1.80μs	8.93%✅
`codeflash_concolic_g9hfh7kd/tmp2gmm179f/test_concolic_coverage.py::test_find_common_tags_2`	671ns	501ns	33.9%✅

Codeflash demo 15 #827

Codeflash demo 15 #827

Uh oh!

Conversation

misrasaurabh1 commented Oct 16, 2025 • edited by github-actions bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

PR Type

Description

Diagram Walkthrough

File Walkthrough

Uh oh!

github-actions bot commented Oct 16, 2025

PR Reviewer Guide 🔍

Uh oh!

github-actions bot commented Oct 16, 2025

PR Code Suggestions ✨

Uh oh!

codeflash-ai bot Oct 16, 2025

Choose a reason for hiding this comment

⚡️Codeflash found 7,954% (79.54x) speedup for find_common_tags in codeflash/result/common_tags.py

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

misrasaurabh1 commented Oct 16, 2025 •

edited by github-actions bot

Loading

⚡️Codeflash found 7,954% (79.54x) speedup for `find_common_tags` in `codeflash/result/common_tags.py`