find common tags #820

misrasaurabh1 · 2025-10-15T18:40:52Z

PR Type

Enhancement, Tests

Description

Add common tags computation utility
Introduce unit tests for intersection

Diagram Walkthrough

flowchart LR
  A["common_tags.py (utility)"] -- "used by" --> B["test_common_tags.py (tests)"]

File Walkthrough

Relevant files

Enhancement

common_tags.py `Implement common tags intersection utility` codeflash/result/common_tags.py Add `find_common_tags` to compute shared tags Handles empty input returning empty set Iteratively intersects tags across articles	+11/-0

Tests

test_common_tags.py `Add unit tests for common tags utility` tests/test_common_tags.py Add tests for `find_common_tags` Validate results for 3 and 4-article inputs	+22/-0

Signed-off-by: Saurabh Misra <[email protected]>

github-actions · 2025-10-15T18:42:00Z

PR Reviewer Guide 🔍

Here are some key observations to aid the review process:

⏱️ Estimated effort to review: 2 🔵🔵⚪⚪⚪
🧪 PR contains tests
🔒 No security concerns identified
⚡ Recommended focus areas for review Performance The intersection is computed using list filtering repeatedly, resulting in O(nm) membership checks per step; converting to sets and using set intersection would be more efficient and simpler, especially for large tag lists. common_tags = articles[0].get("tags", []) for article in articles[1:]: common_tags = [tag for tag in common_tags if tag in article.get("tags", [])] return set(common_tags) Duplicates Handling* The algorithm preserves duplicates during interim list filtering and only removes them at the end by converting to a set; if duplicates in the first article are present, they cause redundant work and could affect expected behavior if order or counts matter. Consider normalizing to sets at the start. common_tags = articles[0].get("tags", []) for article in articles[1:]: common_tags = [tag for tag in common_tags if tag in article.get("tags", [])] return set(common_tags) Missing Edge Cases Tests do not cover empty input, missing 'tags' key, empty tags lists, case sensitivity, or duplicate tags; adding these would strengthen correctness guarantees. def test_common_tags_1() -> None: articles_1 = [ {"title": "Article 1", "tags": ["Python", "AI", "ML"]}, {"title": "Article 2", "tags": ["Python", "Data Science", "AI"]}, {"title": "Article 3", "tags": ["Python", "AI", "Big Data"]}, ] expected = {"Python", "AI"} assert find_common_tags(articles_1) == expected articles_2 = [ {"title": "Article 1", "tags": ["Python", "AI", "ML"]}, {"title": "Article 2", "tags": ["Python", "Data Science", "AI"]}, {"title": "Article 3", "tags": ["Python", "AI", "Big Data"]}, {"title": "Article 4", "tags": ["Python", "AI", "ML"]}, ] assert find_common_tags(articles_2) == expected

github-actions · 2025-10-15T18:42:15Z

PR Code Suggestions ✨

Explore these optional code suggestions:

Category	Suggestion	Impact
General	Use set intersection Convert to set intersection to avoid O(n^2) list scans and to ensure deduplication throughout. This improves performance and reduces memory churn while keeping behavior identical. codeflash/result/common_tags.py [8-11] -common_tags = articles[0].get("tags", []) +common_tags = set(articles[0].get("tags", [])) for article in articles[1:]: - common_tags = [tag for tag in common_tags if tag in article.get("tags", [])] -return set(common_tags) + common_tags &= set(article.get("tags", [])) +return common_tags Suggestion importance[1-10]: 8 __ Why: Replacing list filtering with set intersections on lines 8-11 is correct, preserves behavior, deduplicates naturally, and improves time complexity significantly; the improved_code matches the existing snippet's intent.	Medium
Possible issue	Guard against invalid inputs Handle missing or non-list `tags` defensively to prevent runtime errors when inputs deviate (e.g., `None` or strings). Normalize to empty sets for safety. codeflash/result/common_tags.py [5-6] +if not articles: + return set() - Suggestion importance[1-10]: 2 __ Why: The suggestion proposes additional validation but provides no actual change (improved_code equals existing_code) and doesn't point to concrete issues in the PR; impact is minor without implementation details.	Low

codeflash-ai · 2025-10-15T18:51:35Z

codeflash/result/common_tags.py

+    common_tags = articles[0].get("tags", [])
+    for article in articles[1:]:
+        common_tags = [tag for tag in common_tags if tag in article.get("tags", [])]
+    return set(common_tags)


⚡️Codeflash found 7,937% (79.37x) speedup for find_common_tags in codeflash/result/common_tags.py

⏱️ Runtime : 577 milliseconds → 7.18 milliseconds (best of 74 runs)

📝 Explanation and details

The optimization transforms the algorithm from using list comprehensions with nested loops to using set operations, resulting in a 79x speedup (7936% improvement).

Key optimizations:

Set-based intersection instead of list comprehension: The original code used [tag for tag in common_tags if tag in article.get("tags", [])] which creates O(n*m) operations per article. The optimized version uses set.intersection_update() which performs O(n+m) set intersection operations.

Early termination: Added if not common_tags: break to exit the loop immediately when no common tags remain, avoiding unnecessary processing of remaining articles.

Direct set initialization: Converts the first article's tags directly to a set, eliminating the final set() conversion and enabling efficient set operations from the start.

Performance impact by test case:

Small datasets (2-3 articles): 18-43% faster due to reduced overhead

Large tag lists: Up to 5316% faster (test_large_number_of_tags) where set operations dramatically outperform nested list operations

Large article counts: Up to 11131% faster (large_scale_test_cases) where early termination and O(n+m) complexity vs O(n*m) show exponential benefits

The optimization is particularly effective for scenarios with many articles or large tag lists, where the O(n*m) complexity of membership testing in lists becomes prohibitive compared to O(1) average-case set membership testing.

✅ Correctness verification report:

Test Status

⚙️ Existing Unit Tests ✅ 2 Passed

🌀 Generated Regression Tests ✅ 29 Passed

⏪ Replay Tests 🔘 None Found

🔎 Concolic Coverage Tests ✅ 2 Passed

📊 Tests Coverage 100.0%

⚙️ Existing Unit Tests and Runtime

Test File::Test Function Original ⏱️ Optimized ⏱️ Speedup

test_common_tags.py::test_common_tags_1 5.00μs 4.09μs 22.3%✅

🌀 Generated Regression Tests and Runtime

# imports # function to test from __future__ import annotations import pytest # used for our unit tests from codeflash.result.common_tags import find_common_tags # unit tests def test_single_article(): # Single article should return its tags articles = [{"tags": ["python", "coding", "tutorial"]}] codeflash_output = find_common_tags(articles) # 1.86μs -> 1.40μs (32.8% faster) # Outputs were verified to be equal to the original implementation def test_multiple_articles_with_common_tags(): # Multiple articles with common tags should return the common tags articles = [ {"tags": ["python", "coding"]}, {"tags": ["python", "data"]}, {"tags": ["python", "machine learning"]} ] codeflash_output = find_common_tags(articles) # 2.99μs -> 2.42μs (23.1% faster) # Outputs were verified to be equal to the original implementation def test_empty_list_of_articles(): # Empty list of articles should return an empty set articles = [] codeflash_output = find_common_tags(articles) # 661ns -> 460ns (43.7% faster) # Outputs were verified to be equal to the original implementation def test_articles_with_no_common_tags(): # Articles with no common tags should return an empty set articles = [ {"tags": ["python"]}, {"tags": ["java"]}, {"tags": ["c++"]} ] codeflash_output = find_common_tags(articles) # 2.42μs -> 1.90μs (27.3% faster) # Outputs were verified to be equal to the original implementation def test_articles_with_empty_tag_lists(): # Articles with some empty tag lists should return an empty set articles = [ {"tags": []}, {"tags": ["python"]}, {"tags": ["python", "java"]} ] codeflash_output = find_common_tags(articles) # 2.07μs -> 1.70μs (21.8% faster) # Outputs were verified to be equal to the original implementation def test_all_articles_with_empty_tag_lists(): # All articles with empty tag lists should return an empty set articles = [ {"tags": []}, {"tags": []}, {"tags": []} ] codeflash_output = find_common_tags(articles) # 2.09μs -> 1.58μs (32.3% faster) # Outputs were verified to be equal to the original implementation def test_tags_with_special_characters(): # Tags with special characters should be handled correctly articles = [ {"tags": ["python!", "coding"]}, {"tags": ["python!", "data"]} ] codeflash_output = find_common_tags(articles) # 2.33μs -> 1.96μs (18.9% faster) # Outputs were verified to be equal to the original implementation def test_case_sensitivity(): # Tags with different cases should not be considered the same articles = [ {"tags": ["Python", "coding"]}, {"tags": ["python", "data"]} ] codeflash_output = find_common_tags(articles) # 2.17μs -> 1.84μs (18.0% faster) # Outputs were verified to be equal to the original implementation def test_large_number_of_articles(): # Large number of articles with a common tag should return that tag articles = [{"tags": ["common_tag", f"tag{i}"]} for i in range(1000)] codeflash_output = find_common_tags(articles) # 228μs -> 150μs (51.5% faster) # Outputs were verified to be equal to the original implementation def test_large_number_of_tags(): # Large number of tags with some common tags should return the common tags articles = [ {"tags": [f"tag{i}" for i in range(1000)]}, {"tags": [f"tag{i}" for i in range(500, 1500)]} ] expected = {f"tag{i}" for i in range(500, 1000)} codeflash_output = find_common_tags(articles) # 4.40ms -> 81.2μs (5316% faster) # Outputs were verified to be equal to the original implementation def test_mixed_length_of_tag_lists(): # Articles with mixed length of tag lists should return the common tags articles = [ {"tags": ["python", "coding"]}, {"tags": ["python"]}, {"tags": ["python", "coding", "tutorial"]} ] codeflash_output = find_common_tags(articles) # 2.67μs -> 2.21μs (20.8% faster) # Outputs were verified to be equal to the original implementation def test_tags_with_different_data_types(): # Tags with different data types should only consider strings articles = [ {"tags": ["python", 123]}, {"tags": ["python", "123"]} ] codeflash_output = find_common_tags(articles) # 2.11μs -> 1.92μs (9.93% faster) # Outputs were verified to be equal to the original implementation def test_performance_with_large_data(): # Performance with large data should return the common tag articles = [{"tags": ["common_tag", f"tag{i}"]} for i in range(10000)] codeflash_output = find_common_tags(articles) # 2.26ms -> 1.51ms (50.2% faster) # Outputs were verified to be equal to the original implementation def test_scalability_with_increasing_tags(): # Scalability with increasing tags should return the common tag articles = [{"tags": ["common_tag"] + [f"tag{i}" for i in range(j)]} for j in range(1, 1001)] codeflash_output = find_common_tags(articles) # 412μs -> 282μs (46.2% faster) # Outputs were verified to be equal to the original implementation #------------------------------------------------ # imports # function to test from __future__ import annotations import pytest # used for our unit tests from codeflash.result.common_tags import find_common_tags # unit tests def test_empty_input_list(): # Test with an empty list codeflash_output = find_common_tags([]) # 561ns -> 481ns (16.6% faster) # Outputs were verified to be equal to the original implementation def test_single_article(): # Test with a single article with tags codeflash_output = find_common_tags([{"tags": ["python", "coding", "development"]}]) # 1.44μs -> 1.37μs (5.10% faster) # Test with a single article with no tags codeflash_output = find_common_tags([{"tags": []}]) # 571ns -> 541ns (5.55% faster) # Outputs were verified to be equal to the original implementation def test_multiple_articles_some_common_tags(): # Test with multiple articles having some common tags articles = [ {"tags": ["python", "coding", "development"]}, {"tags": ["python", "development", "tutorial"]}, {"tags": ["python", "development", "guide"]} ] codeflash_output = find_common_tags(articles) # 2.88μs -> 2.56μs (12.5% faster) articles = [ {"tags": ["tech", "news"]}, {"tags": ["tech", "gadgets"]}, {"tags": ["tech", "reviews"]} ] codeflash_output = find_common_tags(articles) # 1.53μs -> 1.21μs (26.5% faster) # Outputs were verified to be equal to the original implementation def test_multiple_articles_no_common_tags(): # Test with multiple articles having no common tags articles = [ {"tags": ["python", "coding"]}, {"tags": ["development", "tutorial"]}, {"tags": ["guide", "learning"]} ] codeflash_output = find_common_tags(articles) # 2.24μs -> 1.94μs (15.4% faster) articles = [ {"tags": ["apple", "banana"]}, {"tags": ["orange", "grape"]}, {"tags": ["melon", "kiwi"]} ] codeflash_output = find_common_tags(articles) # 1.24μs -> 922ns (34.7% faster) # Outputs were verified to be equal to the original implementation def test_articles_with_duplicate_tags(): # Test with articles having duplicate tags articles = [ {"tags": ["python", "python", "coding"]}, {"tags": ["python", "development", "python"]}, {"tags": ["python", "guide", "python"]} ] codeflash_output = find_common_tags(articles) # 2.71μs -> 2.38μs (13.9% faster) articles = [ {"tags": ["tech", "tech", "news"]}, {"tags": ["tech", "tech", "gadgets"]}, {"tags": ["tech", "tech", "reviews"]} ] codeflash_output = find_common_tags(articles) # 1.64μs -> 1.26μs (30.2% faster) # Outputs were verified to be equal to the original implementation def test_articles_with_mixed_case_tags(): # Test with articles having mixed case tags articles = [ {"tags": ["Python", "Coding"]}, {"tags": ["python", "Development"]}, {"tags": ["PYTHON", "Guide"]} ] codeflash_output = find_common_tags(articles) # 2.36μs -> 1.91μs (23.6% faster) articles = [ {"tags": ["Tech", "News"]}, {"tags": ["tech", "Gadgets"]}, {"tags": ["TECH", "Reviews"]} ] codeflash_output = find_common_tags(articles) # 1.15μs -> 1.03μs (11.6% faster) # Outputs were verified to be equal to the original implementation def test_articles_with_non_string_tags(): # Test with articles having non-string tags articles = [ {"tags": ["python", 123, "coding"]}, {"tags": ["python", "development", 123]}, {"tags": ["python", "guide", 123]} ] codeflash_output = find_common_tags(articles) # 3.12μs -> 2.50μs (24.8% faster) articles = [ {"tags": [None, "news"]}, {"tags": ["tech", None]}, {"tags": [None, "reviews"]} ] codeflash_output = find_common_tags(articles) # 1.63μs -> 1.32μs (23.4% faster) # Outputs were verified to be equal to the original implementation def test_large_scale_test_cases(): # Test with large scale input where all tags should be common articles = [ {"tags": ["tag" + str(i) for i in range(1000)]} for _ in range(100) ] expected_output = {"tag" + str(i) for i in range(1000)} codeflash_output = find_common_tags(articles) # 380ms -> 3.43ms (10995% faster) # Test with large scale input where no tags should be common articles = [ {"tags": ["tag" + str(i) for i in range(1000)]} for _ in range(50) ] + [{"tags": ["unique_tag"]}] codeflash_output = find_common_tags(articles) # 188ms -> 1.68ms (11131% faster) # Outputs were verified to be equal to the original implementation #------------------------------------------------ from codeflash.result.common_tags import find_common_tags def test_find_common_tags(): find_common_tags([{}, {}]) def test_find_common_tags_2(): find_common_tags([])

🔎 Concolic Coverage Tests and Runtime

Test File::Test Function Original ⏱️ Optimized ⏱️ Speedup

codeflash_concolic_4ui61_cv/tmpr9nwzabd/test_concolic_coverage.py::test_find_common_tags 2.18μs 2.20μs -0.907%⚠️

codeflash_concolic_4ui61_cv/tmpr9nwzabd/test_concolic_coverage.py::test_find_common_tags_2 632ns 501ns 26.1%✅

To test or edit this optimization locally git merge codeflash/optimize-pr820-2025-10-15T18.51.29

Suggested change

common_tags = articles[0].get("tags", [])

for article in articles[1:]:

common_tags = [tag for tag in common_tags if tag in article.get("tags", [])]

return set(common_tags)

common_tags = set(articles[0].get("tags", []))

for article in articles[1:]:

common_tags.intersection_update(article.get("tags", []))

if not common_tags:

break

return common_tags

find common tags

7a13235

Signed-off-by: Saurabh Misra <[email protected]>

github-actions bot added the Review effort 2/5 label Oct 15, 2025

codeflash-ai bot reviewed Oct 15, 2025

View reviewed changes

misrasaurabh1 closed this Oct 17, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

find common tags #820

find common tags #820

Uh oh!

misrasaurabh1 commented Oct 15, 2025 •

edited by github-actions bot

Loading

Uh oh!

github-actions bot commented Oct 15, 2025

Uh oh!

github-actions bot commented Oct 15, 2025

Uh oh!

codeflash-ai bot Oct 15, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Test	Status
⚙️ Existing Unit Tests	✅ 2 Passed
🌀 Generated Regression Tests	✅ 29 Passed
⏪ Replay Tests	🔘 None Found
🔎 Concolic Coverage Tests	✅ 2 Passed
📊 Tests Coverage	100.0%

Test File::Test Function	Original ⏱️	Optimized ⏱️	Speedup
`codeflash_concolic_4ui61_cv/tmpr9nwzabd/test_concolic_coverage.py::test_find_common_tags`	2.18μs	2.20μs	-0.907%⚠️
`codeflash_concolic_4ui61_cv/tmpr9nwzabd/test_concolic_coverage.py::test_find_common_tags_2`	632ns	501ns	26.1%✅

find common tags #820

find common tags #820

Uh oh!

Conversation

misrasaurabh1 commented Oct 15, 2025 • edited by github-actions bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

PR Type

Description

Diagram Walkthrough

File Walkthrough

Uh oh!

github-actions bot commented Oct 15, 2025

PR Reviewer Guide 🔍

Uh oh!

github-actions bot commented Oct 15, 2025

PR Code Suggestions ✨

Uh oh!

codeflash-ai bot Oct 15, 2025

Choose a reason for hiding this comment

⚡️Codeflash found 7,937% (79.37x) speedup for find_common_tags in codeflash/result/common_tags.py

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

misrasaurabh1 commented Oct 15, 2025 •

edited by github-actions bot

Loading

⚡️Codeflash found 7,937% (79.37x) speedup for `find_common_tags` in `codeflash/result/common_tags.py`