Skip to content

Conversation

@misrasaurabh1
Copy link
Contributor

@misrasaurabh1 misrasaurabh1 commented Oct 15, 2025

PR Type

Enhancement, Tests


Description

  • Add common tags utility function

  • Implement tests for tag intersection


Diagram Walkthrough

flowchart LR
  A["Articles list"] -- "extract 'tags'" --> B["Iterative intersection"]
  B["Iterative intersection"] -- "return" --> C["Set of common tags"]
  D["Unit tests"] -- "validate" --> C
Loading

File Walkthrough

Relevant files
Enhancement
common_tags.py
Add common tags computation helper                                             

codeflash/result/common_tags.py

  • Introduce find_common_tags function.
  • Handle empty input returning empty set.
  • Compute tag intersection across articles' tags.
  • Return result as a set.
+11/-0   
Tests
test_common_tags.py
Tests for common tags utility                                                       

tests/test_common_tags.py

  • Add tests for find_common_tags.
  • Validate common tags across multiple article lists.
  • Assert consistent results with added article.
+22/-0   

Signed-off-by: Saurabh Misra <[email protected]>
@github-actions
Copy link

PR Reviewer Guide 🔍

Here are some key observations to aid the review process:

⏱️ Estimated effort to review: 2 🔵🔵⚪⚪⚪
🧪 PR contains tests
🔒 No security concerns identified
⚡ Recommended focus areas for review

Performance

Iterative list filtering yields O(nmk) behavior and repeated membership checks in lists. Converting to set-based intersection would be clearer and more efficient, especially for larger tag lists.

common_tags = articles[0].get("tags", [])
for article in articles[1:]:
    common_tags = [tag for tag in common_tags if tag in article.get("tags", [])]
Type Robustness

Assumes 'tags' is always a list of strings. Consider defensive handling (e.g., default to empty list when missing, and coerce to set) or validate input to avoid unexpected types or duplicates affecting results order before conversion to set.

common_tags = articles[0].get("tags", [])
for article in articles[1:]:
    common_tags = [tag for tag in common_tags if tag in article.get("tags", [])]
return set(common_tags)

@github-actions
Copy link

PR Code Suggestions ✨

Explore these optional code suggestions:

CategorySuggestion                                                                                                                                    Impact
General
Use set intersections

Convert tag lists to sets and use set intersection to avoid O(n*m) list membership
checks and handle duplicates reliably. This also prevents order-dependent behavior
and improves performance on large inputs.

codeflash/result/common_tags.py [8-11]

-common_tags = articles[0].get("tags", [])
+common_tags = set(articles[0].get("tags", []))
 for article in articles[1:]:
-    common_tags = [tag for tag in common_tags if tag in article.get("tags", [])]
-return set(common_tags)
+    common_tags &= set(article.get("tags", []))
+return common_tags
Suggestion importance[1-10]: 8

__

Why: Replacing list filtering with set intersections is accurate here, improves performance, and simplifies logic; it also deduplicates tags, matching the set return type.

Medium
Possible issue
Guard against invalid first tags

Validate that the first article actually provides a tags iterable to avoid
propagating a non-list value into set logic. Default to an empty set if tags is
missing or not a list of strings.

codeflash/result/common_tags.py [5-8]

 if not articles:
     return set()
+first_tags = articles[0].get("tags", [])
+common_tags = set(first_tags) if isinstance(first_tags, (list, set, tuple)) else set()
Suggestion importance[1-10]: 6

__

Why: Adding a type check for the initial tags prevents propagating bad input and aligns with subsequent set logic, though input validation scope is limited and not critical.

Low

Comment on lines +8 to +11
common_tags = articles[0].get("tags", [])
for article in articles[1:]:
common_tags = [tag for tag in common_tags if tag in article.get("tags", [])]
return set(common_tags)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚡️Codeflash found 7,684% (76.84x) speedup for find_common_tags in codeflash/result/common_tags.py

⏱️ Runtime : 580 milliseconds 7.46 milliseconds (best of 82 runs)

📝 Explanation and details

The optimization replaces list-based filtering with efficient set operations, yielding a 77x speedup.

Key Changes:

  1. Initial conversion to set: common_tags = set(articles[0].get("tags", [])) instead of keeping it as a list
  2. Set intersection instead of list comprehension: common_tags.intersection_update(article.get("tags", [])) replaces the expensive [tag for tag in common_tags if tag in article.get("tags", [])]

Why This Is Faster:

  • The original code performs O(n×m) operations for each article comparison, where n is the number of current common tags and m is the number of tags in each article
  • List comprehension with if tag in article.get("tags", []) requires linear search through the article's tag list for each tag
  • Set intersection operations are O(min(len(set1), len(set2))) and use hash-based lookups instead of linear searches
  • intersection_update() modifies the set in-place, avoiding memory allocation for intermediate results

Performance Gains by Test Case:

  • Small inputs (few articles/tags): 10-50% faster due to reduced overhead
  • Large tag lists: 5,274% faster (test_large_number_of_tags) where set operations excel
  • Large-scale tests: 10,000%+ faster demonstrating the algorithm scales much better

The optimization is particularly effective when articles have many tags or when processing many articles, where the O(n²) behavior of the original becomes prohibitive.

Correctness verification report:

Test Status
⚙️ Existing Unit Tests 2 Passed
🌀 Generated Regression Tests 29 Passed
⏪ Replay Tests 🔘 None Found
🔎 Concolic Coverage Tests 2 Passed
📊 Tests Coverage 100.0%
⚙️ Existing Unit Tests and Runtime
Test File::Test Function Original ⏱️ Optimized ⏱️ Speedup
test_common_tags.py::test_common_tags_1 6.13μs 4.09μs 50.0%✅
🌀 Generated Regression Tests and Runtime
# imports
# function to test
from __future__ import annotations

import pytest  # used for our unit tests
from codeflash.result.common_tags import find_common_tags

# unit tests

def test_single_article():
    # Single article should return its tags
    articles = [{"tags": ["python", "coding", "tutorial"]}]
    codeflash_output = find_common_tags(articles) # 1.75μs -> 1.34μs (30.7% faster)
    # Outputs were verified to be equal to the original implementation

def test_multiple_articles_with_common_tags():
    # Multiple articles with common tags should return the common tags
    articles = [
        {"tags": ["python", "coding"]},
        {"tags": ["python", "data"]},
        {"tags": ["python", "machine learning"]}
    ]
    codeflash_output = find_common_tags(articles) # 2.80μs -> 2.29μs (21.9% faster)
    # Outputs were verified to be equal to the original implementation

def test_empty_list_of_articles():
    # Empty list of articles should return an empty set
    articles = []
    codeflash_output = find_common_tags(articles) # 752ns -> 511ns (47.2% faster)
    # Outputs were verified to be equal to the original implementation

def test_articles_with_no_common_tags():
    # Articles with no common tags should return an empty set
    articles = [
        {"tags": ["python"]},
        {"tags": ["java"]},
        {"tags": ["c++"]}
    ]
    codeflash_output = find_common_tags(articles) # 2.41μs -> 2.05μs (17.5% faster)
    # Outputs were verified to be equal to the original implementation

def test_articles_with_empty_tag_lists():
    # Articles with some empty tag lists should return an empty set
    articles = [
        {"tags": []},
        {"tags": ["python"]},
        {"tags": ["python", "java"]}
    ]
    codeflash_output = find_common_tags(articles) # 2.10μs -> 1.91μs (9.98% faster)
    # Outputs were verified to be equal to the original implementation

def test_all_articles_with_empty_tag_lists():
    # All articles with empty tag lists should return an empty set
    articles = [
        {"tags": []},
        {"tags": []},
        {"tags": []}
    ]
    codeflash_output = find_common_tags(articles) # 1.91μs -> 1.77μs (7.95% faster)
    # Outputs were verified to be equal to the original implementation

def test_tags_with_special_characters():
    # Tags with special characters should be handled correctly
    articles = [
        {"tags": ["python!", "coding"]},
        {"tags": ["python!", "data"]}
    ]
    codeflash_output = find_common_tags(articles) # 2.23μs -> 1.81μs (23.2% faster)
    # Outputs were verified to be equal to the original implementation

def test_case_sensitivity():
    # Tags with different cases should not be considered the same
    articles = [
        {"tags": ["Python", "coding"]},
        {"tags": ["python", "data"]}
    ]
    codeflash_output = find_common_tags(articles) # 2.12μs -> 1.70μs (24.7% faster)
    # Outputs were verified to be equal to the original implementation

def test_large_number_of_articles():
    # Large number of articles with a common tag should return that tag
    articles = [{"tags": ["common_tag", f"tag{i}"]} for i in range(1000)]
    codeflash_output = find_common_tags(articles) # 224μs -> 148μs (50.8% faster)
    # Outputs were verified to be equal to the original implementation

def test_large_number_of_tags():
    # Large number of tags with some common tags should return the common tags
    articles = [
        {"tags": [f"tag{i}" for i in range(1000)]},
        {"tags": [f"tag{i}" for i in range(500, 1500)]}
    ]
    expected = {f"tag{i}" for i in range(500, 1000)}
    codeflash_output = find_common_tags(articles) # 4.38ms -> 81.5μs (5274% faster)
    # Outputs were verified to be equal to the original implementation

def test_mixed_length_of_tag_lists():
    # Articles with mixed length of tag lists should return the common tags
    articles = [
        {"tags": ["python", "coding"]},
        {"tags": ["python"]},
        {"tags": ["python", "coding", "tutorial"]}
    ]
    codeflash_output = find_common_tags(articles) # 2.60μs -> 2.09μs (23.9% faster)
    # Outputs were verified to be equal to the original implementation

def test_tags_with_different_data_types():
    # Tags with different data types should only consider strings
    articles = [
        {"tags": ["python", 123]},
        {"tags": ["python", "123"]}
    ]
    codeflash_output = find_common_tags(articles) # 2.25μs -> 1.74μs (29.3% faster)
    # Outputs were verified to be equal to the original implementation

def test_performance_with_large_data():
    # Performance with large data should return the common tag
    articles = [{"tags": ["common_tag", f"tag{i}"]} for i in range(10000)]
    codeflash_output = find_common_tags(articles) # 2.24ms -> 1.49ms (50.6% faster)
    # Outputs were verified to be equal to the original implementation

def test_scalability_with_increasing_tags():
    # Scalability with increasing tags should return the common tag
    articles = [{"tags": ["common_tag"] + [f"tag{i}" for i in range(j)]} for j in range(1, 1001)]
    codeflash_output = find_common_tags(articles) # 497μs -> 364μs (36.4% faster)
    # Outputs were verified to be equal to the original implementation
#------------------------------------------------
# imports
# function to test
from __future__ import annotations

import pytest  # used for our unit tests
from codeflash.result.common_tags import find_common_tags

# unit tests

def test_empty_input_list():
    # Test with an empty list
    codeflash_output = find_common_tags([]) # 681ns -> 651ns (4.61% faster)
    # Outputs were verified to be equal to the original implementation

def test_single_article():
    # Test with a single article with tags
    codeflash_output = find_common_tags([{"tags": ["python", "coding", "development"]}]) # 1.59μs -> 1.42μs (11.9% faster)
    # Test with a single article with no tags
    codeflash_output = find_common_tags([{"tags": []}]) # 601ns -> 511ns (17.6% faster)
    # Outputs were verified to be equal to the original implementation

def test_multiple_articles_some_common_tags():
    # Test with multiple articles having some common tags
    articles = [
        {"tags": ["python", "coding", "development"]},
        {"tags": ["python", "development", "tutorial"]},
        {"tags": ["python", "development", "guide"]}
    ]
    codeflash_output = find_common_tags(articles) # 3.15μs -> 2.83μs (11.4% faster)

    articles = [
        {"tags": ["tech", "news"]},
        {"tags": ["tech", "gadgets"]},
        {"tags": ["tech", "reviews"]}
    ]
    codeflash_output = find_common_tags(articles) # 1.56μs -> 1.14μs (36.9% faster)
    # Outputs were verified to be equal to the original implementation

def test_multiple_articles_no_common_tags():
    # Test with multiple articles having no common tags
    articles = [
        {"tags": ["python", "coding"]},
        {"tags": ["development", "tutorial"]},
        {"tags": ["guide", "learning"]}
    ]
    codeflash_output = find_common_tags(articles) # 2.33μs -> 2.11μs (10.4% faster)

    articles = [
        {"tags": ["apple", "banana"]},
        {"tags": ["orange", "grape"]},
        {"tags": ["melon", "kiwi"]}
    ]
    codeflash_output = find_common_tags(articles) # 1.24μs -> 1.05μs (18.1% faster)
    # Outputs were verified to be equal to the original implementation

def test_articles_with_duplicate_tags():
    # Test with articles having duplicate tags
    articles = [
        {"tags": ["python", "python", "coding"]},
        {"tags": ["python", "development", "python"]},
        {"tags": ["python", "guide", "python"]}
    ]
    codeflash_output = find_common_tags(articles) # 2.77μs -> 2.25μs (22.7% faster)

    articles = [
        {"tags": ["tech", "tech", "news"]},
        {"tags": ["tech", "tech", "gadgets"]},
        {"tags": ["tech", "tech", "reviews"]}
    ]
    codeflash_output = find_common_tags(articles) # 1.49μs -> 1.14μs (30.7% faster)
    # Outputs were verified to be equal to the original implementation

def test_articles_with_mixed_case_tags():
    # Test with articles having mixed case tags
    articles = [
        {"tags": ["Python", "Coding"]},
        {"tags": ["python", "Development"]},
        {"tags": ["PYTHON", "Guide"]}
    ]
    codeflash_output = find_common_tags(articles) # 2.42μs -> 2.09μs (15.8% faster)

    articles = [
        {"tags": ["Tech", "News"]},
        {"tags": ["tech", "Gadgets"]},
        {"tags": ["TECH", "Reviews"]}
    ]
    codeflash_output = find_common_tags(articles) # 1.19μs -> 1.05μs (13.3% faster)
    # Outputs were verified to be equal to the original implementation

def test_articles_with_non_string_tags():
    # Test with articles having non-string tags
    articles = [
        {"tags": ["python", 123, "coding"]},
        {"tags": ["python", "development", 123]},
        {"tags": ["python", "guide", 123]}
    ]
    codeflash_output = find_common_tags(articles) # 3.08μs -> 2.35μs (31.0% faster)

    articles = [
        {"tags": [None, "news"]},
        {"tags": ["tech", None]},
        {"tags": [None, "reviews"]}
    ]
    codeflash_output = find_common_tags(articles) # 1.59μs -> 1.20μs (32.5% faster)
    # Outputs were verified to be equal to the original implementation

def test_large_scale_test_cases():
    # Test with large scale input where all tags should be common
    articles = [
        {"tags": ["tag" + str(i) for i in range(1000)]} for _ in range(100)
    ]
    expected_output = {"tag" + str(i) for i in range(1000)}
    codeflash_output = find_common_tags(articles) # 383ms -> 3.56ms (10676% faster)

    # Test with large scale input where no tags should be common
    articles = [
        {"tags": ["tag" + str(i) for i in range(1000)]} for _ in range(50)
    ] + [{"tags": ["unique_tag"]}]
    codeflash_output = find_common_tags(articles) # 188ms -> 1.77ms (10606% faster)
    # Outputs were verified to be equal to the original implementation
#------------------------------------------------
from codeflash.result.common_tags import find_common_tags

def test_find_common_tags():
    find_common_tags([{}, {}])

def test_find_common_tags_2():
    find_common_tags([])
🔎 Concolic Coverage Tests and Runtime
Test File::Test Function Original ⏱️ Optimized ⏱️ Speedup
codeflash_concolic_32dt0ivc/tmpa7da_27h/test_concolic_coverage.py::test_find_common_tags 2.39μs 1.87μs 27.7%✅
codeflash_concolic_32dt0ivc/tmpa7da_27h/test_concolic_coverage.py::test_find_common_tags_2 721ns 511ns 41.1%✅

To test or edit this optimization locally git merge codeflash/optimize-pr821-2025-10-15T19.02.22

Suggested change
common_tags = articles[0].get("tags", [])
for article in articles[1:]:
common_tags = [tag for tag in common_tags if tag in article.get("tags", [])]
return set(common_tags)
common_tags = set(articles[0].get("tags", []))
for article in articles[1:]:
common_tags.intersection_update(article.get("tags", []))
return common_tags

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant