Skip to content

Conversation

@codeflash-ai
Copy link

@codeflash-ai codeflash-ai bot commented Dec 23, 2025

📄 28% (0.28x) speedup for find_common_tags in src/algorithms/string.py

⏱️ Runtime : 7.81 milliseconds 6.08 milliseconds (best of 101 runs)

📝 Explanation and details

Optimization notes:

  • Pre-extract "tags" to avoid repeated dict/key lookups inside the loop.
  • Sort the list of tag lists by length – this minimizes set intersection complexity, as intersecting with smaller sets is faster and intersects-out more tags earlier.
  • Early exit if the first tag list is empty (no tags are common).
  • Preserves all behavior and types; no signatures or return changes.
  • No attribute lookup micro-optimizations as instructed.

Correctness verification report:

Test Status
⚙️ Existing Unit Tests 2 Passed
🌀 Generated Regression Tests 29 Passed
⏪ Replay Tests 🔘 None Found
🔎 Concolic Coverage Tests 🔘 None Found
📊 Tests Coverage 100.0%
⚙️ Click to see Existing Unit Tests
Test File::Test Function Original ⏱️ Optimized ⏱️ Speedup
test_common_tags.py::test_common_tags_1 3.25μs 3.58μs -9.29%⚠️
🌀 Click to see Generated Regression Tests
# imports
# function to test
from __future__ import annotations

import pytest  # used for our unit tests
from codeflash.result.common_tags import find_common_tags

# unit tests


def test_single_article():
    # Single article should return its tags
    articles = [{"tags": ["python", "coding", "tutorial"]}]
    codeflash_output = find_common_tags(articles)  # 833ns -> 1.17μs (28.6% slower)
    # Outputs were verified to be equal to the original implementation


def test_multiple_articles_with_common_tags():
    # Multiple articles with common tags should return the common tags
    articles = [
        {"tags": ["python", "coding"]},
        {"tags": ["python", "data"]},
        {"tags": ["python", "machine learning"]},
    ]
    codeflash_output = find_common_tags(articles)  # 1.46μs -> 1.79μs (18.6% slower)
    # Outputs were verified to be equal to the original implementation


def test_empty_list_of_articles():
    # Empty list of articles should return an empty set
    articles = []
    codeflash_output = find_common_tags(articles)  # 333ns -> 292ns (14.0% faster)
    # Outputs were verified to be equal to the original implementation


def test_articles_with_no_common_tags():
    # Articles with no common tags should return an empty set
    articles = [{"tags": ["python"]}, {"tags": ["java"]}, {"tags": ["c++"]}]
    codeflash_output = find_common_tags(articles)  # 1.08μs -> 1.62μs (33.4% slower)
    # Outputs were verified to be equal to the original implementation


def test_articles_with_empty_tag_lists():
    # Articles with some empty tag lists should return an empty set
    articles = [{"tags": []}, {"tags": ["python"]}, {"tags": ["python", "java"]}]
    codeflash_output = find_common_tags(articles)  # 1.04μs -> 1.08μs (3.79% slower)
    # Outputs were verified to be equal to the original implementation


def test_all_articles_with_empty_tag_lists():
    # All articles with empty tag lists should return an empty set
    articles = [{"tags": []}, {"tags": []}, {"tags": []}]
    codeflash_output = find_common_tags(articles)  # 1.04μs -> 1.00μs (4.10% faster)
    # Outputs were verified to be equal to the original implementation


def test_tags_with_special_characters():
    # Tags with special characters should be handled correctly
    articles = [{"tags": ["python!", "coding"]}, {"tags": ["python!", "data"]}]
    codeflash_output = find_common_tags(articles)  # 1.25μs -> 1.58μs (21.1% slower)
    # Outputs were verified to be equal to the original implementation


def test_case_sensitivity():
    # Tags with different cases should not be considered the same
    articles = [{"tags": ["Python", "coding"]}, {"tags": ["python", "data"]}]
    codeflash_output = find_common_tags(articles)  # 1.12μs -> 1.58μs (28.9% slower)
    # Outputs were verified to be equal to the original implementation


def test_large_number_of_articles():
    # Large number of articles with a common tag should return that tag
    articles = [{"tags": ["common_tag", f"tag{i}"]} for i in range(1000)]
    codeflash_output = find_common_tags(articles)  # 117μs -> 144μs (18.7% slower)
    # Outputs were verified to be equal to the original implementation


def test_large_number_of_tags():
    # Large number of tags with some common tags should return the common tags
    articles = [
        {"tags": [f"tag{i}" for i in range(1000)]},
        {"tags": [f"tag{i}" for i in range(500, 1500)]},
    ]
    expected = {f"tag{i}" for i in range(500, 1000)}
    codeflash_output = find_common_tags(articles)  # 66.0μs -> 62.9μs (4.90% faster)
    # Outputs were verified to be equal to the original implementation


def test_mixed_length_of_tag_lists():
    # Articles with mixed length of tag lists should return the common tags
    articles = [
        {"tags": ["python", "coding"]},
        {"tags": ["python"]},
        {"tags": ["python", "coding", "tutorial"]},
    ]
    codeflash_output = find_common_tags(articles)  # 1.38μs -> 1.71μs (19.5% slower)
    # Outputs were verified to be equal to the original implementation


def test_tags_with_different_data_types():
    # Tags with different data types should only consider strings
    articles = [{"tags": ["python", 123]}, {"tags": ["python", "123"]}]
    codeflash_output = find_common_tags(articles)  # 1.08μs -> 1.58μs (31.5% slower)
    # Outputs were verified to be equal to the original implementation


def test_performance_with_large_data():
    # Performance with large data should return the common tag
    articles = [{"tags": ["common_tag", f"tag{i}"]} for i in range(10000)]
    codeflash_output = find_common_tags(articles)  # 1.17ms -> 1.40ms (16.7% slower)
    # Outputs were verified to be equal to the original implementation


def test_scalability_with_increasing_tags():
    # Scalability with increasing tags should return the common tag
    articles = [
        {"tags": ["common_tag"] + [f"tag{i}" for i in range(j)]} for j in range(1, 1001)
    ]
    codeflash_output = find_common_tags(articles)  # 404μs -> 452μs (10.7% slower)
    # Outputs were verified to be equal to the original implementation
# imports
# function to test
from __future__ import annotations

import pytest  # used for our unit tests
from codeflash.result.common_tags import find_common_tags

# unit tests


def test_empty_input_list():
    # Test with an empty list
    codeflash_output = find_common_tags([])  # 375ns -> 292ns (28.4% faster)
    # Outputs were verified to be equal to the original implementation


def test_single_article():
    # Test with a single article with tags
    codeflash_output = find_common_tags(
        [{"tags": ["python", "coding", "development"]}]
    )  # 750ns -> 1.21μs (37.9% slower)
    # Test with a single article with no tags
    codeflash_output = find_common_tags([{"tags": []}])  # 375ns -> 458ns (18.1% slower)
    # Outputs were verified to be equal to the original implementation


def test_multiple_articles_some_common_tags():
    # Test with multiple articles having some common tags
    articles = [
        {"tags": ["python", "coding", "development"]},
        {"tags": ["python", "development", "tutorial"]},
        {"tags": ["python", "development", "guide"]},
    ]
    codeflash_output = find_common_tags(articles)  # 1.54μs -> 1.92μs (19.6% slower)

    articles = [
        {"tags": ["tech", "news"]},
        {"tags": ["tech", "gadgets"]},
        {"tags": ["tech", "reviews"]},
    ]
    codeflash_output = find_common_tags(articles)  # 792ns -> 958ns (17.3% slower)
    # Outputs were verified to be equal to the original implementation


def test_multiple_articles_no_common_tags():
    # Test with multiple articles having no common tags
    articles = [
        {"tags": ["python", "coding"]},
        {"tags": ["development", "tutorial"]},
        {"tags": ["guide", "learning"]},
    ]
    codeflash_output = find_common_tags(articles)  # 1.08μs -> 1.58μs (31.6% slower)

    articles = [
        {"tags": ["apple", "banana"]},
        {"tags": ["orange", "grape"]},
        {"tags": ["melon", "kiwi"]},
    ]
    codeflash_output = find_common_tags(articles)  # 584ns -> 792ns (26.3% slower)
    # Outputs were verified to be equal to the original implementation


def test_articles_with_duplicate_tags():
    # Test with articles having duplicate tags
    articles = [
        {"tags": ["python", "python", "coding"]},
        {"tags": ["python", "development", "python"]},
        {"tags": ["python", "guide", "python"]},
    ]
    codeflash_output = find_common_tags(articles)  # 1.46μs -> 1.83μs (20.5% slower)

    articles = [
        {"tags": ["tech", "tech", "news"]},
        {"tags": ["tech", "tech", "gadgets"]},
        {"tags": ["tech", "tech", "reviews"]},
    ]
    codeflash_output = find_common_tags(articles)  # 833ns -> 1.00μs (16.7% slower)
    # Outputs were verified to be equal to the original implementation


def test_articles_with_mixed_case_tags():
    # Test with articles having mixed case tags
    articles = [
        {"tags": ["Python", "Coding"]},
        {"tags": ["python", "Development"]},
        {"tags": ["PYTHON", "Guide"]},
    ]
    codeflash_output = find_common_tags(articles)  # 1.08μs -> 1.62μs (33.4% slower)

    articles = [
        {"tags": ["Tech", "News"]},
        {"tags": ["tech", "Gadgets"]},
        {"tags": ["TECH", "Reviews"]},
    ]
    codeflash_output = find_common_tags(articles)  # 583ns -> 792ns (26.4% slower)
    # Outputs were verified to be equal to the original implementation


def test_articles_with_non_string_tags():
    # Test with articles having non-string tags
    articles = [
        {"tags": ["python", 123, "coding"]},
        {"tags": ["python", "development", 123]},
        {"tags": ["python", "guide", 123]},
    ]
    codeflash_output = find_common_tags(articles)  # 1.54μs -> 1.88μs (17.8% slower)

    articles = [
        {"tags": [None, "news"]},
        {"tags": ["tech", None]},
        {"tags": [None, "reviews"]},
    ]
    codeflash_output = find_common_tags(articles)  # 875ns -> 1.08μs (19.2% slower)
    # Outputs were verified to be equal to the original implementation


def test_large_scale_test_cases():
    # Test with large scale input where all tags should be common
    articles = [{"tags": ["tag" + str(i) for i in range(1000)]} for _ in range(100)]
    expected_output = {"tag" + str(i) for i in range(1000)}
    codeflash_output = find_common_tags(articles)  # 4.06ms -> 3.96ms (2.55% faster)

    # Test with large scale input where no tags should be common
    articles = [{"tags": ["tag" + str(i) for i in range(1000)]} for _ in range(50)] + [
        {"tags": ["unique_tag"]}
    ]
    codeflash_output = find_common_tags(articles)  # 1.96ms -> 22.3μs (8695% faster)
    # Outputs were verified to be equal to the original implementation
from src.algorithms.string import find_common_tags

def test_find_common_tags():
    find_common_tags([{'\x00\x00\x00\x00': [], 'tags': v1:=['', '']}, {'\x00\x00\x00\x00': [], 'tags': v1}])

def test_find_common_tags_2():
    find_common_tags([])

def test_find_common_tags_3():
    find_common_tags([{}, {}])

To edit these changes git checkout codeflash/optimize-find_common_tags-mjhym39x and push.

Codeflash

**Optimization notes:**
- Pre-extract "tags" to avoid repeated dict/key lookups inside the loop.
- Sort the list of tag lists by length – this minimizes set intersection complexity, as intersecting with smaller sets is faster and intersects-out more tags earlier.
- Early exit if the first tag list is empty (no tags are common).
- Preserves all behavior and types; no signatures or return changes.  
- No attribute lookup micro-optimizations as instructed.
@codeflash-ai codeflash-ai bot requested a review from KRRT7 December 23, 2025 02:22
@codeflash-ai codeflash-ai bot added the ⚡️ codeflash Optimization PR opened by Codeflash AI label Dec 23, 2025
@KRRT7 KRRT7 closed this Dec 23, 2025
@codeflash-ai codeflash-ai bot deleted the codeflash/optimize-find_common_tags-mjhym39x branch December 23, 2025 05:48
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

⚡️ codeflash Optimization PR opened by Codeflash AI

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants