-
Notifications
You must be signed in to change notification settings - Fork 22
find common tags #820
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
find common tags #820
Conversation
Signed-off-by: Saurabh Misra <[email protected]>
PR Reviewer Guide 🔍Here are some key observations to aid the review process:
|
PR Code Suggestions ✨Explore these optional code suggestions:
|
| common_tags = articles[0].get("tags", []) | ||
| for article in articles[1:]: | ||
| common_tags = [tag for tag in common_tags if tag in article.get("tags", [])] | ||
| return set(common_tags) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
⚡️Codeflash found 7,937% (79.37x) speedup for find_common_tags in codeflash/result/common_tags.py
⏱️ Runtime : 577 milliseconds → 7.18 milliseconds (best of 74 runs)
📝 Explanation and details
The optimization transforms the algorithm from using list comprehensions with nested loops to using set operations, resulting in a 79x speedup (7936% improvement).
Key optimizations:
-
Set-based intersection instead of list comprehension: The original code used
[tag for tag in common_tags if tag in article.get("tags", [])]which creates O(n*m) operations per article. The optimized version usesset.intersection_update()which performs O(n+m) set intersection operations. -
Early termination: Added
if not common_tags: breakto exit the loop immediately when no common tags remain, avoiding unnecessary processing of remaining articles. -
Direct set initialization: Converts the first article's tags directly to a set, eliminating the final
set()conversion and enabling efficient set operations from the start.
Performance impact by test case:
- Small datasets (2-3 articles): 18-43% faster due to reduced overhead
- Large tag lists: Up to 5316% faster (test_large_number_of_tags) where set operations dramatically outperform nested list operations
- Large article counts: Up to 11131% faster (large_scale_test_cases) where early termination and O(n+m) complexity vs O(n*m) show exponential benefits
The optimization is particularly effective for scenarios with many articles or large tag lists, where the O(n*m) complexity of membership testing in lists becomes prohibitive compared to O(1) average-case set membership testing.
✅ Correctness verification report:
| Test | Status |
|---|---|
| ⚙️ Existing Unit Tests | ✅ 2 Passed |
| 🌀 Generated Regression Tests | ✅ 29 Passed |
| ⏪ Replay Tests | 🔘 None Found |
| 🔎 Concolic Coverage Tests | ✅ 2 Passed |
| 📊 Tests Coverage | 100.0% |
⚙️ Existing Unit Tests and Runtime
| Test File::Test Function | Original ⏱️ | Optimized ⏱️ | Speedup |
|---|---|---|---|
test_common_tags.py::test_common_tags_1 |
5.00μs | 4.09μs | 22.3%✅ |
🌀 Generated Regression Tests and Runtime
# imports
# function to test
from __future__ import annotations
import pytest # used for our unit tests
from codeflash.result.common_tags import find_common_tags
# unit tests
def test_single_article():
# Single article should return its tags
articles = [{"tags": ["python", "coding", "tutorial"]}]
codeflash_output = find_common_tags(articles) # 1.86μs -> 1.40μs (32.8% faster)
# Outputs were verified to be equal to the original implementation
def test_multiple_articles_with_common_tags():
# Multiple articles with common tags should return the common tags
articles = [
{"tags": ["python", "coding"]},
{"tags": ["python", "data"]},
{"tags": ["python", "machine learning"]}
]
codeflash_output = find_common_tags(articles) # 2.99μs -> 2.42μs (23.1% faster)
# Outputs were verified to be equal to the original implementation
def test_empty_list_of_articles():
# Empty list of articles should return an empty set
articles = []
codeflash_output = find_common_tags(articles) # 661ns -> 460ns (43.7% faster)
# Outputs were verified to be equal to the original implementation
def test_articles_with_no_common_tags():
# Articles with no common tags should return an empty set
articles = [
{"tags": ["python"]},
{"tags": ["java"]},
{"tags": ["c++"]}
]
codeflash_output = find_common_tags(articles) # 2.42μs -> 1.90μs (27.3% faster)
# Outputs were verified to be equal to the original implementation
def test_articles_with_empty_tag_lists():
# Articles with some empty tag lists should return an empty set
articles = [
{"tags": []},
{"tags": ["python"]},
{"tags": ["python", "java"]}
]
codeflash_output = find_common_tags(articles) # 2.07μs -> 1.70μs (21.8% faster)
# Outputs were verified to be equal to the original implementation
def test_all_articles_with_empty_tag_lists():
# All articles with empty tag lists should return an empty set
articles = [
{"tags": []},
{"tags": []},
{"tags": []}
]
codeflash_output = find_common_tags(articles) # 2.09μs -> 1.58μs (32.3% faster)
# Outputs were verified to be equal to the original implementation
def test_tags_with_special_characters():
# Tags with special characters should be handled correctly
articles = [
{"tags": ["python!", "coding"]},
{"tags": ["python!", "data"]}
]
codeflash_output = find_common_tags(articles) # 2.33μs -> 1.96μs (18.9% faster)
# Outputs were verified to be equal to the original implementation
def test_case_sensitivity():
# Tags with different cases should not be considered the same
articles = [
{"tags": ["Python", "coding"]},
{"tags": ["python", "data"]}
]
codeflash_output = find_common_tags(articles) # 2.17μs -> 1.84μs (18.0% faster)
# Outputs were verified to be equal to the original implementation
def test_large_number_of_articles():
# Large number of articles with a common tag should return that tag
articles = [{"tags": ["common_tag", f"tag{i}"]} for i in range(1000)]
codeflash_output = find_common_tags(articles) # 228μs -> 150μs (51.5% faster)
# Outputs were verified to be equal to the original implementation
def test_large_number_of_tags():
# Large number of tags with some common tags should return the common tags
articles = [
{"tags": [f"tag{i}" for i in range(1000)]},
{"tags": [f"tag{i}" for i in range(500, 1500)]}
]
expected = {f"tag{i}" for i in range(500, 1000)}
codeflash_output = find_common_tags(articles) # 4.40ms -> 81.2μs (5316% faster)
# Outputs were verified to be equal to the original implementation
def test_mixed_length_of_tag_lists():
# Articles with mixed length of tag lists should return the common tags
articles = [
{"tags": ["python", "coding"]},
{"tags": ["python"]},
{"tags": ["python", "coding", "tutorial"]}
]
codeflash_output = find_common_tags(articles) # 2.67μs -> 2.21μs (20.8% faster)
# Outputs were verified to be equal to the original implementation
def test_tags_with_different_data_types():
# Tags with different data types should only consider strings
articles = [
{"tags": ["python", 123]},
{"tags": ["python", "123"]}
]
codeflash_output = find_common_tags(articles) # 2.11μs -> 1.92μs (9.93% faster)
# Outputs were verified to be equal to the original implementation
def test_performance_with_large_data():
# Performance with large data should return the common tag
articles = [{"tags": ["common_tag", f"tag{i}"]} for i in range(10000)]
codeflash_output = find_common_tags(articles) # 2.26ms -> 1.51ms (50.2% faster)
# Outputs were verified to be equal to the original implementation
def test_scalability_with_increasing_tags():
# Scalability with increasing tags should return the common tag
articles = [{"tags": ["common_tag"] + [f"tag{i}" for i in range(j)]} for j in range(1, 1001)]
codeflash_output = find_common_tags(articles) # 412μs -> 282μs (46.2% faster)
# Outputs were verified to be equal to the original implementation
#------------------------------------------------
# imports
# function to test
from __future__ import annotations
import pytest # used for our unit tests
from codeflash.result.common_tags import find_common_tags
# unit tests
def test_empty_input_list():
# Test with an empty list
codeflash_output = find_common_tags([]) # 561ns -> 481ns (16.6% faster)
# Outputs were verified to be equal to the original implementation
def test_single_article():
# Test with a single article with tags
codeflash_output = find_common_tags([{"tags": ["python", "coding", "development"]}]) # 1.44μs -> 1.37μs (5.10% faster)
# Test with a single article with no tags
codeflash_output = find_common_tags([{"tags": []}]) # 571ns -> 541ns (5.55% faster)
# Outputs were verified to be equal to the original implementation
def test_multiple_articles_some_common_tags():
# Test with multiple articles having some common tags
articles = [
{"tags": ["python", "coding", "development"]},
{"tags": ["python", "development", "tutorial"]},
{"tags": ["python", "development", "guide"]}
]
codeflash_output = find_common_tags(articles) # 2.88μs -> 2.56μs (12.5% faster)
articles = [
{"tags": ["tech", "news"]},
{"tags": ["tech", "gadgets"]},
{"tags": ["tech", "reviews"]}
]
codeflash_output = find_common_tags(articles) # 1.53μs -> 1.21μs (26.5% faster)
# Outputs were verified to be equal to the original implementation
def test_multiple_articles_no_common_tags():
# Test with multiple articles having no common tags
articles = [
{"tags": ["python", "coding"]},
{"tags": ["development", "tutorial"]},
{"tags": ["guide", "learning"]}
]
codeflash_output = find_common_tags(articles) # 2.24μs -> 1.94μs (15.4% faster)
articles = [
{"tags": ["apple", "banana"]},
{"tags": ["orange", "grape"]},
{"tags": ["melon", "kiwi"]}
]
codeflash_output = find_common_tags(articles) # 1.24μs -> 922ns (34.7% faster)
# Outputs were verified to be equal to the original implementation
def test_articles_with_duplicate_tags():
# Test with articles having duplicate tags
articles = [
{"tags": ["python", "python", "coding"]},
{"tags": ["python", "development", "python"]},
{"tags": ["python", "guide", "python"]}
]
codeflash_output = find_common_tags(articles) # 2.71μs -> 2.38μs (13.9% faster)
articles = [
{"tags": ["tech", "tech", "news"]},
{"tags": ["tech", "tech", "gadgets"]},
{"tags": ["tech", "tech", "reviews"]}
]
codeflash_output = find_common_tags(articles) # 1.64μs -> 1.26μs (30.2% faster)
# Outputs were verified to be equal to the original implementation
def test_articles_with_mixed_case_tags():
# Test with articles having mixed case tags
articles = [
{"tags": ["Python", "Coding"]},
{"tags": ["python", "Development"]},
{"tags": ["PYTHON", "Guide"]}
]
codeflash_output = find_common_tags(articles) # 2.36μs -> 1.91μs (23.6% faster)
articles = [
{"tags": ["Tech", "News"]},
{"tags": ["tech", "Gadgets"]},
{"tags": ["TECH", "Reviews"]}
]
codeflash_output = find_common_tags(articles) # 1.15μs -> 1.03μs (11.6% faster)
# Outputs were verified to be equal to the original implementation
def test_articles_with_non_string_tags():
# Test with articles having non-string tags
articles = [
{"tags": ["python", 123, "coding"]},
{"tags": ["python", "development", 123]},
{"tags": ["python", "guide", 123]}
]
codeflash_output = find_common_tags(articles) # 3.12μs -> 2.50μs (24.8% faster)
articles = [
{"tags": [None, "news"]},
{"tags": ["tech", None]},
{"tags": [None, "reviews"]}
]
codeflash_output = find_common_tags(articles) # 1.63μs -> 1.32μs (23.4% faster)
# Outputs were verified to be equal to the original implementation
def test_large_scale_test_cases():
# Test with large scale input where all tags should be common
articles = [
{"tags": ["tag" + str(i) for i in range(1000)]} for _ in range(100)
]
expected_output = {"tag" + str(i) for i in range(1000)}
codeflash_output = find_common_tags(articles) # 380ms -> 3.43ms (10995% faster)
# Test with large scale input where no tags should be common
articles = [
{"tags": ["tag" + str(i) for i in range(1000)]} for _ in range(50)
] + [{"tags": ["unique_tag"]}]
codeflash_output = find_common_tags(articles) # 188ms -> 1.68ms (11131% faster)
# Outputs were verified to be equal to the original implementation
#------------------------------------------------
from codeflash.result.common_tags import find_common_tags
def test_find_common_tags():
find_common_tags([{}, {}])
def test_find_common_tags_2():
find_common_tags([])🔎 Concolic Coverage Tests and Runtime
| Test File::Test Function | Original ⏱️ | Optimized ⏱️ | Speedup |
|---|---|---|---|
codeflash_concolic_4ui61_cv/tmpr9nwzabd/test_concolic_coverage.py::test_find_common_tags |
2.18μs | 2.20μs | -0.907% |
codeflash_concolic_4ui61_cv/tmpr9nwzabd/test_concolic_coverage.py::test_find_common_tags_2 |
632ns | 501ns | 26.1%✅ |
To test or edit this optimization locally git merge codeflash/optimize-pr820-2025-10-15T18.51.29
| common_tags = articles[0].get("tags", []) | |
| for article in articles[1:]: | |
| common_tags = [tag for tag in common_tags if tag in article.get("tags", [])] | |
| return set(common_tags) | |
| common_tags = set(articles[0].get("tags", [])) | |
| for article in articles[1:]: | |
| common_tags.intersection_update(article.get("tags", [])) | |
| if not common_tags: | |
| break | |
| return common_tags |
PR Type
Enhancement, Tests
Description
Add common tags computation utility
Introduce unit tests for intersection
Diagram Walkthrough
File Walkthrough
common_tags.py
Implement common tags intersection utilitycodeflash/result/common_tags.py
find_common_tagsto compute shared tagstest_common_tags.py
Add unit tests for common tags utilitytests/test_common_tags.py
find_common_tags