-
Notifications
You must be signed in to change notification settings - Fork 2
feat(strings): similar string groups #104
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
|
Caution Review failedThe pull request is closed. WalkthroughAdds a Union-Find implementation and re-exports, introduces a new "Similar String Groups" module with two DSU-based implementations, tests and README, and updates package navigation in DIRECTORY.md. Changes
Sequence Diagram(s)sequenceDiagram
participant Caller as num_similar_groups()
participant DSU as DisjointSetUnion
participant Check as are_similar()
Caller->>DSU: __init__(n)
Note right of DSU: parent[], rank[], count initialized
loop for each pair (i, j)
Caller->>Check: are_similar(strs[i], strs[j])
alt similar
Check-->>Caller: True
Caller->>DSU: union(i, j)
Note right of DSU: merge roots, update count
else not similar
Check-->>Caller: False
end
end
Caller->>DSU: get_count()
DSU-->>Caller: group_count
Caller-->>Caller: return group_count
Estimated code review effort🎯 3 (Moderate) | ⏱️ ~30 minutes
Possibly related PRs
Suggested labels
Poem
Pre-merge checks and finishing touches❌ Failed checks (1 warning)
✅ Passed checks (2 passed)
📜 Recent review detailsConfiguration used: CodeRabbit UI Review profile: CHILL Plan: Pro 📒 Files selected for processing (2)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 3
🧹 Nitpick comments (2)
pystrings/similar_string_groups/__init__.py (1)
67-77: Implementation works but uses different counting approach.This implementation correctly groups similar strings. However, there's a subtle difference in how it counts groups compared to
num_similar_groups:
- This version counts unique roots via set comprehension (line 76)
num_similar_groupsusesDisjointSetUnion.get_count()Both approaches are correct, but the set-based counting approach is less efficient as it performs O(n) find operations after the union operations complete.
Consider documenting why two implementations exist or consolidating them:
def num_similar_groups_2(strs: List[str]) -> int: """ Alternative implementation that demonstrates counting groups via unique roots instead of maintaining a count during unions. """ n = len(strs) uf = DisjointSetUnion(n) for i in range(n): for j in range(i + 1, n): if are_similar(strs[i], strs[j]): uf.union(i, j) # Alternative: return uf.get_count() for better performance roots = {uf.find(i) for i in range(n)} return len(roots)datastructures/sets/union_find/__init__.py (1)
25-26: Remove debug comments.Lines 25-26 contain inline comments (
# 0,# 1) that appear to be debug artifacts from development. These should be removed for cleaner code.- root_i = self.find(i) # 0 - root_j = self.find(j) # 1 + root_i = self.find(i) + root_j = self.find(j)
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
⛔ Files ignored due to path filters (10)
pystrings/similar_string_groups/images/similar_string_groups_example_1.pngis excluded by!**/*.pngpystrings/similar_string_groups/images/similar_string_groups_example_2.pngis excluded by!**/*.pngpystrings/similar_string_groups/images/similar_string_groups_example_3.pngis excluded by!**/*.pngpystrings/similar_string_groups/images/similar_string_groups_solution_1.pngis excluded by!**/*.pngpystrings/similar_string_groups/images/similar_string_groups_solution_2.pngis excluded by!**/*.pngpystrings/similar_string_groups/images/similar_string_groups_solution_3.pngis excluded by!**/*.pngpystrings/similar_string_groups/images/similar_string_groups_solution_4.pngis excluded by!**/*.pngpystrings/similar_string_groups/images/similar_string_groups_solution_5.pngis excluded by!**/*.pngpystrings/similar_string_groups/images/similar_string_groups_solution_6.pngis excluded by!**/*.pngpystrings/similar_string_groups/images/similar_string_groups_solution_7.pngis excluded by!**/*.png
📒 Files selected for processing (7)
DIRECTORY.md(1 hunks)datastructures/__init__.py(1 hunks)datastructures/sets/__init__.py(1 hunks)datastructures/sets/union_find/__init__.py(1 hunks)pystrings/similar_string_groups/README.md(1 hunks)pystrings/similar_string_groups/__init__.py(1 hunks)pystrings/similar_string_groups/test_similar_string_groups.py(1 hunks)
🧰 Additional context used
🧬 Code graph analysis (4)
pystrings/similar_string_groups/test_similar_string_groups.py (1)
pystrings/similar_string_groups/__init__.py (2)
num_similar_groups(5-52)num_similar_groups_2(67-77)
datastructures/sets/__init__.py (1)
datastructures/sets/union_find/__init__.py (2)
DisjointSetUnion(1-45)UnionFind(48-62)
pystrings/similar_string_groups/__init__.py (1)
datastructures/sets/union_find/__init__.py (7)
DisjointSetUnion(1-45)UnionFind(48-62)union(20-41)union(58-62)get_count(43-45)find(12-18)find(52-56)
datastructures/__init__.py (1)
datastructures/sets/union_find/__init__.py (2)
DisjointSetUnion(1-45)UnionFind(48-62)
🪛 LanguageTool
pystrings/similar_string_groups/README.md
[style] ~3-~3: ‘exactly the same’ might be wordy. Consider a shorter alternative.
Context: ...e considered similar if they are either exactly the same or can be made identical by swapping at...
(EN_WORDINESS_PREMIUM_EXACTLY_THE_SAME)
[style] ~38-~38: ‘exactly the same’ might be wordy. Consider a shorter alternative.
Context: ...o strings are similar — that is, either exactly the same or differ at exactly two positions (me...
(EN_WORDINESS_PREMIUM_EXACTLY_THE_SAME)
🪛 markdownlint-cli2 (0.18.1)
DIRECTORY.md
732-732: Unordered list indentation
Expected: 0; Actual: 2
(MD007, ul-indent)
733-733: Unordered list indentation
Expected: 2; Actual: 4
(MD007, ul-indent)
pystrings/similar_string_groups/README.md
54-54: Inconsistent indentation for list items at the same level
Expected: 3; Actual: 4
(MD005, list-indent)
56-56: Reference links and images should use a label that is defined
Missing link or image reference definition: "::-1"
(MD052, reference-links-images)
🔇 Additional comments (6)
DIRECTORY.md (1)
732-733: LGTM! Documentation entry properly placed.The Similar String Groups entry is correctly positioned under Pystrings and follows the existing indentation pattern used throughout the file.
datastructures/__init__.py (1)
1-6: LGTM! Clean package organization.The re-export pattern properly exposes the DSU classes at the package level, making them conveniently accessible via
from datastructures import DisjointSetUnion, UnionFind.pystrings/similar_string_groups/test_similar_string_groups.py (1)
1-60: LGTM! Comprehensive test coverage.The test suite effectively validates both implementations with identical test cases covering:
- Multiple distinct groups (test_1: 3 groups)
- All strings in a single group (test_2: 1 group)
- Partial grouping (test_3, test_4: 2 groups)
- Longer strings for performance validation
The parallel test structure ensures both
num_similar_groupsandnum_similar_groups_2maintain consistent behavior.datastructures/sets/__init__.py (1)
1-3: LGTM! Proper module organization.Clean re-export pattern that maintains the package hierarchy while making the Union-Find classes accessible at the
setsmodule level.pystrings/similar_string_groups/README.md (1)
1-113: Excellent documentation!The README provides comprehensive coverage of the problem with:
- Clear problem statement and constraints
- Visual examples for understanding
- Detailed solution walkthrough
- Accurate time complexity analysis: O(n²·m) dominated by string comparisons
- Space complexity analysis: O(n) for Union-Find and tracking structures
The explanation of the Union-Find approach is well-articulated and helps readers understand both the algorithm and data structure choice.
pystrings/similar_string_groups/__init__.py (1)
55-65: LGTM! Correct similarity check.The
are_similarfunction correctly implements the similarity logic:
- Returns True for identical strings (0 differences)
- Returns True for strings differing at exactly 2 positions where the character pairs are symmetric
- Early exits when more than 2 differences detected
The symmetry check
diff[0] == diff[1][::-1]properly validates that swapping would make the strings equal.
Co-authored-by: coderabbitai[bot] <136622811+coderabbitai[bot]@users.noreply.github.com>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 1
🧹 Nitpick comments (1)
datastructures/sets/union_find/__init__.py (1)
48-74: Consider consolidating or documenting the purpose of having two Union-Find implementations.Both
UnionFindandDisjointSetUnionserve the same purpose but with different optimizations:
DisjointSetUnion: Uses union by rank + path compression, tracks set count → O(α(n)) amortized per operationUnionFind: Uses only path compression → can degrade to O(log n) per operationThis creates potential confusion about which implementation to use. Consider one of the following:
- Consolidate: Remove
UnionFindand use onlyDisjointSetUnion(recommended for best performance)- Document differences: Add clear docstring guidance on when to use each (e.g., "Use
UnionFindfor simpler cases where minimal memory overhead is preferred; useDisjointSetUnionfor optimal performance")- Enhance UnionFind: Add union by rank and
get_count()to matchDisjointSetUnion's API and performanceBased on learnings from the past review comment, option 1 (consolidation) was suggested. If both are needed, consider adding this to the
UnionFinddocstring:class UnionFind: """ A minimal Union-Find data structure with path compression. Note: This is a simpler alternative to DisjointSetUnion without union by rank optimization. For performance-critical applications, prefer DisjointSetUnion. """
Co-authored-by: coderabbitai[bot] <136622811+coderabbitai[bot]@users.noreply.github.com>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 1
♻️ Duplicate comments (1)
pystrings/similar_string_groups/__init__.py (1)
17-41: Code duplication between is_similar and are_similar.The nested
is_similarfunction duplicates the logic of the module-levelare_similarfunction (lines 56-66). This was flagged in previous reviews and remains unaddressed.Consider removing the nested
is_similarand usingare_similardirectly:- # All strings have the same length, per constraints - word_len = len(strs[0]) - # Initialize Union-Find with n elements, one for each string. # The initial count is n (each string is its own group). uf = DisjointSetUnion(strs_len) - def is_similar(s1: str, s2: str) -> bool: - """ - Checks if two strings are similar. - Similar means identical (0 diffs) or 1 swap (2 diffs). - """ - diff_count = 0 - positions_that_differ = [] - for k in range(word_len): - if s1[k] != s2[k]: - positions_that_differ.append(k) - diff_count += 1 - - # Optimization: If more than 2 differences, - # they can't be similar. - if diff_count > 2: - return False - - if diff_count == 2: - i = positions_that_differ[0] - j = positions_that_differ[1] - return s1[i] == s2[j] and s1[j] == s2[i] - - # At this point, diff_count is either 0 or 1 - # Only 0 differences (identical strings) are similar - return diff_count == 0 - # Iterate over all unique pairs of strings for i in range(strs_len): for j in range(i + 1, strs_len): # If the strings are similar, merge their groups. # The union() method handles decrementing the count # only if they were in different groups. - if is_similar(strs[i], strs[j]): + if are_similar(strs[i], strs[j]): uf.union(i, j)
🧹 Nitpick comments (3)
pystrings/similar_string_groups/__init__.py (3)
2-2: Remove unused import.
UnionFindis imported but never used in this file.Apply this diff to remove the unused import:
-from datastructures import DisjointSetUnion, UnionFind +from datastructures import DisjointSetUnion
56-56: Add type hints for consistency.The
are_similarfunction is missing type hints for parameters and return type, which is inconsistent with the rest of the codebase.Apply this diff to add type hints:
-def are_similar(s1, s2): +def are_similar(s1: str, s2: str) -> bool:
77-78: Consider using get_count() for consistency.
num_similar_groups_2manually counts unique roots by creating a set, whilenum_similar_groupsuses the more efficientuf.get_count()method. Both approaches are correct, but usingget_count()is more efficient (O(1) vs O(n)) and consistent with the DSU API.Apply this diff to use
get_count():- roots = {uf.find(i) for i in range(n)} - return len(roots) + return uf.get_count()
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (1)
pystrings/similar_string_groups/__init__.py(1 hunks)
🧰 Additional context used
🧬 Code graph analysis (1)
pystrings/similar_string_groups/__init__.py (1)
datastructures/sets/union_find/__init__.py (7)
DisjointSetUnion(1-45)UnionFind(48-74)union(20-41)union(64-74)get_count(43-45)find(12-18)find(57-62)
🔇 Additional comments (1)
pystrings/similar_string_groups/__init__.py (1)
39-41: Bug fix verified.The logic issue from the previous review has been correctly addressed. The function now properly rejects strings with exactly 1 difference, returning
Trueonly for identical strings whendiff_count != 2.
Describe your change:
Algorithm to find similar string groups
Checklist:
Fixes: #{$ISSUE_NO}.Summary by CodeRabbit
New Features
Documentation
Tests