|
| 1 | +# Similar String Groups |
| 2 | + |
| 3 | +Two strings x and y are considered similar if they are either exactly the same or can be made identical by swapping at |
| 4 | +most two different characters in string x. |
| 5 | + |
| 6 | +We define a similarity group as a set of strings where each string is similar to at least one other string in the group. |
| 7 | +A string doesn't need to be directly similar to every other string in the group — it just needs to be connected to them |
| 8 | +through a chain of similarities. |
| 9 | + |
| 10 | +Given a list of strings strs, where each string is an anagram of the others, your task is to determine how many such |
| 11 | +similarity groups exist in the list. |
| 12 | + |
| 13 | +Constraints: |
| 14 | + |
| 15 | +- 1 ≤ strs.length ≤ 300 |
| 16 | +- 1 ≤ strs[i].length ≤ 300 |
| 17 | +- strs[i] consists of lowercase letters only. |
| 18 | +- All words in strs have the same length and are anagrams of each other. |
| 19 | + |
| 20 | +--- |
| 21 | + |
| 22 | +## Examples |
| 23 | + |
| 24 | + |
| 25 | + |
| 26 | + |
| 27 | + |
| 28 | +--- |
| 29 | + |
| 30 | +## Solution |
| 31 | + |
| 32 | +This problem can be seen as a graph connectivity challenge. Each string is a node, and an edge exists between two nodes |
| 33 | +if their corresponding strings are similar. Our goal is to count how many connected groups (components) exist in this |
| 34 | +graph. |
| 35 | + |
| 36 | +We solve this problem using the Union-Find (Disjoint Set Union) data structure to efficiently group similar strings. |
| 37 | +Initially, each string is placed in its own group. We then iterate over all possible pairs of strings. For each pair at |
| 38 | +indexes i and j, we check whether the two strings are similar — that is, either exactly the same or differ at exactly |
| 39 | +two positions (meaning one swap can make them equal). If they are similar and currently belong to different groups |
| 40 | +(i.e., their roots in the Union-Find structure are different), we perform a union operation to merge their groups. |
| 41 | +Repeating this across all string pairs gradually reduces the number of distinct groups. Finally, we count the number of |
| 42 | +unique roots in the Union-Find structure, which represents the number of similar string groups. |
| 43 | + |
| 44 | +Here’s the step-by-step explanation of the solution: |
| 45 | + |
| 46 | +1. Initialize n = len(strs). |
| 47 | +2. Create a Union-Find (DSU) structure with n elements, where each element is its own parent. |
| 48 | +3. Define a function areSimilar(s1, s2) that returns TRUE if both strings s1 and s2 are similar according to the given |
| 49 | + condition: |
| 50 | + - Initialize an empty list diff = [] to track differences. |
| 51 | + - Loop through both strings in parallel using zip. |
| 52 | + - If characters differ at any position, record the mismatch in diff. |
| 53 | + - Early exit if more than 2 differences and return FALSE. |
| 54 | + - After the loop is completed, evaluate the result: |
| 55 | + - len(diff) == 0 means the strings are identical. |
| 56 | + - len(diff) == 2 and diff[0] == diff[1][::-1] means there are exactly two differences and the character pairs are |
| 57 | + mirror images of each other. |
| 58 | + |
| 59 | +4. Loop over all pairs (i, j) such that 0 ≤ i < j < n. |
| 60 | +5. For each pair, use the areSimilar function to check if strs[i] and strs[j] are similar. |
| 61 | +6. If similar, use find(i) and find(j) to get their root parents. |
| 62 | +7. If the roots differ, merge them using union(i, j). |
| 63 | +8. After processing all pairs, iterate over all indexes i from 0 to n - 1 and find their root parents using find(i). |
| 64 | +9. Add each root to a set to track unique groups. |
| 65 | +10. Return the size of the set as the number of similarity groups. |
| 66 | + |
| 67 | +Let’s look at the following illustration to get a better understanding of the solution: |
| 68 | + |
| 69 | + |
| 70 | + |
| 71 | + |
| 72 | + |
| 73 | + |
| 74 | + |
| 75 | + |
| 76 | + |
| 77 | +### Time Complexity |
| 78 | +Let's break the time complexity down into two major components: |
| 79 | + |
| 80 | +#### **Comparing all pairs of strings** |
| 81 | + |
| 82 | +To check if two strings are similar, we compare them character by character, which takes _O(m)_ where m is the length |
| 83 | +of each string. Given there are n strings and we compare all possible pairs of strings, there are O(n²) comparisons. |
| 84 | +Therefore, the total time spent on comparisons is O(n²∗m). |
| 85 | + |
| 86 | +#### **Union-Find operations (find and union)** |
| 87 | + |
| 88 | +For each similar pair, we perform a find and possibly a union operation. With path compression, each operation takes |
| 89 | +O(α(n)) time, where α(n) is nearly constant in practice. Since there are up to O(n²) similar pairs, the total time for |
| 90 | +Union-Find operations is O(n²∗α(n)). |
| 91 | + |
| 92 | +The comparison step dominates the time complexity, as m (the string length) is typically much larger than α(n), which |
| 93 | +grows very slowly. Therefore, the overall time complexity is O(n²∗m). |
| 94 | + |
| 95 | +### Space Complexity |
| 96 | + |
| 97 | +The space complexity of the algorithm comes from the following components: |
| 98 | + |
| 99 | +#### **Union-Find parent array**: |
| 100 | + |
| 101 | +Requires O(n) space to store the parent of each node (one per string). |
| 102 | + |
| 103 | +#### **Temporary storage in areSimilar() function**: |
| 104 | + |
| 105 | +Uses O(1) space — a constant-sized list to track the positions where the two strings differ. Since at most 2 differences |
| 106 | +are allowed, space usage remains constant. |
| 107 | + |
| 108 | +#### **Set to store unique groups (roots)**: |
| 109 | + |
| 110 | +Requires O(n) space in the worst case, when all strings are in separate groups and each has a unique root. |
| 111 | + |
| 112 | +The total space complexity is O(n), as all other components (e.g., temporary storage and sets) do not exceed linear |
| 113 | +space relative to the input size. |
0 commit comments