|
| 1 | +# Union Find / Disjoint Set |
| 2 | + |
| 3 | +A disjoint-set structure also known as a union-find or merge-find set, is a data structure |
| 4 | +keeps track of a partition of a set into disjoint (non-overlapping) subsets. In CS2040s, this |
| 5 | +is primarily used to check for dynamic connectivity. For instance, Kruskal's algorithm |
| 6 | +in graph theory to find minimum spanning tree of the graph utilizes disjoint set to efficiently |
| 7 | +query if there exists a path between 2 nodes. <br> |
| 8 | +It supports 2 main operations: |
| 9 | +1. Union: Join two subsets into a single subset |
| 10 | +2. Find: Determine which subset a particular element is in. In practice, this is often done to check |
| 11 | +if two elements are in the same subset or component. |
| 12 | + |
| 13 | +The Disjoint Set structure is often introduced in 3 parts, with each iteration being better than the |
| 14 | +previous in terms of time and space complexity. Below is a brief overview: |
| 15 | + |
| 16 | +## Quick Find |
| 17 | +Every object will be assigned a component identity. The implementation of Quick Find often involves |
| 18 | +an underlying array that tracks the component identity of each object. |
| 19 | + |
| 20 | +**Union**: Between the two components, decide on the component d, to represent the combined set. Let the other |
| 21 | +component's identity be d'. Simply iterate over the component identifier array, and for any element with |
| 22 | +identity d', assign it to d. |
| 23 | + |
| 24 | +**Find**: Simply use the component identifier array to query for the component identity of the two elements |
| 25 | +and check if they are equal. This is why this implementation is known as "Quick Find". |
| 26 | + |
| 27 | +#### Analysis |
| 28 | +Let n be the number of elements in consideration. |
| 29 | + |
| 30 | +**Time**: O(n) for Union and O(1) for Find operations |
| 31 | + |
| 32 | +**Space**: O(n) auxiliary space for the component identifier |
| 33 | + |
| 34 | + |
| 35 | +## Quick Union |
| 36 | +Here, we consider a completely different approach. We consider the use of trees. Every element can be |
| 37 | +thought of as a tree node and starts off in its own component. Under this representation, it is likely |
| 38 | +that at any given point, we might have a forest of trees, and that's perfectly fine. The root node of each tree |
| 39 | +simply represents the component / set of all elements in the same set. <br> |
| 40 | +Note that the trees here are not necessarily binary trees. In fact, more often than not, we will have nodes |
| 41 | +with multiple children nodes. |
| 42 | + |
| 43 | +**Union**: Between the two components, decide on the component to represent the combined set as before. |
| 44 | +Now, union is simply assigning the root node of one tree to be the child of the root node of another. Hence, its name. |
| 45 | +One thing to note is that to identify the component of the object involves traversing to the root node of the |
| 46 | +tree. |
| 47 | + |
| 48 | +**Find**: For each of the node, we traverse up the tree from the current node until the root. Check if the |
| 49 | +two roots are the same |
| 50 | + |
| 51 | +#### Analysis |
| 52 | +**Time**: O(n) for Union and Find operations. While union-ing is indeed quick, it is possibly undermined |
| 53 | +by O(n) traversal in the case of a degenerate tree. Note that at this stage, there is nothing to ensure the trees |
| 54 | +are balanced. |
| 55 | + |
| 56 | +**Space**: O(n), implementation still involves wrapping the n elements with some structure / wrapper. |
| 57 | + |
| 58 | + |
| 59 | +## Weighted Union |
| 60 | +Now, we augment and improve upon the Quick Union structure by ensuring trees constructed are 'balanced'. Balanced |
| 61 | +trees have a nice property that the height of the tree will be upper-bounded by O(log(n)). This considerably speeds |
| 62 | +up Union operations. <br> |
| 63 | +We additionally track the size of each tree and ensure that whenever there is a union between 2 elements, the smaller |
| 64 | +tree will be the child of a larger tree. It can be mathematically shown the height of the tree is bounded by O(log(n)). |
| 65 | + |
| 66 | +#### Analysis |
| 67 | +**Time**: O(log(n)) for Union and Find operations. |
| 68 | + |
| 69 | +**Space**: Remains at O(n) |
| 70 | + |
| 71 | + |
| 72 | +### Path Compression |
| 73 | +We can further improve on the time complexity of Weighted Union by introducing path compression. Specifically, during |
| 74 | +the traversal of a node up to the root, we re-assign each node's parent to be the root (or as shown in CS2040s, |
| 75 | +assigning to its grandparent actually suffice and yield the same big-O upper-bound! This allows path compression to be |
| 76 | +done in a single pass.). By doing so, we greatly reduce the height of the trees formed. |
| 77 | + |
| 78 | +#### Analysis |
| 79 | +The analysis is a bit trickier here and talks about the inverse-Ackermann function. Interested readers can find out more |
| 80 | +[here](https://dl.acm.org/doi/pdf/10.1145/321879.321884) |
| 81 | + |
| 82 | +**Time**: O(alpha) |
| 83 | + |
| 84 | +**Space**: O(n) |
0 commit comments