Skip to content

Commit 496c1c0

Browse files
committed
Overview of disjoint set implementations
1 parent 58d738b commit 496c1c0

File tree

1 file changed

+84
-0
lines changed

1 file changed

+84
-0
lines changed
Lines changed: 84 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,84 @@
1+
# Union Find / Disjoint Set
2+
3+
A disjoint-set structure also known as a union-find or merge-find set, is a data structure
4+
keeps track of a partition of a set into disjoint (non-overlapping) subsets. In CS2040s, this
5+
is primarily used to check for dynamic connectivity. For instance, Kruskal's algorithm
6+
in graph theory to find minimum spanning tree of the graph utilizes disjoint set to efficiently
7+
query if there exists a path between 2 nodes. <br>
8+
It supports 2 main operations:
9+
1. Union: Join two subsets into a single subset
10+
2. Find: Determine which subset a particular element is in. In practice, this is often done to check
11+
if two elements are in the same subset or component.
12+
13+
The Disjoint Set structure is often introduced in 3 parts, with each iteration being better than the
14+
previous in terms of time and space complexity. Below is a brief overview:
15+
16+
## Quick Find
17+
Every object will be assigned a component identity. The implementation of Quick Find often involves
18+
an underlying array that tracks the component identity of each object.
19+
20+
**Union**: Between the two components, decide on the component d, to represent the combined set. Let the other
21+
component's identity be d'. Simply iterate over the component identifier array, and for any element with
22+
identity d', assign it to d.
23+
24+
**Find**: Simply use the component identifier array to query for the component identity of the two elements
25+
and check if they are equal. This is why this implementation is known as "Quick Find".
26+
27+
#### Analysis
28+
Let n be the number of elements in consideration.
29+
30+
**Time**: O(n) for Union and O(1) for Find operations
31+
32+
**Space**: O(n) auxiliary space for the component identifier
33+
34+
35+
## Quick Union
36+
Here, we consider a completely different approach. We consider the use of trees. Every element can be
37+
thought of as a tree node and starts off in its own component. Under this representation, it is likely
38+
that at any given point, we might have a forest of trees, and that's perfectly fine. The root node of each tree
39+
simply represents the component / set of all elements in the same set. <br>
40+
Note that the trees here are not necessarily binary trees. In fact, more often than not, we will have nodes
41+
with multiple children nodes.
42+
43+
**Union**: Between the two components, decide on the component to represent the combined set as before.
44+
Now, union is simply assigning the root node of one tree to be the child of the root node of another. Hence, its name.
45+
One thing to note is that to identify the component of the object involves traversing to the root node of the
46+
tree.
47+
48+
**Find**: For each of the node, we traverse up the tree from the current node until the root. Check if the
49+
two roots are the same
50+
51+
#### Analysis
52+
**Time**: O(n) for Union and Find operations. While union-ing is indeed quick, it is possibly undermined
53+
by O(n) traversal in the case of a degenerate tree. Note that at this stage, there is nothing to ensure the trees
54+
are balanced.
55+
56+
**Space**: O(n), implementation still involves wrapping the n elements with some structure / wrapper.
57+
58+
59+
## Weighted Union
60+
Now, we augment and improve upon the Quick Union structure by ensuring trees constructed are 'balanced'. Balanced
61+
trees have a nice property that the height of the tree will be upper-bounded by O(log(n)). This considerably speeds
62+
up Union operations. <br>
63+
We additionally track the size of each tree and ensure that whenever there is a union between 2 elements, the smaller
64+
tree will be the child of a larger tree. It can be mathematically shown the height of the tree is bounded by O(log(n)).
65+
66+
#### Analysis
67+
**Time**: O(log(n)) for Union and Find operations.
68+
69+
**Space**: Remains at O(n)
70+
71+
72+
### Path Compression
73+
We can further improve on the time complexity of Weighted Union by introducing path compression. Specifically, during
74+
the traversal of a node up to the root, we re-assign each node's parent to be the root (or as shown in CS2040s,
75+
assigning to its grandparent actually suffice and yield the same big-O upper-bound! This allows path compression to be
76+
done in a single pass.). By doing so, we greatly reduce the height of the trees formed.
77+
78+
#### Analysis
79+
The analysis is a bit trickier here and talks about the inverse-Ackermann function. Interested readers can find out more
80+
[here](https://dl.acm.org/doi/pdf/10.1145/321879.321884)
81+
82+
**Time**: O(alpha)
83+
84+
**Space**: O(n)

0 commit comments

Comments
 (0)