Skip to content

Commit aa48b9c

Browse files
committed
merging
Merge branch 'main' into branch-quickSort
2 parents 3253330 + 7834f3a commit aa48b9c

File tree

25 files changed

+1380
-207
lines changed

25 files changed

+1380
-207
lines changed

src/algorithms/patternFinding/README.md

Lines changed: 11 additions & 5 deletions
Original file line numberDiff line numberDiff line change
@@ -6,6 +6,12 @@ in text editors when searching for a pattern, in computational biology sequence
66
in NLP problems, and even for looking for file patterns for effective file management.
77
It is hence crucial that we develop an efficient algorithm.
88

9+
![KMP](../../../assets/kmp.png)
10+
Image Source: GeeksforGeeks
11+
12+
## Analysis
13+
**Time complexity**:
14+
915
Naively, we can look for patterns in a given sequence in O(nk) where n is the length of the sequence and k
1016
is the length of the pattern. We do this by iterating every character of the sequence, and look at the
1117
immediate k-1 characters that come after it. This is not a big issue if k is known to be small, but there's
@@ -15,9 +21,9 @@ KMP does this in O(n+k) by making use of previously identified sub-patterns. It
1521
by first processing the pattern input in O(k) time, allowing identification of patterns in
1622
O(n) traversal of the sequence. More details found in the src code.
1723

18-
![KMP](../../../assets/kmp.png)
19-
Image Source: GeeksforGeeks
20-
24+
**Space complexity**: O(k) auxiliary space to store suffix that matches with prefix of the pattern string
2125

22-
If you have trouble understanding the implementation,
23-
here is a good [video](https://www.youtube.com/watch?v=EL4ZbRF587g).
26+
## Notes
27+
A detailed illustration of how the algorithm works is shown in the code.
28+
But if you have trouble understanding the implementation,
29+
here is a good [video](https://www.youtube.com/watch?v=EL4ZbRF587g) as well.

src/algorithms/sorting/bubbleSort/BubbleSort.java

Lines changed: 1 addition & 17 deletions
Original file line numberDiff line numberDiff line change
@@ -15,23 +15,7 @@
1515
*
1616
* At the kth iteration of the outer loop, we only require (n-k) adjacent comparisons to get the kth largest
1717
* element to its correct position.
18-
*
19-
* Complexity Analysis:
20-
* Time:
21-
* - Worst case (reverse sorted array): O(n^2)
22-
* - Average case: O(n^2)
23-
* - Best case (sorted array): O(n)
24-
* In the worst case, during each iteration of the outer loop, the number of adjacent comparisons is upper-bounded
25-
* by n. Since BubbleSort requires (n-1) iterations of the outer loop to sort the entire array, the total number
26-
* of comparisons performed can be upper-bounded by (n-1) * n ≈ n^2.
27-
*
28-
* This implementation of BubbleSort terminates the outer loop once there are no swaps within one iteration of the
29-
* outer loop. This improves the best case time complexity to O(n) for an already sorted array.
30-
*
31-
* Space:
32-
* - O(1) since sorting is done in-place
3318
*/
34-
3519
public class BubbleSort {
3620
/**
3721
* Sorts the given array in-place in non-decreasing order.
@@ -40,7 +24,7 @@ public class BubbleSort {
4024
*/
4125
public static int[] sort(int[] arr) {
4226
int n = arr.length;
43-
boolean swapped; //tracks of the presence of swaps within one iteration of the outer loop to
27+
boolean swapped; // tracks of the presence of swaps within one iteration of the outer loop to
4428
// facilitate early termination
4529
for (int i = 0; i < n - 1; i++ ) { //outer loop which supports the invariant
4630
swapped = false;
Lines changed: 21 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -1 +1,21 @@
1-
![bubble sort img](../../../../assets/BubbleSort.jpeg)
1+
# Bubble Sort
2+
Bubble sort is one of the more intuitive comparison-based sorting algorithms.
3+
It makes repeated comparisons between neighbouring elements, 'bubbling' (side-by-side swaps)
4+
largest (or smallest) element in the unsorted region to the sorted region (often the front or the back).
5+
6+
![bubble sort img](../../../../assets/BubbleSort.jpeg)
7+
8+
## Complexity Analysis
9+
**Time**:
10+
- Worst case (reverse sorted array): O(n^2)
11+
- Average case: O(n^2)
12+
- Best case (sorted array): O(n)
13+
14+
In the worst case, during each iteration of the outer loop, the number of adjacent comparisons is upper-bounded
15+
by n. Since BubbleSort requires (n-1) iterations of the outer loop to sort the entire array, the total number
16+
of comparisons performed can be upper-bounded by (n-1) * n ≈ n^2.
17+
18+
This implementation of BubbleSort terminates the outer loop once there are no swaps within one iteration of the
19+
outer loop. This improves the best case time complexity to O(n) for an already sorted array.
20+
21+
**Space**: O(1) since sorting is done in-place

src/algorithms/sorting/countingSort/CountingSort.java

Lines changed: 6 additions & 17 deletions
Original file line numberDiff line numberDiff line change
@@ -1,45 +1,34 @@
11
package src.algorithms.sorting.countingSort;
22

33
/**
4-
* Stable implementation of Counting Sort.
5-
*
6-
* <p></p>
4+
* <p></p> Stable implementation of Counting Sort.
75
*
8-
* Brief Description: <br>
6+
* <p></p> Brief Description: <br>
97
* Counting sort is a non-comparison based sorting algorithm and isn't bounded by the O(nlogn) lower-bound
108
* of most sorting algorithms. <br>
119
* It first obtains the frequency map of all elements (ie counting the occurrence of every element), then
1210
* computes the prefix sum for the map. This prefix map tells us which position an element should be inserted. <br>
1311
* Ultimately, each group of elements will be placed together, and the groups in succession, in the sorted output.
1412
*
15-
* <p></p>
16-
*
17-
* Assumption for use: <br>
13+
* <p></p> Assumption for use: <br>
1814
* To perform counting sort, the elements must first have total ordering and their rank must be known.
1915
*
20-
* <p></p>
21-
*
22-
* Implementation Invariant: <br>
16+
* <p></p> Implementation Invariant: <br>
2317
* At the end of the ith iteration, the ith element from the back will be placed in its rightful position.
2418
*
2519
* <p></p>
26-
*
2720
* COMMON MISCONCEPTION: Counting sort does not require total ordering of elements since it is non-comparison based.
2821
* This is incorrect. It requires total ordering of elements to determine their relative positions in the sorted output.
2922
* In fact, in conventional implementation, the total ordering property is reflected by virtue of the structure
3023
* of the frequency map.
3124
*
32-
* <p></p>
33-
*
34-
* Complexity Analysis: <br>
25+
* <p></p> Complexity Analysis: <br>
3526
* Time: O(k+n)=O(max(k,n)) where k is the value of the largest element and n is the number of elements. <br>
3627
* Space: O(k+n)=O(max(k,n)) <br>
3728
* Counting sort is most efficient if the range of input values do not exceed the number of input values. <br>
3829
* Counting sort is NOT AN IN-PLACE algorithm. For one, it requires additional space to store freq map. <br>
3930
*
40-
* <p></p>
41-
*
42-
* Note: Implementation deals with integers but the idea is the same and can be generalised to other objects,
31+
* <p></p> Note: Implementation deals with integers but the idea is the same and can be generalised to other objects,
4332
* as long as what was discussed above remains true.
4433
*/
4534
public class CountingSort {
Lines changed: 21 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,21 @@
1+
# Counting Sort
2+
3+
Counting sort is a non-comparison-based sorting algorithm and isn't bounded by the O(nlogn) lower-bound
4+
of most sorting algorithms. <br>
5+
It first obtains the frequency map of all elements (ie counting the occurrence of every element), then
6+
computes the prefix sum for the map. This prefix map tells us which position an element should be inserted.
7+
Ultimately, each group of elements will be placed together, and the groups in succession, in the sorted output.
8+
9+
## Complexity Analysis
10+
Time: O(k+n)=O(max(k,n)) where k is the value of the largest element and n is the number of elements. <br>
11+
Space: O(k+n)=O(max(k,n)) <br>
12+
Counting sort is most efficient if the range of input values do not exceed the number of input values. <br>
13+
Counting sort is NOT AN IN-PLACE algorithm. For one, it requires additional space to store freq map. <br>
14+
15+
## Notes
16+
COMMON MISCONCEPTION: Counting sort does not require total ordering of elements since it is non-comparison based.
17+
This is incorrect. It requires total ordering of elements to determine their relative positions in the sorted output.
18+
In fact, in conventional implementation, the total ordering property is reflected by virtue of the structure
19+
of the frequency map.
20+
21+
Supplementary: Here is a [video](https://www.youtube.com/watch?v=OKd534EWcdk) if you are still having troubles.

src/algorithms/sorting/insertionSort/InsertionSort.java

Lines changed: 3 additions & 23 deletions
Original file line numberDiff line numberDiff line change
@@ -3,33 +3,13 @@
33
/** Here, we are implementing InsertionSort where we sort the array in increasing (or more precisely, non-decreasing)
44
* order.
55
*
6-
* Brief Description:
7-
* InsertionSort is a simple comparison-based sorting algorithm that builds the final sorted array one element at a
8-
* time. It works by repeatedly taking an element from the unsorted portion of the array and inserting it into its
9-
* correct position within the sorted portion. At the kth iteration, we take the element arr[k] and insert
10-
* it into arr[0, k-1] following sorted order, returning us arr[0, k] in sorted order.
11-
*
126
* Implementation Invariant:
137
* The loop invariant is: at the end of kth iteration, the first (k+1) items in the array are in sorted order.
148
* At the end of the (n-1)th iteration, all n items in the array will be in sorted order.
15-
* (Note: the loop invariant here slightly differs from the lecture slides as we are using 0-based indexing.)
16-
*
17-
* Complexity Analysis:
18-
* Time:
19-
* - Worst case (reverse sorted array): O(n^2)
20-
* - Average case: O(n^2)
21-
* - Best case (sorted array): O(n)
22-
*
23-
* In the worst case, inserting an element into the sorted array of length m requires us to iterate through the
24-
* entire array, requiring O(m) time. Since InsertionSort does this insertion (n - 1) times, the time complexity
25-
* of InsertionSort in the worst case is 1 + 2 + ... + (n-2) + (n-1) = O(n^2).
26-
*
27-
* In the best case of an already sorted array, inserting an element into the sorted array of length m requires
28-
* O(1) time as we insert it directly behind the first position of the pointer in the sorted array. Since InsertionSort
29-
* does this insertion (n-1) times, the time complexity of InsertionSort in the best case is O(1) * (n-1) = O(n).
309
*
31-
* Space:
32-
* - O(1) since sorting is done in-place
10+
* Note:
11+
* 1. the loop invariant here slightly differs from the lecture slides as we are using 0-based indexing
12+
* 2. Insertion into the sorted portion is done byb 'bubbling' elements as in bubble sort
3313
*/
3414

3515
public class InsertionSort {
Lines changed: 33 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -1,3 +1,36 @@
1+
# Insertion Sort
2+
3+
Insertion sort is a comparison-based sorting algorithm that builds the final sorted array one element at a
4+
time. It works by repeatedly taking an element from the unsorted portion of the array and
5+
inserting it correctly (portion remains sorted) into the sorted portion. Note that the position is not final
6+
since subsequent elements from unsorted portion may displace previously inserted elements. What's important is
7+
the sorted region remains sorted. More succinctly: <br>
8+
At the kth iteration, we take the element arr[k] and insert
9+
it into arr[0, k-1] following sorted order, returning us arr[0, k] in sorted order.
10+
111
![InsertionSort](../../../../assets/InsertionSort.png)
212

13+
## Complexity Analysis
14+
**Time**:
15+
- Worst case (reverse sorted array): O(n^2)
16+
- Average case: O(n^2)
17+
- Best case (sorted array): O(n)
18+
19+
In the worst case, inserting an element into the sorted array of length m requires us to iterate through the
20+
entire array, requiring O(m) time. Since InsertionSort does this insertion (n - 1) times, the time complexity
21+
of InsertionSort in the worst case is 1 + 2 + ... + (n-2) + (n-1) = O(n^2).
22+
23+
In the best case of an already sorted array, inserting an element into the sorted array of length m requires
24+
O(1) time as we insert it directly behind the first position of the pointer in the sorted array. Since InsertionSort
25+
does this insertion (n-1) times, the time complexity of InsertionSort in the best case is O(1) * (n-1) = O(n).
26+
27+
**Space**: O(1) since sorting is done in-place
28+
29+
## Notes
30+
### Common Misconception
31+
Its invariant is often confused with selection sort's. In selection sort, an element in the unsorted region will
32+
be immediately placed in its correct and final position as it would be in the sorted array. This is not the case
33+
for insertion sort. However, it is because of this 'looser' invariant that allows for a better best case time complexity
34+
for insertion sort.
35+
336
Image Source: https://www.hackerrank.com/challenges/correctness-invariant/problem
Lines changed: 19 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -1,3 +1,22 @@
1+
# Selection Sort
2+
3+
Selection sort is another intuitive comparison-based sorting algorithm. It works similarly to other sorting algorithms
4+
like bubble and insertion in the sense that it maintains a sorted and unsorted region. It does so by repeatedly finding
5+
smallest (or largest) element in the unsorted region, and places the element in the correct and final position as it
6+
would be in the sorted array.
7+
18
![SelectionSort](../../../../assets/SelectionSort.png)
29

10+
## Complexity Analysis
11+
**Time**:
12+
- Worst case: O(n^2)
13+
- Average case: O(n^2)
14+
- Best case: O(n^2)
15+
16+
Regardless of how sorted the input array is, selectionSort will run the minimum element finding algorithm (n-1)
17+
times. For an input array of length m, finding the minimum element necessarily takes O(m) time. Therefore, the
18+
time complexity of selectionSort is n + (n-1) + (n-2) + ... + 2 = O(n^2)
19+
20+
**Space**: O(1) since sorting is done in-place
21+
322
Image Source: https://www.hackerearth.com/practice/algorithms/sorting/selection-sort/tutorial/

src/algorithms/sorting/selectionSort/SelectionSort.java

Lines changed: 5 additions & 16 deletions
Original file line numberDiff line numberDiff line change
@@ -3,26 +3,15 @@
33
/** Here, we are implementing SelectionSort where we sort the array in increasing (or more precisely, non-decreasing)
44
* order.
55
*
6-
* Brief Description and Implementation Invariant:
7-
* Let the array to be sorted be A of length n. SelectionSort works by finding the minimum element A[j] in A[i...n],
8-
* then swapping A[i] with A[j], for i in [0, n-1). The loop invariant is: at the end of the kth iteration, the
9-
* smallest k items are correctly sorted in the first k positions of the array.
6+
* Implementation Invariant:
7+
* Let the array of length n to be sorted be A.
8+
* The loop invariant is:
9+
* At the end of the kth iteration, the smallest k items are correctly sorted in the first k positions of the array.
1010
*
11-
* At the end of the (n-1)th iteration of the loop, the smallest (n-1) items are correctly sorted in the first (n-1)
11+
* So, at the end of the (n-1)th iteration of the loop, the smallest (n-1) items are correctly sorted in the first (n-1)
1212
* positions of the array, leaving the last item correctly positioned in the last index of the array. Therefore,
1313
* (n-1) iterations of the loop is sufficient.
1414
*
15-
* Complexity Analysis:
16-
* Time:
17-
* - Worst case: O(n^2)
18-
* - Average case: O(n^2)
19-
* - Best case: O(n^2)
20-
* Regardless of how sorted the input array is, selectionSort will run the minimum element finding algorithm (n-1)
21-
* times. For an input array of length m, finding the minimum element necessarily takes O(m) time. Therefore, the
22-
* time complexity of selectionSort is n + (n-1) + (n-2) + ... + 2 = O(n^2)
23-
*
24-
* Space:
25-
* - O(1) since sorting is done in-place
2615
*/
2716

2817
public class SelectionSort {
Lines changed: 84 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,84 @@
1+
# Union Find / Disjoint Set
2+
3+
A disjoint-set structure also known as a union-find or merge-find set, is a data structure
4+
keeps track of a partition of a set into disjoint (non-overlapping) subsets. In CS2040s, this
5+
is primarily used to check for dynamic connectivity. For instance, Kruskal's algorithm
6+
in graph theory to find minimum spanning tree of the graph utilizes disjoint set to efficiently
7+
query if there exists a path between 2 nodes. <br>
8+
It supports 2 main operations:
9+
1. Union: Join two subsets into a single subset
10+
2. Find: Determine which subset a particular element is in. In practice, this is often done to check
11+
if two elements are in the same subset or component.
12+
13+
The Disjoint Set structure is often introduced in 3 parts, with each iteration being better than the
14+
previous in terms of time and space complexity. Below is a brief overview:
15+
16+
## Quick Find
17+
Every object will be assigned a component identity. The implementation of Quick Find often involves
18+
an underlying array that tracks the component identity of each object.
19+
20+
**Union**: Between the two components, decide on the component d, to represent the combined set. Let the other
21+
component's identity be d'. Simply iterate over the component identifier array, and for any element with
22+
identity d', assign it to d.
23+
24+
**Find**: Simply use the component identifier array to query for the component identity of the two elements
25+
and check if they are equal. This is why this implementation is known as "Quick Find".
26+
27+
#### Analysis
28+
Let n be the number of elements in consideration.
29+
30+
**Time**: O(n) for Union and O(1) for Find operations
31+
32+
**Space**: O(n) auxiliary space for the component identifier
33+
34+
35+
## Quick Union
36+
Here, we consider a completely different approach. We consider the use of trees. Every element can be
37+
thought of as a tree node and starts off in its own component. Under this representation, it is likely
38+
that at any given point, we might have a forest of trees, and that's perfectly fine. The root node of each tree
39+
simply represents the component / set of all elements in the same set. <br>
40+
Note that the trees here are not necessarily binary trees. In fact, more often than not, we will have nodes
41+
with multiple children nodes.
42+
43+
**Union**: Between the two components, decide on the component to represent the combined set as before.
44+
Now, union is simply assigning the root node of one tree to be the child of the root node of another. Hence, its name.
45+
One thing to note is that to identify the component of the object involves traversing to the root node of the
46+
tree.
47+
48+
**Find**: For each of the node, we traverse up the tree from the current node until the root. Check if the
49+
two roots are the same
50+
51+
#### Analysis
52+
**Time**: O(n) for Union and Find operations. While union-ing is indeed quick, it is possibly undermined
53+
by O(n) traversal in the case of a degenerate tree. Note that at this stage, there is nothing to ensure the trees
54+
are balanced.
55+
56+
**Space**: O(n), implementation still involves wrapping the n elements with some structure / wrapper.
57+
58+
59+
## Weighted Union
60+
Now, we augment and improve upon the Quick Union structure by ensuring trees constructed are 'balanced'. Balanced
61+
trees have a nice property that the height of the tree will be upper-bounded by O(log(n)). This considerably speeds
62+
up Union operations. <br>
63+
We additionally track the size of each tree and ensure that whenever there is a union between 2 elements, the smaller
64+
tree will be the child of a larger tree. It can be mathematically shown the height of the tree is bounded by O(log(n)).
65+
66+
#### Analysis
67+
**Time**: O(log(n)) for Union and Find operations.
68+
69+
**Space**: Remains at O(n)
70+
71+
72+
### Path Compression
73+
We can further improve on the time complexity of Weighted Union by introducing path compression. Specifically, during
74+
the traversal of a node up to the root, we re-assign each node's parent to be the root (or as shown in CS2040s,
75+
assigning to its grandparent actually suffice and yield the same big-O upper-bound! This allows path compression to be
76+
done in a single pass.). By doing so, we greatly reduce the height of the trees formed.
77+
78+
#### Analysis
79+
The analysis is a bit trickier here and talks about the inverse-Ackermann function. Interested readers can find out more
80+
[here](https://dl.acm.org/doi/pdf/10.1145/321879.321884)
81+
82+
**Time**: O(alpha)
83+
84+
**Space**: O(n)

0 commit comments

Comments
 (0)