Skip to content

Commit 0cff81c

Browse files
authored
Merge pull request #77 from 4ndrelim/branch-RefactorAVL
docs: Improve clarity for Radix and complete AVL docs
2 parents e4895d8 + 4e6c240 commit 0cff81c

File tree

10 files changed

+170
-66
lines changed

10 files changed

+170
-66
lines changed

docs/assets/images/AvlTree.png

43.1 KB
Loading
95.8 KB
Loading

docs/assets/images/RadixSort.png

37.4 KB
Loading
83.7 KB
Loading
Lines changed: 52 additions & 39 deletions
Original file line numberDiff line numberDiff line change
@@ -1,68 +1,81 @@
11
# Radix Sort
22

33
## Background
4-
54
Radix Sort is a non-comparison based, stable sorting algorithm that conventionally uses counting sort as a subroutine.
65

76
Radix Sort performs counting sort several times on the numbers. It sorts starting with the least-significant segment
8-
to the most-significant segment.
7+
to the most-significant segment. What a 'segment' refers to is explained below.
98

10-
### Segments
11-
The definition of a 'segment' is user defined and defers from implementation to implementation.
12-
It is most commonly defined as a bit chunk.
9+
### Idea
10+
The definition of a 'segment' is user-defined and could vary depending on implementation.
1311

14-
For example, if we aim to sort integers, we can sort each element
15-
from the least to most significant digit, with the digits being our 'segments'.
12+
Let's consider sorting an array of integers. We interpret the integers in base-10 as shown below.
13+
Here, we treat each digit as a 'segment' and sort (counting sort as a sub-routine here) the elements
14+
from the least significant digit (right) to most significant digit (left). In other words, the sub-routine sort is just
15+
focusing on 1 digit at a time.
1616

17-
Within our implementation, we take the binary representation of the elements and
18-
partition it into 8-bit segments. An integer is represented in 32 bits,
19-
this gives us 4 total segments to sort through.
17+
<div align="center">
18+
<img src="../../../../../../docs/assets/images/RadixSort.png" width="65%">
19+
<br>
20+
Credits: Level Up Coding
21+
</div>
2022

21-
Note that the number of segments is flexible and can range up to the number of digits in the binary representation.
22-
(In this case, sub-routine sort is done on every digit from right to left)
23+
The astute would note that a **stable version of counting sort** has to be used here, otherwise the relative ordering
24+
based on previous segments might get disrupted when sorting with subsequent segments.
2325

24-
![Radix Sort](https://miro.medium.com/v2/resize:fit:661/1*xFnpQ4UNK0TvyxiL8r1svg.png)
26+
### Segment Size
27+
Naturally, the choice of using just 1 digit in base-10 for segmenting is an arbitrary one. The concept of Radix Sort
28+
remains the same regardless of the segment size, allowing for flexibility in its implementation.
2529

26-
We place each element into a queue based on the number of possible segments that could be generated.
27-
Suppose the values of our segments are in base-10, (limited to a value within range *[0, 9]*),
28-
we get 10 queues. We can also see that radix sort is stable since
29-
they are enqueued in a manner where the first observed element remains at the head of the queue
30+
In practice, numbers are often interpreted in their binary representation, with the 'segment' commonly defined as a
31+
bit chunk of a specified size (usually 8 bits/1 byte, though this number could vary for optimization).
3032

31-
*Source: Level Up Coding*
33+
For our implementation, we utilize the binary representation of elements, partitioning them into 8-bit segments.
34+
Given that an integer is typically represented in 32 bits, this results in four segments per integer.
35+
By applying the sorting subroutine to each segment across all integers, we can efficiently sort the array.
36+
This method requires sorting the array four times in total, once for each 8-bit segment,
3237

3338
### Implementation Invariant
39+
At the end of the *ith* iteration, the elements are sorted based on their numeric value up till the *ith* segment.
3440

35-
At the start of the *i-th* segment we are sorting on, the array has already been sorted on the
36-
previous *(i - 1)-th* segments.
37-
38-
### Common Misconceptions
39-
40-
While Radix Sort is non-comparison based,
41-
the that total ordering of elements is still required.
42-
This total ordering is needed because once we assigned a element to a order based on a segment,
43-
the order *cannot* change unless deemed by a segment with a higher significance.
44-
Hence, a stable sort is required to maintain the order as
45-
the sorting is done with respect to each of the segments.
41+
### Common Misconception
42+
While Radix Sort is a non-comparison-based algorithm,
43+
it still necessitates a form of total ordering among the elements to be effective.
44+
Although it does not involve direct comparisons between elements, Radix Sort achieves ordering by processing elements
45+
based on individual segments or digits. This process depends on Counting Sort, which organizes elements into a
46+
frequency map according to a **predefined, ascending order** of those segments.
4647

4748
## Complexity Analysis
48-
Let b-bit words be broken into r-bit pieces. Let n be the number of elements to sort.
49+
Let b-bit words be broken into r-bit pieces. Let n be the number of elements.
4950

5051
*b/r* represents the number of segments and hence the number of counting sort passes. Note that each pass
51-
of counting sort takes *(2^r + n)* (O(k+n) where k is the range which is 2^r here).
52+
of counting sort takes *(2^r + n)* (or more commonly, O(k+n) where k is the range which is 2^r here).
5253

5354
**Time**: *O((b/r) * (2^r + n))*
5455

55-
**Space**: *O(n + 2^r)*
56+
**Space**: *O(2^r + n)* <br>
57+
Note that our implementation has some slight space optimization - creating another array at the start so that we can
58+
repeatedly recycle the use of original and the copy (saves space!),
59+
to write and update the results after each iteration of the sub-routine function.
5660

5761
### Choosing r
58-
Previously we said the number of segments is flexible. Indeed, it is but for more optimised performance, r needs to be
62+
Previously we said the number of segments is flexible. Indeed, it is, but for more optimised performance, r needs to be
5963
carefully chosen. The optimal choice of r is slightly smaller than logn which can be justified with differentiation.
6064

61-
Briefly, r=lgn --> Time complexity can be simplified to (b/lgn)(2n). <br>
62-
For numbers in the range of 0 - n^m, b = mlgn and so the expression can be further simplified to *O(mn)*.
65+
Briefly, r=logn --> Time complexity can be simplified to (b/lgn)(2n). <br>
66+
For numbers in the range of 0 - n^m, b = number of bits = log(n^m) = mlogn <br>
67+
and so the expression can be further simplified to *O(mn)*.
6368

6469
## Notes
65-
- Radix sort's time complexity is dependent on the maximum number of digits in each element,
66-
hence it is ideal to use it on integers with a large range and with little digits.
67-
- This could mean that Radix Sort might end up performing worst on small sets of data
68-
if any one given element has a in-proportionate amount of digits.
70+
- Radix Sort doesn't compare elements against each other, which can make it faster than comparative sorting algorithms
71+
like QuickSort or MergeSort for large datasets with a small range of key values
72+
- Useful for large sets of numeric data, especially if stability is important
73+
- Also works well for data that can be divided into segments of equal size, with the ordering between elements known
74+
75+
- Radix sort's efficiency is closely tied to the number of digits in the largest element. So, its performance
76+
might not be optimal on small datasets that include elements with a significantly higher number of digits compared to
77+
others. This scenario could introduce more sorting passes than desired, diminishing the algorithm's overall efficiency.
78+
- Avoid for datasets with sparse data
79+
80+
- Our implementation uses bit masking. If you are unsure, do check
81+
[this](https://cheever.domains.swarthmore.edu/Ref/BinaryMath/NumSys.html ) out

src/main/java/algorithms/sorting/radixSort/RadixSort.java

Lines changed: 4 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -16,9 +16,9 @@ public class RadixSort {
1616
* @return The value of the digit in the number at the given segment.
1717
*/
1818
private static int getSegmentMasked(int num, int segment) {
19-
// Bit masking here to extract each segment from the integer.
20-
int mask = ((1 << NUM_BITS) - 1) << (segment * NUM_BITS);
21-
return (num & mask) >> (segment * NUM_BITS);
19+
// bit masking here to extract each segment from the integer.
20+
int mask = (1 << NUM_BITS) - 1;
21+
return (num >> (segment * NUM_BITS)) & mask; // we do a right-shift on num to focus on the desired segment
2222
}
2323

2424
/**
@@ -28,7 +28,7 @@ private static int getSegmentMasked(int num, int segment) {
2828
* @param sorted output array.
2929
*/
3030
private static void radixSort(int[] arr, int[] sorted) {
31-
// sort the N numbers by segments, starting from left-most segment
31+
// Code in the loop is essentially counting sort; sort the N numbers by segments, starting from right-most
3232
for (int i = 0; i < NUM_SEGMENTS; i++) {
3333
int[] freqMap = new int[1 << NUM_BITS]; // at most this number of elements
3434

src/main/java/dataStructures/avlTree/AVLTree.java

Lines changed: 13 additions & 19 deletions
Original file line numberDiff line numberDiff line change
@@ -55,7 +55,7 @@ public int height(T key) {
5555
}
5656

5757
/**
58-
* Update height of node in avl tree during rebalancing.
58+
* Update height of node in avl tree for re-balancing.
5959
*
6060
* @param n node whose height is to be updated
6161
*/
@@ -372,6 +372,10 @@ private T successor(Node<T> node) {
372372
return null;
373373
}
374374

375+
376+
// ---------------------------------------------- NOTE ------------------------------------------------------------
377+
// METHODS BELOW ARE NOT NECESSARY; JUST FOR VISUALISATION PURPOSES
378+
375379
/**
376380
* prints in order traversal of the entire tree.
377381
*/
@@ -390,13 +394,9 @@ private void printInorder(Node<T> node) {
390394
if (node == null) {
391395
return;
392396
}
393-
if (node.getLeft() != null) {
394-
printInorder(node.getLeft());
395-
}
397+
printInorder(node.getLeft());
396398
System.out.print(node + " ");
397-
if (node.getRight() != null) {
398-
printInorder(node.getRight());
399-
}
399+
printInorder(node.getRight());
400400
}
401401

402402
/**
@@ -408,7 +408,6 @@ public void printPreorder() {
408408
System.out.println();
409409
}
410410

411-
412411
/**
413412
* Prints out pre-order traversal of tree rooted at node
414413
*
@@ -419,12 +418,8 @@ private void printPreorder(Node<T> node) {
419418
return;
420419
}
421420
System.out.print(node + " ");
422-
if (node.getLeft() != null) {
423-
printPreorder(node.getLeft());
424-
}
425-
if (node.getRight() != null) {
426-
printPreorder(node.getRight());
427-
}
421+
printPreorder(node.getLeft());
422+
printPreorder(node.getRight());
428423
}
429424

430425
/**
@@ -442,12 +437,11 @@ public void printPostorder() {
442437
* @param node node which the tree is rooted at
443438
*/
444439
private void printPostorder(Node<T> node) {
445-
if (node.getLeft() != null) {
446-
printPostorder(node.getLeft());
447-
}
448-
if (node.getRight() != null) {
449-
printPostorder(node.getRight());
440+
if (node == null) {
441+
return;
450442
}
443+
printPostorder(node.getLeft());
444+
printPostorder(node.getRight());
451445
System.out.print(node + " ");
452446
}
453447

src/main/java/dataStructures/avlTree/Node.java

Lines changed: 3 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -15,10 +15,9 @@ public class Node<T extends Comparable<T>> {
1515
private Node<T> parent;
1616
private int height;
1717
/*
18-
* Can insert more properties here.
19-
* If key is not unique, introduce a value property
20-
* so when nodes are being compared, a distinction
21-
* can be made
18+
* Can insert more properties here for augmentation
19+
* e.g. If key is not unique, introduce a value property as a tie-breaker
20+
* or weight property for order statistics
2221
*/
2322

2423
public Node(T key) {
Lines changed: 91 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,91 @@
1+
# AVL Trees
2+
3+
## Background
4+
Is the fastest way to search for data to store them in an array, sort them and perform binary search? No. This will
5+
incur minimally O(nlogn) sorting cost, and O(n) cost per insertion to maintain sorted order.
6+
7+
We have seen binary search trees (BSTs), which always maintains data in sorted order. This allows us to avoid the
8+
overhead of sorting before we search. However, we also learnt that unbalanced BSTs can be incredibly inefficient for
9+
insertion, deletion and search operations, which are O(h) in time complexity (in the case of degenerate trees,
10+
operations can go up to O(n)).
11+
12+
Here we discuss a type of self-balancing BST, known as the AVL tree, that avoids the worst case O(n) performance
13+
across the operations by ensuring careful updating of the tree's structure whenever there is a change
14+
(e.g. insert or delete).
15+
16+
### Definition of Balanced Trees
17+
Balanced trees are a special subset of trees with **height in the order of log(n)**, where n is the number of nodes.
18+
This choice is not an arbitrary one. It can be mathematically shown that a binary tree of n nodes has height of at least
19+
log(n) (in the case of a complete binary tree). So, it makes intuitive sense to give trees whose heights are roughly
20+
in the order of log(n) the desirable 'balanced' label.
21+
22+
<div align="center">
23+
<img src="../../../../../docs/assets/images/BalancedProof.png" width="40%">
24+
<br>
25+
Credits: CS2040s Lecture 9
26+
</div>
27+
28+
### Height-Balanced Property of AVL Trees
29+
There are several ways to achieve a balanced tree. Red-black tree, B-Trees, Scapegoat and AVL trees ensure balance
30+
differently. Each of them relies on some underlying 'good' property to maintain balance - a careful segmenting of nodes
31+
in the case of RB-trees and enforcing a depth constraint for B-Trees. Go check them out in the other folders! <br>
32+
What is important is that this **'good' property holds even after every change** (insert/update/delete).
33+
34+
The 'good' property in AVL Trees is the **height-balanced** property. Height-balanced on a node is defined as
35+
**difference in height between the left and right child node being not more than 1**. <br>
36+
We say the tree is height-balanced if every node in the tree is height-balanced. Be careful not to conflate
37+
the concept of "balanced tree" and "height-balanced" property. They are not the same; the latter is used to achieve the
38+
former.
39+
40+
<details>
41+
<summary> <b>Ponder..</b> </summary>
42+
Consider any two nodes (need not have the same immediate parent node) in the tree. Is the difference in height
43+
between the two nodes <= 1 too?
44+
</details>
45+
46+
It can be mathematically shown that a **height-balanced tree with n nodes, has at most height <= 2log(n)** (
47+
in fact, using the golden ratio, we can achieve a tighter bound of ~1.44log(n)).
48+
Therefore, following the definition of a balanced tree, AVL trees are balanced.
49+
50+
<div align="center">
51+
<img src="../../../../../docs/assets/images/AvlTree.png" width="40%">
52+
<br>
53+
Credits: CS2040s Lecture 9
54+
</div>
55+
56+
## Complexity Analysis
57+
**Search, Insertion, Deletion, Predecessor & Successor queries Time**: O(height) = O(logn)
58+
59+
**Space**: O(n) <br>
60+
where n is the number of elements (whatever the structure, it must store at least n nodes)
61+
62+
## Operations
63+
Minimally, an implementation of AVL tree must support the standard **insert**, **delete**, and **search** operations.
64+
**Update** can be simulated by searching for the old key, deleting it, and then inserting a node with the new key.
65+
66+
Naturally, with insertions and deletions, the structure of the tree will change, and it may not satisfy the
67+
"height-balance" property of the AVL tree. Without this property, we may lose our O(log(n)) run-time guarantee.
68+
Hence, we need some re-balancing operations. To do so, tree rotation operations are introduced. Below is one example.
69+
70+
<div align="center">
71+
<img src="../../../../../docs/assets/images/TreeRotation.png" width="40%">
72+
<br>
73+
Credits: CS2040s Lecture 10
74+
</div>
75+
76+
Prof Seth explains it best! Go re-visit his slides (Lecture 10) for the operations :P <br>
77+
Here is a [link](https://www.youtube.com/watch?v=dS02_IuZPes&list=PLgpwqdiEMkHA0pU_uspC6N88RwMpt9rC8&index=9)
78+
for prof's lecture on trees. <br>
79+
_We may add a summary in the near future._
80+
81+
## Application
82+
While AVL trees offer excellent lookup, insertion, and deletion times due to their strict balancing,
83+
the overhead of maintaining this balance can make them less preferred for applications
84+
where insertions and deletions are significantly more frequent than lookups. As a result, AVL trees often find itself
85+
over-shadowed in practical use by other counterparts like RB-trees,
86+
which boast a relatively simple implementation and lower overhead, or B-trees which are ideal for optimizing disk
87+
accesses in databases.
88+
89+
That said, AVL tree is conceptually simple and often used as the base template for further augmentation to tackle
90+
niche problems. Orthogonal Range Searching and Interval Trees can be implemented with some minor augmentation to
91+
an existing AVL tree.

src/test/java/algorithms/sorting/radixSort/RadixSortTest.java

Lines changed: 7 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -31,14 +31,21 @@ public void test_radixSort_shouldReturnSortedArray() {
3131
int[] fourthResult = Arrays.copyOf(fourthArray, fourthArray.length);
3232
RadixSort.radixSort(fourthResult);
3333

34+
int[] fifthArray =
35+
new int[] {157394, 93495939, 495839239, 485384, 38439958, 3948585, 39585939, 6000999, 111111111, 98162};
36+
int[] fifthResult = Arrays.copyOf(fifthArray, fifthArray.length);
37+
RadixSort.radixSort(fifthResult);
38+
3439
Arrays.sort(firstArray);
3540
Arrays.sort(secondArray);
3641
Arrays.sort(thirdArray);
3742
Arrays.sort(fourthArray);
43+
Arrays.sort(fifthArray);
3844

3945
assertArrayEquals(firstResult, firstArray);
4046
assertArrayEquals(secondResult, secondArray);
4147
assertArrayEquals(thirdResult, thirdArray);
4248
assertArrayEquals(fourthResult, fourthArray);
49+
assertArrayEquals(fifthResult, fifthArray);
4350
}
4451
}

0 commit comments

Comments
 (0)