Skip to content

Commit c6bba36

Browse files
committed
docs: Complete Trie README
1 parent 806c15e commit c6bba36

File tree

3 files changed

+76
-1
lines changed

3 files changed

+76
-1
lines changed

src/main/java/algorithms/patternFinding/README.md

Lines changed: 2 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -6,6 +6,8 @@ in text editors when searching for a pattern, in computational biology sequence
66
in NLP problems, and even for looking for file patterns for effective file management.
77
It is hence crucial that we develop an efficient algorithm.
88

9+
Typically, the algorithm returns a list of indices that denote the start of each occurrence of the pattern string.
10+
911
![KMP](../../../../../docs/assets/images/kmp.png)
1012
Image Source: GeeksforGeeks
1113

src/main/java/dataStructures/disjointSet/weightedUnion/README.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -1,5 +1,5 @@
11
# Quick Union
2-
2+
If you wish to jump to [weighted union](#Weighted-Union).
33
## Background
44
Here, we consider a completely different approach. We consider the use of trees. Every element can be
55
thought of as a tree node and starts off in its own component. Under this representation, it is likely
Lines changed: 73 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,73 @@
1+
# Trie
2+
3+
## Background
4+
A trie (pronounced as 'try') also known as a prefix tree, is often used for handling textual data, especially in
5+
scenarios involving prefixes. In fact, the term 'trie' comes from the word 'retrieval'.
6+
7+
Like most trees, a trie is composed of nodes and edges. But, unlike binary trees, its node can have more than
8+
2 children. A trie stores words by breaking down into characters and organising these characters within a hierarchical
9+
tree. Each node represents a single character, except the root, which does not represent any character
10+
but acts as a starting point for all the words stored. A path in the trie, which is a sequence of connected nodes
11+
from the root, represents a prefix or a whole word. Shared prefixes of different words are represented by common paths.
12+
To distinguish complete words from prefixes within the trie, nodes are often implemented with a boolean flag.
13+
This flag is set to true for nodes that correspond to the final character of a complete word and false otherwise.
14+
15+
<div align="center">
16+
<img src="../../../../../docs/assets/images/Trie.png" alt="Trie" style="width:80%"/>
17+
<br/>
18+
<em>Source: <a href="https://java2blog.com/trie-data-structure-in-java/">Java2Blog</a></em>
19+
</div>
20+
21+
## Complexity Analysis
22+
Let the length of the longest word be _L_ and the number of words be _N_.
23+
24+
**Time**: O(_L_)
25+
An upper-bound. For typical trie operations like insert, delete, and search,
26+
since it is likely that every char is iterated over.
27+
28+
**Space**: O(_N*L_)
29+
In the worst case, we can have minimal overlap between words and every character of every word needs to be captured
30+
with a node.
31+
32+
A trie can be space-intensive. For a very large corpus of words, with the naive assumption of characters being
33+
likely to occur in any position, another naive estimation on the size of the tree is O(_26^l_) where _l_ here is
34+
the average length of a word. Note, 26 is used since are only 26 alphabets.
35+
36+
## Operations
37+
Here we briefly discuss the typical operations supported by a trie.
38+
39+
### Insert
40+
Starting at the root, iterate over the characters and move down the trie to the respective nodes, creating missing
41+
ones in the process. Once the end of the word is reached, the node representing the last character will set its
42+
boolean flag to true
43+
44+
### Search
45+
Starting at the root, iterate over the characters and move down the trie to the respective nodes.
46+
If at any point the required character node is missing, return false. Otherwise, continue traversing until the end of
47+
the word and check if the current node has its boolean flag set to true. If not, the word is not captured in the trie.
48+
49+
### Delete
50+
Starting at the root, iterate over the characters and move down the trie to the respective nodes.
51+
If at any point the required character node is missing, then the word does not exist in the trie and the process
52+
is terminated. Otherwise, continue traversing until the end of the word and un-mark boolean flag of the current node
53+
to false.
54+
55+
### Delete With Pruning
56+
Sometimes, a trie can become huge. Deleting old words would still leave redundant nodes hanging around. These can
57+
accumulate over time, so it is crucial we prune away unused nodes.
58+
59+
Continuing off the delete operation, trace the path back to the root, and if any redundant nodes are found (nodes
60+
that aren't the end flag for a word and have no descendant nodes), remove them.
61+
62+
### Augmentation
63+
Just like how Orthogonal Range Searching can be done by augmenting the usual balanced BSTs, a trie can be augmented
64+
with additional variables captured in the TrieNode to speed up queries of a certain kind. For instance, if one wishes
65+
to quickly find out how many complete words stored in a trie have a given prefix, one can track the number of
66+
descendant nodes whose boolean flag is set to true at each node.
67+
68+
## Notes
69+
### Applications
70+
- [auto-completion](https://medium.com/geekculture/how-to-effortlessly-implement-an-autocomplete-data-structure-in-javascript-using-a-trie-ea87a7d5a804)
71+
- [spell-checker](https://medium.com/@vithusha.ravirajan/enhancing-spell-checking-with-trie-data-structure-eb649ee0b1b5)
72+
- [prefix matching](https://medium.com/@shenchenlei/how-to-implement-a-prefix-matcher-using-trie-tree-1aea9a01013)
73+
- sorting large datasets of textual data

0 commit comments

Comments
 (0)