|
| 1 | +# Trie |
| 2 | + |
| 3 | +## Background |
| 4 | +A trie (pronounced as 'try') also known as a prefix tree, is often used for handling textual data, especially in |
| 5 | +scenarios involving prefixes. In fact, the term 'trie' comes from the word 'retrieval'. |
| 6 | + |
| 7 | +Like most trees, a trie is composed of nodes and edges. But, unlike binary trees, its node can have more than |
| 8 | +2 children. A trie stores words by breaking down into characters and organising these characters within a hierarchical |
| 9 | +tree. Each node represents a single character, except the root, which does not represent any character |
| 10 | +but acts as a starting point for all the words stored. A path in the trie, which is a sequence of connected nodes |
| 11 | +from the root, represents a prefix or a whole word. Shared prefixes of different words are represented by common paths. |
| 12 | +To distinguish complete words from prefixes within the trie, nodes are often implemented with a boolean flag. |
| 13 | +This flag is set to true for nodes that correspond to the final character of a complete word and false otherwise. |
| 14 | + |
| 15 | +<div align="center"> |
| 16 | + <img src="../../../../../docs/assets/images/Trie.png" alt="Trie" style="width:80%"/> |
| 17 | + <br/> |
| 18 | + <em>Source: <a href="https://java2blog.com/trie-data-structure-in-java/">Java2Blog</a></em> |
| 19 | +</div> |
| 20 | + |
| 21 | +## Complexity Analysis |
| 22 | +Let the length of the longest word be _L_ and the number of words be _N_. |
| 23 | + |
| 24 | +**Time**: O(_L_) |
| 25 | +An upper-bound. For typical trie operations like insert, delete, and search, |
| 26 | +since it is likely that every char is iterated over. |
| 27 | + |
| 28 | +**Space**: O(_N*L_) |
| 29 | +In the worst case, we can have minimal overlap between words and every character of every word needs to be captured |
| 30 | +with a node. |
| 31 | + |
| 32 | +A trie can be space-intensive. For a very large corpus of words, with the naive assumption of characters being |
| 33 | +likely to occur in any position, another naive estimation on the size of the tree is O(_26^l_) where _l_ here is |
| 34 | +the average length of a word. Note, 26 is used since are only 26 alphabets. |
| 35 | + |
| 36 | +## Operations |
| 37 | +Here we briefly discuss the typical operations supported by a trie. |
| 38 | + |
| 39 | +### Insert |
| 40 | +Starting at the root, iterate over the characters and move down the trie to the respective nodes, creating missing |
| 41 | +ones in the process. Once the end of the word is reached, the node representing the last character will set its |
| 42 | +boolean flag to true |
| 43 | + |
| 44 | +### Search |
| 45 | +Starting at the root, iterate over the characters and move down the trie to the respective nodes. |
| 46 | +If at any point the required character node is missing, return false. Otherwise, continue traversing until the end of |
| 47 | +the word and check if the current node has its boolean flag set to true. If not, the word is not captured in the trie. |
| 48 | + |
| 49 | +### Delete |
| 50 | +Starting at the root, iterate over the characters and move down the trie to the respective nodes. |
| 51 | +If at any point the required character node is missing, then the word does not exist in the trie and the process |
| 52 | +is terminated. Otherwise, continue traversing until the end of the word and un-mark boolean flag of the current node |
| 53 | +to false. |
| 54 | + |
| 55 | +### Delete With Pruning |
| 56 | +Sometimes, a trie can become huge. Deleting old words would still leave redundant nodes hanging around. These can |
| 57 | +accumulate over time, so it is crucial we prune away unused nodes. |
| 58 | + |
| 59 | +Continuing off the delete operation, trace the path back to the root, and if any redundant nodes are found (nodes |
| 60 | +that aren't the end flag for a word and have no descendant nodes), remove them. |
| 61 | + |
| 62 | +### Augmentation |
| 63 | +Just like how Orthogonal Range Searching can be done by augmenting the usual balanced BSTs, a trie can be augmented |
| 64 | +with additional variables captured in the TrieNode to speed up queries of a certain kind. For instance, if one wishes |
| 65 | +to quickly find out how many complete words stored in a trie have a given prefix, one can track the number of |
| 66 | +descendant nodes whose boolean flag is set to true at each node. |
| 67 | + |
| 68 | +## Notes |
| 69 | +### Applications |
| 70 | +- [auto-completion](https://medium.com/geekculture/how-to-effortlessly-implement-an-autocomplete-data-structure-in-javascript-using-a-trie-ea87a7d5a804) |
| 71 | +- [spell-checker](https://medium.com/@vithusha.ravirajan/enhancing-spell-checking-with-trie-data-structure-eb649ee0b1b5) |
| 72 | +- [prefix matching](https://medium.com/@shenchenlei/how-to-implement-a-prefix-matcher-using-trie-tree-1aea9a01013) |
| 73 | +- sorting large datasets of textual data |
0 commit comments