4ndrelim
diff --git a/‎README.md‎
Lines changed: 3 additions & 3 deletions b/‎README.md‎
Lines changed: 3 additions & 3 deletions
diff --git a/‎docs/assets/images/Trie.png‎
454 KB b/‎docs/assets/images/Trie.png‎
454 KB
diff --git a/‎src/main/java/algorithms/patternFinding/README.md‎
Lines changed: 2 additions & 0 deletions b/‎src/main/java/algorithms/patternFinding/README.md‎
Lines changed: 2 additions & 0 deletions
diff --git a/‎src/main/java/dataStructures/disjointSet/weightedUnion/README.md‎
Lines changed: 1 addition & 1 deletion b/‎src/main/java/dataStructures/disjointSet/weightedUnion/README.md‎
Lines changed: 1 addition & 1 deletion
diff --git a/‎src/main/java/dataStructures/trie/README.md‎
Lines changed: 74 additions & 0 deletions b/‎src/main/java/dataStructures/trie/README.md‎
Lines changed: 74 additions & 0 deletions
diff --git a/‎src/main/java/dataStructures/trie/Trie.java‎
Lines changed: 146 additions & 65 deletions b/‎src/main/java/dataStructures/trie/Trie.java‎
Lines changed: 146 additions & 65 deletions
@@ -34,7 +34,7 @@ Gradle is used for development.
   - [Monotonic Queue](src/main/java/dataStructures/queue/monotonicQueue)
 - Segment Tree
 - [Stack](src/main/java/dataStructures/stack)
-- Trie
+- [Trie](src/main/java/dataStructures/trie)
 
 ## Algorithms
 - [Bubble Sort](src/main/java/algorithms/sorting/bubbleSort)
@@ -81,9 +81,9 @@ Gradle is used for development.
     * [Binary search tree](src/main/java/dataStructures/binarySearchTree)
     * AVL-tree
     * Orthogonal Range Searching
-    * Trie
+    * [Trie](src/main/java/dataStructures/trie)
     * B-Tree
-    * * Red-Black Tree (Not covered in CS2040s but useful!)
+    * Red-Black Tree (Not covered in CS2040s but useful!)
     * Kd-tree (**WIP**)
     * Interval tree (**WIP**)
 5. [Binary Heap](src/main/java/dataStructures/heap) (Max heap)
 
@@ -6,6 +6,8 @@ in text editors when searching for a pattern, in computational biology sequence
 in NLP problems, and even for looking for file patterns for effective file management.
 It is hence crucial that we develop an efficient algorithm.
 
+Typically, the algorithm returns a list of indices that denote the start of each occurrence of the pattern string.
+
 ![KMP](../../../../../docs/assets/images/kmp.png)
 Image Source: GeeksforGeeks
 
 
@@ -1,5 +1,5 @@
 # Quick Union
-
+If you wish to jump to [weighted union](#Weighted-Union).
 ## Background
 Here, we consider a completely different approach. We consider the use of trees. Every element can be
 thought of as a tree node and starts off in its own component. Under this representation, it is likely
 
@@ -0,0 +1,74 @@
+# Trie
+
+## Background
+A trie (pronounced as 'try') also known as a prefix tree, is often used for handling textual data, especially in 
+scenarios involving prefixes. In fact, the term 'trie' comes from the word 'retrieval'.
+
+Like most trees, a trie is composed of nodes and edges. But, unlike binary trees, its node can have more than 
+2 children. A trie stores words by breaking down into characters and organising these characters within a hierarchical 
+tree. Each node represents a single character, except the root, which does not represent any character 
+but acts as a starting point for all the words stored. A path in the trie, which is a sequence of connected nodes 
+from the root, represents a prefix or a whole word. Shared prefixes of different words are represented by common paths.
+
+To distinguish complete words from prefixes within the trie, nodes are often implemented with a boolean flag. 
+This flag is set to true for nodes that correspond to the final character of a complete word and false otherwise.
+
+<div align="center">
+    <img src="../../../../../docs/assets/images/Trie.png" alt="Trie" style="width:80%"/>
+    <br/>
+    <em>Source: <a href="https://java2blog.com/trie-data-structure-in-java/">Java2Blog</a></em>
+</div>
+
+## Complexity Analysis
+Let the length of the longest word be _L_ and the number of words be _N_.
+
+**Time**: O(_L_)
+An upper-bound. For typical trie operations like insert, delete, and search, 
+since it is likely that every char is iterated over.
+
+**Space**: O(_N*L_)
+In the worst case, we can have minimal overlap between words and every character of every word needs to be captured
+with a node.
+
+A trie can be space-intensive. For a very large corpus of words, with the naive assumption of characters being 
+likely to occur in any position, another naive estimation on the size of the tree is O(_26^l_) where _l_ here is 
+the average length of a word. Note, 26 is used since are only 26 alphabets.
+
+## Operations 
+Here we briefly discuss the typical operations supported by a trie. 
+
+### Insert
+Starting at the root, iterate over the characters and move down the trie to the respective nodes, creating missing
+ones in the process. Once the end of the word is reached, the node representing the last character will set its 
+boolean flag to true
+
+### Search
+Starting at the root, iterate over the characters and move down the trie to the respective nodes. 
+If at any point the required character node is missing, return false. Otherwise, continue traversing until the end of
+the word and check if the current node has its boolean flag set to true. If not, the word is not captured in the trie.
+
+### Delete
+Starting at the root, iterate over the characters and move down the trie to the respective nodes.
+If at any point the required character node is missing, then the word does not exist in the trie and the process 
+is terminated. Otherwise, continue traversing until the end of the word and un-mark boolean flag of the current node 
+to false.
+
+### Delete With Pruning
+Sometimes, a trie can become huge. Deleting old words would still leave redundant nodes hanging around. These can 
+accumulate over time, so it is crucial we prune away unused nodes.
+
+Continuing off the delete operation, trace the path back to the root, and if any redundant nodes are found (nodes 
+that aren't the end flag for a word and have no descendant nodes), remove them.
+
+### Augmentation
+Just like how Orthogonal Range Searching can be done by augmenting the usual balanced BSTs, a  trie can be augmented 
+with additional variables captured in the TrieNode to speed up queries of a certain kind. For instance, if one wishes
+to quickly find out how many complete words stored in a trie have a given prefix, one can track the number of 
+descendant nodes whose boolean flag is set to true at each node.
+
+## Notes
+### Applications
+- [auto-completion](https://medium.com/geekculture/how-to-effortlessly-implement-an-autocomplete-data-structure-in-javascript-using-a-trie-ea87a7d5a804)
+- [spell-checker](https://medium.com/@vithusha.ravirajan/enhancing-spell-checking-with-trie-data-structure-eb649ee0b1b5)
+- [prefix matching](https://medium.com/@shenchenlei/how-to-implement-a-prefix-matcher-using-trie-tree-1aea9a01013)
+- sorting large datasets of textual data
@@ -1,12 +1,12 @@
 package dataStructures.trie;
 
+import java.util.ArrayList;
+import java.util.HashMap;
+import java.util.List;
+import java.util.Map;
+
 /**
- * Implementation of Trie structure.
- * Supports the follwing common operations (see below for doc):
- * insert(String word)
- * search(String word)
- * startsWith(String prefix)
- * prune(String word)
+ * Implementation of a Trie; Here we consider strings (not case-sensitive)
  */
 public class Trie {
     private final TrieNode root;
@@ -16,98 +16,179 @@ public Trie() {
     }
 
     /**
-     * Insert a word into the trie; converts word to
-     * to lower-case characters before insertion.
-     *
-     * @param word the string to be inserted
+     * TrieNode implementation. Note, fields are set to public for decreased verbosity.
+     */
+    private class TrieNode {
+        // CHECKSTYLE:OFF: VisibilityModifier
+        public Map<Character, TrieNode> children; // or array of size 26 (assume not case-sensitive) to denote each char
+        // CHECKSTYLE:OFF: VisibilityModifier
+        public boolean isEnd; // a marker to indicate whether the path from the root to this node forms a known word
+
+        public TrieNode() {
+            children = new HashMap<Character, TrieNode>();
+            isEnd = false;
+        }
+    }
+
+    /**
+     * Inserts a word into the trie.
+     * @param word
      */
     public void insert(String word) {
-        word = word.toLowerCase();
-        System.out.printf("~~~~~~~Inserting '%s'~~~~~~~%n", word);
-        TrieNode node = root;
+        word = word.toLowerCase(); // ignore case-sensitivity
+        TrieNode trav = root;
         for (int i = 0; i < word.length(); i++) {
             char curr = word.charAt(i);
-            if (!node.containsKey(curr)) {
-                node.insertKey(curr);
+            if (!trav.children.containsKey(curr)) {
+                trav.children.put(curr, new TrieNode()); // recall, the edges represent the characters
             }
-            node = node.getNext(curr); // go to the subsequent node!
+            trav = trav.children.get(curr);
         }
-        node.makeEnd();
+        trav.isEnd = true; // set word
     }
 
     /**
-     * Search for a word (converted to lower-case) in the trie.
-     *
-     * @param word the string to look for
-     * @return boolean representing whether the word was found
+     * Searches for a word in the trie.
+     * @param word
+     * @return true if the word is found, false otherwise.
      */
     public boolean search(String word) {
-        word.toLowerCase();
-        System.out.printf("~~~~~~~Searching '%s'~~~~~~~%n", word);
-        TrieNode node = root;
+        word = word.toLowerCase();
+        TrieNode trav = root;
         for (int i = 0; i < word.length(); i++) {
             char curr = word.charAt(i);
-            if (node.containsKey(curr)) {
-                node = node.getNext(curr);
-            } else {
+            if (!trav.children.containsKey(curr)) {
                 return false;
             }
+            trav = trav.children.get(curr);
         }
-        return node.isEnd();
+        return trav.isEnd;
     }
 
     /**
-     * Search for a prefix (converted to lower-case) in the trie.
-     * Note: very similar in implementation to search method
-     * except the search here does not need to look for end flag
-     *
-     * @param prefix the string to look for
-     * @return boolean representing whether the prefix exists
+     * Deletes a word from the trie.
+     * @param word
      */
-    public boolean startsWith(String prefix) {
-        prefix = prefix.toLowerCase();
-        System.out.printf("~~~~~~~Looking for prefix '%s'~~~~~~~%n", prefix);
-        TrieNode node = root;
-        for (int i = 0; i < prefix.length(); i++) {
-            char curr = prefix.charAt(i);
-            if (node.containsKey(curr)) {
-                node = node.getNext(curr);
-            } else {
-                return false;
+    public void delete(String word) {
+        word = word.toLowerCase();
+        TrieNode trav = root;
+        for (int i = 0; i < word.length(); i++) {
+            char curr = word.charAt(i);
+            if (!trav.children.containsKey(curr)) {
+                return; // word does not exist in trie, so just return
             }
+            trav = trav.children.get(curr);
         }
-        return true;
+        trav.isEnd = false; // remove word from being tracked
     }
 
+    // ABOVE ARE STANDARD METHODS OF A TYPICAL TRIE IMPLEMENTATION
+    // BELOW IMPLEMENTS TWO MORE COMMON / USEFUL METHODS FOR TRIE; IN PARTICULAR, NOTE THE PRUNING METHOD
+
     /**
-     * Removes a word from the trie by toggling the end flag;
-     * if any of the end nodes (next nodes relative to current)
-     * do not hold further characters, repetitively prune the trie
-     * by removing these nodes from the hashmap of the current node.
-     * Note: This method is useful in optimizing searching for a set of known words
-     * especially when the data to be traversed has words that are similar in spelling/
-     * repeated words which might have been previously found.
-     *
-     * @param word the word to be removed
+     * Deletes a word from the trie, and also prune redundant nodes. This is useful in keeping the trie compact.
+     * @param word
      */
-    public void prune(String word) {
-        word = word.toLowerCase();
-        System.out.printf("~~~~~~~Removing '%s'~~~~~~~%n", word);
-        TrieNode node = root;
-        TrieNode[] track = new TrieNode[word.length()];
+    public void deleteAndPrune(String word) {
+        List<TrieNode> trackNodes = new ArrayList<>();
+        TrieNode trav = root;
         for (int i = 0; i < word.length(); i++) {
             char curr = word.charAt(i);
-            track[i] = node;
-            node = node.getNext(curr);
+            if (!trav.children.containsKey(curr)) {
+                return; // word does not exist in trie
+            }
+            trackNodes.add(trav);
+            trav = trav.children.get(curr);
         }
-        node.removeEnd();
+        trav.isEnd = false;
+
+        // now we start pruning
         for (int i = word.length() - 1; i >= 0; i--) {
             char curr = word.charAt(i);
-            if (track[i].getNext(curr).getCharacters().size() > 0) {
-                break; // done further nodes are required
+            TrieNode nodeBeforeCurr = trackNodes.get(i);
+            TrieNode nextNode = nodeBeforeCurr.children.get(curr);
+            if (!nextNode.isEnd && nextNode.children.size() == 0) { // node essentially doesn't track anything, remove
+                nodeBeforeCurr.children.remove(curr);
+            } else { // children.size() > 0; i.e. this node is still useful; no need to further prune upwards
+                break;
+            }
+        }
+    }
+
+    /**
+     * Find all words with the specified prefix.
+     * @param prefix
+     * @return a list of words.
+     */
+    public List<String> wordsWithPrefix(String prefix) {
+        List<String> ret = new ArrayList<>();
+        TrieNode trav = root;
+        for (int i = 0; i < prefix.length(); i++) {
+            char curr = prefix.charAt(i);
+            if (!trav.children.containsKey(curr)) {
+                return ret; // no words with this prefix
+            }
+            trav = trav.children.get(curr);
+        }
+        List<StringBuilder> allSuffix = getAllSuffixFromNode(trav);
+        for (StringBuilder sb : allSuffix) {
+            ret.add(prefix + sb.toString());
+        }
+        return ret;
+    }
+
+    /**
+     * Find all words in the trie.
+     * @return a list of words.
+     */
+    public List<String> getAllWords() {
+        List<StringBuilder> allWords = getAllSuffixFromNode(root);
+        List<String> ret = new ArrayList<>();
+        for (StringBuilder sb : allWords) {
+            ret.add(sb.toString());
+        }
+        return ret;
+    }
+
+    /**
+     * Helper method to get suffix from the node.
+     * @param node
+     * @return
+     */
+    private List<StringBuilder> getAllSuffixFromNode(TrieNode node) {
+        List<StringBuilder> ret = new ArrayList<>();
+        if (node.isEnd) {
+            ret.add(new StringBuilder(""));
+        }
+        for (char c : node.children.keySet()) {
+            TrieNode nextNode = node.children.get(c);
+            List<StringBuilder> allSuffix = getAllSuffixFromNode(nextNode);
+            for (StringBuilder sb : allSuffix) {
+                sb.insert(0, c); // insert c at the front
+                ret.add(sb);
+            }
+        }
+        return ret;
+    }
+
+    // BELOW IS A METHOD THAT IS USED FOR TESTING PURPOSES ONLY
+
+    /**
+     * Helper method for testing purposes.
+     * @param str
+     * @param pos
+     * @return
+     */
+    public Boolean checkNodeExistsAtPosition(String str, Integer pos) {
+        TrieNode trav = root;
+        for (int i = 0; i < pos; i++) {
+            char c = str.charAt(i);
+            if (trav.children.containsKey(c)) {
+                trav = trav.children.get(c);
             } else {
-                track[i].getCharacters().remove(curr);
+                return false;
             }
         }
+        return true;
     }
 }