|
| 1 | +<h1 align="center">Huffman - Encoding</h1> |
| 2 | + |
| 3 | +## Problem Statement |
| 4 | + |
| 5 | +**Problem URL :** [Huffman Encoding](https://www.geeksforgeeks.org/problems/huffman-encoding3345/1?itm_source=geeksforgeeks&itm_medium=article&itm_campaign=practice_card) |
| 6 | + |
| 7 | + |
| 8 | + |
| 9 | +### Problem Explanation |
| 10 | +The problem at hand is to implement **Huffman encoding**, which is a lossless data compression algorithm. The goal is to generate a binary tree (called a **Huffman tree**) that represents the most efficient way to encode characters based on their frequencies. Characters that appear more frequently should have shorter codes, while those that appear less frequently should have longer codes. |
| 11 | + |
| 12 | +### Steps Involved: |
| 13 | + |
| 14 | +1. **Create Nodes for Each Character**: For each character, create a node with its frequency. |
| 15 | +2. **Build a Min-Heap**: Build a **min-heap** (priority queue) where the nodes with the smallest frequencies are at the top. |
| 16 | +3. **Build the Huffman Tree**: Combine the two nodes with the smallest frequencies into a new node, and push this new node back into the heap. Repeat this process until there is only one node left in the heap (the root of the Huffman tree). |
| 17 | +4. **Generate Codes**: Traverse the Huffman tree, assigning binary codes ('0' for left and '1' for right) to each character. |
| 18 | + |
| 19 | +## Problem Solution |
| 20 | +```cpp |
| 21 | +// Define a Node class to represent each node in the Huffman tree |
| 22 | +class Node { |
| 23 | + public: |
| 24 | + int data; // The frequency of the character (or combined frequencies) |
| 25 | + Node* right; // Pointer to the right child node |
| 26 | + Node* left; // Pointer to the left child node |
| 27 | + |
| 28 | + // Constructor to initialize a node with a given frequency |
| 29 | + Node(int d) { |
| 30 | + this -> data = d; // Set the frequency (data) of the node |
| 31 | + left = NULL; // Initialize the left child as NULL |
| 32 | + right = NULL; // Initialize the right child as NULL |
| 33 | + } |
| 34 | +}; |
| 35 | + |
| 36 | +// Comparator class for the priority queue to create a min-heap |
| 37 | +class cmp { |
| 38 | + public: |
| 39 | + bool operator()(Node* a, Node* b) { |
| 40 | + // Return true if the frequency of 'a' is greater than 'b', |
| 41 | + // which helps in maintaining a min-heap (lowest frequency at top) |
| 42 | + return a -> data > b -> data; |
| 43 | + } |
| 44 | +}; |
| 45 | + |
| 46 | +// Solution class that contains the logic for building Huffman codes |
| 47 | +class Solution { |
| 48 | + public: |
| 49 | + // Helper function to traverse the Huffman tree and store codes |
| 50 | + void traverse(Node* root, vector<string>& ans, string temp) { |
| 51 | + // Base case: If it's a leaf node, add the generated code to the answer |
| 52 | + if(root -> left == NULL && root -> right == NULL) { |
| 53 | + ans.push_back(temp); |
| 54 | + return; |
| 55 | + } |
| 56 | + |
| 57 | + // Recursively traverse the left subtree and append '0' to the code |
| 58 | + traverse(root -> left, ans, temp + "0"); |
| 59 | + |
| 60 | + // Recursively traverse the right subtree and append '1' to the code |
| 61 | + traverse(root -> right, ans, temp + "1"); |
| 62 | + } |
| 63 | + |
| 64 | + // Main function to build Huffman codes |
| 65 | + vector<string> huffmanCodes(string S, vector<int> f, int N) { |
| 66 | + // Priority queue to store nodes of the Huffman tree; min-heap based on frequency |
| 67 | + priority_queue<Node*, vector<Node*>, cmp> pq; |
| 68 | + |
| 69 | + // Step 1: Insert all the nodes into the priority queue (based on frequency) |
| 70 | + for(int i = 0; i < N; i++) { |
| 71 | + Node* temp = new Node(f[i]); // Create a new node with frequency f[i] |
| 72 | + pq.push(temp); // Push the node into the priority queue |
| 73 | + } |
| 74 | + |
| 75 | + // Step 2: Build the Huffman tree by combining the two nodes with the smallest frequencies |
| 76 | + while(pq.size() > 1) { |
| 77 | + // Extract the two nodes with the smallest frequencies |
| 78 | + Node* left = pq.top(); |
| 79 | + pq.pop(); |
| 80 | + |
| 81 | + Node* right = pq.top(); |
| 82 | + pq.pop(); |
| 83 | + |
| 84 | + // Create a new internal node with a frequency equal to the sum of the two nodes' frequencies |
| 85 | + Node* newNode = new Node(left -> data + right -> data); |
| 86 | + |
| 87 | + // Set the left and right children of the new node |
| 88 | + newNode -> left = left; |
| 89 | + newNode -> right = right; |
| 90 | + |
| 91 | + // Push the new node back into the priority queue |
| 92 | + pq.push(newNode); |
| 93 | + } |
| 94 | + |
| 95 | + // The final node in the priority queue is the root of the Huffman tree |
| 96 | + Node* root = pq.top(); |
| 97 | + |
| 98 | + // Step 3: Traverse the Huffman tree to generate the Huffman codes |
| 99 | + vector<string> ans; // This will store the final Huffman codes |
| 100 | + string temp = ""; // Temporary string to build the code for each character |
| 101 | + |
| 102 | + // Call the helper function to traverse the tree and build the codes |
| 103 | + traverse(root, ans, temp); |
| 104 | + |
| 105 | + // Return the final Huffman codes |
| 106 | + return ans; |
| 107 | + } |
| 108 | +}; |
| 109 | + |
| 110 | +``` |
| 111 | +
|
| 112 | +## Problem Solution Explanation |
| 113 | +
|
| 114 | +```cpp |
| 115 | +// Define a Node class to represent each node in the Huffman tree |
| 116 | +class Node { |
| 117 | + public: |
| 118 | + int data; // The frequency of the character (or combined frequencies) |
| 119 | + Node* right; // Pointer to the right child node |
| 120 | + Node* left; // Pointer to the left child node |
| 121 | + |
| 122 | + // Constructor to initialize a node with a given frequency |
| 123 | + Node(int d) { |
| 124 | + this -> data = d; // Set the frequency (data) of the node |
| 125 | + left = NULL; // Initialize the left child as NULL |
| 126 | + right = NULL; // Initialize the right child as NULL |
| 127 | + } |
| 128 | +}; |
| 129 | +``` |
| 130 | + |
| 131 | +- **Node Class**: Each node in the tree holds a frequency (`data`), and pointers to the left and right child nodes. The constructor initializes the node's frequency and sets the left and right children to `NULL`. |
| 132 | + |
| 133 | +```cpp |
| 134 | +// Comparator class for the priority queue to create a min-heap |
| 135 | +class cmp { |
| 136 | + public: |
| 137 | + bool operator()(Node* a, Node* b) { |
| 138 | + // Return true if the frequency of 'a' is greater than 'b', |
| 139 | + // which helps in maintaining a min-heap (lowest frequency at top) |
| 140 | + return a -> data > b -> data; |
| 141 | + } |
| 142 | +}; |
| 143 | +``` |
| 144 | +
|
| 145 | +- **Comparator Class**: The `cmp` class defines a comparison function to be used in the priority queue. The function compares two nodes by their frequency (`data`). If the frequency of node `a` is greater than node `b`, the function returns `true`, ensuring that the min-heap property is maintained, with the node having the lowest frequency at the top of the priority queue. |
| 146 | +
|
| 147 | +```cpp |
| 148 | +// Solution class that contains the logic for building Huffman codes |
| 149 | +class Solution { |
| 150 | + public: |
| 151 | + // Helper function to traverse the Huffman tree and store codes |
| 152 | + void traverse(Node* root, vector<string>& ans, string temp) { |
| 153 | + // Base case: If it's a leaf node, add the generated code to the answer |
| 154 | + if(root -> left == NULL && root -> right == NULL) { |
| 155 | + ans.push_back(temp); |
| 156 | + return; |
| 157 | + } |
| 158 | + |
| 159 | + // Recursively traverse the left subtree and append '0' to the code |
| 160 | + traverse(root -> left, ans, temp + "0"); |
| 161 | + |
| 162 | + // Recursively traverse the right subtree and append '1' to the code |
| 163 | + traverse(root -> right, ans, temp + "1"); |
| 164 | + } |
| 165 | +``` |
| 166 | + |
| 167 | +- **`traverse` Function**: This function recursively traverses the Huffman tree and generates the binary codes for the characters. |
| 168 | + - **Base case**: When a leaf node is reached (a node without children), the current string (`temp`) representing the path from the root is added to the result (`ans`). |
| 169 | + - **Recursive case**: For each internal node, the function calls itself recursively on the left child (appending '0' to the path) and the right child (appending '1' to the path). |
| 170 | + |
| 171 | +```cpp |
| 172 | + // Main function to build Huffman codes |
| 173 | + vector<string> huffmanCodes(string S, vector<int> f, int N) { |
| 174 | + // Priority queue to store nodes of the Huffman tree; min-heap based on frequency |
| 175 | + priority_queue<Node*, vector<Node*>, cmp> pq; |
| 176 | + |
| 177 | + // Step 1: Insert all the nodes into the priority queue (based on frequency) |
| 178 | + for(int i = 0; i < N; i++) { |
| 179 | + Node* temp = new Node(f[i]); // Create a new node with frequency f[i] |
| 180 | + pq.push(temp); // Push the node into the priority queue |
| 181 | + } |
| 182 | +``` |
| 183 | +
|
| 184 | +- **`huffmanCodes` Function**: This function is responsible for constructing the Huffman tree and generating the Huffman codes. |
| 185 | + - **Priority Queue**: A priority queue (min-heap) `pq` is created to store the nodes of the Huffman tree. The comparator class `cmp` ensures that the node with the smallest frequency is at the top. |
| 186 | + - **Step 1**: Nodes are created for each frequency in the array `f[]` and pushed into the priority queue. |
| 187 | +
|
| 188 | +```cpp |
| 189 | + // Step 2: Build the Huffman tree by combining the two nodes with the smallest frequencies |
| 190 | + while(pq.size() > 1) { |
| 191 | + // Extract the two nodes with the smallest frequencies |
| 192 | + Node* left = pq.top(); |
| 193 | + pq.pop(); |
| 194 | + |
| 195 | + Node* right = pq.top(); |
| 196 | + pq.pop(); |
| 197 | + |
| 198 | + // Create a new internal node with a frequency equal to the sum of the two nodes' frequencies |
| 199 | + Node* newNode = new Node(left -> data + right -> data); |
| 200 | + |
| 201 | + // Set the left and right children of the new node |
| 202 | + newNode -> left = left; |
| 203 | + newNode -> right = right; |
| 204 | + |
| 205 | + // Push the new node back into the priority queue |
| 206 | + pq.push(newNode); |
| 207 | + } |
| 208 | +``` |
| 209 | + |
| 210 | +- **Step 2**: **Build the Huffman Tree** |
| 211 | + - The two nodes with the smallest frequencies are extracted from the priority queue. |
| 212 | + - A new internal node is created with a frequency equal to the sum of the two smallest nodes' frequencies. |
| 213 | + - The two smallest nodes become the left and right children of the new internal node. |
| 214 | + - This new node is then pushed back into the priority queue. |
| 215 | + - This process continues until there is only one node left in the priority queue, which is the root of the Huffman tree. |
| 216 | + |
| 217 | +```cpp |
| 218 | + // The final node in the priority queue is the root of the Huffman tree |
| 219 | + Node* root = pq.top(); |
| 220 | + |
| 221 | + // Step 3: Traverse the Huffman tree to generate the Huffman codes |
| 222 | + vector<string> ans; // This will store the final Huffman codes |
| 223 | + string temp = ""; // Temporary string to build the code for each character |
| 224 | + |
| 225 | + // Call the helper function to traverse the tree and build the codes |
| 226 | + traverse(root, ans, temp); |
| 227 | + |
| 228 | + // Return the final Huffman codes |
| 229 | + return ans; |
| 230 | + } |
| 231 | +}; |
| 232 | +``` |
| 233 | +
|
| 234 | +- **Step 3**: **Generate the Huffman Codes** |
| 235 | + - The root node of the Huffman tree is retrieved. |
| 236 | + - The `traverse` function is called to generate the binary codes by traversing the tree, starting from the root. |
| 237 | + - The final Huffman codes are stored in the vector `ans`, and they are returned. |
| 238 | +
|
| 239 | +
|
| 240 | +### Example: |
| 241 | +
|
| 242 | +Consider the string `S = "abc"` with frequencies `f = [5, 9, 12]`. |
| 243 | +
|
| 244 | +1. **Create Nodes**: |
| 245 | + - Node 'a' with frequency 5. |
| 246 | + - Node 'b' with frequency 9. |
| 247 | + - Node 'c' with frequency 12. |
| 248 | +
|
| 249 | +2. **Priority Queue**: |
| 250 | + - Initially: `[5, 9, 12]` (ordered by frequency). |
| 251 | + - Extract two smallest nodes: 'a' (5) and 'b' (9). |
| 252 | + - Create a new internal node with frequency `5 + 9 = 14`. Push it back into the priority queue. |
| 253 | + - Now the queue contains: `[12, 14]`. |
| 254 | +
|
| 255 | +3. **Repeat the Process**: |
| 256 | + - Extract the two smallest nodes: 'c' (12) and the new internal node (14). |
| 257 | + - Create a new internal node with frequency `12 + 14 = 26` and push it into the priority queue. |
| 258 | + - The queue now contains: `[26]` (root node). |
| 259 | +
|
| 260 | +4. **Generate Huffman Codes**: |
| 261 | + - The final Huffman tree is built, and the codes are generated as follows: |
| 262 | + - 'a': `00` |
| 263 | + - 'b': `01` |
| 264 | + - 'c': `1` |
| 265 | +
|
| 266 | +### Time and Space Complexity: |
| 267 | +
|
| 268 | +**Time Complexity**: |
| 269 | +- **Building the Priority Queue**: Pushing N nodes into a priority queue takes (O(N log N)). |
| 270 | +- **Building the Huffman Tree**: Each iteration removes two nodes and inserts one node, which takes (O(log N)) per operation. This is done (N - 1) times, resulting in (O(N log N)) time. |
| 271 | +- **Traversing the Tree**: The tree has N nodes, and traversal takes (O(N)). |
| 272 | + |
| 273 | + **Total Time Complexity**: (O(N log N)) |
| 274 | +
|
| 275 | +**Space Complexity**: |
| 276 | +- **Priority Queue**: The priority queue stores N nodes, so it takes (O(N)) space. |
| 277 | +- **Huffman Tree**: The tree also contains N nodes, which take (O(N)) space. |
| 278 | +
|
| 279 | + **Total Space Complexity**: (O(N)) |
0 commit comments