Skip to content

Commit 1fb3c1e

Browse files
authored
Create README.md
1 parent 7bc85ec commit 1fb3c1e

File tree

1 file changed

+279
-0
lines changed
  • 25 - Greedy Algorithm Problems/08 - Huffman Encoding

1 file changed

+279
-0
lines changed
Lines changed: 279 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,279 @@
1+
<h1 align="center">Huffman - Encoding</h1>
2+
3+
## Problem Statement
4+
5+
**Problem URL :** [Huffman Encoding](https://www.geeksforgeeks.org/problems/huffman-encoding3345/1?itm_source=geeksforgeeks&itm_medium=article&itm_campaign=practice_card)
6+
7+
![image](https://github.com/user-attachments/assets/fa5a66db-b5ba-4ca7-8ae5-fefd1bf398c7)
8+
9+
### Problem Explanation
10+
The problem at hand is to implement **Huffman encoding**, which is a lossless data compression algorithm. The goal is to generate a binary tree (called a **Huffman tree**) that represents the most efficient way to encode characters based on their frequencies. Characters that appear more frequently should have shorter codes, while those that appear less frequently should have longer codes.
11+
12+
### Steps Involved:
13+
14+
1. **Create Nodes for Each Character**: For each character, create a node with its frequency.
15+
2. **Build a Min-Heap**: Build a **min-heap** (priority queue) where the nodes with the smallest frequencies are at the top.
16+
3. **Build the Huffman Tree**: Combine the two nodes with the smallest frequencies into a new node, and push this new node back into the heap. Repeat this process until there is only one node left in the heap (the root of the Huffman tree).
17+
4. **Generate Codes**: Traverse the Huffman tree, assigning binary codes ('0' for left and '1' for right) to each character.
18+
19+
## Problem Solution
20+
```cpp
21+
// Define a Node class to represent each node in the Huffman tree
22+
class Node {
23+
public:
24+
int data; // The frequency of the character (or combined frequencies)
25+
Node* right; // Pointer to the right child node
26+
Node* left; // Pointer to the left child node
27+
28+
// Constructor to initialize a node with a given frequency
29+
Node(int d) {
30+
this -> data = d; // Set the frequency (data) of the node
31+
left = NULL; // Initialize the left child as NULL
32+
right = NULL; // Initialize the right child as NULL
33+
}
34+
};
35+
36+
// Comparator class for the priority queue to create a min-heap
37+
class cmp {
38+
public:
39+
bool operator()(Node* a, Node* b) {
40+
// Return true if the frequency of 'a' is greater than 'b',
41+
// which helps in maintaining a min-heap (lowest frequency at top)
42+
return a -> data > b -> data;
43+
}
44+
};
45+
46+
// Solution class that contains the logic for building Huffman codes
47+
class Solution {
48+
public:
49+
// Helper function to traverse the Huffman tree and store codes
50+
void traverse(Node* root, vector<string>& ans, string temp) {
51+
// Base case: If it's a leaf node, add the generated code to the answer
52+
if(root -> left == NULL && root -> right == NULL) {
53+
ans.push_back(temp);
54+
return;
55+
}
56+
57+
// Recursively traverse the left subtree and append '0' to the code
58+
traverse(root -> left, ans, temp + "0");
59+
60+
// Recursively traverse the right subtree and append '1' to the code
61+
traverse(root -> right, ans, temp + "1");
62+
}
63+
64+
// Main function to build Huffman codes
65+
vector<string> huffmanCodes(string S, vector<int> f, int N) {
66+
// Priority queue to store nodes of the Huffman tree; min-heap based on frequency
67+
priority_queue<Node*, vector<Node*>, cmp> pq;
68+
69+
// Step 1: Insert all the nodes into the priority queue (based on frequency)
70+
for(int i = 0; i < N; i++) {
71+
Node* temp = new Node(f[i]); // Create a new node with frequency f[i]
72+
pq.push(temp); // Push the node into the priority queue
73+
}
74+
75+
// Step 2: Build the Huffman tree by combining the two nodes with the smallest frequencies
76+
while(pq.size() > 1) {
77+
// Extract the two nodes with the smallest frequencies
78+
Node* left = pq.top();
79+
pq.pop();
80+
81+
Node* right = pq.top();
82+
pq.pop();
83+
84+
// Create a new internal node with a frequency equal to the sum of the two nodes' frequencies
85+
Node* newNode = new Node(left -> data + right -> data);
86+
87+
// Set the left and right children of the new node
88+
newNode -> left = left;
89+
newNode -> right = right;
90+
91+
// Push the new node back into the priority queue
92+
pq.push(newNode);
93+
}
94+
95+
// The final node in the priority queue is the root of the Huffman tree
96+
Node* root = pq.top();
97+
98+
// Step 3: Traverse the Huffman tree to generate the Huffman codes
99+
vector<string> ans; // This will store the final Huffman codes
100+
string temp = ""; // Temporary string to build the code for each character
101+
102+
// Call the helper function to traverse the tree and build the codes
103+
traverse(root, ans, temp);
104+
105+
// Return the final Huffman codes
106+
return ans;
107+
}
108+
};
109+
110+
```
111+
112+
## Problem Solution Explanation
113+
114+
```cpp
115+
// Define a Node class to represent each node in the Huffman tree
116+
class Node {
117+
public:
118+
int data; // The frequency of the character (or combined frequencies)
119+
Node* right; // Pointer to the right child node
120+
Node* left; // Pointer to the left child node
121+
122+
// Constructor to initialize a node with a given frequency
123+
Node(int d) {
124+
this -> data = d; // Set the frequency (data) of the node
125+
left = NULL; // Initialize the left child as NULL
126+
right = NULL; // Initialize the right child as NULL
127+
}
128+
};
129+
```
130+
131+
- **Node Class**: Each node in the tree holds a frequency (`data`), and pointers to the left and right child nodes. The constructor initializes the node's frequency and sets the left and right children to `NULL`.
132+
133+
```cpp
134+
// Comparator class for the priority queue to create a min-heap
135+
class cmp {
136+
public:
137+
bool operator()(Node* a, Node* b) {
138+
// Return true if the frequency of 'a' is greater than 'b',
139+
// which helps in maintaining a min-heap (lowest frequency at top)
140+
return a -> data > b -> data;
141+
}
142+
};
143+
```
144+
145+
- **Comparator Class**: The `cmp` class defines a comparison function to be used in the priority queue. The function compares two nodes by their frequency (`data`). If the frequency of node `a` is greater than node `b`, the function returns `true`, ensuring that the min-heap property is maintained, with the node having the lowest frequency at the top of the priority queue.
146+
147+
```cpp
148+
// Solution class that contains the logic for building Huffman codes
149+
class Solution {
150+
public:
151+
// Helper function to traverse the Huffman tree and store codes
152+
void traverse(Node* root, vector<string>& ans, string temp) {
153+
// Base case: If it's a leaf node, add the generated code to the answer
154+
if(root -> left == NULL && root -> right == NULL) {
155+
ans.push_back(temp);
156+
return;
157+
}
158+
159+
// Recursively traverse the left subtree and append '0' to the code
160+
traverse(root -> left, ans, temp + "0");
161+
162+
// Recursively traverse the right subtree and append '1' to the code
163+
traverse(root -> right, ans, temp + "1");
164+
}
165+
```
166+
167+
- **`traverse` Function**: This function recursively traverses the Huffman tree and generates the binary codes for the characters.
168+
- **Base case**: When a leaf node is reached (a node without children), the current string (`temp`) representing the path from the root is added to the result (`ans`).
169+
- **Recursive case**: For each internal node, the function calls itself recursively on the left child (appending '0' to the path) and the right child (appending '1' to the path).
170+
171+
```cpp
172+
// Main function to build Huffman codes
173+
vector<string> huffmanCodes(string S, vector<int> f, int N) {
174+
// Priority queue to store nodes of the Huffman tree; min-heap based on frequency
175+
priority_queue<Node*, vector<Node*>, cmp> pq;
176+
177+
// Step 1: Insert all the nodes into the priority queue (based on frequency)
178+
for(int i = 0; i < N; i++) {
179+
Node* temp = new Node(f[i]); // Create a new node with frequency f[i]
180+
pq.push(temp); // Push the node into the priority queue
181+
}
182+
```
183+
184+
- **`huffmanCodes` Function**: This function is responsible for constructing the Huffman tree and generating the Huffman codes.
185+
- **Priority Queue**: A priority queue (min-heap) `pq` is created to store the nodes of the Huffman tree. The comparator class `cmp` ensures that the node with the smallest frequency is at the top.
186+
- **Step 1**: Nodes are created for each frequency in the array `f[]` and pushed into the priority queue.
187+
188+
```cpp
189+
// Step 2: Build the Huffman tree by combining the two nodes with the smallest frequencies
190+
while(pq.size() > 1) {
191+
// Extract the two nodes with the smallest frequencies
192+
Node* left = pq.top();
193+
pq.pop();
194+
195+
Node* right = pq.top();
196+
pq.pop();
197+
198+
// Create a new internal node with a frequency equal to the sum of the two nodes' frequencies
199+
Node* newNode = new Node(left -> data + right -> data);
200+
201+
// Set the left and right children of the new node
202+
newNode -> left = left;
203+
newNode -> right = right;
204+
205+
// Push the new node back into the priority queue
206+
pq.push(newNode);
207+
}
208+
```
209+
210+
- **Step 2**: **Build the Huffman Tree**
211+
- The two nodes with the smallest frequencies are extracted from the priority queue.
212+
- A new internal node is created with a frequency equal to the sum of the two smallest nodes' frequencies.
213+
- The two smallest nodes become the left and right children of the new internal node.
214+
- This new node is then pushed back into the priority queue.
215+
- This process continues until there is only one node left in the priority queue, which is the root of the Huffman tree.
216+
217+
```cpp
218+
// The final node in the priority queue is the root of the Huffman tree
219+
Node* root = pq.top();
220+
221+
// Step 3: Traverse the Huffman tree to generate the Huffman codes
222+
vector<string> ans; // This will store the final Huffman codes
223+
string temp = ""; // Temporary string to build the code for each character
224+
225+
// Call the helper function to traverse the tree and build the codes
226+
traverse(root, ans, temp);
227+
228+
// Return the final Huffman codes
229+
return ans;
230+
}
231+
};
232+
```
233+
234+
- **Step 3**: **Generate the Huffman Codes**
235+
- The root node of the Huffman tree is retrieved.
236+
- The `traverse` function is called to generate the binary codes by traversing the tree, starting from the root.
237+
- The final Huffman codes are stored in the vector `ans`, and they are returned.
238+
239+
240+
### Example:
241+
242+
Consider the string `S = "abc"` with frequencies `f = [5, 9, 12]`.
243+
244+
1. **Create Nodes**:
245+
- Node 'a' with frequency 5.
246+
- Node 'b' with frequency 9.
247+
- Node 'c' with frequency 12.
248+
249+
2. **Priority Queue**:
250+
- Initially: `[5, 9, 12]` (ordered by frequency).
251+
- Extract two smallest nodes: 'a' (5) and 'b' (9).
252+
- Create a new internal node with frequency `5 + 9 = 14`. Push it back into the priority queue.
253+
- Now the queue contains: `[12, 14]`.
254+
255+
3. **Repeat the Process**:
256+
- Extract the two smallest nodes: 'c' (12) and the new internal node (14).
257+
- Create a new internal node with frequency `12 + 14 = 26` and push it into the priority queue.
258+
- The queue now contains: `[26]` (root node).
259+
260+
4. **Generate Huffman Codes**:
261+
- The final Huffman tree is built, and the codes are generated as follows:
262+
- 'a': `00`
263+
- 'b': `01`
264+
- 'c': `1`
265+
266+
### Time and Space Complexity:
267+
268+
**Time Complexity**:
269+
- **Building the Priority Queue**: Pushing N nodes into a priority queue takes (O(N log N)).
270+
- **Building the Huffman Tree**: Each iteration removes two nodes and inserts one node, which takes (O(log N)) per operation. This is done (N - 1) times, resulting in (O(N log N)) time.
271+
- **Traversing the Tree**: The tree has N nodes, and traversal takes (O(N)).
272+
273+
**Total Time Complexity**: (O(N log N))
274+
275+
**Space Complexity**:
276+
- **Priority Queue**: The priority queue stores N nodes, so it takes (O(N)) space.
277+
- **Huffman Tree**: The tree also contains N nodes, which take (O(N)) space.
278+
279+
**Total Space Complexity**: (O(N))

0 commit comments

Comments
 (0)