|
1 | 1 | # File Compression |
2 | 2 |
|
3 | | -[](https://github.com/ayonious/File-Compression/actions) |
| 3 | +[](https://github.com/ayonious/File-Compression/actions/workflows/build.yml) |
| 4 | +[](https://dl.circleci.com/status-badge/redirect/gh/ayonious/File-Compression/tree/master) |
4 | 5 | [](https://codecov.io/gh/ayonious/File-Compression) |
5 | 6 | [](https://github.com/ayonious/File-Compression/stargazers) |
6 | 7 |
|
7 | | -A Java-based file compression application implementing two classic compression algorithms: |
| 8 | +A File Compression software that helps zip/Unzip files using these 2 algorihtms: |
8 | 9 |
|
9 | | -1. **Huffman Coding** - Frequency-based compression |
10 | | -2. **LZW (Lempel-Ziv-Welch)** - Dictionary-based compression |
| 10 | +1. Huffmans Code |
| 11 | +2. Lempel-Ziv-Wells algorithm |
11 | 12 |
|
12 | | -## Compression Algorithms |
| 13 | +# About Huffmans Code |
13 | 14 |
|
14 | | -### Huffman Coding |
| 15 | +The Huffmans algo creates a 1-1 mapping for each byte of the input file |
| 16 | +and replaces each byte with the mapped bit sequence. For this you need |
| 17 | +to store a dictionary that describes each 1-1 mapping of input byte and |
| 18 | +binary sequence.(which needs extraspace) |
15 | 19 |
|
16 | | -**How it works:** |
17 | | -Huffman coding is a **lossless compression algorithm** that assigns variable-length binary codes to characters based on their frequency. Characters that appear more frequently get shorter codes, while rare characters get longer codes. |
| 20 | +# About Lempel-Ziv-Wells |
18 | 21 |
|
19 | | -**Example:** |
20 | | -``` |
21 | | -Input text: "aabbc" |
22 | | -Frequency: a=2, b=2, c=1 |
23 | | -
|
24 | | -Huffman codes assigned: |
25 | | -a → 0 |
26 | | -b → 10 |
27 | | -c → 11 |
28 | | -
|
29 | | -Compressed: "0 0 10 10 11" = "00101011" (8 bits) |
30 | | -Original: 5 characters × 8 bits = 40 bits |
31 | | -Compression ratio: 80% reduction |
32 | | -``` |
33 | | - |
34 | | -**Key characteristics:** |
35 | | -- Creates a frequency table and binary tree during compression |
36 | | -- Requires storing the frequency table in the compressed file (overhead) |
37 | | -- Works best with files that have **uneven character distribution** |
38 | | -- Typical use cases: Text files, source code, log files |
39 | | - |
40 | | -### LZW (Lempel-Ziv-Welch) |
41 | | - |
42 | | -**How it works:** |
43 | | -LZW is a **dictionary-based compression algorithm** that builds a dictionary of sequences on-the-fly. Instead of replacing individual bytes, it replaces repeated sequences of bytes with dictionary codes. |
44 | | - |
45 | | -**Example:** |
46 | | -``` |
47 | | -Input text: "ABABABA" |
48 | | -
|
49 | | -Initial dictionary (ASCII): |
50 | | -256: A |
51 | | -257: B |
52 | | -... |
53 | | -
|
54 | | -Compression process: |
55 | | -- Read 'A' → output 256, add "AB" to dictionary (512) |
56 | | -- Read 'B' → output 257, add "BA" to dictionary (513) |
57 | | -- Read "AB" (found in dict!) → output 512, add "ABA" to dictionary (514) |
58 | | -- Read "ABA" (found in dict!) → output 514 |
59 | | -
|
60 | | -Compressed: [256, 257, 512, 514] |
61 | | -Original: 7 characters × 8 bits = 56 bits |
62 | | -Compressed: 4 codes × ~9 bits = 36 bits |
63 | | -Compression ratio: 35.7% reduction |
64 | | -``` |
65 | | - |
66 | | -**Key characteristics:** |
67 | | -- **No dictionary is stored** - both compressor and decompressor build the same dictionary |
68 | | -- Replaces **sequences of bytes**, not individual bytes |
69 | | -- Works best with files that have **repetitive patterns** |
70 | | -- Typical use cases: Log files, structured data (JSON, XML), source code with repeated patterns |
| 22 | +Unlike Huffmans code LZW dont need an extra dictionary to be saved. Also |
| 23 | +LZW does not create a mapping to byte to bin sequence. It creates mapping |
| 24 | +of multiple byte to binary sequence. |
71 | 25 |
|
72 | 26 | ## Installation |
73 | 27 |
|
|
0 commit comments