@@ -114,12 +114,18 @@ $ cat alpine-report.json
114114]
115115```
116116
117- ### Entropy calculation
117+ ### Randomness calculation
118118
119119If you are analyzing an unknown file format, it might be useful to know the
120- entropy of the contained files, so you can quickly see for example whether the
120+ randomness of the contained files, so you can quickly see for example whether the
121121file is ** encrypted** or contains some random content.
122122
123+ Two values are calculated as part of randomness measurements:
124+ - Shannon's entropy
125+ - χ² probability
126+
127+ You can find detailed information about both measures [ here] ( https://www.fourmilab.ch/random/ ) .
128+
123129Let's make a file with fully random content at the start and end:
124130
125131``` console
@@ -128,59 +134,61 @@ $ dd if=/dev/random of=random2.bin bs=10M count=1
128134$ cat random1.bin alpine-minirootfs-3.16.1-x86_64.tar.gz random2.bin > unknown-file
129135```
130136
131- A nice ASCII entropy plot is drawn on verbose level 3:
137+ A nice ASCII randomness plot is drawn on verbose level 3:
132138
133139``` console
134140$ unblob -vvv unknown-file | grep -C 15 " Entropy distribution"
135141
136- 2022-07-30 07:58.16 [debug ] Ended searching for chunks all_chunks=[0xa00000-0xc96196] pid=19803
137- 2022-07-30 07:58.16 [debug ] Removed inner chunks outer_chunk_count=1 pid=19803 removed_inner_chunk_count=0
138- 2022-07-30 07:58.16 [warning ] Found unknown Chunks chunks=[0x0-0xa00000, 0xc96196-0x1696196] pid=19803
139- 2022-07-30 07:58.16 [info ] Extracting unknown chunk chunk=0x0-0xa00000 path=unknown-file_extract/0-10485760.unknown pid=19803
140- 2022-07-30 07:58.16 [debug ] Carving chunk path=unknown-file_extract/0-10485760.unknown pid=19803
141- 2022-07-30 07:58.16 [debug ] Calculating entropy for file path=unknown-file_extract/0-10485760.unknown pid=19803 size=0xa00000
142- 2022-07-30 07:58.16 [debug ] Entropy calculated highest=99.99 lowest=99.98 mean=99.98 pid=19803
143- 2022-07-30 07:58.16 [warning ] Drawing plot pid=19803
144- 2022-07-30 07:58.16 [debug ] Entropy chart chart=
145- Entropy distribution
146- ┌---------------------------------------------------------------------------┐
147- 100┤•••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••│
148- 90┤ │
149- 80┤ │
150- 70┤ │
151- 60┤ │
152- 50┤ │
153- 40┤ │
154- 30┤ │
155- 20┤ │
156- 10┤ │
157- 0┤ │
158- └┬---┬---┬---─┬--─┬--─┬--─┬--─┬--─┬--─┬--─┬--─┬--─┬--─┬--─┬--─┬--─┬--─┬--─┬--─┬┘
159- 1 4 7 12 16 20 24 29 33 37 41 46 50 54 59 63 67 71 76 80
160- [y] entropy % [x] mB
161- pid=19803
162- 2022-07-30 07:58.16 [info ] Extracting unknown chunk chunk=0xc96196-0x1696196 path=unknown-file_extract/13197718-23683478.unknown pid=19803
163- 2022-07-30 07:58.16 [debug ] Carving chunk path=unknown-file_extract/13197718-23683478.unknown pid=19803
164- 2022-07-30 07:58.16 [debug ] Calculating entropy for file path=unknown-file_extract/13197718-23683478.unknown pid=19803 size=0xa00000
165- 2022-07-30 07:58.16 [debug ] Entropy calculated highest=99.99 lowest=99.98 mean=99.98 pid=19803
166- 2022-07-30 07:58.16 [warning ] Drawing plot pid=19803
167- 2022-07-30 07:58.16 [debug ] Entropy chart chart=
168- Entropy distribution
169- ┌---------------------------------------------------------------------------┐
170- 100┤•••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••│
171- 90┤ │
172- 80┤ │
173- 70┤ │
174- 60┤ │
175- 50┤ │
176- 40┤ │
177- 30┤ │
178- 20┤ │
179- 10┤ │
180- 0┤ │
181- └┬---┬---┬---─┬--─┬--─┬--─┬--─┬--─┬--─┬--─┬--─┬--─┬--─┬--─┬--─┬--─┬--─┬--─┬--─┬┘
182- 1 4 7 12 16 20 24 29 33 37 41 46 50 54 59 63 67 71 76 80
183- [y] entropy % [x] mB
142+ 2024-10-30 10:52.03 [debug ] Calculating chunk for pattern match handler=arc pid=1963719 real_offset=0x1685f5b start_offset=0x1685f5b
143+ 2024-10-30 10:52.03 [debug ] Header parsed header=<arc_head archive_marker=0x1a, header_type=0x1, name=b'8\xa7i&po\xc77\xd5h\x9a\x9d\xf1', size=0x26d171fa, date=0x1bfd, time=0xe03f, crc=-0x3b95, length=0x349997d5> pid=1963719
144+ 2024-10-30 10:52.03 [debug ] Ended searching for chunks all_chunks=[0xa00000-0xc96196] pid=1963719
145+ 2024-10-30 10:52.03 [debug ] Removed inner chunks outer_chunk_count=1 pid=1963719 removed_inner_chunk_count=0
146+ 2024-10-30 10:52.03 [warning ] Found unknown Chunks chunks=[0x0-0xa00000, 0xc96196-0x1696196] pid=1963719
147+ 2024-10-30 10:52.03 [info ] Extracting unknown chunk chunk=0x0-0xa00000 path=unknown-file_extract/0-10485760.unknown pid=1963719
148+ 2024-10-30 10:52.03 [debug ] Carving chunk path=unknown-file_extract/0-10485760.unknown pid=1963719
149+ 2024-10-30 10:52.03 [debug ] Calculating randomness for file path=unknown-file_extract/0-10485760.unknown pid=1963719 size=0xa00000
150+ 2024-10-30 10:52.03 [debug ] Shannon entropy calculated block_size=0x20000 highest=99.99 lowest=99.98 mean=99.98 path=unknown-file_extract/0-10485760.unknown pid=1963719 size=0xa00000
151+ 2024-10-30 10:52.03 [debug ] Chi square probability calculated block_size=0x20000 highest=97.88 lowest=3.17 mean=52.76 path=unknown-file_extract/0-10485760.unknown pid=1963719 size=0xa00000
152+ 2024-10-30 10:52.03 [debug ] Entropy chart chart=
153+ Randomness distribution
154+ ┌───────────────────────────────────────────────────────────────────────────┐
155+ 100┤ •• Shannon entropy (%) •••••••••♰••••••••••••••••••••••••••••••••••│
156+ 90┤ ♰♰ Chi square probability (%) ♰ ♰ ♰♰♰♰ ♰ ♰ ♰ │
157+ 80┤♰ ♰ ♰♰ ♰♰ ♰♰ ♰ ♰ ♰♰♰♰♰♰♰♰♰ ♰ ♰♰♰♰♰♰ ♰♰ ♰♰ │
158+ 70┤♰♰♰♰ ♰ ♰ ♰ ♰ ♰♰♰ ♰ ♰ ♰ ♰ ♰♰♰♰♰♰♰♰♰ ♰♰ ♰ ♰ ♰ ♰♰♰ ♰♰♰♰♰♰ │
159+ 60┤♰♰♰♰ ♰♰ ♰♰ ♰ ♰♰♰♰ ♰ ♰♰ ♰ ♰ ♰ ♰♰♰♰♰♰ ♰♰ ♰ ♰ ♰♰♰♰ ♰ ♰♰♰ ♰♰♰♰♰♰♰ │
160+ 50┤ ♰♰♰ ♰♰ ♰♰ ♰♰ ♰♰♰♰ ♰♰ ♰ ♰♰♰ ♰♰♰♰♰♰ ♰ ♰ ♰ ♰♰♰♰♰ ♰ ♰♰♰ ♰ ♰♰♰♰♰ ♰ │
161+ 40┤ ♰♰ ♰♰ ♰ ♰♰ ♰♰♰♰ ♰♰ ♰ ♰♰♰ ♰♰♰♰♰♰ ♰♰ ♰♰ ♰♰♰♰♰♰ ♰ ♰♰♰ ♰ ♰♰♰♰ ♰♰ ♰│
162+ 30┤ ♰ ♰♰ ♰♰ ♰♰♰♰ ♰ ♰♰ ♰♰ ♰♰ ♰ ♰♰ ♰ ♰ ♰♰♰ ♰ ♰ ♰♰ ♰ ♰♰♰ ♰♰ ♰ │
163+ 20┤ ♰♰ ♰♰ ♰♰♰ ♰ ♰♰ ♰ ♰♰ ♰ ♰ ♰ ♰ ♰ ♰ ♰♰ │
164+ 10┤ ♰ ♰ ♰ ♰ ♰ ♰♰ ♰ ♰ ♰♰ │
165+ 0┤ ♰ ♰ │
166+ └─┬──┬─┬──┬────┬───┬──┬──┬──┬───┬───┬──┬────┬───┬────┬──┬──┬────┬──┬───┬──┬─┘
167+ 0 2 5 7 11 16 20 23 27 30 34 38 42 47 51 56 60 63 68 71 76 79
168+ 131072 bytes
169+ path=unknown-file_extract/0-10485760.unknown pid=1963719
170+ 2024-10-30 10:52.03 [info ] Extracting unknown chunk chunk=0xc96196-0x1696196 path=unknown-file_extract/13197718-23683478.unknown pid=1963719
171+ 2024-10-30 10:52.03 [debug ] Carving chunk path=unknown-file_extract/13197718-23683478.unknown pid=1963719
172+ 2024-10-30 10:52.03 [debug ] Calculating randomness for file path=unknown-file_extract/13197718-23683478.unknown pid=1963719 size=0xa00000
173+ 2024-10-30 10:52.03 [debug ] Shannon entropy calculated block_size=0x20000 highest=99.99 lowest=99.98 mean=99.98 path=unknown-file_extract/13197718-23683478.unknown pid=1963719 size=0xa00000
174+ 2024-10-30 10:52.03 [debug ] Chi square probability calculated block_size=0x20000 highest=99.03 lowest=0.23 mean=42.62 path=unknown-file_extract/13197718-23683478.unknown pid=1963719 size=0xa00000
175+ 2024-10-30 10:52.03 [debug ] Entropy chart chart=
176+ Randomness distribution
177+ ┌───────────────────────────────────────────────────────────────────────────┐
178+ 100┤ •• Shannon entropy (%) •••••••••••••••••••••♰••••••••••••••••••••••│
179+ 90┤ ♰♰ Chi square probability (%) ♰ ♰♰ ♰ │
180+ 80┤♰♰ ♰♰ ♰♰ ♰ ♰♰ ♰ ♰♰ ♰ ♰♰ │
181+ 70┤♰ ♰ ♰ ♰ ♰ ♰ ♰ ♰ ♰ ♰ ♰♰ ♰♰ ♰♰♰ ♰ ♰♰ ♰♰ │
182+ 60┤ ♰ ♰♰ ♰ ♰ ♰ ♰ ♰♰♰♰♰ ♰♰ ♰♰ ♰♰ ♰ ♰ ♰♰♰ ♰♰ ♰ ♰ ♰♰ ♰ │
183+ 50┤ ♰ ♰♰♰ ♰ ♰ ♰ ♰ ♰ ♰♰♰♰ ♰ ♰♰ ♰ ♰♰♰ ♰ ♰ ♰ ♰♰♰ ♰♰ ♰ ♰ ♰♰ ♰♰ ♰ │
184+ 40┤ ♰♰♰♰ ♰♰ ♰♰ ♰ ♰ ♰♰ ♰♰♰ ♰♰♰ ♰♰♰ ♰♰ ♰ ♰ ♰ ♰♰ ♰ ♰♰ ♰ ♰ ♰ ♰ ♰♰♰ ♰♰ │
185+ 30┤ ♰♰♰♰ ♰♰ ♰♰ ♰♰ ♰♰ ♰♰ ♰♰♰♰♰ ♰♰ ♰ ♰ ♰ ♰♰ ♰♰♰ ♰ ♰ ♰ ♰ ♰ ♰ ♰ ♰│
186+ 20┤ ♰♰♰ ♰ ♰ ♰♰ ♰♰ ♰♰♰♰ ♰♰ ♰ ♰ ♰ ♰♰ ♰♰ ♰ ♰♰ ♰♰ ♰ ♰ │
187+ 10┤ ♰ ♰ ♰ ♰ ♰ ♰ ♰ ♰♰ ♰ ♰♰ ♰♰ ♰♰ ♰ ♰ ♰ │
188+ 0┤ ♰ ♰ ♰♰ ♰ ♰♰ │
189+ └─┬──┬─┬──┬────┬───┬──┬──┬──┬───┬───┬──┬────┬───┬────┬──┬──┬────┬──┬───┬──┬─┘
190+ 0 2 5 7 11 16 20 23 27 30 34 38 42 47 51 56 60 63 68 71 76 79
191+ 131072 bytes
184192```
185193
186194### Skip extraction with file magic
0 commit comments