Commit d86f9e4
committed
Calculate and report weighted mean of entropy blocks
In most cases where entropy is calculated, and have more than 1K
input, the file is split up into multiple "blocks" of minimum 1K size,
and the entropy mean was calculated as the mean of the block-entropies.
This introduced a bias, as the last block is usually not the same size
as all the others, so its entropy had more say on the mean, than the
other blocks.
The above has more significant effect on files smaller than 80K, as that
is the limit, where a different block size can level the differences in
block sizes.
Let's look at an extreme example of an encrypted file of size 1025.
This would give us 2 "blocks" of sizes 1024 and 1 with entropies scaled
to "percentages" (0-100) for these blocks as ~100 and 0 respectively.
In this case naive mean is
~50 = (~100 + 0) / 2
in contrast, the weighted mean is a much better approximate:
~99.9 = (~100 * 1024 + 0 * 1) / (1024 + 1)1 parent 2971550 commit d86f9e4
3 files changed
+12
-12
lines changed| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
378 | 378 | | |
379 | 379 | | |
380 | 380 | | |
381 | | - | |
382 | | - | |
383 | | - | |
384 | | - | |
385 | 381 | | |
386 | 382 | | |
387 | | - | |
| 383 | + | |
| 384 | + | |
| 385 | + | |
388 | 386 | | |
389 | 387 | | |
390 | 388 | | |
391 | 389 | | |
392 | 390 | | |
393 | 391 | | |
394 | | - | |
| 392 | + | |
395 | 393 | | |
396 | 394 | | |
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
521 | 521 | | |
522 | 522 | | |
523 | 523 | | |
| 524 | + | |
524 | 525 | | |
525 | 526 | | |
526 | 527 | | |
527 | 528 | | |
528 | 529 | | |
| 530 | + | |
529 | 531 | | |
530 | | - | |
| 532 | + | |
| 533 | + | |
| 534 | + | |
| 535 | + | |
| 536 | + | |
531 | 537 | | |
532 | 538 | | |
533 | 539 | | |
| |||
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
1 | 1 | | |
2 | 2 | | |
3 | 3 | | |
4 | | - | |
5 | 4 | | |
6 | 5 | | |
7 | 6 | | |
| |||
177 | 176 | | |
178 | 177 | | |
179 | 178 | | |
180 | | - | |
181 | | - | |
182 | | - | |
183 | | - | |
| 179 | + | |
184 | 180 | | |
185 | 181 | | |
186 | 182 | | |
| |||
0 commit comments