|
1 |
| -README.txt |
2 |
| -Last updated: 1/04/2006 |
3 | 1 |
|
4 |
| -This file is intended to supplement the code found in block1.c. |
| 2 | +MD5 Toolkit |
| 3 | +=========== |
5 | 4 |
|
6 |
| -%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% |
7 |
| -Building the Executable |
8 |
| -%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% |
| 5 | +This package contains code to produce MD5 collisions. The core of the |
| 6 | +toolkit are the two programs, 'block1' and 'block2'. |
9 | 7 |
|
10 |
| -To build the binary, just type 'make' at the command line, or use the command |
| 8 | +block1 produces a pair M_1 and M'_1 of 512-bit blocks that satisfy the |
| 9 | +Wang-differential. It also produces the MD5 chaining value for M_1. |
11 | 10 |
|
12 |
| - gcc -O3 -march=pentium4 block1.c -o block1 |
| 11 | +block2 needs only the M_1 chaining value (the M'_1 chaining value can |
| 12 | +be determined from it), and it then outputs a pair M_2 and M'_2. |
13 | 13 |
|
14 |
| -where the '-march=' flag is changed to the appropriate value. On the |
15 |
| -Pentium 4, use of this flag makes the code roughly twice as fast compared |
16 |
| -to a binary compiled without specifying the architecture. You can also |
17 |
| -use the build.sh script supplied to build the entire toolkit. |
| 14 | +At the end, we have that M_1 || M_2 collides with M'_1 || M'_2 |
| 15 | +under MD5. |
18 | 16 |
|
19 | 17 |
|
20 |
| -%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% |
21 |
| -Usage |
22 |
| -%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% |
| 18 | +Auxiliary Programs |
| 19 | +================== |
23 | 20 |
|
24 |
| -The code can either use the default IV for MD5, or it can take the |
25 |
| -IV as a paramter. In the former case, the code is invoked by |
| 21 | +Both block1 and block2 output their results in ASCII. The program |
| 22 | +'makeblocks' converts the output from these programs into two binary |
| 23 | +files so that MD5 can be run on them ('md5sum' is the typical program |
| 24 | +used for Linux installations). |
26 | 25 |
|
27 |
| - ./block1 |
| 26 | +A shell script 'makebins.sh' is also provided in the makeblocks/ |
| 27 | +directory, which does all the steps needed to produce a sample collision. |
28 | 28 |
|
29 |
| -and in the latter case the code is invoked by |
| 29 | +makebins.sh first runs block1, saving its output in a file 'block1.out'. |
| 30 | +Then it runs 'block2' and saves its output in 'block2.out'. Appending |
| 31 | +these outputs into a single file, it hands them to 'makeblocks' along |
| 32 | +with output filenames b1.bin and b2.bin. |
30 | 33 |
|
31 |
| - ./block 1 <IV> |
| 34 | +b1.bin and b2.bin are distinct files, each 1024-bits long, which collide |
| 35 | +under MD5. md5sum is run on each of these files and the user may verify |
| 36 | +the digests are the same. |
32 | 37 |
|
33 |
| -where IV is any hex value of length 32. As an example, one might invoke the |
34 |
| -code as follows: |
35 | 38 |
|
36 |
| - ./block1 d41d8cd98f00b204e9800998ecf8427e |
| 39 | +Building the Code |
| 40 | +================= |
37 | 41 |
|
| 42 | +sh build.sh |
38 | 43 |
|
39 |
| -%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% |
40 |
| -Output |
41 |
| -%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% |
| 44 | +builds all the code. |
42 | 45 |
|
43 |
| -The output to the program consists of three parts: the output of the |
44 |
| -MD5 compression function on the first of two messages, M and M', and |
45 |
| -then the two 512-bit messages M and M' themselves, |
46 |
| -represented as a 16-tuple of 32-bit hex values. |
47 | 46 |
|
48 |
| -As an example run, we obtained the following output: |
49 | 47 |
|
50 |
| - Chaining value for M: |
51 |
| - ef5f6cf37991e593628c40e6794a54b9 |
52 |
| - M = { 87e85516, 9f820a2d, 1d2c1ea0, 891cc06e, |
53 |
| - b50347ae, 364a887c, 3ada98ae, 62468e31, |
54 |
| - 05352d45, 21333bfe, 8c9ef6b7, 269a3354, |
55 |
| - 6043a0c7, 3f98a2f4, ab400728, 3a995dcd } |
| 48 | +Efficiency |
| 49 | +========== |
56 | 50 |
|
57 |
| - M' = { 87e85516, 9f820a2d, 1d2c1ea0, 891cc06e, |
58 |
| - 350347ae, 364a887c, 3ada98ae, 62468e31, |
59 |
| - 05352d45, 21333bfe, 8c9ef6b7, 269ab354, |
60 |
| - 6043a0c7, 3f98a2f4, 2b400728, 3a995dcd } |
| 51 | +The speed of the code, as usual, depends on several factors: the hardware |
| 52 | +used, the quality of the compiler, and how well the code has been tuned. |
| 53 | +(This under the understanding that the basic algorithm remains the same.) |
61 | 54 |
|
62 |
| -This output is a pair of first-blocks of what will be a pair of two-block |
63 |
| -messages that collide under MD5. The differential is the Wang-differential, |
64 |
| -but several more conditions were specified to speed up the algorithm as |
65 |
| -described in the accompanying paper (and see below). |
| 55 | +We have tested the code on a Pentium4 (Xeon) with g++ 3.4.4. (Other |
| 56 | +architectures can be specified in the build.sh file.) The code |
| 57 | +was also tested with the Intel icc compiler, and it runs significantly |
| 58 | +faster, but this compiler is not as widely available, so we've stuck |
| 59 | +to g++ for our Makefiles. |
66 | 60 |
|
67 |
| -The chaining value for M is all that is needed in order to produce a pair |
68 |
| -of second-blocks to complete the collision-pair. The chaining value above |
69 |
| -is given to a separate program (aptly named 'block2') to accomplish this. |
70 |
| -When M from this program is prepended to M from the block2 program we get |
71 |
| -a two-block message X. When M' from this program is prepended to M' from |
72 |
| -the block2 program, we get a distinct two-block message Y. X and Y will |
73 |
| -collide under MD5. |
| 61 | +The code has not yet been tuned anywhere near its potential. Currently, |
| 62 | +in the environment cited above, the block1 code runs typically from 8 to |
| 63 | +50 minutes before producing a collision. The block2 code runs about 3 seconds |
| 64 | +to 5 minutes, typically. Running 100 trials of each program, we've |
| 65 | +measured 16 mins as the average time for block1 and 2 mins for the average |
| 66 | +time for block2 (with large variations, however). |
74 | 67 |
|
75 |
| -%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% |
76 |
| -md5cond_1.txt |
77 |
| -%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% |
| 68 | +Therefore we expect to see collisions in under 20 mins, though sometimes |
| 69 | +much slower or much faster. |
78 | 70 |
|
79 |
| -This file contains the list of conditions on the step values computed during |
80 |
| -the computation of the MD5 compression function. For 'normal' use of the |
81 |
| -code, no modifications of md5cond_1.txt are necessary. That is, the file |
82 |
| -contains an encoding of all the conditions present in the associated |
83 |
| -paper. One may wish to change md5cond_1.txt only to experiment with |
84 |
| -new conditions or to examine the effect on code running-time when various |
85 |
| -conditions are manipulated, created, or removed. |
| 71 | +Tuning the code makes things run much faster. Using the faster code from |
| 72 | +http://www.stachliu.com/collisions.html |
| 73 | +for the block1 algorithm allows us to produce collisions in about 11 minutes. |
| 74 | +Efforts are underway to tune the code for both blocks; preliminary results |
| 75 | +show we can expect collisions in about 5 mins. |
86 | 76 |
|
87 |
| -Format: |
88 |
| -Each line of the file encodes one condition and is composed of five |
89 |
| -space-delimited numbers. The first number denotes the step number of the |
90 |
| -condition, with acceptable values of 0-63 (68-71 are used for the conditions |
91 |
| -on the chaining values). The second number denotes the index of the bit of |
92 |
| -the condition (values 0-31). The third value denotes either the value of |
93 |
| -that bit (-2 means that bit should be zero, and -1 means that bit should be |
94 |
| -one) or the index of the step value that it should be compared to. The |
95 |
| -fourth and fifth values are used only when the condition refers to another |
96 |
| -step value and they represent the bit index and additive constant to that |
97 |
| -value, respectively. |
| 77 | +Wang's original code ran in about an hour on an IBM P690 supercomputer. |
| 78 | +The Stach-Liu code cited above runs in about 45 minutes on a Pentium4, |
| 79 | +however their code sometimes produces block1 pairs that have no block2 |
| 80 | +solutions (our code never does this). |
98 | 81 |
|
99 |
| -Examples: |
100 |
| -3 5 -2 0 0 |
101 |
| -Bit 5 on step value 3 should be 0. |
102 | 82 |
|
103 |
| -27 31 -1 0 0 |
104 |
| -Bit 31 on step value 27 should be 1. |
| 83 | +Applications |
| 84 | +============ |
105 | 85 |
|
106 |
| -15 31 14 31 0 |
107 |
| -Bit 31 on step value 15 should be the same as bit 31 on step value 14. |
| 86 | +We designed the toolkit so that it would be easy to generates various |
| 87 | +flavors of collisions. For example, many collisions where the first |
| 88 | +blocks are the same between a given pair, but the second blocks vary. |
| 89 | +(We can get lots of these fast because our block2 code is quite fast |
| 90 | +even before we have tuned it much.) |
| 91 | + |
| 92 | +Also, we can generate collisions for arbitrary IVs. This is useful for |
| 93 | +attacks like the Daum-Lucks attack (http://www.cits.rub.de/MD5Collisions). |
| 94 | +As an application of their idea we produced two binaries that have the |
| 95 | +same MD5 digest but where binary1 prints "hello world" and binary2 |
| 96 | +prints "I am erasing your hard disk" (even though are program really doesn't |
| 97 | +erase your hard disk). This is done by having the program check to see |
| 98 | +whether msg1 is in a given array, or msg2 is (where msg1 and msg2 are |
| 99 | +1024-bit values that collide under MD5 with the chaining value resulting |
| 100 | +from hashing the binary to the point where the msg1 or msg2 value occurs). |
108 | 101 |
|
109 |
| -63 31 61 31 1 |
110 |
| -Bit 31 on step value 63 should be the opposite of bit 31 on step value 61. |
0 commit comments