Skip to content

Commit 82ceb8b

Browse files
committed
Update README.txt
1 parent 0fd752e commit 82ceb8b

File tree

1 file changed

+73
-82
lines changed

1 file changed

+73
-82
lines changed

README.txt

Lines changed: 73 additions & 82 deletions
Original file line numberDiff line numberDiff line change
@@ -1,110 +1,101 @@
1-
README.txt
2-
Last updated: 1/04/2006
31

4-
This file is intended to supplement the code found in block1.c.
2+
MD5 Toolkit
3+
===========
54

6-
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
7-
Building the Executable
8-
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
5+
This package contains code to produce MD5 collisions. The core of the
6+
toolkit are the two programs, 'block1' and 'block2'.
97

10-
To build the binary, just type 'make' at the command line, or use the command
8+
block1 produces a pair M_1 and M'_1 of 512-bit blocks that satisfy the
9+
Wang-differential. It also produces the MD5 chaining value for M_1.
1110

12-
gcc -O3 -march=pentium4 block1.c -o block1
11+
block2 needs only the M_1 chaining value (the M'_1 chaining value can
12+
be determined from it), and it then outputs a pair M_2 and M'_2.
1313

14-
where the '-march=' flag is changed to the appropriate value. On the
15-
Pentium 4, use of this flag makes the code roughly twice as fast compared
16-
to a binary compiled without specifying the architecture. You can also
17-
use the build.sh script supplied to build the entire toolkit.
14+
At the end, we have that M_1 || M_2 collides with M'_1 || M'_2
15+
under MD5.
1816

1917

20-
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
21-
Usage
22-
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
18+
Auxiliary Programs
19+
==================
2320

24-
The code can either use the default IV for MD5, or it can take the
25-
IV as a paramter. In the former case, the code is invoked by
21+
Both block1 and block2 output their results in ASCII. The program
22+
'makeblocks' converts the output from these programs into two binary
23+
files so that MD5 can be run on them ('md5sum' is the typical program
24+
used for Linux installations).
2625

27-
./block1
26+
A shell script 'makebins.sh' is also provided in the makeblocks/
27+
directory, which does all the steps needed to produce a sample collision.
2828

29-
and in the latter case the code is invoked by
29+
makebins.sh first runs block1, saving its output in a file 'block1.out'.
30+
Then it runs 'block2' and saves its output in 'block2.out'. Appending
31+
these outputs into a single file, it hands them to 'makeblocks' along
32+
with output filenames b1.bin and b2.bin.
3033

31-
./block 1 <IV>
34+
b1.bin and b2.bin are distinct files, each 1024-bits long, which collide
35+
under MD5. md5sum is run on each of these files and the user may verify
36+
the digests are the same.
3237

33-
where IV is any hex value of length 32. As an example, one might invoke the
34-
code as follows:
3538

36-
./block1 d41d8cd98f00b204e9800998ecf8427e
39+
Building the Code
40+
=================
3741

42+
sh build.sh
3843

39-
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
40-
Output
41-
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
44+
builds all the code.
4245

43-
The output to the program consists of three parts: the output of the
44-
MD5 compression function on the first of two messages, M and M', and
45-
then the two 512-bit messages M and M' themselves,
46-
represented as a 16-tuple of 32-bit hex values.
4746

48-
As an example run, we obtained the following output:
4947

50-
Chaining value for M:
51-
ef5f6cf37991e593628c40e6794a54b9
52-
M = { 87e85516, 9f820a2d, 1d2c1ea0, 891cc06e,
53-
b50347ae, 364a887c, 3ada98ae, 62468e31,
54-
05352d45, 21333bfe, 8c9ef6b7, 269a3354,
55-
6043a0c7, 3f98a2f4, ab400728, 3a995dcd }
48+
Efficiency
49+
==========
5650

57-
M' = { 87e85516, 9f820a2d, 1d2c1ea0, 891cc06e,
58-
350347ae, 364a887c, 3ada98ae, 62468e31,
59-
05352d45, 21333bfe, 8c9ef6b7, 269ab354,
60-
6043a0c7, 3f98a2f4, 2b400728, 3a995dcd }
51+
The speed of the code, as usual, depends on several factors: the hardware
52+
used, the quality of the compiler, and how well the code has been tuned.
53+
(This under the understanding that the basic algorithm remains the same.)
6154

62-
This output is a pair of first-blocks of what will be a pair of two-block
63-
messages that collide under MD5. The differential is the Wang-differential,
64-
but several more conditions were specified to speed up the algorithm as
65-
described in the accompanying paper (and see below).
55+
We have tested the code on a Pentium4 (Xeon) with g++ 3.4.4. (Other
56+
architectures can be specified in the build.sh file.) The code
57+
was also tested with the Intel icc compiler, and it runs significantly
58+
faster, but this compiler is not as widely available, so we've stuck
59+
to g++ for our Makefiles.
6660

67-
The chaining value for M is all that is needed in order to produce a pair
68-
of second-blocks to complete the collision-pair. The chaining value above
69-
is given to a separate program (aptly named 'block2') to accomplish this.
70-
When M from this program is prepended to M from the block2 program we get
71-
a two-block message X. When M' from this program is prepended to M' from
72-
the block2 program, we get a distinct two-block message Y. X and Y will
73-
collide under MD5.
61+
The code has not yet been tuned anywhere near its potential. Currently,
62+
in the environment cited above, the block1 code runs typically from 8 to
63+
50 minutes before producing a collision. The block2 code runs about 3 seconds
64+
to 5 minutes, typically. Running 100 trials of each program, we've
65+
measured 16 mins as the average time for block1 and 2 mins for the average
66+
time for block2 (with large variations, however).
7467

75-
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
76-
md5cond_1.txt
77-
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
68+
Therefore we expect to see collisions in under 20 mins, though sometimes
69+
much slower or much faster.
7870

79-
This file contains the list of conditions on the step values computed during
80-
the computation of the MD5 compression function. For 'normal' use of the
81-
code, no modifications of md5cond_1.txt are necessary. That is, the file
82-
contains an encoding of all the conditions present in the associated
83-
paper. One may wish to change md5cond_1.txt only to experiment with
84-
new conditions or to examine the effect on code running-time when various
85-
conditions are manipulated, created, or removed.
71+
Tuning the code makes things run much faster. Using the faster code from
72+
http://www.stachliu.com/collisions.html
73+
for the block1 algorithm allows us to produce collisions in about 11 minutes.
74+
Efforts are underway to tune the code for both blocks; preliminary results
75+
show we can expect collisions in about 5 mins.
8676

87-
Format:
88-
Each line of the file encodes one condition and is composed of five
89-
space-delimited numbers. The first number denotes the step number of the
90-
condition, with acceptable values of 0-63 (68-71 are used for the conditions
91-
on the chaining values). The second number denotes the index of the bit of
92-
the condition (values 0-31). The third value denotes either the value of
93-
that bit (-2 means that bit should be zero, and -1 means that bit should be
94-
one) or the index of the step value that it should be compared to. The
95-
fourth and fifth values are used only when the condition refers to another
96-
step value and they represent the bit index and additive constant to that
97-
value, respectively.
77+
Wang's original code ran in about an hour on an IBM P690 supercomputer.
78+
The Stach-Liu code cited above runs in about 45 minutes on a Pentium4,
79+
however their code sometimes produces block1 pairs that have no block2
80+
solutions (our code never does this).
9881

99-
Examples:
100-
3 5 -2 0 0
101-
Bit 5 on step value 3 should be 0.
10282

103-
27 31 -1 0 0
104-
Bit 31 on step value 27 should be 1.
83+
Applications
84+
============
10585

106-
15 31 14 31 0
107-
Bit 31 on step value 15 should be the same as bit 31 on step value 14.
86+
We designed the toolkit so that it would be easy to generates various
87+
flavors of collisions. For example, many collisions where the first
88+
blocks are the same between a given pair, but the second blocks vary.
89+
(We can get lots of these fast because our block2 code is quite fast
90+
even before we have tuned it much.)
91+
92+
Also, we can generate collisions for arbitrary IVs. This is useful for
93+
attacks like the Daum-Lucks attack (http://www.cits.rub.de/MD5Collisions).
94+
As an application of their idea we produced two binaries that have the
95+
same MD5 digest but where binary1 prints "hello world" and binary2
96+
prints "I am erasing your hard disk" (even though are program really doesn't
97+
erase your hard disk). This is done by having the program check to see
98+
whether msg1 is in a given array, or msg2 is (where msg1 and msg2 are
99+
1024-bit values that collide under MD5 with the chaining value resulting
100+
from hashing the binary to the point where the msg1 or msg2 value occurs).
108101

109-
63 31 61 31 1
110-
Bit 31 on step value 63 should be the opposite of bit 31 on step value 61.

0 commit comments

Comments
 (0)