Skip to content

Commit f0ed822

Browse files
Marius Storm-Olsengitster
authored andcommitted
Add custom memory allocator to MinGW and MacOS builds
The standard allocator on Windows is pretty bad prior to Windows Vista, and nedmalloc is better than the modified dlmalloc provided with newer versions of the MinGW libc. NedMalloc stats in Git ---------------------- All results are the best result out of 3 runs. The benchmarks have been done on different hardware, so the repack times are not comparable. These benchmarks are all based on 'git repack -adf' on the Linux kernel. XP ----------------------------------------------- MinGW Threads Total Time Speed ----------------------------------------------- 3.4.2 (1T) 00:12:28.422 3.4.2 + nedmalloc (1T) 00:07:25.437 1.68x 3.4.5 (1T) 00:12:20.718 3.4.5 + nedmalloc (1T) 00:07:24.809 1.67x 4.3.3-tdm (1T) 00:12:01.843 4.3.3-tdm + nedmalloc (1T) 00:07:16.468 1.65x 4.3.3-tdm (2T) 00:07:35.062 4.3.3-tdm + nedmalloc (2T) 00:04:57.874 1.54x Vista ----------------------------------------------- MinGW Threads Total Time Speed ----------------------------------------------- 4.3.3-tdm (1T) 00:07:40.844 4.3.3-tdm + nedmalloc (1T) 00:07:17.548 1.05x 4.3.3-tdm (2T) 00:05:33.746 4.3.3-tdm + nedmalloc (2T) 00:05:27.334 1.02x Mac Mini ----------------------------------------------- GCC Threads Total Time Speed ----------------------------------------------- i686-darwin9-4.0.1 (2T) 00:09:57.346 i686-darwin9-4.0.1+ned (2T) 00:08:51.072 1.12x Signed-off-by: Marius Storm-Olsen <[email protected]> Signed-off-by: Steffen Prohaska <[email protected]> Signed-off-by: Junio C Hamano <[email protected]>
1 parent e16c60d commit f0ed822

File tree

6 files changed

+7064
-0
lines changed

6 files changed

+7064
-0
lines changed

Makefile

Lines changed: 9 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -178,6 +178,9 @@ all::
178178
#
179179
# Define NO_CROSS_DIRECTORY_HARDLINKS if you plan to distribute the installed
180180
# programs as a tar, where bin/ and libexec/ might be on different file systems.
181+
#
182+
# Define USE_NED_ALLOCATOR if you want to replace the platforms default
183+
# memory allocators with the nedmalloc allocator written by Niall Douglas.
181184

182185
GIT-VERSION-FILE: .FORCE-GIT-VERSION-FILE
183186
@$(SHELL_PATH) ./GIT-VERSION-GEN
@@ -844,6 +847,7 @@ ifneq (,$(findstring MINGW,$(uname_S)))
844847
NO_ST_BLOCKS_IN_STRUCT_STAT = YesPlease
845848
NO_NSEC = YesPlease
846849
USE_WIN32_MMAP = YesPlease
850+
USE_NED_ALLOCATOR = YesPlease
847851
UNRELIABLE_FSTAT = UnfortunatelyYes
848852
OBJECT_CREATION_USES_RENAMES = UnfortunatelyNeedsTo
849853
COMPAT_CFLAGS += -D__USE_MINGW_ACCESS -DNOGDI -Icompat -Icompat/regex -Icompat/fnmatch
@@ -1130,6 +1134,11 @@ ifdef UNRELIABLE_FSTAT
11301134
BASIC_CFLAGS += -DUNRELIABLE_FSTAT
11311135
endif
11321136

1137+
ifdef USE_NED_ALLOCATOR
1138+
COMPAT_CFLAGS += -DUSE_NED_ALLOCATOR -DOVERRIDE_STRDUP -DNDEBUG -DREPLACE_SYSTEM_ALLOCATOR -Icompat/nedmalloc
1139+
COMPAT_OBJS += compat/nedmalloc/nedmalloc.o
1140+
endif
1141+
11331142
ifeq ($(TCLTK_PATH),)
11341143
NO_TCLTK=NoThanks
11351144
endif

compat/nedmalloc/License.txt

Lines changed: 23 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,23 @@
1+
Boost Software License - Version 1.0 - August 17th, 2003
2+
3+
Permission is hereby granted, free of charge, to any person or organization
4+
obtaining a copy of the software and accompanying documentation covered by
5+
this license (the "Software") to use, reproduce, display, distribute,
6+
execute, and transmit the Software, and to prepare derivative works of the
7+
Software, and to permit third-parties to whom the Software is furnished to
8+
do so, all subject to the following:
9+
10+
The copyright notices in the Software and this entire statement, including
11+
the above license grant, this restriction and the following disclaimer,
12+
must be included in all copies of the Software, in whole or in part, and
13+
all derivative works of the Software, unless such copies or derivative
14+
works are solely in the form of machine-executable object code generated by
15+
a source language processor.
16+
17+
THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
18+
IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
19+
FITNESS FOR A PARTICULAR PURPOSE, TITLE AND NON-INFRINGEMENT. IN NO EVENT
20+
SHALL THE COPYRIGHT HOLDERS OR ANYONE DISTRIBUTING THE SOFTWARE BE LIABLE
21+
FOR ANY DAMAGES OR OTHER LIABILITY, WHETHER IN CONTRACT, TORT OR OTHERWISE,
22+
ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER
23+
DEALINGS IN THE SOFTWARE.

compat/nedmalloc/Readme.txt

Lines changed: 136 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,136 @@
1+
nedalloc v1.05 15th June 2008:
2+
-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=
3+
4+
by Niall Douglas (http://www.nedprod.com/programs/portable/nedmalloc/)
5+
6+
Enclosed is nedalloc, an alternative malloc implementation for multiple
7+
threads without lock contention based on dlmalloc v2.8.4. It is more
8+
or less a newer implementation of ptmalloc2, the standard allocator in
9+
Linux (which is based on dlmalloc v2.7.0) but also contains a per-thread
10+
cache for maximum CPU scalability.
11+
12+
It is licensed under the Boost Software License which basically means
13+
you can do anything you like with it. This does not apply to the malloc.c.h
14+
file which remains copyright to others.
15+
16+
It has been tested on win32 (x86), win64 (x64), Linux (x64), FreeBSD (x64)
17+
and Apple MacOS X (x86). It works very well on all of these and is very
18+
significantly faster than the system allocator on all of these platforms.
19+
20+
By literally dropping in this allocator as a replacement for your system
21+
allocator, you can see real world improvements of up to three times in normal
22+
code!
23+
24+
To use:
25+
-=-=-=-
26+
Drop in nedmalloc.h, nedmalloc.c and malloc.c.h into your project.
27+
Configure using the instructions in nedmalloc.h. Run and enjoy.
28+
29+
To test, compile test.c. It will run a comparison between your system
30+
allocator and nedalloc and tell you how much faster nedalloc is. It also
31+
serves as an example of usage.
32+
33+
Notes:
34+
-=-=-=
35+
If you want the very latest version of this allocator, get it from the
36+
TnFOX SVN repository at svn://svn.berlios.de/viewcvs/tnfox/trunk/src/nedmalloc
37+
38+
Because of how nedalloc allocates an mspace per thread, it can cause
39+
severe bloating of memory usage under certain allocation patterns.
40+
You can substantially reduce this wastage by setting MAXTHREADSINPOOL
41+
or the threads parameter to nedcreatepool() to a fraction of the number of
42+
threads which would normally be in a pool at once. This will reduce
43+
bloating at the cost of an increase in lock contention. If allocated size
44+
is less than THREADCACHEMAX, locking is avoided 90-99% of the time and
45+
if most of your allocations are below this value, you can safely set
46+
MAXTHREADSINPOOL to one.
47+
48+
You will suffer memory leakage unless you call neddisablethreadcache()
49+
per pool for every thread which exits. This is because nedalloc cannot
50+
portably know when a thread exits and thus when its thread cache can
51+
be returned for use by other code. Don't forget pool zero, the system pool.
52+
53+
For C++ type allocation patterns (where the same sizes of memory are
54+
regularly allocated and deallocated as objects are created and destroyed),
55+
the threadcache always benefits performance. If however your allocation
56+
patterns are different, searching the threadcache may significantly slow
57+
down your code - as a rule of thumb, if cache utilisation is below 80%
58+
(see the source for neddisablethreadcache() for how to enable debug
59+
printing in release mode) then you should disable the thread cache for
60+
that thread. You can compile out the threadcache code by setting
61+
THREADCACHEMAX to zero.
62+
63+
Speed comparisons:
64+
-=-=-=-=-=-=-=-=-=
65+
See Benchmarks.xls for details.
66+
67+
The enclosed test.c can do two things: it can be a torture test or a speed
68+
test. The speed test is designed to be a representative synthetic
69+
memory allocator test. It works by randomly mixing allocations with frees
70+
with half of the allocation sizes being a two power multiple less than
71+
512 bytes (to mimic C++ stack instantiated objects) and the other half
72+
being a simple random value less than 16Kb.
73+
74+
The real world code results are from Tn's TestIO benchmark. This is a
75+
heavily multithreaded and memory intensive benchmark with a lot of branching
76+
and other stuff modern processors don't like so much. As you'll note, the
77+
test doesn't show the benefits of the threadcache mostly due to the saturation
78+
of the memory bus being the limiting factor.
79+
80+
ChangeLog:
81+
-=-=-=-=-=
82+
v1.05 15th June 2008:
83+
* { 1042 } Added error check for TLSSET() and TLSFREE() macros. Thanks to
84+
Markus Elfring for reporting this.
85+
* { 1043 } Fixed a segfault when freeing memory allocated using
86+
nedindependent_comalloc(). Thanks to Pavel Vozenilek for reporting this.
87+
88+
v1.04 14th July 2007:
89+
* Fixed a bug with the new optimised implementation that failed to lock
90+
on a realloc under certain conditions.
91+
* Fixed lack of thread synchronisation in InitPool() causing pool corruption
92+
* Fixed a memory leak of thread cache contents on disabling. Thanks to Earl
93+
Chew for reporting this.
94+
* Added a sanity check for freed blocks being valid.
95+
* Reworked test.c into being a torture test.
96+
* Fixed GCC assembler optimisation misspecification
97+
98+
v1.04alpha_svn915 7th October 2006:
99+
* Fixed failure to unlock thread cache list if allocating a new list failed.
100+
Thanks to Dmitry Chichkov for reporting this. Futher thanks to Aleksey Sanin.
101+
* Fixed realloc(0, <size>) segfaulting. Thanks to Dmitry Chichkov for
102+
reporting this.
103+
* Made config defines #ifndef so they can be overriden by the build system.
104+
Thanks to Aleksey Sanin for suggesting this.
105+
* Fixed deadlock in nedprealloc() due to unnecessary locking of preferred
106+
thread mspace when mspace_realloc() always uses the original block's mspace
107+
anyway. Thanks to Aleksey Sanin for reporting this.
108+
* Made some speed improvements by hacking mspace_malloc() to no longer lock
109+
its mspace, thus allowing the recursive mutex implementation to be removed
110+
with an associated speed increase. Thanks to Aleksey Sanin for suggesting this.
111+
* Fixed a bug where allocating mspaces overran its max limit. Thanks to
112+
Aleksey Sanin for reporting this.
113+
114+
v1.03 10th July 2006:
115+
* Fixed memory corruption bug in threadcache code which only appeared with >4
116+
threads and in heavy use of the threadcache.
117+
118+
v1.02 15th May 2006:
119+
* Integrated dlmalloc v2.8.4, fixing the win32 memory release problem and
120+
improving performance still further. Speed is now up to twice the speed of v1.01
121+
(average is 67% faster).
122+
* Fixed win32 critical section implementation. Thanks to Pavel Kuznetsov
123+
for reporting this.
124+
* Wasn't locking mspace if all mspaces were locked. Thanks to Pavel Kuznetsov
125+
for reporting this.
126+
* Added Apple Mac OS X support.
127+
128+
v1.01 24th February 2006:
129+
* Fixed multiprocessor scaling problems by removing sources of cache sloshing
130+
* Earl Chew <earl_chew <at> agilent <dot> com> sent patches for the following:
131+
1. size2binidx() wasn't working for default code path (non x86)
132+
2. Fixed failure to release mspace lock under certain circumstances which
133+
caused a deadlock
134+
135+
v1.00 1st January 2006:
136+
* First release

0 commit comments

Comments
 (0)