@@ -4,6 +4,193 @@ brevity. Much more detail can be found in the git revision history:
44
55 https://github.com/jemalloc/jemalloc
66
7+ * 5.0.0 (June 13, 2017)
8+
9+ Unlike all previous jemalloc releases, this release does not use naturally
10+ aligned "chunks" for virtual memory management, and instead uses page-aligned
11+ "extents". This change has few externally visible effects, but the internal
12+ impacts are... extensive. Many other internal changes combine to make this
13+ the most cohesively designed version of jemalloc so far, with ample
14+ opportunity for further enhancements.
15+
16+ Continuous integration is now an integral aspect of development thanks to the
17+ efforts of @davidtgoldblatt, and the dev branch tends to remain reasonably
18+ stable on the tested platforms (Linux, FreeBSD, macOS, and Windows). As a
19+ side effect the official release frequency may decrease over time.
20+
21+ New features:
22+ - Implement optional per-CPU arena support; threads choose which arena to use
23+ based on current CPU rather than on fixed thread-->arena associations.
24+ (@interwq)
25+ - Implement two-phase decay of unused dirty pages. Pages transition from
26+ dirty-->muzzy-->clean, where the first phase transition relies on
27+ madvise(... MADV_FREE) semantics, and the second phase transition discards
28+ pages such that they are replaced with demand-zeroed pages on next access.
29+ (@jasone)
30+ - Increase decay time resolution from seconds to milliseconds. (@jasone)
31+ - Implement opt-in per CPU background threads, and use them for asynchronous
32+ decay-driven unused dirty page purging. (@interwq)
33+ - Add mutex profiling, which collects a variety of statistics useful for
34+ diagnosing overhead/contention issues. (@interwq)
35+ - Add C++ new/delete operator bindings. (@djwatson)
36+ - Support manually created arena destruction, such that all data and metadata
37+ are discarded. Add MALLCTL_ARENAS_DESTROYED for accessing merged stats
38+ associated with destroyed arenas. (@jasone)
39+ - Add MALLCTL_ARENAS_ALL as a fixed index for use in accessing
40+ merged/destroyed arena statistics via mallctl. (@jasone)
41+ - Add opt.abort_conf to optionally abort if invalid configuration options are
42+ detected during initialization. (@interwq)
43+ - Add opt.stats_print_opts, so that e.g. JSON output can be selected for the
44+ stats dumped during exit if opt.stats_print is true. (@jasone)
45+ - Add --with-version=VERSION for use when embedding jemalloc into another
46+ project's git repository. (@jasone)
47+ - Add --disable-thp to support cross compiling. (@jasone)
48+ - Add --with-lg-hugepage to support cross compiling. (@jasone)
49+ - Add mallctl interfaces (various authors):
50+ + background_thread
51+ + opt.abort_conf
52+ + opt.retain
53+ + opt.percpu_arena
54+ + opt.background_thread
55+ + opt.{dirty,muzzy}_decay_ms
56+ + opt.stats_print_opts
57+ + arena.<i>.initialized
58+ + arena.<i>.destroy
59+ + arena.<i>.{dirty,muzzy}_decay_ms
60+ + arena.<i>.extent_hooks
61+ + arenas.{dirty,muzzy}_decay_ms
62+ + arenas.bin.<i>.slab_size
63+ + arenas.nlextents
64+ + arenas.lextent.<i>.size
65+ + arenas.create
66+ + stats.background_thread.{num_threads,num_runs,run_interval}
67+ + stats.mutexes.{ctl,background_thread,prof,reset}.
68+ {num_ops,num_spin_acq,num_wait,max_wait_time,total_wait_time,max_num_thds,
69+ num_owner_switch}
70+ + stats.arenas.<i>.{dirty,muzzy}_decay_ms
71+ + stats.arenas.<i>.uptime
72+ + stats.arenas.<i>.{pmuzzy,base,internal,resident}
73+ + stats.arenas.<i>.{dirty,muzzy}_{npurge,nmadvise,purged}
74+ + stats.arenas.<i>.bins.<j>.{nslabs,reslabs,curslabs}
75+ + stats.arenas.<i>.bins.<j>.mutex.
76+ {num_ops,num_spin_acq,num_wait,max_wait_time,total_wait_time,max_num_thds,
77+ num_owner_switch}
78+ + stats.arenas.<i>.lextents.<j>.{nmalloc,ndalloc,nrequests,curlextents}
79+ + stats.arenas.i.mutexes.{large,extent_avail,extents_dirty,extents_muzzy,
80+ extents_retained,decay_dirty,decay_muzzy,base,tcache_list}.
81+ {num_ops,num_spin_acq,num_wait,max_wait_time,total_wait_time,max_num_thds,
82+ num_owner_switch}
83+
84+ Portability improvements:
85+ - Improve reentrant allocation support, such that deadlock is less likely if
86+ e.g. a system library call in turn allocates memory. (@davidtgoldblatt,
87+ @interwq)
88+ - Support static linking of jemalloc with glibc. (@djwatson)
89+
90+ Optimizations and refactors:
91+ - Organize virtual memory as "extents" of virtual memory pages, rather than as
92+ naturally aligned "chunks", and store all metadata in arbitrarily distant
93+ locations. This reduces virtual memory external fragmentation, and will
94+ interact better with huge pages (not yet explicitly supported). (@jasone)
95+ - Fold large and huge size classes together; only small and large size classes
96+ remain. (@jasone)
97+ - Unify the allocation paths, and merge most fast-path branching decisions.
98+ (@davidtgoldblatt, @interwq)
99+ - Embed per thread automatic tcache into thread-specific data, which reduces
100+ conditional branches and dereferences. Also reorganize tcache to increase
101+ fast-path data locality. (@interwq)
102+ - Rewrite atomics to closely model the C11 API, convert various
103+ synchronization from mutex-based to atomic, and use the explicit memory
104+ ordering control to resolve various hypothetical races without increasing
105+ synchronization overhead. (@davidtgoldblatt)
106+ - Extensively optimize rtree via various methods:
107+ + Add multiple layers of rtree lookup caching, since rtree lookups are now
108+ part of fast-path deallocation. (@interwq)
109+ + Determine rtree layout at compile time. (@jasone)
110+ + Make the tree shallower for common configurations. (@jasone)
111+ + Embed the root node in the top-level rtree data structure, thus avoiding
112+ one level of indirection. (@jasone)
113+ + Further specialize leaf elements as compared to internal node elements,
114+ and directly embed extent metadata needed for fast-path deallocation.
115+ (@jasone)
116+ + Ignore leading always-zero address bits (architecture-specific).
117+ (@jasone)
118+ - Reorganize headers (ongoing work) to make them hermetic, and disentangle
119+ various module dependencies. (@davidtgoldblatt)
120+ - Convert various internal data structures such as size class metadata from
121+ boot-time-initialized to compile-time-initialized. Propagate resulting data
122+ structure simplifications, such as making arena metadata fixed-size.
123+ (@jasone)
124+ - Simplify size class lookups when constrained to size classes that are
125+ multiples of the page size. This speeds lookups, but the primary benefit is
126+ complexity reduction in code that was the source of numerous regressions.
127+ (@jasone)
128+ - Lock individual extents when possible for localized extent operations,
129+ rather than relying on a top-level arena lock. (@davidtgoldblatt, @jasone)
130+ - Use first fit layout policy instead of best fit, in order to improve
131+ packing. (@jasone)
132+ - If munmap(2) is not in use, use an exponential series to grow each arena's
133+ virtual memory, so that the number of disjoint virtual memory mappings
134+ remains low. (@jasone)
135+ - Implement per arena base allocators, so that arenas never share any virtual
136+ memory pages. (@jasone)
137+ - Automatically generate private symbol name mangling macros. (@jasone)
138+
139+ Incompatible changes:
140+ - Replace chunk hooks with an expanded/normalized set of extent hooks.
141+ (@jasone)
142+ - Remove ratio-based purging. (@jasone)
143+ - Remove --disable-tcache. (@jasone)
144+ - Remove --disable-tls. (@jasone)
145+ - Remove --enable-ivsalloc. (@jasone)
146+ - Remove --with-lg-size-class-group. (@jasone)
147+ - Remove --with-lg-tiny-min. (@jasone)
148+ - Remove --disable-cc-silence. (@jasone)
149+ - Remove --enable-code-coverage. (@jasone)
150+ - Remove --disable-munmap (replaced by opt.retain). (@jasone)
151+ - Remove Valgrind support. (@jasone)
152+ - Remove quarantine support. (@jasone)
153+ - Remove redzone support. (@jasone)
154+ - Remove mallctl interfaces (various authors):
155+ + config.munmap
156+ + config.tcache
157+ + config.tls
158+ + config.valgrind
159+ + opt.lg_chunk
160+ + opt.purge
161+ + opt.lg_dirty_mult
162+ + opt.decay_time
163+ + opt.quarantine
164+ + opt.redzone
165+ + opt.thp
166+ + arena.<i>.lg_dirty_mult
167+ + arena.<i>.decay_time
168+ + arena.<i>.chunk_hooks
169+ + arenas.initialized
170+ + arenas.lg_dirty_mult
171+ + arenas.decay_time
172+ + arenas.bin.<i>.run_size
173+ + arenas.nlruns
174+ + arenas.lrun.<i>.size
175+ + arenas.nhchunks
176+ + arenas.hchunk.<i>.size
177+ + arenas.extend
178+ + stats.cactive
179+ + stats.arenas.<i>.lg_dirty_mult
180+ + stats.arenas.<i>.decay_time
181+ + stats.arenas.<i>.metadata.{mapped,allocated}
182+ + stats.arenas.<i>.{npurge,nmadvise,purged}
183+ + stats.arenas.<i>.huge.{allocated,nmalloc,ndalloc,nrequests}
184+ + stats.arenas.<i>.bins.<j>.{nruns,reruns,curruns}
185+ + stats.arenas.<i>.lruns.<j>.{nmalloc,ndalloc,nrequests,curruns}
186+ + stats.arenas.<i>.hchunks.<j>.{nmalloc,ndalloc,nrequests,curhchunks}
187+
188+ Bug fixes:
189+ - Improve interval-based profile dump triggering to dump only one profile when
190+ a single allocation's size exceeds the interval. (@jasone)
191+ - Use prefixed function names (as controlled by --with-jemalloc-prefix) when
192+ pruning backtrace frames in jeprof. (@jasone)
193+
7194* 4.5.0 (February 28, 2017)
8195
9196 This is the first release to benefit from much broader continuous integration
0 commit comments