Skip to content

Commit 155f810

Browse files
author
Julien Pauli
committed
Zend Memory Manager chapter
1 parent 3d52ed6 commit 155f810

File tree

5 files changed

+255
-9
lines changed

5 files changed

+255
-9
lines changed

Book/php7/internal_types.rst

Lines changed: 1 addition & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -13,7 +13,6 @@ Contents:
1313
internal_types/zvals.rst
1414
internal_types/strings.rst
1515
internal_types/zend_resources.rst
16-
17-
..
1816
internal_types/hashtables.rst
17+
..
1918
internal_types/objects/classes_and_objects.rst
Lines changed: 4 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,4 @@
1+
HashTables
2+
==========
3+
4+

Book/php7/internal_types/zvals/basic_structure.rst

Lines changed: 9 additions & 7 deletions
Original file line numberDiff line numberDiff line change
@@ -13,8 +13,8 @@ the type can change during the life of a zval, so if the zval previously stored
1313
later point in time.
1414

1515
The type is stored as an integer tag (an unsigned int). It can be one of several values. Some values correspond to the eight
16-
types available in PHP, others are used for internal engine purpose only. These values are referred to using constants of the form ``IS_TYPE``. E.g. ``IS_NULL``
17-
corresponds to the null type and ``IS_STRING`` corresponds to the string type.
16+
types available in PHP, others are used for internal engine purpose only. These values are referred to using constants
17+
of the form ``IS_TYPE``. E.g. ``IS_NULL`` corresponds to the null type and ``IS_STRING`` corresponds to the string type.
1818

1919
The actual value is stored in a union, which is defined as follows::
2020

@@ -60,7 +60,9 @@ Secondly, ``zend_long`` represents an abstraction of the platform long, so whate
6060
``zend_long`` weights 4 bytes on 32bit platforms and 8 bytes on 64bit ones.
6161

6262
In addition to that, you may use macros related to longs, ``SIZEOF_ZEND_LONG`` or ``ZEND_LONG_MAX`` f.e.
63-
See Zend/zend_long.h in source code for more informations.
63+
See
64+
`Zend/zend_long.h <https://github.com/php/php-src/blob/c3b910370c5c92007c3e3579024490345cb7f9a7/Zend/zend_long.h>`_
65+
in source code for more informations.
6466

6567
The ``double`` type used to store floating point numbers is (typically) an 8-byte value following the IEEE-754
6668
specification. The details of this format won't be discussed here, but you should at least be aware of the fact that
@@ -74,7 +76,7 @@ The remaining four types will only be mentioned here quickly and discussed in gr
7476

7577
Strings (``IS_STRING``) are stored in a ``zend_string`` structure, i.e. they consist of a ``char *`` string
7678
and an ``size_t`` length. You will find more informations about the ``zend_string`` structure and its dedicated API
77-
into the :doc:`/php7/zend_strings` chapter.
79+
into the :doc:`string <../strings>` chapter.
7880

7981
Arrays use the ``IS_ARRAY`` type tag and are stored in the ``zend_array *arr`` member. How the ``HashTable`` structure
8082
works will be discussed in the :doc:`/hashtables` chapter.
@@ -120,14 +122,14 @@ internal use-case only. The zval structure has been thought to be very flexible,
120122
virtually any type of data of interest, and not only the PHP specific types we just reviewed above.
121123

122124
The special ``IS_UNDEF`` type has a special meaning. That means "This zval contains no data of interest, do not access
123-
any data field from it". This is used for :doc:`/php7/zvals/memory_management` purposes. If you see an ``IS_UNDEF`` zval,
125+
any data field from it". This is used for :doc:`zvals/memory_management` purposes. If you see an ``IS_UNDEF`` zval,
124126
that means that it is of no special type and contains no valid information.
125127

126128
The ``zend_refcounted *counted`` field is very tricky to understand. Basically, that field serve as a header for any
127-
other reference-countable type. This part is detailed into the :doc:`/zvals/memory_management` chapter.
129+
other reference-countable type. This part is detailed into the :doc:`zvals/memory_management` chapter.
128130

129131
The ``zend_reference *ref`` is used to represent a PHP reference. The ``IS_REFERENCE`` type flag is then used.
130-
Here as well, we dedicated a chapter to such a concept, have a look at the :doc:`/php7/zvals/memory_management` chapter.
132+
Here as well, we dedicated a chapter to such a concept, have a look at the :doc:`zvals/memory_management` chapter.
131133

132134
The ``zend_ast_ref *ast`` is used when you manipulate the AST from the compiler. The PHP compilation is detailed into
133135
the :doc:`/php7/compiler` chapter.

Book/php7/memory_management.rst

Lines changed: 23 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -1,6 +1,29 @@
11
Memory management
22
=================
33

4+
C programmers usually have to deal with memory by hand. With dynamic memory, the programmer allocates memory when
5+
needing and frees it when finished. Failing to free dynamic memory leads to a "memory leak", which may or may not be a
6+
bad thing. In the case of PHP, as the process could live for a virtually infinite amount of time, creating a memory leak
7+
will be dramatic. In any situation, leaking memory really translates to poorly and badly designed programs that cannot
8+
be trusted.
9+
Memory leaking is easy to understand. You ask the OS to book some part of the main machine memory for you, but you never
10+
tell it to release it back for other processes usage : you are not alone on the machine, other processes need some
11+
memory, and the OS itself as well.
12+
13+
Also, in C, memory areas are clearly bound. Reading or writing before or after the bounds is a very nasty operation.
14+
That will lead for sure to a crash, or worse an exploitable security issue. There are no magical things like
15+
auto-resizeable areas with the C language. You must clearly tell the machine (and the CPU) what you want it to do. No
16+
guess, no magic, no automation of any kind (like garbage collection).
17+
18+
PHP's got a very specific memory model, and provides its own layer over the traditionnal libc's dynamic memory
19+
allocator. This layer is called **Zend Memory Manager**.
20+
21+
This chapter explains you what Zend Memory Manager is, how you must use it, and what you must/must not do with it.
22+
After that, we'll quickly introduce you to specific tools used in the C area to debug dynamic memory.
23+
24+
.. note:: If you need, please get some (possibly strong) knowledge about C memory allocation classes (static vs
25+
dynamic memory), and about libc's allocator.
26+
427
Contents:
528

629
.. toctree::
Lines changed: 218 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -1,3 +1,221 @@
11
Zend Memory Manager
22
===================
33

4+
Zend Memory Manager, often abbreviated as ZendMM or ZMM, is a C layer that aims to provide abilities to allocate and
5+
release dynamic **request-bound** memory.
6+
7+
Note the "request-bound" in the above sentence.
8+
9+
ZendMM is not just a classical layer over libc's dynamic memory allocator, mainly represented by the couple API calls
10+
``malloc()/free()``. ZendMM is about request-bound memory that PHP must allocate while treating a request.
11+
12+
The two main kind of dynamic memory pools in PHP
13+
************************************************
14+
15+
PHP is a share-nothing architecture. Well, not at 100%. Let us explain.
16+
17+
.. note:: You may need to read :doc:`the PHP lifecycle chapter <../extensions_design/php_lifecycle>` before continuing
18+
here, you'll get additionnal informations about the different steps and cycles that can be drawn from PHP
19+
lifetime.
20+
21+
PHP can treat several (dozen, thousands ?) of requests into the same process. By default, PHP will forget anything it
22+
knows of the current request, when that later finishes.
23+
24+
"Forgetting" things translates to freeing any dynamic buffer that got allocated while treating a request. That means
25+
that when in the process of treating a request, one must not allocate dynamic memory using traditionnal libc calls.
26+
Doing that is perfectly valid, but you give a chance to forget to free such a buffer.
27+
28+
ZendMM comes with an API that substitute to libc's dynamic allocator, by copying its API. When in the process of
29+
treating a request, the programmer must use that API instead of libc's allocator.
30+
31+
For example, when PHP treats a request, it will parse PHP files. Those ones will lead to functions and classes
32+
declarations, for example. When the compiler comes to compile the PHP files, it will allocate some dynamic memory to
33+
store classes and functions it discovers. But, at the end of the request, PHP will forget about those latter. By
34+
default, PHP forgets *a very huge number* of informations from one request to another.
35+
36+
There exists however some pretty rare informations you need to persist across several requests. But that's uncommon.
37+
38+
What could be kept unchanged throught requests ? What we call **persistent** objects. Once more let us insist : those
39+
are rare cases. For example, the current PHP executable path won't change from requests to requests. That latter
40+
information is allocated permanently, that means it is allocated with a traditionnal libc's ``malloc()`` call.
41+
42+
What else ? Some strings. For example, the "_SERVER" string will be reused from request to request, as every request
43+
will create the ``$_SERVER`` PHP array. So the "_SERVER" string itself can be permanently allocated, because it will be
44+
allocated once
45+
46+
What you must remember:
47+
48+
* There exists two kinds of dynamic memory allocations while programming PHP (or extensions):
49+
* Request-bound dynamic allocations
50+
* Permanent dynamic allocations
51+
52+
* Request-bound dynamic memory allocations
53+
* Must only be performed when PHP is treating a request (not before, nor after)
54+
* Can only be performed using the ZendMM dynamic memory allocation API
55+
* Are very common, basically 95% of your dynamic allocations will be request-bound
56+
* Are tracked by ZendMM, and you'll be informed about bad usage of the memory area, or if you leak
57+
58+
* Permanent dynamic memory allocations
59+
* Should not be performed while PHP is treating a request (not forbidden, but a bad idea)
60+
* Are not tracked by ZendMM, and you won't be informed about bad usage of the memory area, or if you leak
61+
* Should be pretty rare in an extension
62+
63+
Also, keep in mind that all PHP source code has been based on such a memory level. Thus, many internal structures get
64+
allocated using the Zend Memory Manager. Most of them got a "persistent" API call, which when used, lead to
65+
traditionnal libc allocation.
66+
67+
Here is a request-bound allocated :doc:`zend_string <../internal_types/strings/zend_strings>`::
68+
69+
zend_string *foo = zend_string_init("foo", strlen("foo"), 0);
70+
71+
And here is the persistent allocated one::
72+
73+
zend_string *foo = zend_string_init("foo", strlen("foo"), 1);
74+
75+
Same for :doc:`HashTable <../internal_types/hashtables>`. Request-bound allocated one::
76+
77+
zend_array ar;
78+
zend_hash_init(&ar, 8, NULL, NULL, 0);
79+
80+
Persistent allocated one::
81+
82+
zend_array ar;
83+
zend_hash_init(&ar, 8, NULL, NULL, 1);
84+
85+
It is always the same in all the different Zend APIs. Usually, it is weither a "0" to pass as last parameter to mean
86+
"I want this structure to be allocated using ZendMM, so request-bound", or "1" meaning "I want this structure to get
87+
allocated bypassing ZendMM and using a traditionnal libc's ``malloc()`` call".
88+
89+
Obviously, those structures provide an API that remembers how it did allocate the structure, to use the right
90+
deallocation function when destroyed. Hence in such a code::
91+
92+
zend_string_release(foo);
93+
zend_hash_destroy(&ar);
94+
95+
The API knows whether those structures were allocated using request-bound allocation, or permanent one, and in the
96+
first case will use ``efree()`` to release it, and in the second case libc's ``free()``.
97+
98+
Zend Memory Manager API
99+
***********************
100+
101+
The API is located into
102+
`Zend/zend_alloc.h <https://github.com/php/php-src/blob/c3b910370c5c92007c3e3579024490345cb7f9a7/Zend/zend_alloc.h>`_
103+
104+
The API calls are mainly C macros and not functions, so get prepared if you debug them and want to look at how they
105+
work. Those calls copy libc's calls, they usually add an "e" in the function name; So you should not be lost, and there
106+
is not many things to detail about the API.
107+
108+
Basically what you'll use most are ``emalloc(size_t)`` and ``efree(void *)``.
109+
110+
You are also provided with ``ecalloc(size_t nmemb, size_t size)`` that allocates ``nmemb`` of individual size ``size``,
111+
and zeroes the area. If you are a strong C programmer with experience, you should know that whenever possible, it is
112+
better to use ``ecalloc()`` over ``emalloc()`` as ``ecalloc()`` will zero out the memory area which could help a lot in
113+
pointer bug detection. Remember that ``emalloc()`` works basically like the libc ``malloc()``: it will look for a big
114+
enough area in different pools, and return you the best fit. So you may be given a recycled pointer which points to
115+
garbage.
116+
117+
Then comes ``safe_emalloc(size_t nmemb, size_t size, size_t offset)``, which is an ``emalloc(size * nmemb + offset)``
118+
but that does check against overflows for you. You should use this API call if the numbers you must provide come from an
119+
untrusted source, like the userland.
120+
121+
About string facilities, ``estrdup(char *)`` and ``estrndup(char *, size_t len)`` allow to duplicate strings or binary
122+
strings.
123+
124+
Whatever happens, pointers returned by ZendMM must be freed using ZendMM, aka ``efree()`` call and
125+
**not libc's free()**.
126+
127+
Zend Memory Manager debugging shields
128+
*************************************
129+
130+
ZendMM provides the following abilities:
131+
132+
* Memory consumption management.
133+
* Memory leak tracking.
134+
* Buffer overflows or underflows.
135+
136+
Memory consumption management
137+
-----------------------------
138+
139+
ZendMM is the layer behind the PHP userland "memory_limit" feature. Every single byte allocated using the ZendMM layer
140+
is counted and added. When the INI's *memory_limit* is reached, you know what happens.
141+
That also mean that any allocation you perform via ZendMM is reflected in the ``memory_get_usage()`` call from PHP
142+
userland.
143+
144+
As an extension developper, this is a good thing, because it helps mastering the PHP process' heap size.
145+
146+
If a memory limit error is launched, the engine will bail out from the current code position to a catch block, and will
147+
terminate smoothly. But there is no chance it goes back to the location in your code where the limit blew up.
148+
You must be prepared to that.
149+
150+
That means that in theory, ZendMM cannot return a NULL pointer to you. If the allocation fails from the OS, or if the
151+
allocation generates a memory limit error, the code will run into a catch block and won't return to you allocation call.
152+
153+
If for any reason you need to bypass that protection, you must then use a traditionnal libc call, like ``malloc()``.
154+
Take care however and know what you do. It may happen that you need to allocate lots of memory and could blow up the PHP
155+
*memory_limit* if using ZendMM. Thus use another allocator (like libc) but take care: your extension will grow the
156+
current process heap size. That cannot be seen using ``memory_get_usage()`` in PHP, but by analyzing the current heap
157+
with the OS facilities (like */proc/{pid}/maps*)
158+
159+
.. note:: If you need to fully disable ZendMM, you can launch PHP with the ``USE_ZEND_ALLOC=0`` env var. This way, every
160+
call to the ZendMM API (like ``emalloc()``) will be directed to a libc call, and ZendMM will be disabled.
161+
This is especially useful when :doc:`debugging memory <./memory_debugging>`.
162+
163+
Memory leak tracking
164+
--------------------
165+
166+
Remember the main ZendMM rules: it starts when a request starts, it then expects you call its API when in need of
167+
dynamic memory as you are treating a request. When the current request ends, ZendMM shuts down.
168+
169+
By shutting down, it will browse every of its active pointer, and if using
170+
:doc:`a debug build<../build_system/building_php>` of PHP, it will warn you about memory leaking.
171+
172+
Let's be clear here: if at the end of the current request ZendMM finds some active memory blocks, that means those are
173+
leaking. There should not be any active memory block living onto ZendMM heap at the end of the request, as anyone who
174+
allocated some should have freed them.
175+
176+
If you forget to free blocks, they will all get displayed on *stderr*. This process of memory leak reporting only works
177+
in the following conditions:
178+
179+
* You are using :doc:`a debug build<../build_system/building_php>` of PHP
180+
* You have report_memleaks=On in php.ini (default)
181+
182+
Here is an example of a simple leak into an extension::
183+
184+
PHP_RINIT_FUNCTION(example)
185+
{
186+
void *foo = emalloc(128);
187+
}
188+
189+
When launching PHP with that extension activated, on a debug build, that generates on stderr::
190+
191+
[Fri Jun 9 16:04:59 2017] Script: '/tmp/foobar.php'
192+
/path/to/extension/file.c(123) : Freeing 0x00007fffeee65000 (128 bytes), script=/tmp/foobar.php
193+
=== Total 1 memory leaks detected ===
194+
195+
Those lines are generated when the Zend Memory Manager shuts down, that is at the end of each treated request.
196+
197+
Beware however:
198+
199+
* Obviously ZendMM doesn't know anything about persistent allocations, or allocations that were perform in another way
200+
than using it. Hence, ZendMM can only warn you about allocations it is aware of, every traditionnal libc allocation
201+
won't be reported in here.
202+
* If PHP shuts down in an incorrect maner (what we call an unclean shutdown), ZendMM will report tons of leaks. This is
203+
because when incorrectly shutdown, the engine uses a longjmp() call to a catch block, preventing every code that cleans
204+
memory to fire-in. Thus, many leaks get reported. This happens especially after a call to PHP's exit()/die(), or if a
205+
fatal error gets triggered in some critical parts of PHP.
206+
* If you use a non-debug build of PHP, nothing shows, ZendMM is dumb.
207+
208+
What you must remember is that ZendMM leak tracking is a nice bonus tool to have, but it does not replace a
209+
:doc:`true C memory debugger <./memory_debugging>`.
210+
211+
212+
Buffer overflows or underflows
213+
------------------------------
214+
215+
Zend Memory Manager engine
216+
**************************
217+
218+
ZendMM substitutes to libc's API by providing a very similar one. That API should only be used when treating requests.
219+
220+
ZendMM encapsulates libc's allocator, and like this later, it asks for memory, arange the memory areas, sticks header
221+
and canary blocks against it, and gives you back the buffer you asked.

0 commit comments

Comments
 (0)