-
Notifications
You must be signed in to change notification settings - Fork 8k
Description
Description
Hello, I'm trying to understand if the following code is performing as expected with json_decode and large json strings
When I yield a randomly generated json string (that is not persisted outside of the function scope), it does not seem to get reclaimed by the garbage collector or by the OS, just seems to stay allocated to the php process.
- You can see this in first log output, the memory stays above 350mb for the entire run
However if I turn on gc_mem_caches (via env variable USE_GC_MEM_CACHES=true), You can see in the 2nd output log, the memory does get reclaimed/collected.
- The memory bounces from 8mb to 398mb to 182mb back to 8mb and so on.
If I replace everything with just normal strings, each yield/iteration, the string gets cleaned up correctly by the php garbage collector (from what I can tell) without any need for go gc_mem_caches() or gc_collect_cycles().
My guess (unsure if its correct) is the zend_string which seems to have a ref counter on it actually gets garbage collected by php, where as I think? the json_decode uses _zval_struct underneath which doesn't get handled by the normal php garbage collection? (unsure)
The following code:
<?php
ini_set('display_errors', '1');
ini_set('display_startup_errors', '1');
error_reporting(E_ALL);
echo "Starting " . format(memory_get_usage(true)) . ' bytes' . PHP_EOL;
function format($number, $decimal_places = 2)
{
$formatted = number_format($number, $decimal_places, '.', ',');
if (substr($formatted, ($decimal_places * -1) - 1) == '.' . str_repeat('0', $decimal_places))
{
return substr($formatted, 0, strpos($formatted, '.'));
}
return $formatted;
}
function generateLargeJsonBySizeWithRandomData($megabytes, $stringLength = 100) {
$targetSize = $megabytes * 1024 * 1024;
$data = [];
$currentSize = 0;
$index = 0;
while ($currentSize < $targetSize) {
$randomString = substr(str_shuffle(str_repeat("0123456789abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ", ceil($stringLength / 62))), 1, $stringLength);
$itemKey = "item_$index";
$data[$itemKey] = $randomString;
$currentSize += strlen($randomString);
$index++;
}
$jsonString = json_encode($data);
if ($jsonString === false) {
return "JSON encoding error: " . json_last_error_msg();
}
$decodedArray = json_decode($jsonString, true);
if (json_last_error() !== JSON_ERROR_NONE) {
return "JSON decoding error: " . json_last_error_msg();
}
return $decodedArray;
}
function yieldLargeJsonFn()
{
return generateLargeJsonBySizeWithRandomData(100);
}
function yieldSmallJsonFn()
{
return generateLargeJsonBySizeWithRandomData(1);
}
function generatorLoop(bool $use_gc_mem_caches = false)
{
$data = [
["small", 'yieldSmallJsonFn'],
["large", 'yieldLargeJsonFn'],
["small", 'yieldSmallJsonFn'],
["large", 'yieldLargeJsonFn'],
["small", 'yieldSmallJsonFn'],
["large", 'yieldLargeJsonFn']
];
foreach ($data as $item)
{
echo "Start " . $item[0] . " yield " . format(memory_get_usage(true)) . ' bytes' . PHP_EOL;
echo "";
yield $item[1]();
echo "End " . $item[0] . " yield " . format(memory_get_usage(true)) . ' bytes' . PHP_EOL;
echo "";
if ($use_gc_mem_caches) {
gc_mem_caches();
echo "End " . $item[0] . " yield after caches clear " . format(memory_get_usage(true)) . ' bytes' . PHP_EOL;
}
echo "";
sleep(5);
}
}
$use_gc_mem_caches = $_ENV['USE_GC_MEM_CACHES'] === "true";
echo "Using gc_mem_caches() = $use_gc_mem_caches" . PHP_EOL;
foreach (generatorLoop($use_gc_mem_caches) as $item) {
echo "iterated" . PHP_EOL;
}
echo "Ending " . format(memory_get_usage(true)) . ' bytes' . PHP_EOL;Resulted in this output without using gc_mem_caches():
# USE_GC_MEM_CACHES=false php -d "memory_limit=-1" test.php
Starting 2,097,152 bytes
Using gc_mem_caches() =
Start small yield 2,097,152 bytes
iterated
End small yield 8,388,608 bytes
Start large yield 8,388,608 bytes
iterated
End large yield 398,458,880 bytes
Start small yield 398,458,880 bytes
iterated
End small yield 358,612,992 bytes
Start large yield 358,612,992 bytes
iterated
End large yield 398,458,880 bytes
Start small yield 398,458,880 bytes
iterated
End small yield 358,612,992 bytes
Start large yield 358,612,992 bytes
iterated
End large yield 398,458,880 bytes
Ending 398,458,880 bytes
Resulted in this output with using gc_mem_caches():
# USE_GC_MEM_CACHES=true php -d "memory_limit=-1" test.php
Starting 2,097,152 bytes
Using gc_mem_caches() = 1
Start small yield 2,097,152 bytes
iterated
End small yield 8,388,608 bytes
End small yield after caches clear 8,388,608 bytes
Start large yield 8,388,608 bytes
iterated
End large yield 398,458,880 bytes
End large yield after caches clear 224,395,264 bytes
Start small yield 224,395,264 bytes
iterated
End small yield 182,452,224 bytes
End small yield after caches clear 8,388,608 bytes
Start large yield 8,388,608 bytes
iterated
End large yield 398,458,880 bytes
End large yield after caches clear 222,298,112 bytes
Start small yield 222,298,112 bytes
iterated
End small yield 184,549,376 bytes
End small yield after caches clear 8,388,608 bytes
Start large yield 8,388,608 bytes
iterated
End large yield 398,458,880 bytes
End large yield after caches clear 222,298,112 bytes
Ending 222,298,112 bytes
PHP Version
PHP 8.1.19
Operating System
No response