Skip to content

json_decode() not release memory as expected #18019

@shayaantx

Description

@shayaantx

Description

Hello, I'm trying to understand if the following code is performing as expected with json_decode and large json strings

When I yield a randomly generated json string (that is not persisted outside of the function scope), it does not seem to get reclaimed by the garbage collector or by the OS, just seems to stay allocated to the php process.

  • You can see this in first log output, the memory stays above 350mb for the entire run

However if I turn on gc_mem_caches (via env variable USE_GC_MEM_CACHES=true), You can see in the 2nd output log, the memory does get reclaimed/collected.

  • The memory bounces from 8mb to 398mb to 182mb back to 8mb and so on.

If I replace everything with just normal strings, each yield/iteration, the string gets cleaned up correctly by the php garbage collector (from what I can tell) without any need for go gc_mem_caches() or gc_collect_cycles().

My guess (unsure if its correct) is the zend_string which seems to have a ref counter on it actually gets garbage collected by php, where as I think? the json_decode uses _zval_struct underneath which doesn't get handled by the normal php garbage collection? (unsure)

The following code:

<?php

ini_set('display_errors', '1');
ini_set('display_startup_errors', '1');
error_reporting(E_ALL);

echo "Starting " . format(memory_get_usage(true)) . ' bytes' . PHP_EOL;

function format($number, $decimal_places = 2)
{
    $formatted = number_format($number, $decimal_places, '.', ',');
    if (substr($formatted, ($decimal_places * -1) - 1) == '.' . str_repeat('0', $decimal_places))
    {
        return substr($formatted, 0, strpos($formatted, '.'));
    }
    return $formatted;
}

function generateLargeJsonBySizeWithRandomData($megabytes, $stringLength = 100) {
    $targetSize = $megabytes * 1024 * 1024;
    $data = [];
    $currentSize = 0;
    $index = 0;

    while ($currentSize < $targetSize) {
        $randomString = substr(str_shuffle(str_repeat("0123456789abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ", ceil($stringLength / 62))), 1, $stringLength);
        $itemKey = "item_$index";
        $data[$itemKey] = $randomString;
        $currentSize += strlen($randomString);
        $index++;
    }
    $jsonString = json_encode($data);
    if ($jsonString === false) {
        return "JSON encoding error: " . json_last_error_msg();
    }
    $decodedArray = json_decode($jsonString, true);
    if (json_last_error() !== JSON_ERROR_NONE) {
        return "JSON decoding error: " . json_last_error_msg();
    }
    return $decodedArray;
}

function yieldLargeJsonFn()
{
    return generateLargeJsonBySizeWithRandomData(100);
}

function yieldSmallJsonFn()
{
    return generateLargeJsonBySizeWithRandomData(1);
}

function generatorLoop(bool $use_gc_mem_caches = false)
{
    $data = [
        ["small", 'yieldSmallJsonFn'],
        ["large", 'yieldLargeJsonFn'],
        ["small", 'yieldSmallJsonFn'],
        ["large", 'yieldLargeJsonFn'],
        ["small", 'yieldSmallJsonFn'],
        ["large", 'yieldLargeJsonFn']
    ];
    foreach ($data as $item)
    {
        echo "Start " . $item[0] . " yield " . format(memory_get_usage(true)) . ' bytes'  . PHP_EOL;
        echo "";
        yield $item[1]();
        echo "End " . $item[0] . " yield " . format(memory_get_usage(true)) . ' bytes'  . PHP_EOL;
        echo "";
        if ($use_gc_mem_caches) {
            gc_mem_caches();
            echo "End " . $item[0] . " yield after caches clear " . format(memory_get_usage(true)) . ' bytes' . PHP_EOL;
        }
        echo "";
        sleep(5);
    }
}

$use_gc_mem_caches = $_ENV['USE_GC_MEM_CACHES'] === "true";
echo "Using gc_mem_caches() = $use_gc_mem_caches" . PHP_EOL;

foreach (generatorLoop($use_gc_mem_caches) as $item) {
    echo "iterated" . PHP_EOL;
}


echo "Ending " . format(memory_get_usage(true)) . ' bytes' . PHP_EOL;

Resulted in this output without using gc_mem_caches():

# USE_GC_MEM_CACHES=false php -d "memory_limit=-1" test.php 
Starting 2,097,152 bytes
Using gc_mem_caches() = 
Start small yield 2,097,152 bytes
iterated
End small yield 8,388,608 bytes
Start large yield 8,388,608 bytes
iterated
End large yield 398,458,880 bytes
Start small yield 398,458,880 bytes
iterated
End small yield 358,612,992 bytes
Start large yield 358,612,992 bytes
iterated
End large yield 398,458,880 bytes
Start small yield 398,458,880 bytes
iterated
End small yield 358,612,992 bytes
Start large yield 358,612,992 bytes
iterated
End large yield 398,458,880 bytes
Ending 398,458,880 bytes

Resulted in this output with using gc_mem_caches():

# USE_GC_MEM_CACHES=true php -d "memory_limit=-1" test.php 
Starting 2,097,152 bytes
Using gc_mem_caches() = 1
Start small yield 2,097,152 bytes
iterated
End small yield 8,388,608 bytes
End small yield after caches clear 8,388,608 bytes
Start large yield 8,388,608 bytes
iterated
End large yield 398,458,880 bytes
End large yield after caches clear 224,395,264 bytes
Start small yield 224,395,264 bytes
iterated
End small yield 182,452,224 bytes
End small yield after caches clear 8,388,608 bytes
Start large yield 8,388,608 bytes
iterated
End large yield 398,458,880 bytes
End large yield after caches clear 222,298,112 bytes
Start small yield 222,298,112 bytes
iterated
End small yield 184,549,376 bytes
End small yield after caches clear 8,388,608 bytes
Start large yield 8,388,608 bytes
iterated
End large yield 398,458,880 bytes
End large yield after caches clear 222,298,112 bytes
Ending 222,298,112 bytes

PHP Version

PHP 8.1.19

Operating System

No response

Metadata

Metadata

Assignees

No one assigned

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions