Skip to content

Memory Issue PdfDataExtractor #30

@solverat

Description

@solverat
Q A
Bug report? yes
Feature request? no
BC Break report? no
RFC? no

If you're dealing with hundreds/thousands of asset documents (PDF), the asset method getLocalFile will fail ([PHP Warning: exec() unable to fork because a lot of resource streams will be processed.

$cmd = sprintf('%s "%s" "%s"', $verboseCommand, $data->getLocalFile(), $tmpFile);
exec($pdfToTextBin . ' ' . $cmd);

Fetching the full path like in the example below could be a solution, but it won't work if the assets are stored on an asset storage server for example.

$tmpFile = sprintf('%s%s%s.text', $assetTmpDir, DIRECTORY_SEPARATOR, uniqid('t2p-', false));
$verboseCommand = !\Pimcore::inDebugMode() ? '-q' : '';
$cmd = sprintf('%s "%s/public/var/assets%s" "%s"', $verboseCommand, $this->projectDir, $data->getRealFullPath(), $tmpFile);

try {
    exec($pdfToTextBin . ' ' . $cmd);
} catch (\Exception $e) {
    return null;
}

Metadata

Metadata

Assignees

No one assigned

    Labels

    bugSomething isn't working

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions