Skip to content

xls-dump.py consumes lot of memory on some file #8

@xeyownt

Description

@xeyownt

Hello,

I'm using xls-dump.py through the indexer "recoll".
It turns out that the index was generating out-of-memory and finally freezing the machine because it was chocking on a specific file named fat-loop.xls. This file is found in Mediawiki website source (at least version 1.33.4, 1.34.4, 1.35.0 and 1.35.1).

To reproduce (adapt path as necessary):

python3 xls-dump.py --dump-mode=canonical-xml --utf-8 --catch /home/data/www/html/mw1.35.1/tests/phpunit/data/MSCompoundFileReader/fat-loop.xls

I tried with xls-dump.py from commit db25622 and could confirm the issue is still present.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions