Skip to content

memory is multiples of the size of each additional file #449

@jtmoon79

Description

@jtmoon79

According to my tests,

s4 process Maximum resident Set Size (MSS) per additional file is high s4. For an ad-hoc text file of size 2.1 MB, s4 Max RSS is about 4.5 MB, or an average MSS multiple of ×2.2. In other words, for each additional 2.1 MB file processed, s4 uses an additional 4.5 MB of memory.
Max RSS per file graph
The MSS multiple is very highest after the first file, an additional 13 MB of RSS for the second 2.1 MB file (×6.2 multiple). As the number of 2.1 MB files nears 50, the multiple levels off to about ×2.2.

This is a high multiple, and really disappointing in the behavior of s4.

Some possible causes:

  • Block are not getting correctly dropped by the BlockReader
  • ???

This needs further investigations. So far this has only been investigated for a large ad-hoc text log file. I don't know the behavior of other files types.

Metadata

Metadata

Assignees

No one assigned

    Labels

    bugSomething isn't workingcode improvementenhancement not seen by the userdifficultA difficult problem; a major coding effort or difficult algorithm to perfect

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions