Skip to content

Optimise to make fewer S3 API calls #543

@yatharthranjan

Description

@yatharthranjan

it could be optimised by -

  1. not rescanning the full directory hierarchy to find new topics, but limiting that to once per 15 minutes (configurable).
  2. storing the actual file directories (partition=0, etc) to check for updates, instead of the one level above the file directory. Right now only the topic directory is stored.
  3. Keeping in memory the last object that was scanned. Then do not list all files but only newly added files using ListObjectsV2.start-after. Note that you will still want to do a full scan sometimes to avoid files being skipped that were added at a later time or not deemed complete by the cleaner.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions