Skip to content

parallelizing file including #9

@lun-4

Description

@lun-4

we can quickly get the list of files in a folder, but at the moment we just take the entire list sequentially, which leads to inefficiency as we are always blocked on I/O, we can't pin a core to 100% usage lol.

  • is it possible to bump up the chunk size (at the moment it's 1KB) and get speedier includes while maintaining sequential flow?
    • NOTE: make sure to wipe filesystem caches while testing: echo 3 | sudo tee /proc/sys/vm/drop_caches
  • move away from sequential flow into a job queue flow
    • get worker threads that take files and hash them up
      • is it possible to split the hashing work even more thanks to blake3's ability to be parallelized?
    • as files get hashed up, submit them for tag processing
      • tag inferrers can declare if they're thread safe or not, and if not, just have a single thread that works on the tag processing queue in a sequential manner.
    • hope it gives us high octane speedy includes

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions