Skip to content

improvements to syscalls and large directory handling #7

@lgarrison

Description

@lgarrison

Right now we're stat-ing every directory entry unconditionally (here) and calling getxattr on every dir (here)

There are a few improvements we could make. First, we're actually calling lgetxattr twice to get the rentries, since the protocol is to call it once to get the attr len, allocate a buffer of that size, and then call it again to get the value. However, we could probably just pre-allocate a large buffer (e.g. 4 KB) and skip the first call. If the buffer is too small, we'll get ERANGE and can return None.

Second, we may not need to always stat every entry. I'm pretty sure we can tell the file type (dir, symlink, file) from readdir alone, although I haven't looked into if/how Rust exposes that information. For dir rbytes, we're currently using the size from stat, but it's possible that ceph.dir.rbytes is faster; I haven't timed it. For file sizes, we may need to stat, unless there's a ceph xattr for that.

If the user wants to view the owner/group (u), then we do need to stat.

This all has implications for handling large directories. We might want to warn the user if they're about to open a dir with > 10 K files, or perhaps just open the directory but don't stat anything, and let them press a key to run stat/getxattr if they want it anyway. If we can get the file type info from readdir, then the user can still see what's a file and what's a directory and navigate accordingly.

Similarly, to aid navigation, if a directory is big but has few subdirectories, we can just stat/getxattr those.

Metadata

Metadata

Assignees

No one assigned

    Labels

    enhancementNew feature or request

    Type

    No type

    Projects

    No projects

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions