Skip to content

fix(dir): use Prefix instead of StartAfter for directory existence check#168

Open
alexsavio wants to merge 1 commit intoyandex-cloud:masterfrom
alexsavio:fix-intelligent-list-cut-utf8
Open

fix(dir): use Prefix instead of StartAfter for directory existence check#168
alexsavio wants to merge 1 commit intoyandex-cloud:masterfrom
alexsavio:fix-intelligent-list-cut-utf8

Conversation

@alexsavio
Copy link

Pull Request: Use Prefix instead of StartAfter for directory existence check

Summary

This PR replaces the StartAfter parameter with invalid UTF-8 sequences with a simpler Prefix-based query when checking for directory existence in intelligentListCut.

Problem

The current implementation in core/dir.go uses StartAfter with a crafted UTF-8 suffix (.\xEF\xBF\xBD) to simulate a >= operator for directory existence checks:

StartAfter: PString(lastName[0:lastLtPos] + ".\xEF\xBF\xBD"),

This approach:

  1. Relies on specific UTF-8 byte ordering behavior
  2. Uses invalid/replacement UTF-8 characters that may be handled differently by various S3-compatible backends
  3. Is complex and hard to understand (requires understanding UTF-8 code point ranges)

We encountered issues with this approach when using AWS S3 in our setup.

Solution

Replace the StartAfter approach with a direct Prefix query:

Prefix: PString(lastName[0:lastLtPos] + "/"),

This is:

  • More straightforward: Directly queries for items starting with path/
  • More portable: Works consistently across different S3-compatible backends
  • Easier to understand: No need to understand UTF-8 byte ordering tricks

Changes

  1. core/dir.go: Replace StartAfter with Prefix in intelligentListCut
  2. core/dir_test.go: Update test assertion to match new behavior
  3. core/goofys_common_test.go: Add default CLOUD=s3 for easier local test execution

Testing

  • Updated existing unit tests to verify the new behavior
  • Tested with AWS S3 backend

Related

This is similar in spirit to commit de62549 which removed StartAfter-based ext-v1 autodetection because it didn't work with Yandex S3.


How to apply these changes

If you want to create this PR from your own fork:

# Add upstream remote if not already added
git remote add upstream git@github.com:yandex-cloud/geesefs.git

# Fetch latest upstream
git fetch upstream

# Create a new branch from upstream master
git checkout -b fix-intelligent-list-cut-utf8 upstream/master

# Apply the patch (see PATCH.diff file) or manually make the changes
git apply PATCH.diff

# Commit
git commit -m "fix(dir): use Prefix instead of StartAfter for directory existence check"

# Push to your fork
git push origin fix-intelligent-list-cut-utf8

# Then create PR via GitHub UI

@alexsavio alexsavio force-pushed the fix-intelligent-list-cut-utf8 branch from d161877 to 6fd8fe8 Compare January 22, 2026 09:11
Replace StartAfter with invalid UTF-8 sequences (\xEF\xBF\xBD) with a
direct Prefix-based query when checking for directory existence in
intelligentListCut.

The previous approach used StartAfter with '.\xEF\xBF\xBD' suffix to
simulate a '>=' operator, but this relies on specific UTF-8 byte
ordering behavior that may not work correctly with all S3-compatible
storage backends.

The new approach uses Prefix directly to query for items starting with
'path/' which is:
- More straightforward and easier to understand
- More portable across different S3-compatible backends
- Avoids potential issues with UTF-8 handling in various implementations

Also adds a default CLOUD=s3 in tests for easier local test execution.
@alexsavio alexsavio force-pushed the fix-intelligent-list-cut-utf8 branch from 6fd8fe8 to e66fd01 Compare January 22, 2026 09:12
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant