-
Notifications
You must be signed in to change notification settings - Fork 3.4k
Fixed the handling of empty directories in TrinoHudiStorage. #26897
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: master
Are you sure you want to change the base?
Conversation
Thank you for your pull request and welcome to the Trino community. We require contributors to sign our Contributor License Agreement, and we don't seem to have you on file. Continue to work with us on the review and improvements in this PR, and submit the signed CLA to [email protected]. Photos, scans, or digitally-signed PDF files are all suitable. Processing may take a few days. The CLA needs to be on file before we merge your changes. For more information, see https://github.com/trinodb/cla |
Reviewer's GuideThis PR refines TrinoHudiStorage’s directory listing logic to conform with Hudi’s specs: it distinguishes between empty directories and non-existent paths by leveraging fileSystem.directoryExists, returning empty lists for valid but empty directories and throwing FileNotFoundException only for missing paths. Sequence diagram for directory listing in TrinoHudiStorage after PR changessequenceDiagram
participant Caller
participant TrinoHudiStorage
participant FileSystem
participant FileIterator
Caller->>TrinoHudiStorage: listDirectEntries(path)
TrinoHudiStorage->>FileSystem: directoryExists(location)
FileSystem-->>TrinoHudiStorage: Optional<Boolean> dirExists
alt dirExists.isPresent() && !dirExists.get()
TrinoHudiStorage-->>Caller: throw FileNotFoundException
else
TrinoHudiStorage->>FileSystem: listFiles(location)
FileSystem-->>TrinoHudiStorage: FileIterator
loop iterate files
TrinoHudiStorage->>FileIterator: hasNext()
FileIterator-->>TrinoHudiStorage: boolean
alt hasNext
TrinoHudiStorage->>FileIterator: next()
FileIterator-->>TrinoHudiStorage: file
TrinoHudiStorage->>TrinoHudiStorage: add to entryList
end
end
alt entryList.isEmpty() && dirExists.isEmpty()
TrinoHudiStorage-->>Caller: throw FileNotFoundException
else
TrinoHudiStorage-->>Caller: return entryList
end
end
Class diagram for updated TrinoHudiStorage directory listing methodsclassDiagram
class TrinoHudiStorage {
+listDirectEntries(StoragePath path): List<StoragePathInfo>
+listDirectEntries(StoragePath path, StoragePathFilter filter): List<StoragePathInfo>
}
class FileSystem {
+directoryExists(Location location): Optional<Boolean>
+listFiles(Location location): FileIterator
}
class FileIterator {
+hasNext(): boolean
+next(): File
}
TrinoHudiStorage --> FileSystem : uses
TrinoHudiStorage --> FileIterator : uses
FileSystem --> FileIterator : returns
File-Level Changes
Tips and commandsInteracting with Sourcery
Customizing Your ExperienceAccess your dashboard to:
Getting Help
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We failed to fetch pull request #26897.
You can try again by commenting this pull request with @sourcery-ai review
, or contact us for help.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could you add a test?
Please follow our commit message guideline: https://trino.io/development/process.html#pull-request-and-commit-guidelines-
|
Thank you for your pull request and welcome to the Trino community. We require contributors to sign our Contributor License Agreement, and we don't seem to have you on file. Continue to work with us on the review and improvements in this PR, and submit the signed CLA to [email protected]. Photos, scans, or digitally-signed PDF files are all suitable. Processing may take a few days. The CLA needs to be on file before we merge your changes. For more information, see https://github.com/trinodb/cla |
Thank you for your pull request and welcome to the Trino community. We require contributors to sign our Contributor License Agreement, and we don't seem to have you on file. Continue to work with us on the review and improvements in this PR, and submit the signed CLA to [email protected]. Photos, scans, or digitally-signed PDF files are all suitable. Processing may take a few days. The CLA needs to be on file before we merge your changes. For more information, see https://github.com/trinodb/cla |
Thank you for your pull request and welcome to the Trino community. We require contributors to sign our Contributor License Agreement, and we don't seem to have you on file. Continue to work with us on the review and improvements in this PR, and submit the signed CLA to [email protected]. Photos, scans, or digitally-signed PDF files are all suitable. Processing may take a few days. The CLA needs to be on file before we merge your changes. For more information, see https://github.com/trinodb/cla |
Thank you for your pull request and welcome to the Trino community. We require contributors to sign our Contributor License Agreement, and we don't seem to have you on file. Continue to work with us on the review and improvements in this PR, and submit the signed CLA to [email protected]. Photos, scans, or digitally-signed PDF files are all suitable. Processing may take a few days. The CLA needs to be on file before we merge your changes. For more information, see https://github.com/trinodb/cla |
@sourcery-ai review |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hey there - I've reviewed your changes - here's some feedback:
- Consider extracting the common existence‐check and listing logic between the two listDirectEntries methods into a shared helper to reduce duplication.
- The FileNotFoundException message is the same for a non-directory vs non-existent path—consider making the error text more specific (e.g. "not a directory" vs "does not exist").
- Add a test for listDirectEntries(path, filter) on an empty directory to verify that the filtered listing also returns an empty list instead of throwing.
Prompt for AI Agents
Please address the comments from this code review:
## Overall Comments
- Consider extracting the common existence‐check and listing logic between the two listDirectEntries methods into a shared helper to reduce duplication.
- The FileNotFoundException message is the same for a non-directory vs non-existent path—consider making the error text more specific (e.g. "not a directory" vs "does not exist").
- Add a test for listDirectEntries(path, filter) on an empty directory to verify that the filtered listing also returns an empty list instead of throwing.
## Individual Comments
### Comment 1
<location> `plugin/trino-hudi/src/test/java/io/trino/plugin/hudi/storage/TestTrinoHudiStorage.java:296-297` </location>
<code_context>
}
}
+ @Test
+ public void testListEmptyDirectory()
+ throws IOException
+ {
</code_context>
<issue_to_address>
**suggestion (testing):** Consider adding coverage for non-hierarchical filesystems (e.g., S3) when listing empty directories.
Please add a test using a non-hierarchical filesystem (such as S3) to ensure listing an empty directory behaves as expected and does not raise exceptions.
Suggested implementation:
```java
@Test
public void testListEmptyDirectory()
throws IOException
{
HoodieStorage storage = getStorage();
// Create an empty directory
StoragePath emptyDir = new StoragePath(getTempDir(), "empty_directory");
assertThat(storage.createDirectory(emptyDir)).isTrue();
assertThat(storage.exists(emptyDir)).isTrue();
// List the empty directory - should return empty list, not throw exception
List<StoragePathInfo> entries = storage.listDirectEntries(emptyDir);
assertThat(entries).isEmpty();
}
@Test
public void testListEmptyDirectoryOnS3()
throws IOException
{
HoodieStorage s3Storage = getS3Storage();
// Create an empty "directory" on S3 (non-hierarchical)
StoragePath emptyS3Dir = new StoragePath(getS3TempDir(), "empty_s3_directory/");
assertThat(s3Storage.createDirectory(emptyS3Dir)).isTrue();
assertThat(s3Storage.exists(emptyS3Dir)).isTrue();
// List the empty S3 "directory" - should return empty list, not throw exception
List<StoragePathInfo> s3Entries = s3Storage.listDirectEntries(emptyS3Dir);
assertThat(s3Entries).isEmpty();
```
You will need to implement or provide the following helper methods for the new test to work:
- `getS3Storage()`: Returns a HoodieStorage instance backed by S3 (or a mock S3).
- `getS3TempDir()`: Returns a StoragePath representing a temporary S3 directory for testing.
If you already have S3 test infrastructure, use those helpers. Otherwise, you may need to add them, possibly using a mock S3 service (e.g., LocalStack or Minio) for testing.
</issue_to_address>
### Comment 2
<location> `plugin/trino-hudi/src/test/java/io/trino/plugin/hudi/storage/TestTrinoHudiStorage.java:311-313` </location>
<code_context>
+ assertThat(storage.exists(emptyDir)).isTrue();
+
+ // List the empty directory - should return empty list, not throw exception
+ List<StoragePathInfo> entries = storage.listDirectEntries(emptyDir);
+ assertThat(entries).isEmpty();
+
+ List<StoragePathInfo> files = storage.listFiles(emptyDir);
</code_context>
<issue_to_address>
**suggestion (testing):** Consider adding a test for listDirectEntries with a filter on an empty directory.
Please add a test for listDirectEntries with a filter (e.g., path -> true) on an empty directory to ensure it returns an empty list and does not throw an exception.
```suggestion
List<StoragePathInfo> files = storage.listFiles(emptyDir);
assertThat(files).isEmpty();
// List the empty directory with a filter - should return empty list, not throw exception
List<StoragePathInfo> filteredEntries = storage.listDirectEntries(emptyDir, path -> true);
assertThat(filteredEntries).isEmpty();
}
```
</issue_to_address>
Help me be more useful! Please click 👍 or 👎 on each comment and I'll use the feedback to improve your reviews.
@Test | ||
public void testListEmptyDirectory() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
suggestion (testing): Consider adding coverage for non-hierarchical filesystems (e.g., S3) when listing empty directories.
Please add a test using a non-hierarchical filesystem (such as S3) to ensure listing an empty directory behaves as expected and does not raise exceptions.
Suggested implementation:
@Test
public void testListEmptyDirectory()
throws IOException
{
HoodieStorage storage = getStorage();
// Create an empty directory
StoragePath emptyDir = new StoragePath(getTempDir(), "empty_directory");
assertThat(storage.createDirectory(emptyDir)).isTrue();
assertThat(storage.exists(emptyDir)).isTrue();
// List the empty directory - should return empty list, not throw exception
List<StoragePathInfo> entries = storage.listDirectEntries(emptyDir);
assertThat(entries).isEmpty();
}
@Test
public void testListEmptyDirectoryOnS3()
throws IOException
{
HoodieStorage s3Storage = getS3Storage();
// Create an empty "directory" on S3 (non-hierarchical)
StoragePath emptyS3Dir = new StoragePath(getS3TempDir(), "empty_s3_directory/");
assertThat(s3Storage.createDirectory(emptyS3Dir)).isTrue();
assertThat(s3Storage.exists(emptyS3Dir)).isTrue();
// List the empty S3 "directory" - should return empty list, not throw exception
List<StoragePathInfo> s3Entries = s3Storage.listDirectEntries(emptyS3Dir);
assertThat(s3Entries).isEmpty();
You will need to implement or provide the following helper methods for the new test to work:
getS3Storage()
: Returns a HoodieStorage instance backed by S3 (or a mock S3).getS3TempDir()
: Returns a StoragePath representing a temporary S3 directory for testing.
If you already have S3 test infrastructure, use those helpers. Otherwise, you may need to add them, possibly using a mock S3 service (e.g., LocalStack or Minio) for testing.
Description
The current implementation throws an error for empty directories, which does not conform to the specifications defined here and here.
With this change, it returns an empty list if the directory is empty and throws an error only if the directory does not exist.
Additional context and related issues
Release notes
( ) This is not user-visible or is docs only, and no release notes are required.
( ) Release notes are required. Please propose a release note for me.
( ) Release notes are required, with the following suggested text:
Summary by Sourcery
Align TrinoHudiStorage with Hudi specifications by returning an empty list for empty directories and throwing FileNotFoundException only when a directory truly does not exist across both hierarchical and non-hierarchical file systems.
Bug Fixes:
Tests: