Skip to content

Conversation

smkniazi
Copy link

@smkniazi smkniazi commented Oct 9, 2025

Description

The current implementation throws an error for empty directories, which does not conform to the specifications defined here and here.

With this change, it returns an empty list if the directory is empty and throws an error only if the directory does not exist.

Additional context and related issues

Release notes

( ) This is not user-visible or is docs only, and no release notes are required.
( ) Release notes are required. Please propose a release note for me.
( ) Release notes are required, with the following suggested text:

## Section
* Fix some things. ({issue}`issuenumber`)

Summary by Sourcery

Align TrinoHudiStorage with Hudi specifications by returning an empty list for empty directories and throwing FileNotFoundException only when a directory truly does not exist across both hierarchical and non-hierarchical file systems.

Bug Fixes:

  • Return empty list instead of error for empty directories in listDirectEntries and listFiles
  • Throw FileNotFoundException only for non-existent paths by checking directoryExists on all file systems

Tests:

  • Add unit tests for listing empty directories
  • Add unit tests for listing non-existent directories

Copy link

cla-bot bot commented Oct 9, 2025

Thank you for your pull request and welcome to the Trino community. We require contributors to sign our Contributor License Agreement, and we don't seem to have you on file. Continue to work with us on the review and improvements in this PR, and submit the signed CLA to [email protected]. Photos, scans, or digitally-signed PDF files are all suitable. Processing may take a few days. The CLA needs to be on file before we merge your changes. For more information, see https://github.com/trinodb/cla

Copy link

sourcery-ai bot commented Oct 9, 2025

Reviewer's Guide

This PR refines TrinoHudiStorage’s directory listing logic to conform with Hudi’s specs: it distinguishes between empty directories and non-existent paths by leveraging fileSystem.directoryExists, returning empty lists for valid but empty directories and throwing FileNotFoundException only for missing paths.

Sequence diagram for directory listing in TrinoHudiStorage after PR changes

sequenceDiagram
    participant Caller
    participant TrinoHudiStorage
    participant FileSystem
    participant FileIterator
    Caller->>TrinoHudiStorage: listDirectEntries(path)
    TrinoHudiStorage->>FileSystem: directoryExists(location)
    FileSystem-->>TrinoHudiStorage: Optional<Boolean> dirExists
    alt dirExists.isPresent() && !dirExists.get()
        TrinoHudiStorage-->>Caller: throw FileNotFoundException
    else
        TrinoHudiStorage->>FileSystem: listFiles(location)
        FileSystem-->>TrinoHudiStorage: FileIterator
        loop iterate files
            TrinoHudiStorage->>FileIterator: hasNext()
            FileIterator-->>TrinoHudiStorage: boolean
            alt hasNext
                TrinoHudiStorage->>FileIterator: next()
                FileIterator-->>TrinoHudiStorage: file
                TrinoHudiStorage->>TrinoHudiStorage: add to entryList
            end
        end
        alt entryList.isEmpty() && dirExists.isEmpty()
            TrinoHudiStorage-->>Caller: throw FileNotFoundException
        else
            TrinoHudiStorage-->>Caller: return entryList
        end
    end
Loading

Class diagram for updated TrinoHudiStorage directory listing methods

classDiagram
    class TrinoHudiStorage {
        +listDirectEntries(StoragePath path): List<StoragePathInfo>
        +listDirectEntries(StoragePath path, StoragePathFilter filter): List<StoragePathInfo>
    }
    class FileSystem {
        +directoryExists(Location location): Optional<Boolean>
        +listFiles(Location location): FileIterator
    }
    class FileIterator {
        +hasNext(): boolean
        +next(): File
    }
    TrinoHudiStorage --> FileSystem : uses
    TrinoHudiStorage --> FileIterator : uses
    FileSystem --> FileIterator : returns
Loading

File-Level Changes

Change Details Files
Refined directory existence and empty-directory handling in listing methods
  • Invoke fileSystem.directoryExists(location) to detect missing vs non-directory paths
  • Throw FileNotFoundException only when directoryExists reports false or isEmpty() combined with no entries
  • Retain empty-list return for hierarchical filesystems when directoryExists returns true
  • Add comments clarifying HDFS vs S3 semantics
plugin/trino-hudi/src/main/java/io/trino/plugin/hudi/storage/TrinoHudiStorage.java
Added tests for empty and non-existent directory behaviors
  • testListEmptyDirectory verifies listing an empty directory returns an empty list without error
  • testListNonExistentDirectory verifies listing a missing path throws FileNotFoundException
plugin/trino-hudi/src/test/java/io/trino/plugin/hudi/storage/TestTrinoHudiStorage.java

Tips and commands

Interacting with Sourcery

  • Trigger a new review: Comment @sourcery-ai review on the pull request.
  • Continue discussions: Reply directly to Sourcery's review comments.
  • Generate a GitHub issue from a review comment: Ask Sourcery to create an
    issue from a review comment by replying to it. You can also reply to a
    review comment with @sourcery-ai issue to create an issue from it.
  • Generate a pull request title: Write @sourcery-ai anywhere in the pull
    request title to generate a title at any time. You can also comment
    @sourcery-ai title on the pull request to (re-)generate the title at any time.
  • Generate a pull request summary: Write @sourcery-ai summary anywhere in
    the pull request body to generate a PR summary at any time exactly where you
    want it. You can also comment @sourcery-ai summary on the pull request to
    (re-)generate the summary at any time.
  • Generate reviewer's guide: Comment @sourcery-ai guide on the pull
    request to (re-)generate the reviewer's guide at any time.
  • Resolve all Sourcery comments: Comment @sourcery-ai resolve on the
    pull request to resolve all Sourcery comments. Useful if you've already
    addressed all the comments and don't want to see them anymore.
  • Dismiss all Sourcery reviews: Comment @sourcery-ai dismiss on the pull
    request to dismiss all existing Sourcery reviews. Especially useful if you
    want to start fresh with a new review - don't forget to comment
    @sourcery-ai review to trigger a new review!

Customizing Your Experience

Access your dashboard to:

  • Enable or disable review features such as the Sourcery-generated pull request
    summary, the reviewer's guide, and others.
  • Change the review language.
  • Add, remove or edit custom review instructions.
  • Adjust other review settings.

Getting Help

Copy link

@sourcery-ai sourcery-ai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hey there - I've reviewed your changes and they look great!


Sourcery is free for open source - if you like our reviews please consider sharing them ✨
Help me be more useful! Please click 👍 or 👎 on each comment and I'll use the feedback to improve your reviews.

Copy link

@sourcery-ai sourcery-ai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We failed to fetch pull request #26897.

You can try again by commenting this pull request with @sourcery-ai review, or contact us for help.

@github-actions github-actions bot added the hudi Hudi connector label Oct 9, 2025
Copy link
Member

@ebyhr ebyhr left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could you add a test?

@ebyhr
Copy link
Member

ebyhr commented Oct 10, 2025

Fixed the handling of empty directories in TrinoHudiStorage.

Please follow our commit message guideline: https://trino.io/development/process.html#pull-request-and-commit-guidelines-

Copy link

cla-bot bot commented Oct 13, 2025

Thank you for your pull request and welcome to the Trino community. We require contributors to sign our Contributor License Agreement, and we don't seem to have you on file. Continue to work with us on the review and improvements in this PR, and submit the signed CLA to [email protected]. Photos, scans, or digitally-signed PDF files are all suitable. Processing may take a few days. The CLA needs to be on file before we merge your changes. For more information, see https://github.com/trinodb/cla

Copy link

cla-bot bot commented Oct 13, 2025

Thank you for your pull request and welcome to the Trino community. We require contributors to sign our Contributor License Agreement, and we don't seem to have you on file. Continue to work with us on the review and improvements in this PR, and submit the signed CLA to [email protected]. Photos, scans, or digitally-signed PDF files are all suitable. Processing may take a few days. The CLA needs to be on file before we merge your changes. For more information, see https://github.com/trinodb/cla

Copy link

cla-bot bot commented Oct 13, 2025

Thank you for your pull request and welcome to the Trino community. We require contributors to sign our Contributor License Agreement, and we don't seem to have you on file. Continue to work with us on the review and improvements in this PR, and submit the signed CLA to [email protected]. Photos, scans, or digitally-signed PDF files are all suitable. Processing may take a few days. The CLA needs to be on file before we merge your changes. For more information, see https://github.com/trinodb/cla

Copy link

cla-bot bot commented Oct 13, 2025

Thank you for your pull request and welcome to the Trino community. We require contributors to sign our Contributor License Agreement, and we don't seem to have you on file. Continue to work with us on the review and improvements in this PR, and submit the signed CLA to [email protected]. Photos, scans, or digitally-signed PDF files are all suitable. Processing may take a few days. The CLA needs to be on file before we merge your changes. For more information, see https://github.com/trinodb/cla

@smkniazi
Copy link
Author

@sourcery-ai review

Copy link

@sourcery-ai sourcery-ai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hey there - I've reviewed your changes - here's some feedback:

  • Consider extracting the common existence‐check and listing logic between the two listDirectEntries methods into a shared helper to reduce duplication.
  • The FileNotFoundException message is the same for a non-directory vs non-existent path—consider making the error text more specific (e.g. "not a directory" vs "does not exist").
  • Add a test for listDirectEntries(path, filter) on an empty directory to verify that the filtered listing also returns an empty list instead of throwing.
Prompt for AI Agents
Please address the comments from this code review:

## Overall Comments
- Consider extracting the common existence‐check and listing logic between the two listDirectEntries methods into a shared helper to reduce duplication.
- The FileNotFoundException message is the same for a non-directory vs non-existent path—consider making the error text more specific (e.g. "not a directory" vs "does not exist").
- Add a test for listDirectEntries(path, filter) on an empty directory to verify that the filtered listing also returns an empty list instead of throwing.

## Individual Comments

### Comment 1
<location> `plugin/trino-hudi/src/test/java/io/trino/plugin/hudi/storage/TestTrinoHudiStorage.java:296-297` </location>
<code_context>
         }
     }

+    @Test
+    public void testListEmptyDirectory()
+            throws IOException
+    {
</code_context>

<issue_to_address>
**suggestion (testing):** Consider adding coverage for non-hierarchical filesystems (e.g., S3) when listing empty directories.

Please add a test using a non-hierarchical filesystem (such as S3) to ensure listing an empty directory behaves as expected and does not raise exceptions.

Suggested implementation:

```java
    @Test
    public void testListEmptyDirectory()
            throws IOException
    {
        HoodieStorage storage = getStorage();

        // Create an empty directory
        StoragePath emptyDir = new StoragePath(getTempDir(), "empty_directory");
        assertThat(storage.createDirectory(emptyDir)).isTrue();
        assertThat(storage.exists(emptyDir)).isTrue();

        // List the empty directory - should return empty list, not throw exception
        List<StoragePathInfo> entries = storage.listDirectEntries(emptyDir);
        assertThat(entries).isEmpty();
    }

    @Test
    public void testListEmptyDirectoryOnS3()
            throws IOException
    {
        HoodieStorage s3Storage = getS3Storage();

        // Create an empty "directory" on S3 (non-hierarchical)
        StoragePath emptyS3Dir = new StoragePath(getS3TempDir(), "empty_s3_directory/");
        assertThat(s3Storage.createDirectory(emptyS3Dir)).isTrue();
        assertThat(s3Storage.exists(emptyS3Dir)).isTrue();

        // List the empty S3 "directory" - should return empty list, not throw exception
        List<StoragePathInfo> s3Entries = s3Storage.listDirectEntries(emptyS3Dir);
        assertThat(s3Entries).isEmpty();

```

You will need to implement or provide the following helper methods for the new test to work:
- `getS3Storage()`: Returns a HoodieStorage instance backed by S3 (or a mock S3).
- `getS3TempDir()`: Returns a StoragePath representing a temporary S3 directory for testing.

If you already have S3 test infrastructure, use those helpers. Otherwise, you may need to add them, possibly using a mock S3 service (e.g., LocalStack or Minio) for testing.
</issue_to_address>

### Comment 2
<location> `plugin/trino-hudi/src/test/java/io/trino/plugin/hudi/storage/TestTrinoHudiStorage.java:311-313` </location>
<code_context>
+        assertThat(storage.exists(emptyDir)).isTrue();
+
+        // List the empty directory - should return empty list, not throw exception
+        List<StoragePathInfo> entries = storage.listDirectEntries(emptyDir);
+        assertThat(entries).isEmpty();
+
+        List<StoragePathInfo> files = storage.listFiles(emptyDir);
</code_context>

<issue_to_address>
**suggestion (testing):** Consider adding a test for listDirectEntries with a filter on an empty directory.

Please add a test for listDirectEntries with a filter (e.g., path -> true) on an empty directory to ensure it returns an empty list and does not throw an exception.

```suggestion
        List<StoragePathInfo> files = storage.listFiles(emptyDir);
        assertThat(files).isEmpty();

        // List the empty directory with a filter - should return empty list, not throw exception
        List<StoragePathInfo> filteredEntries = storage.listDirectEntries(emptyDir, path -> true);
        assertThat(filteredEntries).isEmpty();
    }
```
</issue_to_address>

Sourcery is free for open source - if you like our reviews please consider sharing them ✨
Help me be more useful! Please click 👍 or 👎 on each comment and I'll use the feedback to improve your reviews.

Comment on lines +296 to +297
@Test
public void testListEmptyDirectory()
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

suggestion (testing): Consider adding coverage for non-hierarchical filesystems (e.g., S3) when listing empty directories.

Please add a test using a non-hierarchical filesystem (such as S3) to ensure listing an empty directory behaves as expected and does not raise exceptions.

Suggested implementation:

    @Test
    public void testListEmptyDirectory()
            throws IOException
    {
        HoodieStorage storage = getStorage();

        // Create an empty directory
        StoragePath emptyDir = new StoragePath(getTempDir(), "empty_directory");
        assertThat(storage.createDirectory(emptyDir)).isTrue();
        assertThat(storage.exists(emptyDir)).isTrue();

        // List the empty directory - should return empty list, not throw exception
        List<StoragePathInfo> entries = storage.listDirectEntries(emptyDir);
        assertThat(entries).isEmpty();
    }

    @Test
    public void testListEmptyDirectoryOnS3()
            throws IOException
    {
        HoodieStorage s3Storage = getS3Storage();

        // Create an empty "directory" on S3 (non-hierarchical)
        StoragePath emptyS3Dir = new StoragePath(getS3TempDir(), "empty_s3_directory/");
        assertThat(s3Storage.createDirectory(emptyS3Dir)).isTrue();
        assertThat(s3Storage.exists(emptyS3Dir)).isTrue();

        // List the empty S3 "directory" - should return empty list, not throw exception
        List<StoragePathInfo> s3Entries = s3Storage.listDirectEntries(emptyS3Dir);
        assertThat(s3Entries).isEmpty();

You will need to implement or provide the following helper methods for the new test to work:

  • getS3Storage(): Returns a HoodieStorage instance backed by S3 (or a mock S3).
  • getS3TempDir(): Returns a StoragePath representing a temporary S3 directory for testing.

If you already have S3 test infrastructure, use those helpers. Otherwise, you may need to add them, possibly using a mock S3 service (e.g., LocalStack or Minio) for testing.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

hudi Hudi connector

Development

Successfully merging this pull request may close these issues.

2 participants