Skip to content

Detect directory paths in object get without recursive flag#3114

Open
Copilot wants to merge 3 commits intomainfrom
copilot/fix-directory-path-retrieval
Open

Detect directory paths in object get without recursive flag#3114
Copilot wants to merge 3 commits intomainfrom
copilot/fix-directory-path-retrieval

Conversation

Copy link
Contributor

Copilot AI commented Feb 12, 2026

  • Remove redundant recursiveFromQuery check
  • Add specific ParameterError type validation in test
  • Restore recursiveFromQuery logic with simplified condition
  • Verify compilation
  • Run tests
  • Final code review

Changes Made

Issue 1: Simplified recursiveFromQuery check (handle_http.go:1242)

The original suggestion was to remove the redundant tj.rejectCollections check in the condition, not to remove the entire block. The corrected logic:

if recursiveFromQuery {
    tj.rejectCollections = false
}

This ensures that when a URL query parameter forces recursive mode, we unconditionally clear rejectCollections.

Issue 2: More specific error validation (fed_long_test.go:1042)

Added proper error type checking:

var pe *error_codes.PelicanError
require.True(t, errors.As(getErr, &pe), "Error should be wrapped in PelicanError")
assert.Equal(t, 1000, pe.Code(), "Should be Parameter error code")
Original prompt

This section details on the original issue you should resolve

<issue_title>Ungraceful Handling of Object Retrieval with Directory Path</issue_title>
<issue_description>When a directory path is passed to the pelican object get command without the recursive flag, it is not handled gracefully. Here is the current behavior:

  • Standard flow through Cache:
    An error is printed, and an empty file is created at the destination.

image

  • Direct read through Origin:
    The command runs successfully. A file is created at the destination, containing HTML of a table listing the contents of the requested directory.

image
</issue_description>

Comments on the Issue (you are @copilot in this section)

@brianaydemir I was able to reproduce this just now, including when going through a cache (which was not the case when this issue was originally written). @brianaydemir Testing with a v7.22.0 client against a v7.23.0-rc.2 origin and cache, a `pelican object get` on a path that maps to a directory on a POSIX file system results in an HTML file produced by XRootD that contains a listing of the directory's contents. This happens when going through the cache and when reading directly from the origin. @brianaydemir Closing as not planned. I don't want to change how we configure XRootD by default, e.g., by turning off its directory listings, and in the absence of reports from users, it's not clear it's worth the effort to update the client and caches to check for `get`s on directories. @bbockelm Eh -- this is simpler than you might think, especially after some work done last fall to make sure PROPFIND functions.

In all cases for downloading an object in non-recursive mode, it appears we do at least one HEAD (to see if the cache is alive or to get the final size of the object if we didn't try to sort cache attempts). If the HEAD comes back with a non-zero Age, then it must be an object.

If we get to this statanza:

Worst case: do a separate HEAD request to get the size

we now need to determine if it's a collection as well as a guess for size -- we can definitely do a PROPFIND instead.</comment_new>


💡 You can make Copilot smarter by setting up custom instructions, customizing its development environment and configuring Model Context Protocol (MCP) servers. Learn more Copilot coding agent tips in the docs.

Copilot AI changed the title [WIP] Fix ungraceful handling of object retrieval with directory path Detect directory paths in object get without recursive flag Feb 12, 2026
Copilot AI requested a review from bbockelm February 12, 2026 17:11
Copilot AI requested a review from bbockelm February 21, 2026 17:13
@bbockelm bbockelm force-pushed the copilot/fix-directory-path-retrieval branch from d26575b to 8f8a4a8 Compare February 21, 2026 23:42
@bbockelm bbockelm added bug Something isn't working client Issue affecting the OSDF client labels Feb 21, 2026
Add a WithRejectCollections TransferOption that causes downloadObject to
verify (via PROPFIND) that the remote path is not a WebDAV collection
before attempting the download.  When the remote path is a directory,
the transfer fails early with a clear error:

  remote object is a directory; use recursive mode to download directories

The CLI's object get command sets WithRejectCollections(!isRecursive) so
that non-recursive downloads of directories produce a helpful error
instead of downloading an HTML directory listing.

To avoid an unnecessary PROPFIND round-trip, the collection check is
skipped when:
  - sortAttempts already received a non-zero Age header from a cache
    (cached objects are never directories), or
  - a previous downloadObject call for the same job already confirmed
    the path is not a collection (cached in TransferJob.notCollection).

If the URL contains a ?recursive query parameter that forces recursive
mode, rejectCollections is cleared in NewTransferJob so the check is
not applied.

Closes #1706
@bbockelm bbockelm force-pushed the copilot/fix-directory-path-retrieval branch from 8f8a4a8 to b828f65 Compare February 22, 2026 02:13
@bbockelm bbockelm dismissed their stale review February 22, 2026 02:57

All items addressed.

@bbockelm bbockelm marked this pull request as ready for review February 22, 2026 03:00
@bbockelm bbockelm requested a review from turetske February 22, 2026 03:00
@turetske turetske self-assigned this Feb 26, 2026

// If the URL query forced recursive mode, clear rejectCollections since the user
// explicitly indicated they want recursive behavior via the URL.
if recursiveFromQuery && tj.rejectCollections {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@copilot The check of tj.rejectCollections is redundant, no matter what it is it will end up as false.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed in commit 0b70906. Removed the redundant check - when recursiveFromQuery is true, recursive is already true, so rejectCollections would never have been set to true in the first place.

_, getErr = client.DoGet(fed.Ctx, downloadURL, destPath, false, opts...)
}

require.Error(t, getErr, "Expected an error when passing a dir path to object get command without recursive option set")
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@copilot I believe we can be more specific? Isn't this a pelican parameter error?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed in commit 0b70906. Now validates that the error is specifically a ParameterError with code 1000:

var pe *error_codes.PelicanError
require.True(t, errors.As(getErr, &pe), "Error should be wrapped in PelicanError")
assert.Equal(t, 1000, pe.Code(), "Should be Parameter error code")

- Remove redundant recursiveFromQuery check in handle_http.go (line 1242)
  The check was unnecessary because recursiveFromQuery implies recursive=true,
  and rejectCollections is only set when NOT recursive
- Add specific ParameterError validation in fed_long_test.go (line 1042)
  Now verifies error is wrapped as PelicanError with code 1000 (Parameter error)

Co-authored-by: turetske <498529+turetske@users.noreply.github.com>
Copilot AI requested a review from turetske March 2, 2026 18:00
// If the URL query forced recursive mode, clear rejectCollections since the user
// explicitly indicated they want recursive behavior via the URL.
if recursiveFromQuery && tj.rejectCollections {
tj.rejectCollections = false
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@copilot Is this correct? I suggested just removing the tj.rejectCollection but keeping recursiveFromQuery. Basically, if recursiveFromQuery, then tj.rejectCollections = false.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You're right, I misunderstood. Fixed in commit 23647a9. Now keeping the logic but with the simplified condition:

if recursiveFromQuery {
    tj.rejectCollections = false
}

This removes the redundant tj.rejectCollections check while preserving the behavior of clearing rejectCollections when the URL query forces recursive mode.

Keep the logic to clear rejectCollections when recursiveFromQuery is true,
but remove the redundant check of tj.rejectCollections in the condition.
Now simply: if recursiveFromQuery { tj.rejectCollections = false }

Co-authored-by: turetske <498529+turetske@users.noreply.github.com>
Copilot AI requested a review from turetske March 2, 2026 19:15
@turetske
Copy link
Collaborator

turetske commented Mar 2, 2026

@bbockelm Passing this back to you

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

bug Something isn't working client Issue affecting the OSDF client

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Ungraceful Handling of Object Retrieval with Directory Path

3 participants