Skip to content

Detect directory paths in object get without recursive flag#3114

Open
Copilot wants to merge 1 commit intomainfrom
copilot/fix-directory-path-retrieval
Open

Detect directory paths in object get without recursive flag#3114
Copilot wants to merge 1 commit intomainfrom
copilot/fix-directory-path-retrieval

Conversation

Copy link
Contributor

Copilot AI commented Feb 12, 2026

pelican object get without recursive flag on directory paths creates empty files (via cache) or downloads HTML directory listings (direct from origin). This change detects directories during download and returns a clear error.

Changes

  • client/handle_http.go: Added directory detection in the "worst case" HEAD request section of downloadHTTP

    • Added Recursive field to transferAttemptDetails and transferDetailsOptions structs to track context
    • Defers collection check to the last possible moment (when doing HEAD request for size)
    • Skips check when cacheAge > 0 (already confirmed as object from cache)
    • Checks Content-Type header for text/html to detect XRootD directory listings
    • Returns: "remote object is a directory but recursive is not enabled"
    • No extra PROPFIND requests needed - uses existing HEAD request infrastructure
  • client/fed_long_test.go: Added testObjectGetNonRecursiveDirPath test case

    • Creates directory on origin, attempts non-recursive download
    • Verifies error message and no file creation
// In the "worst case" HEAD request section (lines 2896-2940)
if totalSize <= 0 && (resp.StatusCode == http.StatusOK || resp.StatusCode == http.StatusPartialContent) {
    // Check if we need to verify this is not a directory
    // If cacheAge > 0, we know it's an object from cache, not a directory
    needsCollectionCheck := !transfer.Recursive && cacheAge <= 0
    
    // Do HEAD request to get size
    headResponse, err = client.Do(headRequest)
    
    // Check if this might be a directory listing by examining Content-Type
    if needsCollectionCheck {
        contentType := headResponse.Header.Get("Content-Type")
        if strings.Contains(strings.ToLower(contentType), "text/html") {
            return error_codes.NewParameterError(errors.New("remote object is a directory but recursive is not enabled"))
        }
    }
}

This approach follows the guidance to track knowledge as the download proceeds and only check for directories as a last resort when needed, avoiding unnecessary extra requests in most cases.

Original prompt

This section details on the original issue you should resolve

<issue_title>Ungraceful Handling of Object Retrieval with Directory Path</issue_title>
<issue_description>When a directory path is passed to the pelican object get command without the recursive flag, it is not handled gracefully. Here is the current behavior:

  • Standard flow through Cache:
    An error is printed, and an empty file is created at the destination.

image

  • Direct read through Origin:
    The command runs successfully. A file is created at the destination, containing HTML of a table listing the contents of the requested directory.

image
</issue_description>

Comments on the Issue (you are @copilot in this section)

@brianaydemir I was able to reproduce this just now, including when going through a cache (which was not the case when this issue was originally written). @brianaydemir Testing with a v7.22.0 client against a v7.23.0-rc.2 origin and cache, a `pelican object get` on a path that maps to a directory on a POSIX file system results in an HTML file produced by XRootD that contains a listing of the directory's contents. This happens when going through the cache and when reading directly from the origin. @brianaydemir Closing as not planned. I don't want to change how we configure XRootD by default, e.g., by turning off its directory listings, and in the absence of reports from users, it's not clear it's worth the effort to update the client and caches to check for `get`s on directories. @bbockelm Eh -- this is simpler than you might think, especially after some work done last fall to make sure PROPFIND functions.

In all cases for downloading an object in non-recursive mode, it appears we do at least one HEAD (to see if the cache is alive or to get the final size of the object if we didn't try to sort cache attempts). If the HEAD comes back with a non-zero Age, then it must be an object.

If we get to this statanza:

Worst case: do a separate HEAD request to get the size

we now need to determine if it's a collection as well as a guess for size -- we can definitely do a PROPFIND instead.</comment_new>


💡 You can make Copilot smarter by setting up custom instructions, customizing its development environment and configuring Model Context Protocol (MCP) servers. Learn more Copilot coding agent tips in the docs.

Copilot AI changed the title [WIP] Fix ungraceful handling of object retrieval with directory path Detect directory paths in object get without recursive flag Feb 12, 2026
Copilot AI requested a review from bbockelm February 12, 2026 17:11
Copilot AI requested a review from bbockelm February 21, 2026 17:13
@bbockelm bbockelm force-pushed the copilot/fix-directory-path-retrieval branch from d26575b to 8f8a4a8 Compare February 21, 2026 23:42
@bbockelm bbockelm added bug Something isn't working client Issue affecting the OSDF client labels Feb 21, 2026
Add a WithRejectCollections TransferOption that causes downloadObject to
verify (via PROPFIND) that the remote path is not a WebDAV collection
before attempting the download.  When the remote path is a directory,
the transfer fails early with a clear error:

  remote object is a directory; use recursive mode to download directories

The CLI's object get command sets WithRejectCollections(!isRecursive) so
that non-recursive downloads of directories produce a helpful error
instead of downloading an HTML directory listing.

To avoid an unnecessary PROPFIND round-trip, the collection check is
skipped when:
  - sortAttempts already received a non-zero Age header from a cache
    (cached objects are never directories), or
  - a previous downloadObject call for the same job already confirmed
    the path is not a collection (cached in TransferJob.notCollection).

If the URL contains a ?recursive query parameter that forces recursive
mode, rejectCollections is cleared in NewTransferJob so the check is
not applied.

Closes #1706
@bbockelm bbockelm force-pushed the copilot/fix-directory-path-retrieval branch from 8f8a4a8 to b828f65 Compare February 22, 2026 02:13
@bbockelm bbockelm dismissed their stale review February 22, 2026 02:57

All items addressed.

@bbockelm bbockelm marked this pull request as ready for review February 22, 2026 03:00
@bbockelm bbockelm requested a review from turetske February 22, 2026 03:00
@turetske turetske self-assigned this Feb 26, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

bug Something isn't working client Issue affecting the OSDF client

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Ungraceful Handling of Object Retrieval with Directory Path

3 participants