Skip to content

S3Proxy ignores files without user.content-md5 NTFS stream #998

@lukaszherman

Description

@lukaszherman

Current behavior
Files copied (not using S3Proxy) into the S3Proxy directory on Windows are not visible. Moreover, the presence of a single file without an MD5 causes all files to become invisible, even if the other files already have MD5 values calculated.
S3Proxy only detects files that have the NTFS alternate data stream user.content-md5 set. Files without this stream are ignored.
Question
Is there any option to bypass this limitation?
For example:

show files even if the MD5 is missing,
calculate the MD5 checksum on the fly,
or generate/update the MD5 value on read if the NTFS stream does not exist?

Current workaround
As a workaround, I periodically calculate the MD5 checksum for all files in the S3Proxy directory and store it in the user.content-md5 NTFS alternate data stream using PowerShell:

$s3FolderPath = "\\fileServer01\SharedStorage\folder1"
function Get-MD5Bytes($filePath) {
    $md5 = [System.Security.Cryptography.MD5]::Create()
    $stream = [System.IO.File]::OpenRead($filePath)
    try {
        return $md5.ComputeHash($stream)
    } finally {
        $stream.Dispose()
    }
}

Get-ChildItem -Path $s3FolderPath -File -Recurse | ForEach-Object {

    $file = $_
    $fileToUpdateMd5 = $file.FullName

    $streams = Get-Item -Path $fileToUpdateMd5 -Stream * |
               Select-Object -ExpandProperty Stream

    if ($streams -notcontains "user.content-md5") {
        $md5bytes = Get-MD5Bytes $fileToUpdateMd5
        Set-Content -Path $fileToUpdateMd5 `
                    -Stream "user.content-md5" `
                    -Value $md5bytes `
                    -Encoding Byte
        Write-Host "`tAdded user.content-md5 into $fileToUpdateMd5"
    }
}

Metadata

Metadata

Assignees

No one assigned

    Labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions