-
Notifications
You must be signed in to change notification settings - Fork 1.2k
Description
Describe the bug
When uploading a large file (170GiB), using nextcloud and S3 backend, with MultiPart enabled, the aws code uses temp files to uploads. The issue is that it takes the same amount of files twice in disk, once to received the whole file, and once splitted as multipart split. This results in a requirement of 170GiB * 2 as temporary disk in my kubernetes cluster.
Isn't it possible to read and send as-you-go, in streaming mode, and cleaning the split part after having sent the multipart successfully, instead of waiting for a fclose of the input stream?
https://github.com/aws/aws-sdk-php/blob/3.360.1/src/S3/StreamWrapper.php#L32 says
* Because Amazon S3 requires a Content-Length header, write only streams will
* maintain a 'php://temp' stream to buffer data written to the stream until
* the stream is flushed (usually by closing the stream with fclose).
Regression Issue
- Select this option if this issue appears to be a regression.
Expected Behavior
1/ The input file is read in a buffer in temp dir and after each read, the read data is discarded
2/ In a multipart upload, as soon as one split part is sent, the temporary file is deleted to free space.
Current Behavior
1/ The input file of 170GiB is stored in /tmp/phpXXXX.
2/ In a multipart upload, all temporary files of split parts stays until the end of the whole upload.
Reproduction Steps
Not an easy test case, to be improved:
- deploy nextcloud
- configure an external storage as s3 backend a cloud provider (scaleway, gcs, aws, etc.)
- use rclone to send to nextcloud using webdav.
Possible Solution
Put unlink carefully at the multiupload code somewhere?
Additional Information/Context
No response
SDK version used
3.349
Environment details (Version of PHP (php -v)? OS name and version, etc.)
8.3.27