-
Notifications
You must be signed in to change notification settings - Fork 30
Open
Description
Hey.
We are experimenting with writing data into AOF logs in azure adls gen2.
However, when we are trying to see if its possible to both write to the logs and concurrently reading from it, we stumbled upon a race condition (?). Our assumption is that the azure http client that this extension is using, invalidates files that have changed its Etag between listing/globbing files and the actual reading.
D SELECT count(*) FROM 'abfss://<path>/**.jsonl';
94% ▕███████████████████████████████████▋ ▏ (~10 seconds remaining) IO Error:
AzureBlobStorageFileSystem Read to 'abfss://<path>/<hive-partition>/*.jsonl' failed with ConditionNotMet Reason Phrase: The condition specified using HTTP conditional header(s) is not met.duckdb-azure/src/azure_dfs_filesystem.cpp
Lines 191 to 210 in 60fec85
| void AzureDfsStorageFileSystem::ReadRange(AzureFileHandle &handle, idx_t file_offset, char *buffer_out, | |
| idx_t buffer_out_len) { | |
| auto &afh = handle.Cast<AzureDfsStorageFileHandle>(); | |
| try { | |
| // Specify the range | |
| Azure::Core::Http::HttpRange range; | |
| range.Offset = (int64_t)file_offset; | |
| range.Length = buffer_out_len; | |
| Azure::Storage::Files::DataLake::DownloadFileToOptions options; | |
| options.Range = range; | |
| options.TransferOptions.Concurrency = afh.read_options.transfer_concurrency; | |
| options.TransferOptions.InitialChunkSize = afh.read_options.transfer_chunk_size; | |
| options.TransferOptions.ChunkSize = afh.read_options.transfer_chunk_size; | |
| auto res = afh.file_client.DownloadTo((uint8_t *)buffer_out, buffer_out_len, options); | |
| } catch (const Azure::Storage::StorageException &e) { | |
| throw IOException("AzureBlobStorageFileSystem Read to '%s' failed with %s Reason Phrase: %s", afh.path, | |
| e.ErrorCode, e.ReasonPhrase); | |
| } | |
| } |
We propose these options to be added:
- gracefully read files. Ignore those that "fail" [allows the reader to read most of the data]
- lease the file during read [blocks the writer from reader]
- allow ignoring Etag validation during partial read(s) [allows the reader to read all available data, but can introduce unwanted state where the reader needs to read the specified Etag]
Reactions are currently unavailable
Metadata
Metadata
Assignees
Labels
No labels