-
Notifications
You must be signed in to change notification settings - Fork 2.7k
Labels
A-io-cloudArea: reading/writing to cloud storageArea: reading/writing to cloud storageA-io-csvArea: reading/writing CSV filesArea: reading/writing CSV filesA-io-jsonArea: reading/writing JSON filesArea: reading/writing JSON filesA-streamingRelated to the streaming engineRelated to the streaming engineenhancementNew feature or an improvement of an existing featureNew feature or an improvement of an existing featureperformancePerformance issues or improvementsPerformance issues or improvements
Description
Description
Currently, scanning a cloud file for CSV and NDJSON perform a full download of the data onto the local disk before scanning. Ideally we can instead download and process the file in chunks.
-
scan_ndjson/scan_linesperf: Streaming cloud download forscan_ndjson/scan_lines#26563 -
scan_csvperf: Streaming cloud download forscan_csv#26637 -
pl.len()for CSV (currently usingcount_rows) - restrict number of files (issue Restrict CSV schema inference to 10 files by default and add configuration API #26064)
- update public APIs for file cache deprecation
Reactions are currently unavailable
Metadata
Metadata
Assignees
Labels
A-io-cloudArea: reading/writing to cloud storageArea: reading/writing to cloud storageA-io-csvArea: reading/writing CSV filesArea: reading/writing CSV filesA-io-jsonArea: reading/writing JSON filesArea: reading/writing JSON filesA-streamingRelated to the streaming engineRelated to the streaming engineenhancementNew feature or an improvement of an existing featureNew feature or an improvement of an existing featureperformancePerformance issues or improvementsPerformance issues or improvements