Is it possible to query many Parquet files stored on S3 with a single query? #17967
collimarco
started this conversation in
General
Replies: 0 comments
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Uh oh!
There was an error while loading. Please reload this page.
-
Let's say that you have thousands of Parquet files already stored on S3.
The schema is similar, but it is not identical for all the files. For example:
Is it possible to use Trino to query all the files in a directory on S3?
Like
SELECT count(*) FROM /s3path/custom/2023/06/*.parquet WHERE status = 300
Or it possible to give Trino a long list of files on S3 to query dynamically?
Like
SELECT count(*) FROM s3_file_1, s3_file_2, ... s3_file1000 WHERE status = 300
Ideally each query uses a different set of files (they are grouped in partitions), so it would be better to be able to execute the queries directly on a list of files, without having to perform too many intermediate steps.
Is this possible with Trino?
Beta Was this translation helpful? Give feedback.
All reactions