Why in hive TEXTFILE tale scan is too slow？ #17887

luhea · 2023-06-14T06:59:39Z

luhea
Jun 14, 2023

trino version:418
config:
query.max-memory-per-node=32GB
num workes:16
query.max-memory=576GB
jvm:
-Xmx50G
hive table：
table1：format = 'TEXTFILE' ,textfile_field_separator = '|' ,partitioned_by = ARRAY['pt_dt','cluster']
table2: format = 'PARQUET' ,partitioned_by = ARRAY['pt_dt','cluster']
table1 and table have 133342642 rows in pt_dt='2023-06-13'

whren i run in table1 and table the same query:
SELECT a.path ,
sum(a.filesize)/1024/1024/1024/1024*3
FROM
cluster_stats.table1 AS a
WHERE pt_dt='2023-06-01'
GROUP BY a.path
ORDER BY a.path
limit 20

table1 return in：8min table2 return in 14s

and in stage performance：
table1：

table2：

why in hive TEXTFILE tale scan is too slow？

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Why in hive TEXTFILE tale scan is too slow？ #17887

Uh oh!

{{title}}

Uh oh!

Replies: 0 comments

Select a reply

Uh oh!

Why in hive TEXTFILE tale scan is too slow？ #17887

Uh oh!

luhea Jun 14, 2023

Replies: 0 comments

luhea
Jun 14, 2023