Replies: 1 comment 1 reply
-
I don't think Presto/Trino read can ever cause data loss. For your issue - do you have caching enabled that might have incomplete data at some point of time? |
Beta Was this translation helpful? Give feedback.
1 reply
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Uh oh!
There was an error while loading. Please reload this page.
-
Does Presto read Hive cause data loss?
When a Hive task is completed and the end time is
2022-11-16 05:43:36
, run Presto to query Hive data.Parallel query after the end time Hive uses
select count(1)
to query the total number of data items:342434654
Data missing1238
When
select count(1)
is used again 5 minutes later, the total number of query data is342435892
After checking, it is found that there is a block in HDFS at the time of
2022-11-16 05:43:36
, and the data inside is exactly1238
. It is speculated that this Block is missing when Presto loads HDFS Block!What configuration do I need to do for this situation? Or is it configured to load HDFS Block?
Beta Was this translation helpful? Give feedback.
All reactions