You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Remove TimedConsumer from parquet-concat since it needs to stricter invocation
After some experimentation with real-world scenarios, the parquet-concat
Lambda cannot operate as effectively with "BUFFER MORE" semantics like
the file-loader lambda can. Instead parquet-concat should be invoked
with its 10 messages (batch size max) from the FIFO and use those in
order to concat as quickly as possible and then exit.
Since this doesn't have the same parallelism restrictions that
file-loader does, since it's not appending to a Delta table, it's better
to crank up concurrent executions to the extent possible.
This change also utilizes the S3EventRecord file size when it is
available to reduce the amount of S3 round-trips when reading the input
parquet files.
Signed-off-by: R. Tyler Croy <rtyler@buoyantdata.com>
0 commit comments