Replies: 1 comment
-
|
MIght be worth trying -> PR with proposal is better to discuss such things, and by preparing a PR proposal you might find out more limitations/issues or that it is actually easy. |
Beta Was this translation helpful? Give feedback.
0 replies
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Uh oh!
There was an error while loading. Please reload this page.
-
Hello!
I've been trying to help my colleagues to set up
SqlToS3Operatorrecently, and together we've come across the fact that this operator can't handle big tables correctly. See, it usesget_pandas_dfmethod to read the full table in RAM first, then loads it to S3, optionally in multiple files ifmax_rows_per_fileargument is provided. The problem is, this logic is not suitable for big tables, but can be (with not so much effort) fixed.Given that
SqlToS3Operator._get_hook()method is designed to returnDbApiHookinstance, and that the latter has aget_pandas_df_by_chunksmethod, isn't it only natural to use this method instead ofget_pandas_dfwhenmax_rows_per_fileis specified for theSqlToS3Operator?Beta Was this translation helpful? Give feedback.
All reactions