-
Notifications
You must be signed in to change notification settings - Fork 51
Open
Description
I am using the driver to run a data migration. When dealing with a table of 24 million rows and 9 columns, the performance is excellent when I fetch 10 thousand rows. When I increase the fetch size to 100 thousand rows, the transfer speed is still good. However, when fetching 1 million rows or more, the data transfer becomes very slow. A quick test made me think that the driver tries to fetch all the data in a single batch. Is there a way to improve this process?
This is the connection string
"token:xxx@host:443$xxx-path?catalog=sample&database=big_table&useCloudFetch=true&maxRows=10000"
I've tried to use this two settings, data transfer is still slow to big tables.
- useCloudFetch=true
- maxRows=10000
In forums users suggest change spark.driver.maxResultSize ?
Metadata
Metadata
Assignees
Labels
No labels