Skip to content

Commit 42460f1

Browse files
sfc-gh-stakedaankit-bhatnagar167
authored andcommitted
SNOW-84977: new fetch pandas API: enable exposing Pandas data frame efficiently when query result format is Arrow
1 parent 10b58c3 commit 42460f1

File tree

6 files changed

+1320
-241
lines changed

6 files changed

+1320
-241
lines changed

arrow_result.pyx

Lines changed: 7 additions & 14 deletions
Original file line numberDiff line numberDiff line change
@@ -218,28 +218,21 @@ cdef class ArrowResult:
218218
else:
219219
return None
220220

221-
def _fetch_pandas_batches(self):
222-
"""
221+
def _fetch_pandas_batches(self, **kwargs):
222+
u"""
223223
Fetch Pandas dataframes in batch, where 'batch' refers to Snowflake Chunk
224-
Thus, the batch size (the number of rows in dataframe) may be different
225-
TODO: take a look at pyarrow to_pandas() API, which provides some useful arguments
226-
e.g. 1. use `use_threads=true` for acceleration
227-
2. use `strings_to_categorical` and `categories` to encoding categorical data,
228-
which is really different from `string` in data science.
229-
For example, some data may be marked as 0 and 1 as binary class in dataset,
230-
the user wishes to interpret as categorical data instead of integer.
231-
3. use `zero_copy_only` to capture the potential unnecessary memory copying
232-
we'd better also provide these handy arguments to make data scientists happy :)
224+
Thus, the batch size (the number of rows in dataframe) is optimized by
225+
Snowflake Python Connector
233226
"""
234227
for table in self._fetch_arrow_batches():
235-
yield table.to_pandas()
228+
yield table.to_pandas(**kwargs)
236229

237-
def _fetch_pandas_all(self):
230+
def _fetch_pandas_all(self, **kwargs):
238231
"""
239232
Fetch a single Pandas dataframe
240233
"""
241234
table = self._fetch_arrow_all()
242235
if table:
243-
return table.to_pandas()
236+
return table.to_pandas(**kwargs)
244237
else:
245238
return None

0 commit comments

Comments
 (0)