Many third party libraries are interoperable with pandas, for example by accepting pandas dataframes objects as function inputs. Here we have a non-exhaustive list of third party library use cases with pandas and note whether each method works in Snowpark pandas as well.
Snowpark pandas supports the dataframe interchange protocol, which some libraries use to interoperate with Snowpark pandas to the same level of support as pandas.
The following table is structured as follows: The first column contains the name of a method in the plotly.express module.
The second column is a flag for whether or not interoperability is guaranteed with Snowpark pandas. For each of these
operations, we validate that passing in Snowpark pandas dataframes or series as the data inputs behaves equivalently
to passing in pandas dataframes or series.
Note
Y stands for yes, i.e., interoperability is guaranteed with this method, and N stands for no.
Note
Currently only plotly versions <6.0.0 are supported through the dataframe interchange protocol.
| Method name | Interoperable with Snowpark pandas? (Y/N) | Notes for current implementation |
scatter |
Y | |
line |
Y | |
area |
Y | |
timeline |
Y | |
violin |
Y | |
bar |
Y | |
histogram |
Y | |
pie |
Y | |
treemap |
Y | |
sunburst |
Y | |
icicle |
Y | |
scatter_matrix |
Y | |
funnel |
Y | |
density_heatmap |
Y | |
boxplot |
Y | |
imshow |
Y |
We break down scikit-learn interoperability by categories of scikit-learn operations.
For each category, we provide a table of interoperability with the following structure: The first column describes a scikit-learn operation that may include multiple method calls. The second column is a flag for whether or not interoperability is guaranteed with Snowpark pandas. For each of these methods, we validate that passing in Snowpark pandas objects behaves equivalently to passing in pandas objects.
Note
Y stands for yes, i.e., interoperability is guaranteed with this method, and N stands for no.
Note
While some scikit-learn methods accept Snowpark pandas inputs, their
performance with Snowpark pandas inputs is often much worse than their
performance with native pandas inputs. Generally we recommend converting
Snowpark pandas inputs to pandas with to_pandas() before passing them
to scikit-learn.
| Operation | Interoperable with Snowpark pandas? (Y/N) | Notes for current implementation |
Fitting a LinearDiscriminantAnalysis
classifier with the fit() method and
classifying data with the predict()
method. |
Y |
| Operation | Interoperable with Snowpark pandas? (Y/N) | Notes for current implementation |
Fitting a LogisticRegression model
with the fit() method and predicting
results with the predict() method. |
Y |
| Clustering method | Interoperable with Snowpark pandas? (Y/N) | Notes for current implementation |
KMeans.fit() |
Y |
| Operation | Interoperable with Snowpark pandas? (Y/N) | Notes for current implementation |
Getting the principal components of a
numerical dataset with PCA.fit(). |
Y |
| Operation | Interoperable with Snowpark pandas? (Y/N) | Notes for current implementation |
Choosing parameters for a
LogisticRegression model with
RandomizedSearchCV.fit(). |
Y | RandomizedSearchCV causes Snowpark pandas
to issue many queries. We strongly recommend
converting Snowpark pandas inputs to pandas
before using RandomizedSearchCV |
| Operation | Interoperable with Snowpark pandas? (Y/N) | Notes for current implementation |
Scaling training data with
MaxAbsScaler.fit_transform(). |
Y |