SNOW-981562: add way to respect user defined schema before inferring schema#1257
Conversation
…-pandas-to-accept-schema
src/snowflake/snowpark/session.py
Outdated
| return t | ||
| except ProgrammingError as e: | ||
| self._run_query(f"drop table if exists {temp_table_name}") | ||
| logging.debug( |
There was a problem hiding this comment.
This might be better as a warn so that the user is more likely to see it.
| try: | ||
| schema_string = attribute_to_schema_string(schema._to_attributes()) | ||
| self._run_query( | ||
| f"CREATE SCOPED TEMP TABLE {temp_table_name} ({schema_string})" |
There was a problem hiding this comment.
I'm not particularly familiar with scoped temp tables. When do they get cleaned up if not explicitly deleted?
There was a problem hiding this comment.
When run outside of a stored proc, using SCOPED keyword is a no-op. Within stored proc, it is mainly used to limit the snowpark internal created temp object scope. This is not properly documented and I learnt about this by asking out in our slack channels
…-pandas-to-accept-schema
src/snowflake/snowpark/session.py
Outdated
| if isinstance( | ||
| schema, StructType | ||
| ) and self._create_temp_table_for_given_schema(temp_table_name, schema): | ||
| try: | ||
| t = self.write_pandas( | ||
| data, | ||
| temp_table_name, | ||
| database=sf_database, | ||
| schema=sf_schema, |
There was a problem hiding this comment.
I'm wondering if this is still considered a BCR -- let's say previously users specify the schema, but it's not appreciated when input is a pandas DF, so the schema of the result DF is different than what users specified.
however, with the change, it can be the schema users specified.
There was a problem hiding this comment.
this is a good point. We don't respect user schema everytime - it only happens when pandas dataframe can "fit" into user respected schema. But I can dream of a scenario where user specifies a schema for a string type column using StringType(10). Earlier we would create a dataframe with no limit on this column and allow 11+ character string but this will break now. Although this is silly, it could happen.
src/snowflake/snowpark/session.py
Outdated
| # the temp table. If we fail, go back to old method using infer schema. | ||
| if isinstance( | ||
| schema, StructType | ||
| ) and self._create_temp_table_for_given_schema(temp_table_name, schema): |
There was a problem hiding this comment.
should we consider dropping this temp table after evaluation?
There was a problem hiding this comment.
If we are successful at writing pandas data into the temp table, then we will use this table in this session. If we drop it, we won't be able to use the dataframe.
There was a problem hiding this comment.
Now that we have temp table clean-up, we can update Session.write_pandas to clean-up temp table once the usage is done?
There was a problem hiding this comment.
I separated this change out into this PR: #2784
|
Seems like your changes contain some Local Testing changes, please request review from @snowflakedb/local-testing |
…allow-createDataframe-from-pandas-to-accept-schema
|
Seems like your changes contain some Local Testing changes, please request review from @snowflakedb/local-testing |
|
Seems like your changes contain some Local Testing changes, please request review from @snowflakedb/local-testing |
…allow-createDataframe-from-pandas-to-accept-schema
|
Seems like your changes contain some Local Testing changes, please request review from @snowflakedb/local-testing |
a77abd1 to
7e717af
Compare
cacb1bb to
611a768
Compare
Please answer these questions before submitting your pull requests. Thanks!
What GitHub issue is this PR addressing? Make sure that there is an accompanying issue to your PR.
Fixes #SNOW-981562, SNOW-1544694
Fill out the following pre-review checklist:
Please describe how your code solves the related issue.
This PR allows users to specify schema for
session.crate_dataframefrom pandas dataframes. We follow a best effort way to create dataframe using specified schema but fall back to infer-schema if we fail. This is done so we do not introduce a breaking change.