Skip to content

SNOW-981562: add way to respect user defined schema before inferring schema#1257

Open
sfc-gh-aalam wants to merge 18 commits intoSNOW-1830033-dataframe-api-bcrsfrom
aalam-SNOW-981562-allow-createDataframe-from-pandas-to-accept-schema
Open

SNOW-981562: add way to respect user defined schema before inferring schema#1257
sfc-gh-aalam wants to merge 18 commits intoSNOW-1830033-dataframe-api-bcrsfrom
aalam-SNOW-981562-allow-createDataframe-from-pandas-to-accept-schema

Conversation

@sfc-gh-aalam
Copy link
Contributor

@sfc-gh-aalam sfc-gh-aalam commented Feb 17, 2024

Please answer these questions before submitting your pull requests. Thanks!

  1. What GitHub issue is this PR addressing? Make sure that there is an accompanying issue to your PR.

    Fixes #SNOW-981562, SNOW-1544694

  2. Fill out the following pre-review checklist:

    • I am adding a new automated test(s) to verify correctness of my new code
    • I am adding new logging messages
    • I am adding a new telemetry message
    • I am adding new credentials
    • I am adding a new dependency
  3. Please describe how your code solves the related issue.

    This PR allows users to specify schema for session.crate_dataframe from pandas dataframes. We follow a best effort way to create dataframe using specified schema but fall back to infer-schema if we fail. This is done so we do not introduce a breaking change.

@sfc-gh-aalam sfc-gh-aalam marked this pull request as ready for review February 17, 2024 23:07
@sfc-gh-aalam sfc-gh-aalam requested a review from a team as a code owner February 17, 2024 23:07
@sfc-gh-aalam sfc-gh-aalam requested review from a team, sfc-gh-bwarsaw and sfc-gh-smirzaei and removed request for a team and sfc-gh-bwarsaw February 17, 2024 23:07
return t
except ProgrammingError as e:
self._run_query(f"drop table if exists {temp_table_name}")
logging.debug(
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This might be better as a warn so that the user is more likely to see it.

try:
schema_string = attribute_to_schema_string(schema._to_attributes())
self._run_query(
f"CREATE SCOPED TEMP TABLE {temp_table_name} ({schema_string})"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not particularly familiar with scoped temp tables. When do they get cleaned up if not explicitly deleted?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

When run outside of a stored proc, using SCOPED keyword is a no-op. Within stored proc, it is mainly used to limit the snowpark internal created temp object scope. This is not properly documented and I learnt about this by asking out in our slack channels

@sfc-gh-aalam sfc-gh-aalam requested review from a team, sfc-gh-aling and sfc-gh-jdu and removed request for a team March 5, 2024 21:51
Comment on lines +2309 to +2317
if isinstance(
schema, StructType
) and self._create_temp_table_for_given_schema(temp_table_name, schema):
try:
t = self.write_pandas(
data,
temp_table_name,
database=sf_database,
schema=sf_schema,
Copy link
Contributor

@sfc-gh-aling sfc-gh-aling Mar 7, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm wondering if this is still considered a BCR -- let's say previously users specify the schema, but it's not appreciated when input is a pandas DF, so the schema of the result DF is different than what users specified.

however, with the change, it can be the schema users specified.

Copy link
Contributor Author

@sfc-gh-aalam sfc-gh-aalam Mar 7, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this is a good point. We don't respect user schema everytime - it only happens when pandas dataframe can "fit" into user respected schema. But I can dream of a scenario where user specifies a schema for a string type column using StringType(10). Earlier we would create a dataframe with no limit on this column and allow 11+ character string but this will break now. Although this is silly, it could happen.

# the temp table. If we fail, go back to old method using infer schema.
if isinstance(
schema, StructType
) and self._create_temp_table_for_given_schema(temp_table_name, schema):
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

should we consider dropping this temp table after evaluation?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If we are successful at writing pandas data into the temp table, then we will use this table in this session. If we drop it, we won't be able to use the dataframe.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Now that we have temp table clean-up, we can update Session.write_pandas to clean-up temp table once the usage is done?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I separated this change out into this PR: #2784

@sfc-gh-aalam sfc-gh-aalam requested a review from a team as a code owner December 18, 2024 19:01
@github-actions
Copy link

Seems like your changes contain some Local Testing changes, please request review from @snowflakedb/local-testing

@sfc-gh-aalam sfc-gh-aalam changed the base branch from main to SNOW-1830033-dataframe-api-bcrs January 3, 2025 20:52
…allow-createDataframe-from-pandas-to-accept-schema
@github-actions
Copy link

github-actions bot commented Jan 3, 2025

Seems like your changes contain some Local Testing changes, please request review from @snowflakedb/local-testing

@github-actions
Copy link

github-actions bot commented Jan 3, 2025

Seems like your changes contain some Local Testing changes, please request review from @snowflakedb/local-testing

…allow-createDataframe-from-pandas-to-accept-schema
@github-actions
Copy link

github-actions bot commented Jan 8, 2025

Seems like your changes contain some Local Testing changes, please request review from @snowflakedb/local-testing

@sfc-gh-yuwang sfc-gh-yuwang deleted the branch SNOW-1830033-dataframe-api-bcrs February 13, 2025 22:51
@sfc-gh-yuwang sfc-gh-yuwang deleted the aalam-SNOW-981562-allow-createDataframe-from-pandas-to-accept-schema branch February 13, 2025 22:51
@sfc-gh-aalam sfc-gh-aalam restored the aalam-SNOW-981562-allow-createDataframe-from-pandas-to-accept-schema branch February 13, 2025 22:56
@sfc-gh-aalam sfc-gh-aalam reopened this Feb 13, 2025
@sfc-gh-smirzaei sfc-gh-smirzaei removed their request for review July 22, 2025 21:14
@sfc-gh-aalam sfc-gh-aalam force-pushed the aalam-SNOW-981562-allow-createDataframe-from-pandas-to-accept-schema branch from a77abd1 to 7e717af Compare November 12, 2025 23:20
@sfc-gh-aalam sfc-gh-aalam force-pushed the SNOW-1830033-dataframe-api-bcrs branch from cacb1bb to 611a768 Compare November 12, 2025 23:20
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants