Skip to content

Conversation

tswast
Copy link
Collaborator

@tswast tswast commented Sep 20, 2024

fix!: to_gbq loads unit8 columns to BigQuery INT64 instead of STRING (#814)

fix!: to_gbq loads naive (no timezone) columns to BigQuery DATETIME instead of TIMESTAMP (#814)
fix!: to_gbq loads object column containing bool values to BOOLEAN instead of STRING (#814)
fix!: to_gbq loads object column containing dictionary values to STRUCT instead of STRING (#814)
deps: min pyarrow is now 4.0.0 to support compliant nested types (#814)
Release-As: 0.24.0

Note to Googlers, this copies some pandas -> BigQuery logic from both https://github.com/googleapis/python-bigquery and https://github.com/googleapis/python-bigquery-dataframes as part of an effort to reduce redundancy across code bases. My intention is to make those packages depend on pandas-gbq for pandas -> BigQuery logic.

Thank you for opening a Pull Request! Before submitting your PR, there are a few things you can do to make sure it goes smoothly:

  • Make sure to open an issue as a bug/issue before writing your code! That way we can discuss the change, evaluate designs, and agree on the general idea
  • Ensure the tests and linter pass
  • Code coverage does not decrease (if any source code was changed)
  • Appropriate docs were updated (if necessary)

Fixes #452, #105, #616, #450
🦕

fix!: `to_gbq` loads naive (no timezone) columns to BigQuery DATETIME instead of TIMESTAMP
fix!: `to_gbq` loads object column containing bool values to BOOLEAN instead of STRING
fix!: `to_gbq` loads object column containing dictionary values to STRUCT instead of STRING
@tswast tswast requested review from a team as code owners September 20, 2024 20:53
@tswast tswast requested a review from farhan0102 September 20, 2024 20:53
@product-auto-label product-auto-label bot added size: xl Pull request size is extra large. api: bigquery Issues related to the googleapis/python-bigquery-pandas API. labels Sep 20, 2024
@tswast tswast requested review from chelsea-lin and Linchin and removed request for farhan0102 September 20, 2024 20:56
@tswast
Copy link
Collaborator Author

tswast commented Sep 20, 2024

_________________ ERROR collecting tests/system/test_to_gbq.py _________________
[tests/system/test_to_gbq.py:392](https://cs.corp.google.com/piper///depot/google3/tests/system/test_to_gbq.py?l=392): in <module>
    dtype=pandas.ArrowDtype(
[.nox/system-3-8/lib/python3.8/site-packages/pandas/__init__.py:258](https://cs.corp.google.com/piper///depot/google3/.nox/system-3-8/lib/python3.8/site-packages/pandas/__init__.py?l=258): in __getattr__
    raise AttributeError(f"module 'pandas' has no attribute '{name}'")
E   AttributeError: module 'pandas' has no attribute 'ArrowDtype'

Looks like I need to update the tests to be compatible with older pandas.

@tswast tswast added the owlbot:run Add this label to trigger the Owlbot post processor. label Sep 23, 2024
@gcf-owl-bot gcf-owl-bot bot removed the owlbot:run Add this label to trigger the Owlbot post processor. label Sep 23, 2024
@tswast
Copy link
Collaborator Author

tswast commented Sep 23, 2024

@Linchin @chelsea-lin please take a look. I got it working on Python 3.8 by upgrading the minimum pyarrow version from 3.0.0 (January 2021) to 4.0.0 (April 2021).

Copy link

@chelsea-lin chelsea-lin left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@@ -0,0 +1,156 @@
# Copyright (c) 2019 pandas-gbq Authors All rights reserved.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: 2024

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This contains some code I copied from a file dated to 2019, so I don't think I should update this.

@tswast tswast added the owlbot:run Add this label to trigger the Owlbot post processor. label Sep 23, 2024
@gcf-owl-bot gcf-owl-bot bot removed the owlbot:run Add this label to trigger the Owlbot post processor. label Sep 23, 2024
@tswast tswast merged commit 107bb40 into googleapis:main Sep 23, 2024
25 checks passed
@tswast tswast deleted the b323176126-issue300-streaming-to_gbq branch September 23, 2024 17:18
@@ -1219,9 +1220,16 @@ def _generate_bq_schema(df, default_type="STRING"):
be overridden: https://github.com/pydata/pandas-gbq/issues/218, this
method can be removed after there is time to migrate away from this
method."""
from pandas_gbq import schema
fields = pandas_gbq.schema.pandas_to_bigquery.dataframe_to_bigquery_fields(
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe should have un-deprecated generate_bq_schema for use in bigframes.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
api: bigquery Issues related to the googleapis/python-bigquery-pandas API. size: xl Pull request size is extra large.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

pyarrow.lib.ArrowTypeError: Expected bytes, got a 'dict' object
3 participants