Skip to content

Commit c9883a5

Browse files
authored
Merge branch 'main' into start-index-2
2 parents 810e8dd + b140fca commit c9883a5

39 files changed

+762
-301
lines changed

.github/.OwlBot.lock.yaml

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -13,5 +13,5 @@
1313
# limitations under the License.
1414
docker:
1515
image: gcr.io/cloud-devrel-public-resources/owlbot-python:latest
16-
digest: sha256:a7aef70df5f13313ddc027409fc8f3151422ec2a57ac8730fce8fa75c060d5bb
17-
# created: 2025-04-10T17:00:10.042601326Z
16+
digest: sha256:3b3a31be60853477bc39ed8d9bac162cac3ba083724cecaad54eb81d4e4dae9c
17+
# created: 2025-04-16T22:40:03.123475241Z

.github/workflows/unittest.yml

Lines changed: 89 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,89 @@
1+
on:
2+
pull_request:
3+
branches:
4+
- main
5+
name: unittest
6+
jobs:
7+
unit:
8+
# Use `ubuntu-latest` runner.
9+
runs-on: ubuntu-latest
10+
strategy:
11+
matrix:
12+
python: ['3.9', '3.11', '3.12', '3.13']
13+
steps:
14+
- name: Checkout
15+
uses: actions/checkout@v4
16+
- name: Setup Python
17+
uses: actions/setup-python@v5
18+
with:
19+
python-version: ${{ matrix.python }}
20+
- name: Install nox
21+
run: |
22+
python -m pip install --upgrade setuptools pip wheel
23+
python -m pip install nox
24+
- name: Run unit tests
25+
env:
26+
COVERAGE_FILE: .coverage-${{ matrix.python }}
27+
run: |
28+
nox -s unit-${{ matrix.python }}
29+
- name: Upload coverage results
30+
uses: actions/upload-artifact@v4
31+
with:
32+
name: coverage-artifact-${{ matrix.python }}
33+
path: .coverage-${{ matrix.python }}
34+
include-hidden-files: true
35+
36+
unit_noextras:
37+
# Use `ubuntu-latest` runner.
38+
runs-on: ubuntu-latest
39+
strategy:
40+
matrix:
41+
python: ['3.9', '3.13']
42+
steps:
43+
- name: Checkout
44+
uses: actions/checkout@v4
45+
- name: Setup Python
46+
uses: actions/setup-python@v5
47+
with:
48+
python-version: ${{ matrix.python }}
49+
- name: Install nox
50+
run: |
51+
python -m pip install --upgrade setuptools pip wheel
52+
python -m pip install nox
53+
- name: Run unit_noextras tests
54+
env:
55+
COVERAGE_FILE: .coverage-unit-noextras-${{ matrix.python }}
56+
run: |
57+
nox -s unit_noextras-${{ matrix.python }}
58+
- name: Upload coverage results
59+
uses: actions/upload-artifact@v4
60+
with:
61+
name: coverage-artifact-unit-noextras-${{ matrix.python }}
62+
path: .coverage-unit-noextras-${{ matrix.python }}
63+
include-hidden-files: true
64+
65+
cover:
66+
runs-on: ubuntu-latest
67+
needs:
68+
- unit
69+
- unit_noextras
70+
steps:
71+
- name: Checkout
72+
uses: actions/checkout@v4
73+
- name: Setup Python
74+
uses: actions/setup-python@v5
75+
with:
76+
python-version: "3.9"
77+
- name: Install coverage
78+
run: |
79+
python -m pip install --upgrade setuptools pip wheel
80+
python -m pip install coverage
81+
- name: Download coverage results
82+
uses: actions/download-artifact@v4
83+
with:
84+
path: .coverage-results/
85+
- name: Report coverage results
86+
run: |
87+
find .coverage-results -type f -name '*.zip' -exec unzip {} \;
88+
coverage combine .coverage-results/**/.coverage*
89+
coverage report --show-missing --fail-under=100

CHANGELOG.md

Lines changed: 34 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -5,6 +5,40 @@
55
[1]: https://pypi.org/project/google-cloud-bigquery/#history
66

77

8+
## [3.33.0](https://github.com/googleapis/python-bigquery/compare/v3.32.0...v3.33.0) (2025-05-19)
9+
10+
11+
### Features
12+
13+
* Add ability to set autodetect_schema query param in update_table ([#2171](https://github.com/googleapis/python-bigquery/issues/2171)) ([57f940d](https://github.com/googleapis/python-bigquery/commit/57f940d957613b4d80fb81ea40a1177b73856189))
14+
* Add dtype parameters to to_geodataframe functions ([#2176](https://github.com/googleapis/python-bigquery/issues/2176)) ([ebfd0a8](https://github.com/googleapis/python-bigquery/commit/ebfd0a83d43bcb96f65f5669437220aa6138b766))
15+
* Support job reservation ([#2186](https://github.com/googleapis/python-bigquery/issues/2186)) ([cb646ce](https://github.com/googleapis/python-bigquery/commit/cb646ceea172bf199f366ae0592546dff2d3bcb2))
16+
17+
18+
### Bug Fixes
19+
20+
* Ensure AccessEntry equality and repr uses the correct `entity_type` ([#2182](https://github.com/googleapis/python-bigquery/issues/2182)) ([0217637](https://github.com/googleapis/python-bigquery/commit/02176377d5e2fc25b5cd4f46aa6ebfb1b6a960a6))
21+
* Ensure SchemaField.field_dtype returns a string ([#2188](https://github.com/googleapis/python-bigquery/issues/2188)) ([7ec2848](https://github.com/googleapis/python-bigquery/commit/7ec2848379d5743bbcb36700a1153540c451e0e0))
22+
23+
## [3.32.0](https://github.com/googleapis/python-bigquery/compare/v3.31.0...v3.32.0) (2025-05-12)
24+
25+
26+
### Features
27+
28+
* Add dataset access policy version attribute ([#2169](https://github.com/googleapis/python-bigquery/issues/2169)) ([b7656b9](https://github.com/googleapis/python-bigquery/commit/b7656b97c1bd6c204d0508b1851d114719686655))
29+
* Add preview support for incremental results ([#2145](https://github.com/googleapis/python-bigquery/issues/2145)) ([22b80bb](https://github.com/googleapis/python-bigquery/commit/22b80bba9d0bed319fd3102e567906c9b458dd02))
30+
* Add WRITE_TRUNCATE_DATA enum ([#2166](https://github.com/googleapis/python-bigquery/issues/2166)) ([4692747](https://github.com/googleapis/python-bigquery/commit/46927479085f13fd326e3f2388f60dfdd37f7f69))
31+
* Adds condition class and assoc. unit tests ([#2159](https://github.com/googleapis/python-bigquery/issues/2159)) ([a69d6b7](https://github.com/googleapis/python-bigquery/commit/a69d6b796d2edb6ba453980c9553bc9b206c5a6e))
32+
* Support BigLakeConfiguration (managed Iceberg tables) ([#2162](https://github.com/googleapis/python-bigquery/issues/2162)) ([a1c8e9a](https://github.com/googleapis/python-bigquery/commit/a1c8e9aaf60986924868d54a0ab0334e77002a39))
33+
* Update the AccessEntry class with a new condition attribute and unit tests ([#2163](https://github.com/googleapis/python-bigquery/issues/2163)) ([7301667](https://github.com/googleapis/python-bigquery/commit/7301667272dfbdd04b1a831418a9ad2d037171fb))
34+
35+
36+
### Bug Fixes
37+
38+
* `query()` now warns when `job_id` is set and the default `job_retry` is ignored ([#2167](https://github.com/googleapis/python-bigquery/issues/2167)) ([ca1798a](https://github.com/googleapis/python-bigquery/commit/ca1798aaee2d5905fe688d3097f8ee5c989da333))
39+
* Empty record dtypes ([#2147](https://github.com/googleapis/python-bigquery/issues/2147)) ([77d7173](https://github.com/googleapis/python-bigquery/commit/77d71736fcc006d3ab8f8ba17955ad5f06e21876))
40+
* Table iterator should not use bqstorage when page_size is not None ([#2154](https://github.com/googleapis/python-bigquery/issues/2154)) ([e89a707](https://github.com/googleapis/python-bigquery/commit/e89a707b162182ededbf94cc9a0f7594bc2be475))
41+
842
## [3.31.0](https://github.com/googleapis/python-bigquery/compare/v3.30.0...v3.31.0) (2025-03-20)
943

1044

docs/conf.py

Lines changed: 1 addition & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -61,7 +61,7 @@
6161

6262
# autodoc/autosummary flags
6363
autoclass_content = "both"
64-
autodoc_default_options = {"members": True, "inherited-members": True}
64+
autodoc_default_options = {"members": True}
6565
autosummary_generate = True
6666

6767

@@ -109,7 +109,6 @@
109109
# List of patterns, relative to source directory, that match files and
110110
# directories to ignore when looking for source files.
111111
exclude_patterns = [
112-
"google/cloud/bigquery_v2/**", # Legacy proto-based types.
113112
"_build",
114113
"**/.nox/**/*",
115114
"samples/AUTHORING_GUIDE.md",

google/cloud/bigquery/_pandas_helpers.py

Lines changed: 66 additions & 88 deletions
Original file line numberDiff line numberDiff line change
@@ -508,31 +508,37 @@ def dataframe_to_bq_schema(dataframe, bq_schema):
508508
bq_schema_unused = set()
509509

510510
bq_schema_out = []
511-
unknown_type_fields = []
512-
511+
unknown_type_columns = []
512+
dataframe_reset_index = dataframe.reset_index()
513513
for column, dtype in list_columns_and_indexes(dataframe):
514-
# Use provided type from schema, if present.
514+
# Step 1: use provided type from schema, if present.
515515
bq_field = bq_schema_index.get(column)
516516
if bq_field:
517517
bq_schema_out.append(bq_field)
518518
bq_schema_unused.discard(bq_field.name)
519519
continue
520520

521-
# Otherwise, try to automatically determine the type based on the
521+
# Step 2: try to automatically determine the type based on the
522522
# pandas dtype.
523523
bq_type = _PANDAS_DTYPE_TO_BQ.get(dtype.name)
524524
if bq_type is None:
525-
sample_data = _first_valid(dataframe.reset_index()[column])
525+
sample_data = _first_valid(dataframe_reset_index[column])
526526
if (
527527
isinstance(sample_data, _BaseGeometry)
528528
and sample_data is not None # Paranoia
529529
):
530530
bq_type = "GEOGRAPHY"
531-
bq_field = schema.SchemaField(column, bq_type)
532-
bq_schema_out.append(bq_field)
531+
if bq_type is not None:
532+
bq_schema_out.append(schema.SchemaField(column, bq_type))
533+
continue
534+
535+
# Step 3: try with pyarrow if available
536+
bq_field = _get_schema_by_pyarrow(column, dataframe_reset_index[column])
537+
if bq_field is not None:
538+
bq_schema_out.append(bq_field)
539+
continue
533540

534-
if bq_field.field_type is None:
535-
unknown_type_fields.append(bq_field)
541+
unknown_type_columns.append(column)
536542

537543
# Catch any schema mismatch. The developer explicitly asked to serialize a
538544
# column, but it was not found.
@@ -543,98 +549,70 @@ def dataframe_to_bq_schema(dataframe, bq_schema):
543549
)
544550
)
545551

546-
# If schema detection was not successful for all columns, also try with
547-
# pyarrow, if available.
548-
if unknown_type_fields:
549-
if not pyarrow:
550-
msg = "Could not determine the type of columns: {}".format(
551-
", ".join(field.name for field in unknown_type_fields)
552-
)
553-
warnings.warn(msg)
554-
return None # We cannot detect the schema in full.
555-
556-
# The augment_schema() helper itself will also issue unknown type
557-
# warnings if detection still fails for any of the fields.
558-
bq_schema_out = augment_schema(dataframe, bq_schema_out)
552+
if unknown_type_columns != []:
553+
msg = "Could not determine the type of columns: {}".format(
554+
", ".join(unknown_type_columns)
555+
)
556+
warnings.warn(msg)
557+
return None # We cannot detect the schema in full.
559558

560-
return tuple(bq_schema_out) if bq_schema_out else None
559+
return tuple(bq_schema_out)
561560

562561

563-
def augment_schema(dataframe, current_bq_schema):
564-
"""Try to deduce the unknown field types and return an improved schema.
562+
def _get_schema_by_pyarrow(name, series):
563+
"""Attempt to detect the type of the given series by leveraging PyArrow's
564+
type detection capabilities.
565565
566-
This function requires ``pyarrow`` to run. If all the missing types still
567-
cannot be detected, ``None`` is returned. If all types are already known,
568-
a shallow copy of the given schema is returned.
566+
This function requires the ``pyarrow`` library to be installed and
567+
available. If the series type cannot be determined or ``pyarrow`` is not
568+
available, ``None`` is returned.
569569
570570
Args:
571-
dataframe (pandas.DataFrame):
572-
DataFrame for which some of the field types are still unknown.
573-
current_bq_schema (Sequence[google.cloud.bigquery.schema.SchemaField]):
574-
A BigQuery schema for ``dataframe``. The types of some or all of
575-
the fields may be ``None``.
571+
name (str):
572+
the column name of the SchemaField.
573+
series (pandas.Series):
574+
The Series data for which to detect the data type.
576575
Returns:
577-
Optional[Sequence[google.cloud.bigquery.schema.SchemaField]]
576+
Optional[google.cloud.bigquery.schema.SchemaField]:
577+
A tuple containing the BigQuery-compatible type string (e.g.,
578+
"STRING", "INTEGER", "TIMESTAMP", "DATETIME", "NUMERIC", "BIGNUMERIC")
579+
and the mode string ("NULLABLE", "REPEATED").
580+
Returns ``None`` if the type cannot be determined or ``pyarrow``
581+
is not imported.
578582
"""
579-
# pytype: disable=attribute-error
580-
augmented_schema = []
581-
unknown_type_fields = []
582-
for field in current_bq_schema:
583-
if field.field_type is not None:
584-
augmented_schema.append(field)
585-
continue
586-
587-
arrow_table = pyarrow.array(dataframe.reset_index()[field.name])
588-
589-
if pyarrow.types.is_list(arrow_table.type):
590-
# `pyarrow.ListType`
591-
detected_mode = "REPEATED"
592-
detected_type = _pyarrow_helpers.arrow_scalar_ids_to_bq(
593-
arrow_table.values.type.id
594-
)
595-
596-
# For timezone-naive datetimes, pyarrow assumes the UTC timezone and adds
597-
# it to such datetimes, causing them to be recognized as TIMESTAMP type.
598-
# We thus additionally check the actual data to see if we need to overrule
599-
# that and choose DATETIME instead.
600-
# Note that this should only be needed for datetime values inside a list,
601-
# since scalar datetime values have a proper Pandas dtype that allows
602-
# distinguishing between timezone-naive and timezone-aware values before
603-
# even requiring the additional schema augment logic in this method.
604-
if detected_type == "TIMESTAMP":
605-
valid_item = _first_array_valid(dataframe[field.name])
606-
if isinstance(valid_item, datetime) and valid_item.tzinfo is None:
607-
detected_type = "DATETIME"
608-
else:
609-
detected_mode = field.mode
610-
detected_type = _pyarrow_helpers.arrow_scalar_ids_to_bq(arrow_table.type.id)
611-
if detected_type == "NUMERIC" and arrow_table.type.scale > 9:
612-
detected_type = "BIGNUMERIC"
613583

614-
if detected_type is None:
615-
unknown_type_fields.append(field)
616-
continue
584+
if not pyarrow:
585+
return None
617586

618-
new_field = schema.SchemaField(
619-
name=field.name,
620-
field_type=detected_type,
621-
mode=detected_mode,
622-
description=field.description,
623-
fields=field.fields,
624-
)
625-
augmented_schema.append(new_field)
587+
arrow_table = pyarrow.array(series)
588+
if pyarrow.types.is_list(arrow_table.type):
589+
# `pyarrow.ListType`
590+
mode = "REPEATED"
591+
type = _pyarrow_helpers.arrow_scalar_ids_to_bq(arrow_table.values.type.id)
592+
593+
# For timezone-naive datetimes, pyarrow assumes the UTC timezone and adds
594+
# it to such datetimes, causing them to be recognized as TIMESTAMP type.
595+
# We thus additionally check the actual data to see if we need to overrule
596+
# that and choose DATETIME instead.
597+
# Note that this should only be needed for datetime values inside a list,
598+
# since scalar datetime values have a proper Pandas dtype that allows
599+
# distinguishing between timezone-naive and timezone-aware values before
600+
# even requiring the additional schema augment logic in this method.
601+
if type == "TIMESTAMP":
602+
valid_item = _first_array_valid(series)
603+
if isinstance(valid_item, datetime) and valid_item.tzinfo is None:
604+
type = "DATETIME"
605+
else:
606+
mode = "NULLABLE" # default mode
607+
type = _pyarrow_helpers.arrow_scalar_ids_to_bq(arrow_table.type.id)
608+
if type == "NUMERIC" and arrow_table.type.scale > 9:
609+
type = "BIGNUMERIC"
626610

627-
if unknown_type_fields:
628-
warnings.warn(
629-
"Pyarrow could not determine the type of columns: {}.".format(
630-
", ".join(field.name for field in unknown_type_fields)
631-
)
632-
)
611+
if type is not None:
612+
return schema.SchemaField(name, type, mode)
613+
else:
633614
return None
634615

635-
return augmented_schema
636-
# pytype: enable=attribute-error
637-
638616

639617
def dataframe_to_arrow(dataframe, bq_schema):
640618
"""Convert pandas dataframe to Arrow table, using BigQuery schema.

google/cloud/bigquery/client.py

Lines changed: 11 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -1389,6 +1389,7 @@ def update_table(
13891389
self,
13901390
table: Table,
13911391
fields: Sequence[str],
1392+
autodetect_schema: bool = False,
13921393
retry: retries.Retry = DEFAULT_RETRY,
13931394
timeout: TimeoutType = DEFAULT_TIMEOUT,
13941395
) -> Table:
@@ -1419,6 +1420,10 @@ def update_table(
14191420
fields (Sequence[str]):
14201421
The fields of ``table`` to change, spelled as the
14211422
:class:`~google.cloud.bigquery.table.Table` properties.
1423+
autodetect_schema (bool):
1424+
Specifies if the schema of the table should be autodetected when
1425+
updating the table from the underlying source. Only applicable
1426+
for external tables.
14221427
retry (Optional[google.api_core.retry.Retry]):
14231428
A description of how to retry the API call.
14241429
timeout (Optional[float]):
@@ -1438,12 +1443,18 @@ def update_table(
14381443
path = table.path
14391444
span_attributes = {"path": path, "fields": fields}
14401445

1446+
if autodetect_schema:
1447+
query_params = {"autodetect_schema": True}
1448+
else:
1449+
query_params = {}
1450+
14411451
api_response = self._call_api(
14421452
retry,
14431453
span_name="BigQuery.updateTable",
14441454
span_attributes=span_attributes,
14451455
method="PATCH",
14461456
path=path,
1457+
query_params=query_params,
14471458
data=partial,
14481459
headers=headers,
14491460
timeout=timeout,

0 commit comments

Comments
 (0)