Skip to content

perf: Replace expensive len() call with PandasBatches.total_rows in anywidget TableWidget #1937

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 12 commits into
base: main
Choose a base branch
from

Conversation

shuoweil
Copy link
Contributor

perf: Replace expensive len() call with PandasBatches.total_rows in anywidget TableWidget

@shuoweil shuoweil self-assigned this Jul 24, 2025
@shuoweil shuoweil requested review from a team as code owners July 24, 2025 23:22
@shuoweil shuoweil requested a review from GarrettWu July 24, 2025 23:22
Copy link

Check out this pull request on  ReviewNB

See visual diffs & provide feedback on Jupyter Notebooks.


Powered by ReviewNB

@product-auto-label product-auto-label bot added size: m Pull request size is medium. api: bigquery Issues related to the googleapis/python-bigquery-dataframes API. labels Jul 24, 2025
@shuoweil shuoweil requested review from tswast and removed request for GarrettWu July 24, 2025 23:23
@shuoweil shuoweil force-pushed the shuowei-anywidget-remove-len-call branch from aee37a7 to 303c4af Compare July 24, 2025 23:23
@tswast
Copy link
Collaborator

tswast commented Jul 29, 2025

Please also update the benchmarks to use the total_rows parameter.

@shuoweil shuoweil force-pushed the shuowei-anywidget-remove-len-call branch from fc38cf3 to f643cfb Compare July 30, 2025 03:28
@shuoweil
Copy link
Contributor Author

shuoweil commented Jul 30, 2025

Please also update the benchmarks to use the total_rows parameter.

Let's use a separate PR for this request. #1949

@shuoweil shuoweil requested a review from tswast July 30, 2025 03:29
@shuoweil shuoweil force-pushed the shuowei-anywidget-remove-len-call branch 2 times, most recently from e12c8ff to f8ab27b Compare July 30, 2025 22:07
@shuoweil shuoweil added the kokoro:force-run Add this label to force Kokoro to re-run the tests. label Jul 31, 2025
@bigframes-bot bigframes-bot removed the kokoro:force-run Add this label to force Kokoro to re-run the tests. label Jul 31, 2025
@shuoweil shuoweil force-pushed the shuowei-anywidget-remove-len-call branch from f8ab27b to df85824 Compare July 31, 2025 04:32
@shuoweil shuoweil force-pushed the shuowei-anywidget-remove-len-call branch from df85824 to 2756968 Compare August 1, 2025 08:06
@shuoweil shuoweil requested a review from tswast August 1, 2025 08:07
@shuoweil shuoweil force-pushed the shuowei-anywidget-remove-len-call branch 7 times, most recently from 5bc65ce to 26eb25e Compare August 9, 2025 00:18
@shuoweil
Copy link
Contributor Author

Per checking, the failed testcase multimodal_test.py::test_multimodal_dataframe and /tmpfs/src/github/python-bigquery-dataframes/notebooks/location/regionalized.ipynb are not introduced by my change. @tswast

Comment on lines 31 to 33
execute_result = df._block.session._executor.execute(df._block.expr, ordered=True)
execute_result.total_rows or 0
next(iter(df.to_pandas_batches(page_size=PAGE_SIZE)))
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this would cause the query to execute twice. Please revert. Instead, it should match as closely as possible to what happens in TableWidget.

Suggested change
execute_result = df._block.session._executor.execute(df._block.expr, ordered=True)
execute_result.total_rows or 0
next(iter(df.to_pandas_batches(page_size=PAGE_SIZE)))
# Get number of rows (to calculate number of pages) and the first page.
batches = df.to_pandas_batches(page_size=PAGE_SIZE)
first_page = next(iter(batches))
assert first_page is not None
total_rows = batches.total_rows
assert total_rows is not None

@shuoweil shuoweil force-pushed the shuowei-anywidget-remove-len-call branch from 26eb25e to 577ccb9 Compare August 12, 2025 00:22
@shuoweil shuoweil added the kokoro:force-run Add this label to force Kokoro to re-run the tests. label Aug 12, 2025
@bigframes-bot bigframes-bot removed the kokoro:force-run Add this label to force Kokoro to re-run the tests. label Aug 12, 2025
@shuoweil shuoweil force-pushed the shuowei-anywidget-remove-len-call branch from 577ccb9 to 0d6a300 Compare August 13, 2025 17:51
@shuoweil shuoweil added the kokoro:force-run Add this label to force Kokoro to re-run the tests. label Aug 14, 2025
@bigframes-bot bigframes-bot removed the kokoro:force-run Add this label to force Kokoro to re-run the tests. label Aug 14, 2025
Comment on lines 173 to 175
google.api_core.exceptions.GoogleAPICallError,
TypeError,
ValueError,
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we should bubble up these exceptions and show an error message, not hide them.

Please revert.

Suggested change
google.api_core.exceptions.GoogleAPICallError,
TypeError,
ValueError,

Also, can you share what situations happened where these occurred? It may indicate a bug.

df.shape
next(iter(df.to_pandas_batches(page_size=PAGE_SIZE)))
batches = df.to_pandas_batches(page_size=PAGE_SIZE)
next(iter(batches))
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please add an assertion that references batches.total_rows. We want to mimic TableWidget as closely as we can.

For example:

Suggested change
next(iter(batches))
assert batches.total_rows >= 0
next(iter(batches))

Same for the other benchmarks.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This change will trigger a mypy error, to_pandas_batches() actually returns a PandasBatches object that has the total_rows attribute, but the type annotations show it as Iterable[pandas.DataFrame]. I can assert here after casting

@shuoweil shuoweil force-pushed the shuowei-anywidget-remove-len-call branch from 0d6a300 to 00f203e Compare August 14, 2025 21:56
@shuoweil shuoweil requested a review from tswast August 14, 2025 22:13
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
api: bigquery Issues related to the googleapis/python-bigquery-dataframes API. size: m Pull request size is medium.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants