-
Notifications
You must be signed in to change notification settings - Fork 53
perf: Replace expensive len() call with PandasBatches.total_rows in anywidget TableWidget #1937
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
Check out this pull request on See visual diffs & provide feedback on Jupyter Notebooks. Powered by ReviewNB |
aee37a7
to
303c4af
Compare
Please also update the benchmarks to use the |
fc38cf3
to
f643cfb
Compare
Let's use a separate PR for this request. #1949 |
e12c8ff
to
f8ab27b
Compare
f8ab27b
to
df85824
Compare
df85824
to
2756968
Compare
5bc65ce
to
26eb25e
Compare
Per checking, the failed testcase multimodal_test.py::test_multimodal_dataframe and /tmpfs/src/github/python-bigquery-dataframes/notebooks/location/regionalized.ipynb are not introduced by my change. @tswast |
execute_result = df._block.session._executor.execute(df._block.expr, ordered=True) | ||
execute_result.total_rows or 0 | ||
next(iter(df.to_pandas_batches(page_size=PAGE_SIZE))) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this would cause the query to execute twice. Please revert. Instead, it should match as closely as possible to what happens in TableWidget.
execute_result = df._block.session._executor.execute(df._block.expr, ordered=True) | |
execute_result.total_rows or 0 | |
next(iter(df.to_pandas_batches(page_size=PAGE_SIZE))) | |
# Get number of rows (to calculate number of pages) and the first page. | |
batches = df.to_pandas_batches(page_size=PAGE_SIZE) | |
first_page = next(iter(batches)) | |
assert first_page is not None | |
total_rows = batches.total_rows | |
assert total_rows is not None |
26eb25e
to
577ccb9
Compare
577ccb9
to
0d6a300
Compare
bigframes/display/anywidget.py
Outdated
google.api_core.exceptions.GoogleAPICallError, | ||
TypeError, | ||
ValueError, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we should bubble up these exceptions and show an error message, not hide them.
Please revert.
google.api_core.exceptions.GoogleAPICallError, | |
TypeError, | |
ValueError, |
Also, can you share what situations happened where these occurred? It may indicate a bug.
df.shape | ||
next(iter(df.to_pandas_batches(page_size=PAGE_SIZE))) | ||
batches = df.to_pandas_batches(page_size=PAGE_SIZE) | ||
next(iter(batches)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please add an assertion that references batches.total_rows
. We want to mimic TableWidget
as closely as we can.
For example:
next(iter(batches)) | |
assert batches.total_rows >= 0 | |
next(iter(batches)) |
Same for the other benchmarks.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This change will trigger a mypy error, to_pandas_batches() actually returns a PandasBatches object that has the total_rows attribute, but the type annotations show it as Iterable[pandas.DataFrame]. I can assert here after casting
0d6a300
to
00f203e
Compare
perf: Replace expensive len() call with PandasBatches.total_rows in anywidget TableWidget