-
Notifications
You must be signed in to change notification settings - Fork 53
perf: Replace expensive len() call with PandasBatches.total_rows in anywidget TableWidget #1937
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Changes from all commits
6b81cc5
cef4df7
aba1889
67ffc96
6211f25
e9f6c92
9fcafe7
43bdcd0
e0d78e0
0881bf6
00f203e
cc7add6
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
Original file line number | Diff line number | Diff line change | ||||||
---|---|---|---|---|---|---|---|---|
|
@@ -12,6 +12,7 @@ | |||||||
# See the License for the specific language governing permissions and | ||||||||
# limitations under the License. | ||||||||
import pathlib | ||||||||
import typing | ||||||||
|
||||||||
import benchmark.utils as utils | ||||||||
|
||||||||
|
@@ -26,8 +27,9 @@ def aggregate_output(*, project_id, dataset_id, table_id): | |||||||
df = bpd._read_gbq_colab(f"SELECT * FROM `{project_id}`.{dataset_id}.{table_id}") | ||||||||
|
||||||||
# Simulate getting the first page, since we'll always do that first in the UI. | ||||||||
df.shape | ||||||||
next(iter(df.to_pandas_batches(page_size=PAGE_SIZE))) | ||||||||
batches = df.to_pandas_batches(page_size=PAGE_SIZE) | ||||||||
assert typing.cast(typing.Any, batches).total_rows >= 0 | ||||||||
next(iter(batches)) | ||||||||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Please add an assertion that references For example:
Suggested change
Same for the other benchmarks. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. This change will trigger a mypy error, to_pandas_batches() actually returns a PandasBatches object that has the total_rows attribute, but the type annotations show it as Iterable[pandas.DataFrame]. I can assert here after casting |
||||||||
|
||||||||
# To simulate very small rows that can only fit a boolean, | ||||||||
# some tables don't have an integer column. If an integer column is available, | ||||||||
|
@@ -43,8 +45,8 @@ def aggregate_output(*, project_id, dataset_id, table_id): | |||||||
.sum(numeric_only=True) | ||||||||
) | ||||||||
|
||||||||
df_aggregated.shape | ||||||||
next(iter(df_aggregated.to_pandas_batches(page_size=PAGE_SIZE))) | ||||||||
batches_aggregated = df_aggregated.to_pandas_batches(page_size=PAGE_SIZE) | ||||||||
next(iter(batches_aggregated)) | ||||||||
|
||||||||
|
||||||||
if __name__ == "__main__": | ||||||||
|
Uh oh!
There was an error while loading. Please reload this page.