Skip to content
12 changes: 10 additions & 2 deletions bigframes/display/anywidget.py
Original file line number Diff line number Diff line change
Expand Up @@ -245,7 +245,7 @@ def _cached_data(self) -> pd.DataFrame:
"""Combine all cached batches into a single DataFrame."""
if not self._cached_batches:
return pd.DataFrame(columns=self._dataframe.columns)
return pd.concat(self._cached_batches, ignore_index=True)
return pd.concat(self._cached_batches)

def _reset_batch_cache(self) -> None:
"""Resets batch caching attributes."""
Expand Down Expand Up @@ -294,7 +294,15 @@ def _set_table_html(self) -> None:
break

# Get the data for the current page
page_data = cached_data.iloc[start:end]
page_data = cached_data.iloc[start:end].copy()

# Handle index display
if page_data.index.name is not None:
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would it be possible to check len(df.index.names) != 0 on the original BigFrames DataFrame instead? Most of the time the no-name index is going to be the default range index, but not always.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actually, it looks like df.index.names won't work in partial ordering mode.

@validations.requires_index

I'll try to figure out the correct alternative and report back.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

df._block.has_index

Copy link
Contributor Author

@shuoweil shuoweil Dec 12, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks again for the excellent feedback, @tswast. Your suggestion to use df._block.has_index was precisely what was needed to avoid the expensive index materialization on the full BigFrames DataFrame.

I've implemented the change to use self._dataframe._block.has_index. However, I found that an additional check was needed to fully resolve the issue and pass test_widget_with_default_index_should_display_row_column.

The full condition I've used is:
if self._dataframe._block.has_index and page_data.index.name is not None:

To clarify the inclusion of page_data.index.name is not None: this part of the check is performed on the small, local pandas page_data slice. It's a cheap operation because page_data is already in memory. Its purpose is to correctly handle the display formatting in all scenarios.

Specifically, while self._dataframe._block.has_index correctly tells us if a custom index exists on the BigFrames DataFrame, that index might still be an unnamed default (like a RangeIndex). In such cases, page_data.index.name would be None.

The combined check ensures that we only display a named custom index when both conditions are met:

  1. A custom index has been defined on the BigFrames DataFrame (self._dataframe._block.has_index).
  2. That index actually has a name (page_data.index.name is not None) to use as a header.

Otherwise, it correctly falls back to displaying the generic "Row" column. This makes the overall index display logic robust and accurate, reflecting the user's intent without triggering expensive BigQuery operations.

Let me know if this approach looks good to you.

# Custom named index - include it with its actual name
page_data.insert(0, page_data.index.name, page_data.index)
else:
# Default index - include as "Row" column
page_data.insert(0, "Row", range(start + 1, start + len(page_data) + 1))

# Handle case where user navigated beyond available data with unknown row count
is_unknown_count = self.row_count is None
Expand Down
Loading