Skip to content

feat: implement _repr_mimebundle_ for DataFrame#6385

Open
Abyss-lord wants to merge 3 commits intoEventual-Inc:mainfrom
Abyss-lord:codex/issue-2598-repr-mimebundle
Open

feat: implement _repr_mimebundle_ for DataFrame#6385
Abyss-lord wants to merge 3 commits intoEventual-Inc:mainfrom
Abyss-lord:codex/issue-2598-repr-mimebundle

Conversation

@Abyss-lord
Copy link
Contributor

Problem

DataFrame does not implement _repr_mimebundle_, so some notebook frontends cannot reliably render rich output.

Root Cause

Only __repr__ and _repr_html_ are implemented today; the IPython mimebundle display protocol is missing.

Solution

  • Added _repr_mimebundle_(include, exclude) to DataFrame.
  • Returns both text/plain and text/html by default.
  • Supports include/exclude filtering.

Tests

  • Added two test groups in tests/dataframe/test_repr.py.
  • Ran -k mimebundle: 8 passed.

Impact

Improves DataFrame display compatibility across notebook frontends without affecting query behavior.

@github-actions github-actions bot added the feat label Mar 12, 2026
@Abyss-lord Abyss-lord mentioned this pull request Mar 12, 2026
@greptile-apps
Copy link
Contributor

greptile-apps bot commented Mar 12, 2026

Greptile Summary

This PR adds _repr_mimebundle_ to DataFrame so that notebook frontends that rely on the IPython MIME-bundle display protocol (rather than _repr_html_ alone) can render DataFrames correctly. The core logic is a straightforward addition that delegates to the existing __repr__ and _repr_html_ methods and applies include/exclude filtering — a well-scoped change that does not affect query execution.

Key points from the review:

  • Unrelated file committed: doc/issue-2598-feasibility.md is a personal planning note written in Chinese and should be removed before merging; the project's documentation lives in docs/, not a new doc/ directory.
  • Eager evaluation: Both __repr__() and _repr_html_() are computed unconditionally before include/exclude filtering. Since _repr_html_() may trigger interactive HTML generation, it should only be called when it will actually appear in the returned bundle.
  • Test organisation: The two new test functions cover the right scenarios but could be merged into a single parametrized test to match the project's convention.

Confidence Score: 3/5

  • Safe to merge after removing the internal planning document and optionally addressing the eager-evaluation inefficiency.
  • The implementation is functionally correct and well-tested. The confidence is lowered slightly because an unintended file (doc/issue-2598-feasibility.md) is included in the PR and should not be committed to the repository, and there is a minor but real inefficiency in eagerly computing both representations before filtering.
  • doc/issue-2598-feasibility.md must be removed; daft/dataframe/dataframe.py lines 692–695 should be reviewed for the eager-evaluation concern.

Important Files Changed

Filename Overview
daft/dataframe/dataframe.py Adds _repr_mimebundle_ method to DataFrame; implementation is correct but eagerly evaluates both representations before filtering, which wastes work when include/exclude would discard one.
doc/issue-2598-feasibility.md Internal Chinese-language feasibility scratch note — should not be committed to the repository; creates a new doc/ directory separate from the established docs/ directory.
tests/dataframe/test_repr.py Adds two test functions for _repr_mimebundle_; coverage is adequate, but two separate functions could be consolidated into a single parametrized test per project convention.

Sequence Diagram

sequenceDiagram
    participant FE as Notebook Frontend
    participant IPy as IPython Display
    participant DF as DataFrame

    FE->>IPy: display(df)
    IPy->>DF: _repr_mimebundle_(include, exclude)
    DF->>DF: __repr__() → text/plain
    DF->>DF: _repr_html_() → text/html
    DF->>DF: filter by include/exclude
    DF-->>IPy: {text/plain: ..., text/html: ...}
    IPy-->>FE: render best matching MIME type
Loading

Last reviewed commit: c191e8b

Comment on lines +1 to +25
# Issue #2598 可行性分析

## 1. issue summary
为 `daft.DataFrame` 增加 `_repr_mimebundle_`,让不支持 `_repr_html_` 的 Jupyter 前端(例如 Zed)也能正确展示 DataFrame。

## 2. root cause
当前 `DataFrame` 只提供 `__repr__` 和 `_repr_html_`,部分前端优先走 `_repr_mimebundle_` 协议,缺失该协议会导致展示降级或不可用。

## 3. expected modification modules
- `daft/dataframe/dataframe.py`
- `tests/dataframe/test_repr.py`

## 4. implementation plan
1. 在 `DataFrame` 上新增 `_repr_mimebundle_(include=None, exclude=None)`。
2. 默认返回 `text/plain` 与 `text/html` 两种 mime。
3. 支持 `include`/`exclude` 过滤,兼容 IPython display 协议。
4. 新增测试覆盖默认返回与 include/exclude 过滤行为。

## 复杂度评估
- 预计修改文件数:2(<20)
- API 设计变更:无(仅新增兼容显示协议方法)
- 架构调整:无
- 多模块重构:无

结论:可直接实现,风险低。
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Internal planning document should not be committed

This file (doc/issue-2598-feasibility.md) is a personal feasibility-analysis scratch note written in Chinese. It is not project documentation and should not be committed to the repository. The project's documentation lives in docs/, not a new doc/ directory.

Please remove this file from the PR before merging.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fix

Comment on lines +692 to +695
mimebundle = {
"text/plain": self.__repr__(),
"text/html": self._repr_html_(),
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Eager evaluation of both representations regardless of include filter

Both __repr__() and _repr_html_() are computed up-front before any include/exclude filtering is applied. This means that even when a caller specifies include={"text/plain"}, the potentially expensive _repr_html_() call (which may generate an interactive HTML widget) is still executed unconditionally.

Consider building the bundle lazily so that only the representations that will survive the filter are generated:

Suggested change
mimebundle = {
"text/plain": self.__repr__(),
"text/html": self._repr_html_(),
}
include_set = set(include) if include is not None else None
exclude_set = set(exclude) if exclude is not None else set()
mimebundle: dict[str, str] = {}
if include_set is None or "text/plain" in include_set:
if "text/plain" not in exclude_set:
mimebundle["text/plain"] = self.__repr__()
if include_set is None or "text/html" in include_set:
if "text/html" not in exclude_set:
mimebundle["text/html"] = self._repr_html_()

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fix

Comment on lines +278 to +293
def test_repr_mimebundle_contains_plain_and_html(make_df):
df = make_df({"A": [1, 2, 3], "B": ["x", "y", "z"]})

bundle = df._repr_mimebundle_()

assert set(bundle.keys()) == {"text/plain", "text/html"}
assert bundle["text/plain"] == df.__repr__()
assert bundle["text/html"] == df._repr_html_()


def test_repr_mimebundle_include_exclude(make_df):
df = make_df({"A": [1]})

assert set(df._repr_mimebundle_(include={"text/plain"}).keys()) == {"text/plain"}
assert set(df._repr_mimebundle_(exclude={"text/html"}).keys()) == {"text/plain"}
assert df._repr_mimebundle_(include={"text/plain"}, exclude={"text/plain"}) == {}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Prefer a single parametrized test over two separate functions

test_repr_mimebundle_contains_plain_and_html and test_repr_mimebundle_include_exclude both exercise _repr_mimebundle_. Following the project convention of consolidating related assertions, consider merging them — or at a minimum unifying the include/exclude cases into a single parametrized test:

@pytest.mark.parametrize(
    "kwargs,expected_keys",
    [
        ({}, {"text/plain", "text/html"}),
        ({"include": {"text/plain"}}, {"text/plain"}),
        ({"exclude": {"text/html"}}, {"text/plain"}),
        ({"include": {"text/plain"}, "exclude": {"text/plain"}}, set()),
    ],
)
def test_repr_mimebundle(make_df, kwargs, expected_keys):
    df = make_df({"A": [1, 2, 3], "B": ["x", "y", "z"]})
    bundle = df._repr_mimebundle_(**kwargs)
    assert set(bundle.keys()) == expected_keys

Rule Used: Prefer single parametrized functions over multiple... (source)

Learnt From
Eventual-Inc/Daft#5207

Note: If this suggestion doesn't match your team's coding style, reply to this and let me know. I'll remember it for next time!

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fix

@codspeed-hq
Copy link

codspeed-hq bot commented Mar 12, 2026

Unable to generate the performance report

There was an internal error while processing the run's data. We're working on fixing the issue. Feel free to contact us on Discord or at support@codspeed.io if the issue persists.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant