-
Notifications
You must be signed in to change notification settings - Fork 168
feat: Use _repr_html_
when native supports it
#2776
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. Weβll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
- Related #1702 - https://ipython.readthedocs.io/en/stable/config/integrating.html#rich-display - https://github.com/pandas-dev/pandas/blob/22f12fc5d3f7fda3f198760204e7c13150c78581/pandas/core/frame.py#L1189-L1232 - https://github.com/pola-rs/polars/blob/8011fa34e0c5f1270ef52e2d3b0b2946bb2faa72/py-polars/polars/dataframe/frame.py#L1580-L1605
style_css = ( | ||
".dataframe caption { " | ||
"caption-side: bottom; " | ||
"text-align: center; " | ||
"font-weight: bold; " | ||
"padding-top: 8px;" | ||
"}" | ||
) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If anyone has any suggestions for styling - feel free to experiment/comment π
The only decision I'd made so far was putting the <caption>
below the table
With the default polars
formatting, it appeared between the table and the shape tuple when above - which I thought looked odd
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That's very reasonable!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
- `pandas` reuses the eager version - `pyarrow` doesn't support - `ibis` requires changing global config, so skipping that - `dask` does have a `_repr_html_`, but doesn't parse well
narwhals/_utils.py
Outdated
if header == "Narwhals LazyFrame" and "LazyFrame" in native_html: | ||
html = native_html.replace("LazyFrame", "LazyFrame.to_native()") | ||
return f"{html}<p><b>{header}</b></p>" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Had to add this branch for pl.LazyFrame
as it wasn't parsing with my naive wrapper:
import io
import xml.etree.ElementTree as ET
import polars as pl
data = {"a": [1, 2, 3], "b": ["fdaf", "fda", "cf"]}
ldf = pl.LazyFrame(data)
>>> ET.parse(io.StringIO(ldf._repr_html_()))
ParseError: junk after document element: line 1, column 25
Seems to fail on the first <p>
in https://github.com/pola-rs/polars/blob/dfa5efe71156c654a1ba3a54b865eae723a818e9/py-polars/polars/lazyframe/frame.py#L783
- `pandas` only supports it for `pd.DataFrame`
Possible follow-upsJust some loose ideas, nothing I'm planning to work on any time soon π
|
Thought I'd do one last check before closing this one, here's a few options to choose from:
No worries if we don't want it π |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@dangotbanned thank to your ping - I got reminded that I once started to look at this, and never finished. I am not against having a good support, yet I am not very useful nor opinionated about this.
I would try to aim for a pareto optimum that balances usefulness and maintainability π
narwhals/_utils.py
Outdated
header: Literal["Narwhals DataFrame", "Narwhals LazyFrame", "Narwhals Series"], | ||
native_html: str, | ||
) -> str | None: # pragma: no cover | ||
if header == "Narwhals LazyFrame" and "LazyFrame" in native_html: | ||
html = native_html.replace("LazyFrame", "LazyFrame.to_native()") | ||
return f"{html}<p><b>{header}</b></p>" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I am mostly nitpicking here but... isn't the header
actually a footer
? π
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You're quite right π
It started as a header
until I ran into (#2776 (comment))
I should've updated that to footer
or caption
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
tree.getroot().insert(0, style) | ||
buf = io.BytesIO() | ||
tree.write(buf, "utf-8", method="html") | ||
return buf.getvalue().decode() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Everything else in this function is a new language to me - I am not very helpful
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah yeah xml.etree.elementtree
is a bit of a strange one
I had to learn a bit of lxml
once to fix a particularly broken file.
The API of that is based on this stdlib module, but was more ergonoic than this mess π
To simplify this:
- Element: Is a HTML Element
- Tree: Refers to a document/webpage, but in this case it is just a table
So I'm essentially doing a fancy find/replace, but trying to preserve the structure of the document
style_css = ( | ||
".dataframe caption { " | ||
"caption-side: bottom; " | ||
"text-align: center; " | ||
"font-weight: bold; " | ||
"padding-top: 8px;" | ||
"}" | ||
) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That's very reasonable!
I've started #2925 and came up against the import pyarrow as pa
import narwhals as nw
>>> nw.Series.from_iterable("a", [4, 1, 3, 2], dtype=nw.UInt32, backend=pa)
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
| Narwhals Series |
|-------------------------------------------------------|
|<pyarrow.lib.ChunkedArray object at 0x0000017129497880>|
|[ |
| [ |
| 4, |
| 1, |
| 3, |
| 2 |
| ] |
|] |
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββ Even if we don't go ahead with The >>> nw.Series.from_iterable("a", [4, 1, 3, 2], dtype=nw.UInt32, backend="polars")
βββββββββββββββββββ
| Narwhals Series |
|-----------------|
|shape: (4,) |
|Series: 'a' [u32]|
|[ |
| 4 |
| 1 |
| 3 |
| 2 |
|] |
βββββββββββββββββββ |
If we just wanted shape: (365,)
dtype: Datetime(time_unit='us', time_zone=None)
name: 'time series'
nw.Series[pyarrow]
[
2009-01-02 00:00:00
2009-01-03 00:00:00
2009-01-04 00:00:00
2009-01-05 00:00:00
2009-01-06 00:00:00
β¦
2009-12-28 00:00:00
2009-12-29 00:00:00
2009-12-30 00:00:00
2009-12-31 00:00:00
2010-01-01 00:00:00
] shape: (30,)
dtype: UInt32
name: 'lower max rows'
nw.Series[pyarrow]
[
0
1
2
β¦
27
28
29
] shape: (30,)
dtype: Int16
name: 'oh pandas too???'
nw.Series[pandas]
[
29
28
27
26
25
24
β¦
5
4
3
2
1
0
] Would be nicer-er if we used the short type codes from |
What type of PR is this? (check all applicable)
Related issues
Checklist
If you have comments or can explain your changes, please do so below
I discovered this method in (#2572), when I was trying to work out why
polars.Expr
looked so much better that what I had πThinking we can get more immediate benefits now by allowing this option when a backend supports it for:
DataFrame
pandas
,polars
)LazyFrame
pandas
,polars
)Series
,pandas
polars
)