Skip to content

Conversation

dangotbanned
Copy link
Member

@dangotbanned dangotbanned commented Jul 3, 2025

What type of PR is this? (check all applicable)

  • πŸ’Ύ Refactor
  • ✨ Feature
  • πŸ› Bug Fix
  • πŸ”§ Optimization
  • πŸ“ Documentation
  • βœ… Test
  • 🐳 Other

Related issues

Checklist

  • Code follows style guide (ruff)
  • Tests added
  • Documented the changes

If you have comments or can explain your changes, please do so below

I discovered this method in (#2572), when I was trying to work out why polars.Expr looked so much better that what I had πŸ˜…

Thinking we can get more immediate benefits now by allowing this option when a backend supports it for:

  • DataFrame
    • (pandas, polars)

image

  • LazyFrame
    • (pandas, polars)

image

  • Series
    • (pandas, polars)

image

@dangotbanned dangotbanned added enhancement New feature or request pandas-like Issue is related to pandas-like backends polars Issue is related to polars backend labels Jul 3, 2025
Comment on lines 1586 to 1593
style_css = (
".dataframe caption { "
"caption-side: bottom; "
"text-align: center; "
"font-weight: bold; "
"padding-top: 8px;"
"}"
)
Copy link
Member Author

@dangotbanned dangotbanned Jul 3, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If anyone has any suggestions for styling - feel free to experiment/comment πŸ™‚

The only decision I'd made so far was putting the <caption> below the table

With the default polars formatting, it appeared between the table and the shape tuple when above - which I thought looked odd

image

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's very reasonable!

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Realized I never followed this up with an example

Now that I'm looking at it again, maybe above isn't so bad?

image

- `pandas` reuses the eager version
- `pyarrow` doesn't support
- `ibis` requires changing global config, so skipping that
- `dask` does have a `_repr_html_`, but doesn't parse well
Comment on lines 1586 to 1588
if header == "Narwhals LazyFrame" and "LazyFrame" in native_html:
html = native_html.replace("LazyFrame", "LazyFrame.to_native()")
return f"{html}<p><b>{header}</b></p>"
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Had to add this branch for pl.LazyFrame as it wasn't parsing with my naive wrapper:

import io
import xml.etree.ElementTree as ET

import polars as pl

data = {"a": [1, 2, 3], "b": ["fdaf", "fda", "cf"]}
ldf = pl.LazyFrame(data)

>>> ET.parse(io.StringIO(ldf._repr_html_()))
ParseError: junk after document element: line 1, column 25

Seems to fail on the first <p> in https://github.com/pola-rs/polars/blob/dfa5efe71156c654a1ba3a54b865eae723a818e9/py-polars/polars/lazyframe/frame.py#L783

- `pandas` only supports it for `pd.DataFrame`
@dangotbanned
Copy link
Member Author

dangotbanned commented Jul 9, 2025

Possible follow-ups

Just some loose ideas, nothing I'm planning to work on any time soon πŸ˜…

@dangotbanned dangotbanned marked this pull request as ready for review July 9, 2025 16:53
@dangotbanned
Copy link
Member Author

@MarcoGorelli, @FBruzzesi

Thought I'd do one last check before closing this one, here's a few options to choose from:

  1. Don't support this
  2. Do it, but less (defer entirely to polars, pandas)
  3. Do it, but style differently (feat: Use _repr_html_ when native supports itΒ #2776 (comment))
  4. Do it, but increase the scope and make pyarrow + pd.Series look pretty too (feat: Use _repr_html_ when native supports itΒ #2776 (comment))

No worries if we don't want it πŸ™‚

Copy link
Member

@FBruzzesi FBruzzesi left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@dangotbanned thank to your ping - I got reminded that I once started to look at this, and never finished. I am not against having a good support, yet I am not very useful nor opinionated about this.

I would try to aim for a pareto optimum that balances usefulness and maintainability πŸ˜‚

Comment on lines 1584 to 1589
header: Literal["Narwhals DataFrame", "Narwhals LazyFrame", "Narwhals Series"],
native_html: str,
) -> str | None: # pragma: no cover
if header == "Narwhals LazyFrame" and "LazyFrame" in native_html:
html = native_html.replace("LazyFrame", "LazyFrame.to_native()")
return f"{html}<p><b>{header}</b></p>"
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am mostly nitpicking here but... isn't the header actually a footer? πŸ˜‚

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You're quite right πŸ˜‚

It started as a header until I ran into (#2776 (comment))

I should've updated that to footer or caption

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh I forgot, the name header actually came from generate_repr

def generate_repr(header: str, native_repr: str) -> str:

Anyway - updated it in (57e333d)

tree.getroot().insert(0, style)
buf = io.BytesIO()
tree.write(buf, "utf-8", method="html")
return buf.getvalue().decode()
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Everything else in this function is a new language to me - I am not very helpful

Copy link
Member Author

@dangotbanned dangotbanned Aug 2, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah yeah xml.etree.elementtree is a bit of a strange one

I had to learn a bit of lxml once to fix a particularly broken file.
The API of that is based on this stdlib module, but was more ergonoic than this mess πŸ˜„

To simplify this:

  • Element: Is a HTML Element
  • Tree: Refers to a document/webpage, but in this case it is just a table

So I'm essentially doing a fancy find/replace, but trying to preserve the structure of the document

Comment on lines 1586 to 1593
style_css = (
".dataframe caption { "
"caption-side: bottom; "
"text-align: center; "
"font-weight: bold; "
"padding-top: 8px;"
"}"
)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's very reasonable!

@dangotbanned
Copy link
Member Author

#2776 (comment)

  1. Do it, but increase the scope and make pyarrow + pd.Series look pretty too

I've started #2925 and came up against the pyarrow.ChunkedArray repr again, while writing an example πŸ€¦β€β™‚οΈ

import pyarrow as pa

import narwhals as nw

>>> nw.Series.from_iterable("a", [4, 1, 3, 2], dtype=nw.UInt32, backend=pa)
β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
|                    Narwhals Series                    |
|-------------------------------------------------------|
|<pyarrow.lib.ChunkedArray object at 0x0000017129497880>|
|[                                                      |
|  [                                                    |
|    4,                                                 |
|    1,                                                 |
|    3,                                                 |
|    2                                                  |
|  ]                                                    |
|]                                                      |
β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜

Even if we don't go ahead with _repr_html_ - I'd really like to be displaying Series.name and Series.dtype in __repr__

The polars one manages to fit in both of those + shape, while taking up waaaay less horizontal space and 1 fewer lines:

>>> nw.Series.from_iterable("a", [4, 1, 3, 2], dtype=nw.UInt32, backend="polars")
β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
| Narwhals Series |
|-----------------|
|shape: (4,)      |
|Series: 'a' [u32]|
|[                |
|        4        |
|        1        |
|        3        |
|        2        |
|]                |
β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜

@dangotbanned dangotbanned marked this pull request as draft August 3, 2025 15:22
@dangotbanned dangotbanned mentioned this pull request Aug 11, 2025
10 tasks
@dangotbanned
Copy link
Member Author

dangotbanned commented Aug 15, 2025

#2776 (comment)

If we just wanted pa.ChunkedArray to look nicer, I've got a very naive new repr (not html) for nw.Series:

shape: (365,)
dtype: Datetime(time_unit='us', time_zone=None)
name: 'time series'
nw.Series[pyarrow]
[
	2009-01-02 00:00:00
	2009-01-03 00:00:00
	2009-01-04 00:00:00
	2009-01-05 00:00:00
	2009-01-06 00:00:00
	…
	2009-12-28 00:00:00
	2009-12-29 00:00:00
	2009-12-30 00:00:00
	2009-12-31 00:00:00
	2010-01-01 00:00:00
]
shape: (30,)
dtype: UInt32
name: 'lower max rows'
nw.Series[pyarrow]
[
	0
	1
	2
	…
	27
	28
	29
]
shape: (30,)
dtype: Int16
name: 'oh pandas too???'
nw.Series[pandas]
[
	29
	28
	27
	26
	25
	24
	…
	5
	4
	3
	2
	1
	0
]

Would be nicer-er if we used the short type codes from polars

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

enhancement New feature or request pandas-like Issue is related to pandas-like backends polars Issue is related to polars backend

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants