-
Notifications
You must be signed in to change notification settings - Fork 102
Description
Description
Would you consider adding a parameter to export_to_dataframe to enable exporting to pandas, polars or pyarrow?
Motivation
In recent times, I find myself avoiding installing pandas whenever I can if I can get away with polars only (there are quite a few reasons for this)
docling-core requires pandas as a strict dependency only to enable the export_to_dataframe feature. Making that optional would lower the dependency burden and all the transitive dependencies that come with that.
Proposal
I am one of the maintainer of Narwhals (An extremely lightweight and extensible compatibility layer between dataframe libraries) and I would be happy to submit a PR to enable exporting to different dataframe libraries. Here is the branch/changes
Remark that Narwhals comes dependency free, which means that has no real impact in the dependency tree, and it's up to the user to have installed the library to which they would like to export to.
Narwhals is used with the very same scope by libraries such as altair, plotly, bokeh, marimo and many others (see ecosystem to know more).
Concretely, the changes would look like something like the following (for a full diff, you can check the branch/changes on my fork):
+ import narwhals.stable.v2 as nw
def export_to_dataframe(
self,
doc: Optional["DoclingDocument"] = None,
+ return_type: Literal["pandas", "polars", "pyarrow"] = "pandas"
):
...
+ df = nw.from_dict(data, backend=return_type) # <- this is a narwhals DataFrame, backed by either pandas, polars or pyarrow
+ return df.to_native() # <- this is the native dataframeGuarantees
We try to make two guarantees for projects:
- Stable versions of the library, or Perfect backwards compatibility policy, which TL;DR is: we (almost) never ever do breaking changes on stable versions.
- We test in our CI downstream dependencies that use narwhals (see downstream_tests.yml).