Skip to content

feat: Allow to export_to_dataframe to different kind of dataframes #382

@FBruzzesi

Description

@FBruzzesi

Description

Would you consider adding a parameter to export_to_dataframe to enable exporting to pandas, polars or pyarrow?

Motivation

In recent times, I find myself avoiding installing pandas whenever I can if I can get away with polars only (there are quite a few reasons for this)

docling-core requires pandas as a strict dependency only to enable the export_to_dataframe feature. Making that optional would lower the dependency burden and all the transitive dependencies that come with that.

Proposal

I am one of the maintainer of Narwhals (An extremely lightweight and extensible compatibility layer between dataframe libraries) and I would be happy to submit a PR to enable exporting to different dataframe libraries. Here is the branch/changes

Remark that Narwhals comes dependency free, which means that has no real impact in the dependency tree, and it's up to the user to have installed the library to which they would like to export to.

Narwhals is used with the very same scope by libraries such as altair, plotly, bokeh, marimo and many others (see ecosystem to know more).


Concretely, the changes would look like something like the following (for a full diff, you can check the branch/changes on my fork):

+ import narwhals.stable.v2 as nw

def export_to_dataframe(
        self,
        doc: Optional["DoclingDocument"] = None,
+      return_type: Literal["pandas", "polars", "pyarrow"] = "pandas"
    ):
    ...
+    df = nw.from_dict(data, backend=return_type)  # <- this is a narwhals DataFrame, backed by either pandas, polars or pyarrow
+    return df.to_native()  # <- this is the native dataframe

Guarantees

We try to make two guarantees for projects:

  1. Stable versions of the library, or Perfect backwards compatibility policy, which TL;DR is: we (almost) never ever do breaking changes on stable versions.
  2. We test in our CI downstream dependencies that use narwhals (see downstream_tests.yml).

Related issues

docling-project/docling#498

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions