Skip to content

Altair DataFrame class #3083

@jonmmease

Description

@jonmmease

In #3081, we are integrating VegaFusion to make it possible to extract the transformed data from an Altair Chart object. As discussed in #3081 (comment), it's even going to be possible to extract the transformed data from a chart without any marks configured. e.g:

import pandas as pd
chart = alt.Chart(
    pd.DataFrame({"a": [1, 2, 3], "b": ["A", "BB", "CCC"]})
)
chart.transform_filter("datum.a > 1")._transformed_data()
a b
0 2 BB
1 3 CCC

In this case, the Chart object is actually being used more like a lazy DataFrame than a chart. What if we added an alt.DataFrame class that includes a subset of the alt.Chart methods. In particular:

  • The .transform_* methods, which would return a new alt.DataFrame. And maybe we even drop the transform_* prefix.
  • The .mark_* methods, which would return a new alt.Chart.

VegaFusion doesn't do this efficiently yet, but I'd also picture supporting a .dtypes property that would return the output pandas data types for the alt.DataFrame. We could even use these output dtypes for encoding type inference (the way we currently only do for pandas DataFrames).

Alternative: maybe this functionality could be combined with the existing alt.Data sub classes, so that you could do things like:

alt.UrlData("https://path/to/file.csv").filter("datum.a > 1").transformed_data()
alt.UrlData("https://path/to/file.csv").filter("datum.a > 1").mark_point().encode(...)

Please follow these steps to make it more efficient to respond to your feature request.

  • Describe the feature's goal, motivating use cases, and its expected behavior.

Metadata

Metadata

Assignees

No one assigned

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions