Skip to content

Using altair with a large dataset #1748

@cddesja-fda

Description

@cddesja-fda

In order to use altair for a dataset where the number of rows exceeds 5000, one needs to enable the VegaFusion data transformer. For example, creating the following simple Shiny app:

import shiny.express
from shinywidgets import render_altair
import altair as alt
import numpy as np
import pandas as pd

# Generate bivariate normal distribution
mean = [0, 0]
cov = [[1, 0.5], [0.5, 1]]
data = np.random.multivariate_normal(mean, cov, 10000)
df = pd.DataFrame(data, columns=['x', 'y'])


@render_altair
def scatterplot():
    return(
        alt.Chart(df).mark_circle(size=60, color='#b6377a').encode(
        x='x',
        y='y')
    )

raise this error:

The number of rows in your dataset is greater than the maximum allowed (5000).

Try enabling the VegaFusion data transformer which raises this limit by pre-evaluating data
transformations in Python.
    >> import altair as alt
    >> alt.data_transformers.enable("vegafusion")

Or, see https://altair-viz.github.io/user_guide/large_datasets.html for additional information
on how to plot large datasets.

The software recommends adding the following alt.data_transformers.enable("vegafusion"). Modifying the Shiny app:

import shiny.express
from shinywidgets import render_altair
import altair as alt
import numpy as np
import pandas as pd
alt.data_transformers.enable("vegafusion")

# Generate bivariate normal distribution
mean = [0, 0]
cov = [[1, 0.5], [0.5, 1]]
data = np.random.multivariate_normal(mean, cov, 10000)
df = pd.DataFrame(data, columns=['x', 'y'])


@render_altair
def scatterplot():
    return(
        alt.Chart(df).mark_circle(size=60, color='#b6377a').encode(
        x='x',
        y='y')
    )

Which when run results in the following error:

TypeError(
TypeError: Invalid tag item type: <class 'altair.utils.plugin_registry.PluginEnabler'>. Consider calling str() on this value before treating it as a tag item.

Is this bug with Shiny, altair, and/or is there a workaround?

Metadata

Metadata

Assignees

No one assigned

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions