Skip to content

ENH: Support non-categorical values for pandas bar plots when x axis is datetime valuesΒ #59543

@kdheepak

Description

@kdheepak

Feature Type

  • Adding new functionality to pandas

  • Changing existing functionality in pandas

  • Removing existing functionality in pandas

Problem Description

When the x-axis is all dates, and a user tries to plot a bar plot, pandas treats the dates as categorical values.

Take a simple pandas dataframe with date time values and plot it using bar plots:

import pandas as pd

df = pd.DataFrame(
    dict(
        date=pd.date_range(start="2020-01-01", end="2020-12-31", freq="MS"),
        data=[1,2,3,4,5,6,7,8,9,10,11,12]
    ), 
)

import matplotlib as mpl
import matplotlib.dates as mdates
import matplotlib.pyplot as plt

fig = mpl.figure.Figure(constrained_layout=True)
axs = fig.subplot_mosaic("a")

ax = axs["a"]

df.plot.bar(x="date", y="data", ax=ax, legend=False) # incorrect year -> 1970 instead of 2020

formatter = mdates.DateFormatter("%Y - %b")
ax.xaxis.set_major_formatter(formatter)

fig

You'll get this:

image

There's unfortunately no way to bypass this.

Using x_compat doesn't do anything:

with pd.plotting.plot_params.use("x_compat", True):
    df.plot.bar()

And throws an error if you try to use it directly:

df.plot.bar(x_compat=True)

image

If I change the df to this (i.e. more data points):

import pandas as pd
import numpy as np

date = pd.date_range(start="2020-01-01", end="2050-12-31", freq="MS")

df = pd.DataFrame(
    dict(
        date=date,
        data=[i for i, x in enumerate(date)]
    ), 
)

I get this:

image

imho, this is a bad user experience.

  1. It takes significantly longer to plot because pandas is generating text labels for every data point

  2. the plot labels are not useful to a user

  3. users have no way to modify this plot to "fix" it because the x axis's data interval is categorical, i.e. 0 - N where N represents an integer corresponding to the last time period

    image
  4. users cannot annotate labels on this plot easily because the x position is now a categorical axis instead of datetime values.


fwiw, matplotlib does the right thing when the x axis are all dates:

image

Feature Description

Add a new option to df.plot.bar(...) that skips treating datetime values as categorical data. df.plot(...) already has use_index=False and x_compat=True. The former option is not useful imo but adding the latter option for bar plots would be great.

Alternative Solutions

Alternatively, consider passing datetime values to matplotlib always without considering them as categorical data.

This may be slightly breaking though?

Additional Context

This is currently the source of quite a bit of confusion when plotting bar plots with timeseries and line plots on the same ax.

e.g.: https://stackoverflow.com/q/39560099

Suggestions include

  1. using ax.twinx() and setting the bar plot's ax to invisible:

This is currently the best solution to this problem but imo is a little bit of a hack.

  1. using use_index=False for the line plot:

This makes the line plot difficult to further annotate (x axis values are still 0 - N, and a user cannot use the datetime to place annotation text) and the user still will run into issues with large number of categorical datetime labels.

For context, this enhancement proposal was because I didn't understand that bar plots always use categorical values in pandas and posted this question on stackoverflow: https://stackoverflow.com/q/78882352/5451769

Metadata

Metadata

Assignees

No one assigned

    Labels

    EnhancementNeeds TriageIssue that has not been reviewed by a pandas team member

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions