Feature request: Parallel coordinates plots #3124

EwoutH · 2022-10-27T13:14:14Z

EwoutH
Oct 27, 2022

When visualizing high-dimensional datasets, parallel coordinates plots are sometimes very useful. I would love for Seaborn to have a build in function to do this!

Resources

Wikipedia: Parallel coordinates
Python Graph Gallery: Parallel coordinate plot
plotly: Parallel Coordinates Plot in Python
Pandas docs: pandas.plotting.parallel_coordinates

mwaskom · 2022-10-28T23:06:21Z

mwaskom
Oct 28, 2022
Maintainer

I don't think this is a great fit for seaborn. It's already in pandas (as you note) and also Yellowbrick, so there'd need to be something substantial that seaborn could add beyond those implementations. I don't really see much case for that here: parallel coordinates is a kind of one-off plot type so it doesn't fall into the existing grouping of functions and also wouldn't really compose with any other seaborn features (e.g. faceting, pair grid, etc.). Do you have in mind something that a "seaborn parallel coordinates plot" could add beyond what already exists in other libraries?

0 replies

FirefoxMetzger · 2022-10-29T10:04:38Z

FirefoxMetzger
Oct 29, 2022

@mwaskom Coincidentally, I might have an interesting use case for this where it would be beneficial to have an easy way to add additional axes (or at least a second one similar to ax.twins()).

I want to visualize the result of a grid search on a regression model while tracking two metrics/scores. The catch is that one metric (max_error) is absolute, and the other (MAPE) is a percentage. For me, both metrics are useful because they give me an estimate of both overall performance and worst-case performance.

One way I can currently do this is by using a facet over metrics:

(
    so.Plot(grid_result, x="max_depth", y="score")
    .facet(col="metric")
    .add(so.Line(), so.Agg())
    .add(so.Band())
    .share(y=False)
)

This is nice, but a bit hard to read, because I need to go back and forth between figures. With base matplotlib, I can use ax.twinx() and plot both in the same Axis, but with two scales (which, btw, fails if I use seaborn's new objects API, so here is it using the functional API instead):

fig, ax1 = plt.subplots()
ax2 = ax1.twinx()

sns.lineplot(grid_result.query("metric == 'mape'"), x="max_depth", y="score", color="tab:blue", ax=ax1)
sns.lineplot(grid_result.query("metric == 'max_error'"), x="max_depth", y="score", color="tab:red", ax=ax2)

ax1.set_ylabel("mape (blue)")
ax2.set_ylabel("max_error (red)")

It would be nice if we could get this done in seaborn without having to drop down to matplotlib; especially so because this would free up the dimensions used by a facet to be used with by other variables, e.g., grid search parameters.

0 replies

mwaskom · 2022-10-29T12:09:02Z

mwaskom
Oct 29, 2022
Maintainer

I think a Plot.twin operation is in scope — it could re-use most of the abstractions that support Plot.pair — but I don't really see what that has to do with a parallel coordinates plot?

0 replies

FirefoxMetzger · 2022-10-29T18:30:14Z

FirefoxMetzger
Oct 29, 2022

but I don't really see what that has to do with a parallel coordinates plot?

Isn't Plot.twin just a special case (N=2) of parallel coordinates? Was I to have 3 metrics in the example above I might wish to make a plot with 3 axes. Going from 1, 2, or 3 to N axes isn't that far-fetched, and plotting N axes is essentially a parallel coordinates plot unless I am missing something.

0 replies

mwaskom · 2022-10-29T20:22:28Z

mwaskom
Oct 29, 2022
Maintainer

Isn't Plot.twin just a special case (N=2) of parallel coordinates?

I'm having trouble seeing it that way. In a parallel coordinates plot there isn't a separate x variable that you're showing a relationship with. Or you can think of it in terms of melting your data matrix and then plotting value against variable:

(
    sns.load_dataset("iris")
    .rename_axis("example")
    .reset_index()
    .melt(["example", "species"])
    .pipe(so.Plot, x="variable", y="value", color="species")
    .add(so.Lines(alpha=.5), group="example")
)

BTW

but with two scales (which, btw, fails if I use seaborn's new objects API)

This seems to work for me? (Of course it has the same limitations of not playing nicely with faceting, etc., as the function interface)

f, ax1 = plt.subplots()
ax2 = ax1.twinx()
p = so.Plot(healthexp, x="Year", group="Country")
p.add(so.Line(), so.Agg(), y="Spending_USD").on(ax1).plot()
p.add(so.Line(color="r"), so.Agg(), y="Life_Expectancy").on(ax2).plot()

0 replies

EwoutH · 2022-10-29T21:47:17Z

EwoutH
Oct 29, 2022
Author

In the first plot above, would it be possible to (minmax) normalise the data on the Y-axis?

0 replies

FirefoxMetzger · 2022-10-30T07:44:00Z

FirefoxMetzger
Oct 30, 2022

Or you can think of it in terms of melting your data matrix and then plotting value against variable

Right! I have indeed misunderstood the parallel coordinates plot and they are separate things; sorry about that.

@mwaskom Should I create a new issue/feature request to track twin?

This seems to work for me? (Of course it has the same limitations of not playing nicely with faceting, etc., as the function interface)

Cool! Then this was user-error on my side. I didn't call .plot in the end which then resulted in only one of the two bars showing. Also I created two so.Plot objects, which resulted in two plots being created in the notebook

healthexp = sns.load_dataset("healthexp")

fig, ax1 = plt.subplots()
ax2 = ax1.twinx()

(
    so.Plot(healthexp, x="Year", group="Country", y="Spending_USD")
    .add(so.Line(color="tab:blue"), so.Agg())
    .on(ax1)
)

(
    so.Plot(healthexp, x="Year", group="Country", y="Life_Expectancy")
    .add(so.Line(color="tab:red"), so.Agg())
    .on(ax2)
)

In the first plot above, would it be possible to (minmax) normalise the data on the Y-axis?

@EwoutH Absolutely. Just transform your data before handing it over to the plot :)

import numpy as np
import pandas as pd
import seaborn.objects as so

iris: pd.DataFrame = sns.load_dataset("iris")


def normalize(df, columns):
    normalized = df.loc[:, columns].apply(
        # min/max normalization of a column
        lambda data: (data - np.min(data)) / np.ptp(data)
    )

    return df.assign(**{col: normalized[col] for col in normalized})


(
    iris.rename_axis("example")
    .reset_index()
    .transform(
        normalize,
        columns=["sepal_length", "sepal_width", "petal_length", "petal_width"],
    )
    .melt(["example", "species"])
    .pipe(so.Plot, x="variable", y="value", color="species")
    .add(so.Lines(alpha=0.5), group="example")
)

0 replies

mwaskom · 2022-10-30T13:07:49Z

mwaskom
Oct 30, 2022
Maintainer

Should I create a new issue/feature request to track twin?

This isn't good enough tracking for you? :)

seaborn/seaborn/_core/plot.py

Line 602 in 021a20f

# TODO def twin()?

I didn't call .plot in the end which then resulted in only one of the two bars showing. Also I created two so.Plot objects, which resulted in two plots being created in the notebook.

You don't need to invoke so.Plot twice to get this to work, and that's not why you're seeing two outputs. The duplicate output is because 1) plt.subplots() activates pyplot, and the inline backend automatically collects and shows any open figures after cell execution, and 2) the last line of your cell returns a so.Plot object, so it gets displayed. Up to you whether you'd rather solve this by closing the pyplot figure (plt.close(f)), avoiding pyplot altogether (f = mpl.figure.Figure(); ax = f.subplots()), or suppressing display of the so.Plot object (catching it with a variable, using a semi-colon, etc.). You probably want to defer to the so.Plot display because it will be retina-scaled by default.

The key thing is explicitly calling Plot.plot for each component that you want to appear in the final figure. This is documented here although I think it requires a fair amount of understanding of what's happening behind the scenes to be intuitive.

0 replies

mwaskom · 2022-10-30T13:12:35Z

mwaskom
Oct 30, 2022
Maintainer

Just transform your data before handing it over to the plot :)

You could also do this with a move transform:

class NormByOrient(so.Move):
    def __call__(self, df, groupby, orient, scales):
        other = {"x": "y", "y": "x"}[orient]
        return df.assign(**{
            other: df.groupby(orient)[other]
            .transform(lambda x: (x - x.min()) / (x.max() - x.min()))
        })

(
    iris
    .rename_axis("example")
    .reset_index()
    .melt(["example", "species"])
    .pipe(so.Plot, x="variable", y="value", color="species", group="example")
    .add(so.Lines(alpha=.5), NormByOrient())
)

I'm 👎 on adding a move transform that does this specifically but open to having it work within a more general operation. The existing Norm move doesn't quite do what you want here so that would be the right place to start. (In practice I've found that object a little hard to work with since I wrote it).

But also I suspect that in most cases where you're doing a parallel coordinates plot your data are going to be in "wide form" as that's how you'd hand them to an ML library so the X, y interface that yellowbrick offers would probably continue to be more convenient for most people.

0 replies

FirefoxMetzger · 2022-10-30T13:47:22Z

FirefoxMetzger
Oct 30, 2022

The key thing is explicitly calling Plot.plot for each component that you want to appear in the final figure. This is documented here although I think it requires a fair amount of understanding of what's happening behind the scenes to be intuitive.

Indeed that's the crux. I actually think the documentation is fine as is; it's just a bit imperceptible because it is part of the detailed explanation of Plot.on and thus easily overlooked. One thought could be to add/duplicate this info in the Notes section. That's usually where I search for and document gotchas like this. One might also consider adding Plots.plot into the See Also section. Plot.plot's documentation is currently sparse, but I'd assume that it will grow as time goes by.

If you are willing to accept a PR for this I can look into that.

0 replies

mwaskom · 2022-10-30T14:23:42Z

mwaskom
Oct 30, 2022
Maintainer

Duplication of the information doesn't sound like a great idea but maybe "notes" would be a better section, then again, the numpydoc standard says:

Extended Summary
A few sentences giving an extended description. This section should be used to clarify functionality, not to discuss implementation detail or background theory, which should rather be explored in the Notes section below. You may refer to the parameters and the function name, but parameter descriptions still belong in the Parameters section.

Of course, the docs don't really adhere to that standard religiously...

0 replies

Feature request: Parallel coordinates plots #3124

Uh oh!

Uh oh!

EwoutH Oct 27, 2022

Replies: 11 comments

Uh oh!

mwaskom Oct 28, 2022 Maintainer

Uh oh!

FirefoxMetzger Oct 29, 2022

Uh oh!

mwaskom Oct 29, 2022 Maintainer

Uh oh!

FirefoxMetzger Oct 29, 2022

Uh oh!

mwaskom Oct 29, 2022 Maintainer

Uh oh!

Uh oh!

EwoutH Oct 29, 2022 Author

Uh oh!

FirefoxMetzger Oct 30, 2022

Uh oh!

mwaskom Oct 30, 2022 Maintainer

Uh oh!

mwaskom Oct 30, 2022 Maintainer

Uh oh!

FirefoxMetzger Oct 30, 2022

Uh oh!

mwaskom Oct 30, 2022 Maintainer

EwoutH
Oct 27, 2022

mwaskom
Oct 28, 2022
Maintainer

FirefoxMetzger
Oct 29, 2022

mwaskom
Oct 29, 2022
Maintainer

FirefoxMetzger
Oct 29, 2022

mwaskom
Oct 29, 2022
Maintainer

EwoutH
Oct 29, 2022
Author

FirefoxMetzger
Oct 30, 2022

mwaskom
Oct 30, 2022
Maintainer

mwaskom
Oct 30, 2022
Maintainer

FirefoxMetzger
Oct 30, 2022

mwaskom
Oct 30, 2022
Maintainer