Skip to content

Conversation

@jnumainville
Copy link
Collaborator

Adds filtering to dx with the filter_by and required_filter_by arguments that take columns and allow them to be filtered on with input filters and linkers.
This PR is sufficient to work with input filters. deephaven/web-client-ui#2456 is required for linkers.

@jnumainville jnumainville requested review from Copilot and mofojed June 4, 2025 22:30
@github-actions github-actions bot requested a review from margaretkennedy June 4, 2025 22:30
Copy link

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull Request Overview

This PR introduces filtering support for dx charts by adding two new parameters – filter_by and required_filter_by – that enable users to filter data based on specific column(s) when rendering plots. Key changes include new tests verifying widget sendMessage behavior with filters, a new FilterColumn type for handling filter metadata, and modifications across multiple plot functions and rendering utilities to integrate filtering.

Reviewed Changes

Copilot reviewed 26 out of 26 changed files in this pull request and generated 2 comments.

Show a summary per file
File Description
plugins/plotly-express/src/js/src/PlotlyExpressChartModel.test.ts Added tests to validate widget messaging when optional and required filters are applied; also introduced an import that appears unused.
plugins/plotly-express/src/deephaven/plot/express/types/utility.py Introduced the FilterColumn namedtuple for filter metadata.
plugins/plotly-express/src/deephaven/plot/express/plots/* Updated plot function signatures and docstrings to document filter_by and required_filter_by parameters.
plugins/plotly-express/src/deephaven/plot/express/plots/_private_utils.py Added helper functions to process filter_by arguments for both normal and PartitionedTables.
plugins/plotly-express/src/deephaven/plot/express/plots/_layer.py, PartitionManager.py, and DeephavenFigure* Integrated filter column handling into figure generation, layering, and graph management.
plugins/plotly-express/docs/filter-by.md Added documentation and examples for using the new filtering functionality.

if filters is None and (filter_by or required_filter_by):
# if there are input filters wait for them before creating the proper chart
# the python figure is created, then the filters are sent from the client
self.send_default_figure = True
Copy link

Copilot AI Jun 4, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[nitpick] The logic for setting 'send_default_figure' based on the presence of 'filters' along with filter_by and required_filter_by is a bit nested; consider refactoring or adding explanatory comments to improve maintainability.

Copilot uses AI. Check for mistakes.
jnumainville added a commit to deephaven/web-client-ui that referenced this pull request Jun 6, 2025
This requires deephaven/deephaven-plugins#1185
for full support, but this is all that's needed to support linker
directly with those changes.
Copy link
Member

@mofojed mofojed left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Some things to cleanup.

For my comment about how it will tie in with sharing tables between workers... I know that's a bit more of a complicated problem, but it's something I think we should think about/discuss soon.

PlotlyChartWidgetData,
setDefaultValueFormat,
} from './PlotlyExpressChartUtils';
import { l } from 'vite/dist/node/types.d-aGj9QkWt';
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree with Copilot - what is this import? I'm guessing it was auto-added accidentally.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done


To plot a subset of a table based on a column value, use the `filter_by` and `required_filter_by` parameters. These parameters accept column(s) denoting variables to filter on in the dataset. The plot shows only the data that matches the filter criteria. `filter_by` does not require the [input filter](https://deephaven.io/core/docs/how-to-guides/user-interface/filters/#input-filters) or [linker](https://deephaven.io/core/docs/how-to-guides/user-interface/filters/#linker) to be set on that column whereas `required_filter_by` does.

Under the hood, the Deephaven query engine performs a `partition_by` table operation on the given filter column. This efficient implementation means that plots with many groups can be filtered and redrawn quickly, even with large datasets.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is slightly different than how one_click currently works on plots, and I think it's an improvement, but I want to acknowledge the difference.
Existing one_click documentation: https://deephaven.io/core/docs/reference/plot/one-click/
I rewrote the example to show it with express:

from deephaven import read_csv
from deephaven.plot.selectable_dataset import one_click
from deephaven.plot.figure import Figure

source = read_csv(
    "https://media.githubusercontent.com/media/deephaven/examples/main/CryptoCurrencyHistory/CSV/CryptoTrades_20210922.csv"
)
oc = one_click(t=source, by=["Instrument"])
plot = Figure().plot_xy(series_name="Plot", t=oc, x="Timestamp", y="Price").show()
flp2 = dx.line(source, x="Timestamp", y="Price", filter_by="Instrument")

And the two plots look quite different initially, with plot looking dumb because it's not filtered, and flp2 looking good just showing all the different partitions:
image

Just want to acknowledge it. Once filtered they look the same.

Copy link
Collaborator Author

@jnumainville jnumainville Jun 24, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is definitely a good thing to point out, and has gotten me thinking. I've definitely seen that dumb example before and it doesn't make sense since it's implying continuity where there is none. I think if you're dealing with categorical values to filter on, the new behavior is way better. It's possible, however, you would want the old behavior, especially in cases of numeric columns that have lots of different values in the column. In that case, the new behavior is definitely worse in terms of performance and usability.

I think this calls for other arguments that are also filters, but behave differently. They probably don't have to be implemented now, although they could be and will touch much of the same code.

These arguments could be called filter (if we don't mind that there could be confusion) or even where (with required counterparts in either case). I'd lean just naming them filter, as it forms a natural progression.
filter just filters on the column -> filter_by filters and partitions on the columns -> required_filter_by filters, partitions, and is required
required_filter is also a natural progression of filter, in that it is required. It's a bit of an odd one in that it will often look the same as required_filter_by, but won't do the partition so is much better suited for cases when the partition isn't desired.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I added a little note about this for now. If we want to add the separate filter capability let me know and I will create a ticket.

Copy link
Collaborator Author

@jnumainville jnumainville Jun 27, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Worth adding that even a numeric column creates a partitioned table, so really adding this sort of filter behavior would be a new backend behavior that can emulate the old frontend behavior.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I created DH-19810 just in case we want to some day add this.

Comment on lines 14 to 19
import deephaven.plot.express as dx

stocks = dx.data.stocks() # import the example stocks data set

# specify `x` and `y` columns, as well as additional filter variables with `filter_by`
filtered_line_plot = dx.line(stocks, x="Timestamp", y="Price", filter_by="Sym")
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Right now if you do this and then filter on a value that does not exist, we just show a loading spinner:
image

It would be better if we showed either a blank figure (as one_clicks would) or even show an error message indicating that value does not exist.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

# specify `x` and `y` columns, as well as additional filter variables with `filter_by` and `required_filter_by`
filtered_line_plot = dx.line(
stocks, x="Timestamp", y="Price", filter_by="Sym", required_filter_by="Exchange"
)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Random - I thought we fixed the title margin? Wonder why it's still so big?
image
I thought this was fixed with deephaven/web-client-ui#2381 but still seeing it.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I missed removing it from the default figure as the default figure was rarely sent before. Fixed.


# specify `x` and `y` columns, as well as additional filter variables with `filter_by` and `required_filter_by`
filtered_line_plot = dx.line(
stocks, x="Timestamp", y="Price", filter_by="Sym", required_filter_by="Exchange"
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If I'm a jerk and specify the same column for both filter_by and required_filter_by, it just treats it as a required filter. Maybe we should just raise an exception right away instead?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

`make_subplots` maintains any `filter_by` and `required_filter_by` filter columns originally passed into the subplots.

> [!WARNING]
> Multiple filters with the same name but different types are not currently supported. Rename columns so that they are unique if necessary.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice note.

Comment on lines +176 to +185
elif message["type"] == "FILTER":
self._figure.update_filters(message["filterMap"])
revision = self._revision_manager.get_revision()
# updating the filters automatically recreates the figure, so it's ready to send
figure = self._get_figure()
try:
self._connection.on_data(*self._build_figure_message(figure, revision))
except RuntimeError:
# trying to send data when the connection is closed, ignore
pass
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Something we're going to want to think about going forward is how this ties into sharing tables between workers, and resolving on the client: https://deephaven.atlassian.net/issues/DH-19001
In context of plotting, I'm imagining something like:

  • WorkerA has a table t
  • WorkerB uses this lazy resolve API and passes it into the plotting API, e.g. p = dx.line(uri.client_resolve("pq://WorkerA/t"), x="X", y="Y"). Importantly, the WorkerB does not fetch the table from WorkerA, just keeps a reference to it.
  • Client fetches p from WorkerB, which sends a plot figure definition and tells the client to fetch the table/data from WorkerA

Of course this flow becomes more complicated the more server side processing we do before passing the data table to the client. We're a ways away from this but it's something I want to keep in my mind...

Copy link
Collaborator Author

@jnumainville jnumainville Jun 25, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think as you allude to the problem we're really going to run into is the server side processing. Ultimately, dx was built with the idea that it has access to tables that it can transform when running.

I think it would be relatively straightforward to detect that there is a URI passed in, suspend processing, and save off arguments, similar to what happens in the background with plot by charts.

So now we're back to retrieving the suspended chart, and doing the table processing. It seems like the problem really is we don't want to retrieve a table then have all the table processing happen in a users code studio in case the table is just too big. In my fantasy land, what I'd really want is some sort of identical table interface where I can just say "use this worker to do everything with this table but I can still pull out what I need here". Maybe like a pydeephaven table? Really I wish we would have that for every case we deal with, because the idea that we have to build logic to handle this sort of thing in every plugin we write seems like a pain. And if it's a pain for us, what about external developers trying to build custom plugins? This seems like something we want to strive for a general purpose abstraction for.

Otherwise, I think there just has to be the concept of a worker that handles the table transformations. As in I could pass in worker="WorkerC", suspend processing as mentioned above, then when I actually ask for that chart, all the processing happens in WorkerC because it has to happen somewhere.

Comment on lines 856 to 868
fireFilterUpdated(filterMap: FilterMap): void {
// Only send the filter update if filters are not required
// They will either be set or none are required
if (!this.isFilterRequired()) {
this.widget?.sendMessage(
JSON.stringify({
type: 'FILTER',
filterMap: Object.fromEntries(filterMap),
})
);
}
}

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Change this function name to sendFilterUpdated. The fire* functions are messages that are emitted by this model, whereas this is a directed message to the underlying widget.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

@jnumainville
Copy link
Collaborator Author

@mofojed I'm wondering if instead of trying to refactor for the new filter hooks, I should just address your comments and merge for now.

The thing is we will still need to use the existing overlay, and until DH-19613 is done there is going to still be some weirdness. I added a comment to DH-19613 about adding the types to the overlay as that is really where the inconsistency comes in.

The new hooks can allow both a unique name and type combo, but the overlay can't currently, so we just end up in a case where the overlay would show something that isn't correct.

@github-actions
Copy link

plotly-express docs preview (Available for 14 days)

@github-actions
Copy link

plotly-express docs preview (Available for 14 days)

@github-actions
Copy link

plotly-express docs preview (Available for 14 days)

@github-actions
Copy link

plotly-express docs preview (Available for 14 days)

@jnumainville jnumainville requested a review from mofojed June 26, 2025 15:29
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What's up with these snapshot updates? I see the legend now has a title, and in indicator the indicators are rounded to no decimal points. Are those changes expected? Not sure why they'd be related to filter_by.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The legend title was added. Instead of using chart title like one_click charts do this is how someone can see what columns they are filtering on. Worth noting that px already does it. I just missed it before and realized it would be good to add especially for filters as a replacement for the chart title.
I can pull out out to a PR if really necessary but I thought it would be fine here since it it especially important to have with filters.

I didn't see those rounding changes, I will take a look as I'm not sure why that would happen.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I added a fix for the rounding changes as what was happening was the client would ask for a new chart render when it shouldn't. I think that combined with DH-19811 caused it.

@github-actions
Copy link

github-actions bot commented Jul 1, 2025

plotly-express docs preview (Available for 14 days)

@github-actions
Copy link

github-actions bot commented Jul 1, 2025

plotly-express docs preview (Available for 14 days)

@jnumainville jnumainville requested a review from mofojed July 1, 2025 20:01
mofojed
mofojed previously approved these changes Jul 2, 2025
> If you are familiar with the `one_click` API it works similarly to `filter_by`, but there are some differences in behavior:
> In the `one_click` API, if filters are provided but not set then one trace is charted.
> In the `filter_by` API, if filters are provided but not set then all values within the filter columns are charted on separate traces.
> This provides a consistent experience with plot by behavior, but may not be optimal if filtering on numeric columns with many unique values.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

plot_by?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

by argument, made that more explicit

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@margaretkennedy is this clearer now? Just want to make sure

@github-actions
Copy link

github-actions bot commented Jul 2, 2025

plotly-express docs preview (Available for 14 days)

@github-actions
Copy link

github-actions bot commented Jul 2, 2025

plotly-express docs preview (Available for 14 days)

@jnumainville jnumainville merged commit 905945e into deephaven:main Jul 14, 2025
17 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants