feat: DH-18281: Add dx filter support #1185

jnumainville · 2025-06-04T22:30:11Z

Adds filtering to dx with the filter_by and required_filter_by arguments that take columns and allow them to be filtered on with input filters and linkers.
This PR is sufficient to work with input filters. deephaven/web-client-ui#2456 is required for linkers.

Copilot

Pull Request Overview

This PR introduces filtering support for dx charts by adding two new parameters – filter_by and required_filter_by – that enable users to filter data based on specific column(s) when rendering plots. Key changes include new tests verifying widget sendMessage behavior with filters, a new FilterColumn type for handling filter metadata, and modifications across multiple plot functions and rendering utilities to integrate filtering.

Reviewed Changes

Copilot reviewed 26 out of 26 changed files in this pull request and generated 2 comments.

Show a summary per file

File	Description
plugins/plotly-express/src/js/src/PlotlyExpressChartModel.test.ts	Added tests to validate widget messaging when optional and required filters are applied; also introduced an import that appears unused.
plugins/plotly-express/src/deephaven/plot/express/types/utility.py	Introduced the FilterColumn namedtuple for filter metadata.
plugins/plotly-express/src/deephaven/plot/express/plots/*	Updated plot function signatures and docstrings to document filter_by and required_filter_by parameters.
plugins/plotly-express/src/deephaven/plot/express/plots/_private_utils.py	Added helper functions to process filter_by arguments for both normal and PartitionedTables.
plugins/plotly-express/src/deephaven/plot/express/plots/_layer.py, PartitionManager.py, and DeephavenFigure*	Integrated filter column handling into figure generation, layering, and graph management.
plugins/plotly-express/docs/filter-by.md	Added documentation and examples for using the new filtering functionality.

plugins/plotly-express/src/js/src/PlotlyExpressChartModel.test.ts

Copilot · 2025-06-04T22:30:47Z

plugins/plotly-express/src/deephaven/plot/express/plots/PartitionManager.py

+        if filters is None and (filter_by or required_filter_by):
+            # if there are input filters wait for them before creating the proper chart
+            # the python figure is created, then the filters are sent from the client
+            self.send_default_figure = True


[nitpick] The logic for setting 'send_default_figure' based on the presence of 'filters' along with filter_by and required_filter_by is a bit nested; consider refactoring or adding explanatory comments to improve maintainability.

This requires deephaven/deephaven-plugins#1185 for full support, but this is all that's needed to support linker directly with those changes.

mofojed

Some things to cleanup.

For my comment about how it will tie in with sharing tables between workers... I know that's a bit more of a complicated problem, but it's something I think we should think about/discuss soon.

mofojed · 2025-06-12T14:14:23Z

plugins/plotly-express/src/js/src/PlotlyExpressChartModel.test.ts

  PlotlyChartWidgetData,
  setDefaultValueFormat,
 } from './PlotlyExpressChartUtils';
+import { l } from 'vite/dist/node/types.d-aGj9QkWt';


I agree with Copilot - what is this import? I'm guessing it was auto-added accidentally.

mofojed · 2025-06-12T14:34:09Z

plugins/plotly-express/docs/filter-by.md

+
+To plot a subset of a table based on a column value, use the `filter_by` and `required_filter_by` parameters. These parameters accept column(s) denoting variables to filter on in the dataset. The plot shows only the data that matches the filter criteria. `filter_by` does not require the [input filter](https://deephaven.io/core/docs/how-to-guides/user-interface/filters/#input-filters) or [linker](https://deephaven.io/core/docs/how-to-guides/user-interface/filters/#linker) to be set on that column whereas `required_filter_by` does.
+
+Under the hood, the Deephaven query engine performs a `partition_by` table operation on the given filter column. This efficient implementation means that plots with many groups can be filtered and redrawn quickly, even with large datasets.


This is slightly different than how one_click currently works on plots, and I think it's an improvement, but I want to acknowledge the difference.
Existing one_click documentation: https://deephaven.io/core/docs/reference/plot/one-click/
I rewrote the example to show it with express:

from deephaven import read_csv from deephaven.plot.selectable_dataset import one_click from deephaven.plot.figure import Figure source = read_csv( "https://media.githubusercontent.com/media/deephaven/examples/main/CryptoCurrencyHistory/CSV/CryptoTrades_20210922.csv" ) oc = one_click(t=source, by=["Instrument"]) plot = Figure().plot_xy(series_name="Plot", t=oc, x="Timestamp", y="Price").show() flp2 = dx.line(source, x="Timestamp", y="Price", filter_by="Instrument")

And the two plots look quite different initially, with plot looking dumb because it's not filtered, and flp2 looking good just showing all the different partitions:

Just want to acknowledge it. Once filtered they look the same.

This is definitely a good thing to point out, and has gotten me thinking. I've definitely seen that dumb example before and it doesn't make sense since it's implying continuity where there is none. I think if you're dealing with categorical values to filter on, the new behavior is way better. It's possible, however, you would want the old behavior, especially in cases of numeric columns that have lots of different values in the column. In that case, the new behavior is definitely worse in terms of performance and usability.

I think this calls for other arguments that are also filters, but behave differently. They probably don't have to be implemented now, although they could be and will touch much of the same code.

These arguments could be called filter (if we don't mind that there could be confusion) or even where (with required counterparts in either case). I'd lean just naming them filter, as it forms a natural progression.
filter just filters on the column -> filter_by filters and partitions on the columns -> required_filter_by filters, partitions, and is required
required_filter is also a natural progression of filter, in that it is required. It's a bit of an odd one in that it will often look the same as required_filter_by, but won't do the partition so is much better suited for cases when the partition isn't desired.

I added a little note about this for now. If we want to add the separate filter capability let me know and I will create a ticket.

Worth adding that even a numeric column creates a partitioned table, so really adding this sort of filter behavior would be a new backend behavior that can emulate the old frontend behavior.

I created DH-19810 just in case we want to some day add this.

mofojed · 2025-06-12T14:36:02Z

plugins/plotly-express/docs/filter-by.md

+import deephaven.plot.express as dx
+
+stocks = dx.data.stocks()  # import the example stocks data set
+
+# specify `x` and `y` columns, as well as additional filter variables with `filter_by`
+filtered_line_plot = dx.line(stocks, x="Timestamp", y="Price", filter_by="Sym")


Right now if you do this and then filter on a value that does not exist, we just show a loading spinner:

It would be better if we showed either a blank figure (as one_clicks would) or even show an error message indicating that value does not exist.

mofojed · 2025-06-12T14:41:10Z

plugins/plotly-express/docs/filter-by.md

+# specify `x` and `y` columns, as well as additional filter variables with `filter_by` and `required_filter_by`
+filtered_line_plot = dx.line(
+    stocks, x="Timestamp", y="Price", filter_by="Sym", required_filter_by="Exchange"
+)


Random - I thought we fixed the title margin? Wonder why it's still so big?

I thought this was fixed with deephaven/web-client-ui#2381 but still seeing it.

I missed removing it from the default figure as the default figure was rarely sent before. Fixed.

mofojed · 2025-06-12T14:43:27Z

plugins/plotly-express/docs/filter-by.md

+
+# specify `x` and `y` columns, as well as additional filter variables with `filter_by` and `required_filter_by`
+filtered_line_plot = dx.line(
+    stocks, x="Timestamp", y="Price", filter_by="Sym", required_filter_by="Exchange"


If I'm a jerk and specify the same column for both filter_by and required_filter_by, it just treats it as a required filter. Maybe we should just raise an exception right away instead?

mofojed · 2025-06-12T14:45:16Z

plugins/plotly-express/docs/filter-by.md

+`make_subplots` maintains any `filter_by` and `required_filter_by` filter columns originally passed into the subplots.
+
+> [!WARNING]
+> Multiple filters with the same name but different types are not currently supported. Rename columns so that they are unique if necessary.


mofojed · 2025-06-12T15:10:30Z

plugins/plotly-express/src/deephaven/plot/express/communication/DeephavenFigureListener.py

+        elif message["type"] == "FILTER":
+            self._figure.update_filters(message["filterMap"])
+            revision = self._revision_manager.get_revision()
+            # updating the filters automatically recreates the figure, so it's ready to send
+            figure = self._get_figure()
+            try:
+                self._connection.on_data(*self._build_figure_message(figure, revision))
+            except RuntimeError:
+                # trying to send data when the connection is closed, ignore
+                pass


Something we're going to want to think about going forward is how this ties into sharing tables between workers, and resolving on the client: https://deephaven.atlassian.net/issues/DH-19001
In context of plotting, I'm imagining something like:

WorkerA has a table t

WorkerB uses this lazy resolve API and passes it into the plotting API, e.g. p = dx.line(uri.client_resolve("pq://WorkerA/t"), x="X", y="Y"). Importantly, the WorkerB does not fetch the table from WorkerA, just keeps a reference to it.

Client fetches p from WorkerB, which sends a plot figure definition and tells the client to fetch the table/data from WorkerA

Of course this flow becomes more complicated the more server side processing we do before passing the data table to the client. We're a ways away from this but it's something I want to keep in my mind...

I think as you allude to the problem we're really going to run into is the server side processing. Ultimately, dx was built with the idea that it has access to tables that it can transform when running.

I think it would be relatively straightforward to detect that there is a URI passed in, suspend processing, and save off arguments, similar to what happens in the background with plot by charts.

So now we're back to retrieving the suspended chart, and doing the table processing. It seems like the problem really is we don't want to retrieve a table then have all the table processing happen in a users code studio in case the table is just too big. In my fantasy land, what I'd really want is some sort of identical table interface where I can just say "use this worker to do everything with this table but I can still pull out what I need here". Maybe like a pydeephaven table? Really I wish we would have that for every case we deal with, because the idea that we have to build logic to handle this sort of thing in every plugin we write seems like a pain. And if it's a pain for us, what about external developers trying to build custom plugins? This seems like something we want to strive for a general purpose abstraction for.

Otherwise, I think there just has to be the concept of a worker that handles the table transformations. As in I could pass in worker="WorkerC", suspend processing as mentioned above, then when I actually ask for that chart, all the processing happens in WorkerC because it has to happen somewhere.

mofojed · 2025-06-13T18:04:57Z

plugins/plotly-express/src/js/src/PlotlyExpressChartModel.ts

+  fireFilterUpdated(filterMap: FilterMap): void {
+    // Only send the filter update if filters are not required
+    // They will either be set or none are required
+    if (!this.isFilterRequired()) {
+      this.widget?.sendMessage(
+        JSON.stringify({
+          type: 'FILTER',
+          filterMap: Object.fromEntries(filterMap),
+        })
+      );
+    }
+  }
+


Change this function name to sendFilterUpdated. The fire* functions are messages that are emitted by this model, whereas this is a directed message to the underlying widget.

jnumainville · 2025-06-24T18:06:11Z

@mofojed I'm wondering if instead of trying to refactor for the new filter hooks, I should just address your comments and merge for now.

The thing is we will still need to use the existing overlay, and until DH-19613 is done there is going to still be some weirdness. I added a comment to DH-19613 about adding the types to the overlay as that is really where the inconsistency comes in.

The new hooks can allow both a unique name and type combo, but the overlay can't currently, so we just end up in a case where the overlay would show something that isn't correct.

github-actions · 2025-06-25T19:47:47Z

plotly-express docs preview (Available for 14 days)

github-actions · 2025-06-25T20:13:23Z

plotly-express docs preview (Available for 14 days)

github-actions · 2025-06-25T22:48:41Z

plotly-express docs preview (Available for 14 days)

github-actions · 2025-06-26T00:08:03Z

plotly-express docs preview (Available for 14 days)

mofojed · 2025-06-30T14:44:46Z

tests/express.spec.ts-snapshots/Histogram-loads-1-webkit-linux.png

What's up with these snapshot updates? I see the legend now has a title, and in indicator the indicators are rounded to no decimal points. Are those changes expected? Not sure why they'd be related to filter_by.

The legend title was added. Instead of using chart title like one_click charts do this is how someone can see what columns they are filtering on. Worth noting that px already does it. I just missed it before and realized it would be good to add especially for filters as a replacement for the chart title.
I can pull out out to a PR if really necessary but I thought it would be fine here since it it especially important to have with filters.

I didn't see those rounding changes, I will take a look as I'm not sure why that would happen.

I added a fix for the rounding changes as what was happening was the client would ask for a new chart render when it shouldn't. I think that combined with DH-19811 caused it.

github-actions · 2025-07-01T15:42:43Z

plotly-express docs preview (Available for 14 days)

github-actions · 2025-07-01T18:01:54Z

plotly-express docs preview (Available for 14 days)

plugins/plotly-express/docs/filter.md

margaretkennedy · 2025-07-02T14:40:48Z

plugins/plotly-express/docs/filter.md

+> If you are familiar with the `one_click` API it works similarly to `filter_by`, but there are some differences in behavior:
+> In the `one_click` API, if filters are provided but not set then one trace is charted.
+> In the `filter_by` API, if filters are provided but not set then all values within the filter columns are charted on separate traces.
+> This provides a consistent experience with plot by behavior, but may not be optimal if filtering on numeric columns with many unique values.


by argument, made that more explicit

@margaretkennedy is this clearer now? Just want to make sure

plugins/plotly-express/docs/filter.md

Co-authored-by: margaretkennedy <[email protected]>

github-actions · 2025-07-02T16:01:31Z

plotly-express docs preview (Available for 14 days)

github-actions · 2025-07-02T16:08:11Z

plotly-express docs preview (Available for 14 days)

jnumainville added 20 commits May 15, 2025 09:49

wip

5b8778b

wip

06911e9

wip

c0a37d6

wip

e4c9a27

wip

f43002f

wip

3d0635e

wip

11e69df

wip

e36026c

wip

2eec07c

wip

4fa3681

wip

84868e5

wip

5bb9743

wip

fa707f7

wip

71277c9

wip

596ec5b

wip

a8e5da7

wip

668b19d

wip

1f20601

wip

81c8a74

wip

e2ecc81

jnumainville requested review from Copilot and mofojed June 4, 2025 22:30

github-actions bot requested a review from margaretkennedy June 4, 2025 22:30

Copilot AI reviewed Jun 4, 2025

View reviewed changes

jnumainville mentioned this pull request Jun 4, 2025

feat: DH-18840: Add dx linker support deephaven/web-client-ui#2456

Merged

wip

9e7ccb0

mofojed requested changes Jun 13, 2025

View reviewed changes

wip

461a95d

jnumainville added 3 commits June 25, 2025 14:14

Merge remote-tracking branch 'origin/main' into 18281_input_filters

c37bfa1

wip

c001add

wip

4e18718

wip

067fc7b

wip

9f2cdcb

wip

9074d01

jnumainville requested a review from mofojed June 26, 2025 15:29

mofojed reviewed Jun 30, 2025

View reviewed changes

jnumainville added 2 commits June 30, 2025 12:14

wip

088a363

wip

010cdca

wip

c58a176

jnumainville requested a review from mofojed July 1, 2025 20:01

mofojed previously approved these changes Jul 2, 2025

View reviewed changes

margaretkennedy reviewed Jul 2, 2025

View reviewed changes

Apply suggestions from code review

eb773b1

Co-authored-by: margaretkennedy <[email protected]>

jnumainville dismissed mofojed’s stale review via eb773b1 July 2, 2025 15:58

wip

3d2e1fb

jnumainville requested review from margaretkennedy and mofojed July 2, 2025 16:06

mofojed approved these changes Jul 3, 2025

View reviewed changes

margaretkennedy approved these changes Jul 14, 2025

View reviewed changes

jnumainville merged commit 905945e into deephaven:main Jul 14, 2025
17 checks passed


		To plot a subset of a table based on a column value, use the `filter_by` and `required_filter_by` parameters. These parameters accept column(s) denoting variables to filter on in the dataset. The plot shows only the data that matches the filter criteria. `filter_by` does not require the [input filter](https://deephaven.io/core/docs/how-to-guides/user-interface/filters/#input-filters) or [linker](https://deephaven.io/core/docs/how-to-guides/user-interface/filters/#linker) to be set on that column whereas `required_filter_by` does.

		Under the hood, the Deephaven query engine performs a `partition_by` table operation on the given filter column. This efficient implementation means that plots with many groups can be filtered and redrawn quickly, even with large datasets.

feat: DH-18281: Add dx filter support #1185

feat: DH-18281: Add dx filter support #1185

Uh oh!

Conversation

jnumainville commented Jun 4, 2025

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull Request Overview

Reviewed Changes

Uh oh!

Uh oh!

Copilot AI Jun 4, 2025

Choose a reason for hiding this comment

Uh oh!

mofojed left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

jnumainville Jun 24, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

jnumainville Jun 27, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

jnumainville Jun 25, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

jnumainville commented Jun 24, 2025

Uh oh!

github-actions bot commented Jun 25, 2025

Uh oh!

github-actions bot commented Jun 25, 2025

Uh oh!

github-actions bot commented Jun 25, 2025

Uh oh!

github-actions bot commented Jun 26, 2025

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

github-actions bot commented Jul 1, 2025

Uh oh!

github-actions bot commented Jul 1, 2025

Uh oh!

jnumainville Jun 24, 2025 •

edited

Loading

jnumainville Jun 27, 2025 •

edited

Loading

jnumainville Jun 25, 2025 •

edited

Loading