Skip to content
Merged
Show file tree
Hide file tree
Changes from 31 commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
169 changes: 169 additions & 0 deletions plugins/plotly-express/docs/filter.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,169 @@
# Filter By

To plot a subset of a table based on a column value, use the `filter_by` and `required_filter_by` parameters. These parameters accept column(s) denoting variables to filter on in the dataset. The plot shows only the data that matches the filter criteria. `filter_by` does not require the [input filter](https://deephaven.io/core/docs/how-to-guides/user-interface/filters/#input-filters) or [linker](https://deephaven.io/core/docs/how-to-guides/user-interface/filters/#linker) to be set on that column whereas `required_filter_by` does.

Under the hood, the Deephaven query engine performs a `partition_by` table operation on the given filter column. This efficient implementation means that plots with many groups can be filtered and redrawn quickly, even with large datasets.

> [!NOTE]
> If you are familiar with the `one_click` API it works similarly to `filter_by`, but there are some differences in behavior:
> In the `one_click` API, if filters are provided but not set then one trace is charted.
> In the `filter_by` API, if filters are provided but not set then all values within the filter columns are charted on separate traces.
> This provides a consistent experience with plot by behavior, but may not be optimal if filtering on numeric columns with many unique values.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

plot_by?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

by argument, made that more explicit

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@margaretkennedy is this clearer now? Just want to make sure


## Examples

### Filter by a categorical variable

To filter on a single column, provide a column to `filter_by`. The chart is filtered to match the value of the filter variable from the corresponding input filter or link. If the input filter or link is not set, all groups within the column are shown.

```python skip-test
import deephaven.plot.express as dx

stocks = dx.data.stocks() # import the example stocks data set

# specify `x` and `y` columns, as well as additional filter variables with `filter_by`
filtered_line_plot = dx.line(stocks, x="Timestamp", y="Price", filter_by="Sym")
```

### Filter by multiple categorical variables

To filter on multiple columns, provide columns to `filter_by`. The chart is filtered to match the values of the filter variables from the corresponding input filters or links. If the input filters or links are not set, all groups of variables are shown.

```python skip-test
import deephaven.plot.express as dx

stocks = dx.data.stocks() # import the example stocks data set

# specify `x` and `y` columns, as well as additional filter variables with `filter_by`
filtered_line_plot = dx.line(
stocks, x="Timestamp", y="Price", filter_by=["Sym", "Exchange"]
)
```

### Filter by a required variable

To require a filter on a column, provide a column to `required_filter_by`. The chart is filtered to match the value of the filter variable from the corresponding input filter or link. If the input filter or link is not set, no data is shown.

```python skip-test
import deephaven.plot.express as dx

stocks = dx.data.stocks() # import the example stocks data set

# specify `x` and `y` columns, as well as additional filter variables with `required_filter_by`
filtered_line_plot = dx.line(stocks, x="Timestamp", y="Price", required_filter_by="Sym")
```

### Filter by optional and required variables

To mix optional and required filters, provide columns to both `filter_by` and `required_filter_by`. The chart is filtered to match the values of the filter variables from the corresponding input filters or links. If only the `required_filter_by` input filter or link is not set, no data is shown. If only the `filter_by` input filter or link is not set, all groups within the `filter_by` column are shown.

> [!NOTE]
> Mixing optional and required filters displays overlays to enter filters for all columns. Only the `required_filter_by` filters are actually required and the message is dismissed when all of those are provided.

```python skip-test
import deephaven.plot.express as dx

stocks = dx.data.stocks() # import the example stocks data set

# specify `x` and `y` columns, as well as additional filter variables with `filter_by` and `required_filter_by`
filtered_line_plot = dx.line(
stocks, x="Timestamp", y="Price", filter_by="Sym", required_filter_by="Exchange"
)
```

### Filter by and plot by

To mix a filter and [plot by](plot-by.md), provide columns to both `filter_by` and `by`. By default, all grouping variables within the columns are shown for `by` and `filter_by`.

```python skip-test
import deephaven.plot.express as dx

stocks = dx.data.stocks() # import the example stocks data set

# specify `x` and `y` columns, as well as additional filter variables with `filter_by`
filtered_line_plot = dx.line(
stocks, x="Timestamp", y="Price", by="Sym", filter_by="Exchange"
)
```

### `PartitionedTable` filter by

Providing a `PartitionedTable` defaults to a [plot by](plot-by.md) for the key columns that the table is partitioned on. Set `filter_by=True` to make the columns filters instead.

```python skip-test
import deephaven.plot.express as dx

# import and partition on example stocks data set
stocks = dx.data.stocks()
partitioned_stocks = stocks.partition_by(["Sym", "Exchange"])

# specify `x` and `y` columns, and make "Sym" and "Exchange" filters
filtered_line_plot = dx.line(
partitioned_stocks, x="Timestamp", y="Price", filter_by=True
)
```

### `PartitionedTable` required filter by

Providing a `PartitionedTable` defaults to a [plot by](plot-by.md) for the key columns that the table is partitioned on. Set `required_filter_by=True` to make the columns required filters instead.

```python skip-test
import deephaven.plot.express as dx

# import and partition on example stocks data set
stocks = dx.data.stocks()
partitioned_stocks = stocks.partition_by(["Sym", "Exchange"])

# specify `x` and `y` columns, and make "Sym" and "Exchange" required filters
filtered_line_plot = dx.line(
partitioned_stocks, x="Timestamp", y="Price", required_filter_by=True
)
```

### `PartitionedTable` filter by and plot by

Providing a `PartitionedTable` defaults to a [plot by](plot-by.md) for the key columns that the table is partitioned on. Set `filter_by` to a subset of the key columns to make those columns filters instead.

```python skip-test
import deephaven.plot.express as dx

# import and partition on example stocks data set
stocks = dx.data.stocks()
partitioned_stocks = stocks.partition_by(["Sym", "Exchange"])

# specify `x` and `y` columns, and make "Sym" a filter, maintaining "Exchange" as a plot by
filtered_line_plot = dx.line(
partitioned_stocks, x="Timestamp", y="Price", filter_by="Sym"
)
```

### Subplot filter by

`make_subplots` maintains any `filter_by` and `required_filter_by` filter columns originally passed into the subplots.

> [!WARNING]
> Multiple filters with the same name but different types are not currently supported. Rename columns so that they are unique if necessary.

```python skip-test
import deephaven.plot.express as dx

stocks = dx.data.stocks() # import the example stocks data set

# specify `x` and `y` columns, as well as additional filter variables with `filter_by`
filtered_sym_line_plot = dx.line(
stocks,
x="Timestamp",
y="Price",
filter_by="Sym",
)

# specify `x` and `y` columns, as well as additional required filter variables with `required_filter_by`
filtered_exchange_line_plot = dx.line(
stocks, x="Timestamp", y="Price", by="Sym", required_filter_by="Exchange"
)

# make subplots, maintaining the filters
filtered_plots = dx.make_subplots(
filtered_sym_line_plot, filtered_exchange_line_plot, rows=2
)
```
4 changes: 4 additions & 0 deletions plugins/plotly-express/docs/sidebar.json
Original file line number Diff line number Diff line change
Expand Up @@ -123,6 +123,10 @@
"label": "Plot by",
"path": "plot-by.md"
},
{
"label": "Filter",
"path": "filter.md"
},
{
"label": "Sub plot",
"path": "sub-plots.md"
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -3,6 +3,7 @@
import json
from functools import partial
from typing import Any
import io

from deephaven.plugin.object_type import MessageStream
from deephaven.table_listener import listen, TableUpdate
Expand Down Expand Up @@ -169,9 +170,19 @@ def process_message(

"""
# need to create a new exporter for each message
message = json.loads(payload.decode())
message = json.loads(io.BytesIO(payload).read().decode())
if message["type"] == "RETRIEVE":
return self._handle_retrieve_figure()
elif message["type"] == "FILTER":
self._figure.update_filters(message["filterMap"])
revision = self._revision_manager.get_revision()
# updating the filters automatically recreates the figure, so it's ready to send
figure = self._get_figure()
try:
self._connection.on_data(*self._build_figure_message(figure, revision))
except RuntimeError:
# trying to send data when the connection is closed, ignore
pass
Comment on lines +176 to +185
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Something we're going to want to think about going forward is how this ties into sharing tables between workers, and resolving on the client: https://deephaven.atlassian.net/issues/DH-19001
In context of plotting, I'm imagining something like:

  • WorkerA has a table t
  • WorkerB uses this lazy resolve API and passes it into the plotting API, e.g. p = dx.line(uri.client_resolve("pq://WorkerA/t"), x="X", y="Y"). Importantly, the WorkerB does not fetch the table from WorkerA, just keeps a reference to it.
  • Client fetches p from WorkerB, which sends a plot figure definition and tells the client to fetch the table/data from WorkerA

Of course this flow becomes more complicated the more server side processing we do before passing the data table to the client. We're a ways away from this but it's something I want to keep in my mind...

Copy link
Collaborator Author

@jnumainville jnumainville Jun 25, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think as you allude to the problem we're really going to run into is the server side processing. Ultimately, dx was built with the idea that it has access to tables that it can transform when running.

I think it would be relatively straightforward to detect that there is a URI passed in, suspend processing, and save off arguments, similar to what happens in the background with plot by charts.

So now we're back to retrieving the suspended chart, and doing the table processing. It seems like the problem really is we don't want to retrieve a table then have all the table processing happen in a users code studio in case the table is just too big. In my fantasy land, what I'd really want is some sort of identical table interface where I can just say "use this worker to do everything with this table but I can still pull out what I need here". Maybe like a pydeephaven table? Really I wish we would have that for every case we deal with, because the idea that we have to build logic to handle this sort of thing in every plugin we write seems like a pain. And if it's a pain for us, what about external developers trying to build custom plugins? This seems like something we want to strive for a general purpose abstraction for.

Otherwise, I think there just has to be the concept of a worker that handles the table transformations. As in I could pass in worker="WorkerC", suspend processing as mentioned above, then when I actually ask for that chart, all the processing happens in WorkerC because it has to happen somewhere.

return b"", []

def __del__(self):
Expand Down
Loading
Loading