-
Notifications
You must be signed in to change notification settings - Fork 17
feat: DH-18281: Add dx filter support #1185
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Changes from 31 commits
5b8778b
06911e9
c0a37d6
e4c9a27
f43002f
3d0635e
11e69df
e36026c
2eec07c
4fa3681
84868e5
5bb9743
fa707f7
71277c9
596ec5b
a8e5da7
668b19d
1f20601
81c8a74
e2ecc81
9e7ccb0
461a95d
c37bfa1
c001add
4e18718
067fc7b
9f2cdcb
9074d01
088a363
010cdca
c58a176
eb773b1
3d2e1fb
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,169 @@ | ||
| # Filter By | ||
|
|
||
| To plot a subset of a table based on a column value, use the `filter_by` and `required_filter_by` parameters. These parameters accept column(s) denoting variables to filter on in the dataset. The plot shows only the data that matches the filter criteria. `filter_by` does not require the [input filter](https://deephaven.io/core/docs/how-to-guides/user-interface/filters/#input-filters) or [linker](https://deephaven.io/core/docs/how-to-guides/user-interface/filters/#linker) to be set on that column whereas `required_filter_by` does. | ||
|
|
||
| Under the hood, the Deephaven query engine performs a `partition_by` table operation on the given filter column. This efficient implementation means that plots with many groups can be filtered and redrawn quickly, even with large datasets. | ||
|
|
||
| > [!NOTE] | ||
| > If you are familiar with the `one_click` API it works similarly to `filter_by`, but there are some differences in behavior: | ||
jnumainville marked this conversation as resolved.
Outdated
Show resolved
Hide resolved
|
||
| > In the `one_click` API, if filters are provided but not set then one trace is charted. | ||
jnumainville marked this conversation as resolved.
Outdated
Show resolved
Hide resolved
|
||
| > In the `filter_by` API, if filters are provided but not set then all values within the filter columns are charted on separate traces. | ||
jnumainville marked this conversation as resolved.
Outdated
Show resolved
Hide resolved
|
||
| > This provides a consistent experience with plot by behavior, but may not be optimal if filtering on numeric columns with many unique values. | ||
|
||
|
|
||
| ## Examples | ||
|
|
||
| ### Filter by a categorical variable | ||
|
|
||
| To filter on a single column, provide a column to `filter_by`. The chart is filtered to match the value of the filter variable from the corresponding input filter or link. If the input filter or link is not set, all groups within the column are shown. | ||
|
|
||
| ```python skip-test | ||
| import deephaven.plot.express as dx | ||
|
|
||
| stocks = dx.data.stocks() # import the example stocks data set | ||
|
|
||
| # specify `x` and `y` columns, as well as additional filter variables with `filter_by` | ||
| filtered_line_plot = dx.line(stocks, x="Timestamp", y="Price", filter_by="Sym") | ||
| ``` | ||
|
|
||
| ### Filter by multiple categorical variables | ||
|
|
||
| To filter on multiple columns, provide columns to `filter_by`. The chart is filtered to match the values of the filter variables from the corresponding input filters or links. If the input filters or links are not set, all groups of variables are shown. | ||
|
|
||
| ```python skip-test | ||
| import deephaven.plot.express as dx | ||
|
|
||
| stocks = dx.data.stocks() # import the example stocks data set | ||
|
|
||
| # specify `x` and `y` columns, as well as additional filter variables with `filter_by` | ||
| filtered_line_plot = dx.line( | ||
| stocks, x="Timestamp", y="Price", filter_by=["Sym", "Exchange"] | ||
| ) | ||
| ``` | ||
|
|
||
| ### Filter by a required variable | ||
|
|
||
| To require a filter on a column, provide a column to `required_filter_by`. The chart is filtered to match the value of the filter variable from the corresponding input filter or link. If the input filter or link is not set, no data is shown. | ||
|
|
||
| ```python skip-test | ||
| import deephaven.plot.express as dx | ||
|
|
||
| stocks = dx.data.stocks() # import the example stocks data set | ||
|
|
||
| # specify `x` and `y` columns, as well as additional filter variables with `required_filter_by` | ||
| filtered_line_plot = dx.line(stocks, x="Timestamp", y="Price", required_filter_by="Sym") | ||
| ``` | ||
|
|
||
| ### Filter by optional and required variables | ||
|
|
||
| To mix optional and required filters, provide columns to both `filter_by` and `required_filter_by`. The chart is filtered to match the values of the filter variables from the corresponding input filters or links. If only the `required_filter_by` input filter or link is not set, no data is shown. If only the `filter_by` input filter or link is not set, all groups within the `filter_by` column are shown. | ||
|
|
||
| > [!NOTE] | ||
| > Mixing optional and required filters displays overlays to enter filters for all columns. Only the `required_filter_by` filters are actually required and the message is dismissed when all of those are provided. | ||
|
|
||
| ```python skip-test | ||
| import deephaven.plot.express as dx | ||
|
|
||
| stocks = dx.data.stocks() # import the example stocks data set | ||
|
|
||
| # specify `x` and `y` columns, as well as additional filter variables with `filter_by` and `required_filter_by` | ||
| filtered_line_plot = dx.line( | ||
| stocks, x="Timestamp", y="Price", filter_by="Sym", required_filter_by="Exchange" | ||
| ) | ||
| ``` | ||
|
|
||
| ### Filter by and plot by | ||
|
|
||
| To mix a filter and [plot by](plot-by.md), provide columns to both `filter_by` and `by`. By default, all grouping variables within the columns are shown for `by` and `filter_by`. | ||
|
|
||
| ```python skip-test | ||
| import deephaven.plot.express as dx | ||
|
|
||
| stocks = dx.data.stocks() # import the example stocks data set | ||
|
|
||
| # specify `x` and `y` columns, as well as additional filter variables with `filter_by` | ||
| filtered_line_plot = dx.line( | ||
| stocks, x="Timestamp", y="Price", by="Sym", filter_by="Exchange" | ||
| ) | ||
| ``` | ||
|
|
||
| ### `PartitionedTable` filter by | ||
|
|
||
| Providing a `PartitionedTable` defaults to a [plot by](plot-by.md) for the key columns that the table is partitioned on. Set `filter_by=True` to make the columns filters instead. | ||
jnumainville marked this conversation as resolved.
Outdated
Show resolved
Hide resolved
|
||
|
|
||
| ```python skip-test | ||
| import deephaven.plot.express as dx | ||
|
|
||
| # import and partition on example stocks data set | ||
| stocks = dx.data.stocks() | ||
| partitioned_stocks = stocks.partition_by(["Sym", "Exchange"]) | ||
|
|
||
| # specify `x` and `y` columns, and make "Sym" and "Exchange" filters | ||
| filtered_line_plot = dx.line( | ||
| partitioned_stocks, x="Timestamp", y="Price", filter_by=True | ||
| ) | ||
| ``` | ||
|
|
||
| ### `PartitionedTable` required filter by | ||
|
|
||
| Providing a `PartitionedTable` defaults to a [plot by](plot-by.md) for the key columns that the table is partitioned on. Set `required_filter_by=True` to make the columns required filters instead. | ||
jnumainville marked this conversation as resolved.
Outdated
Show resolved
Hide resolved
|
||
|
|
||
| ```python skip-test | ||
| import deephaven.plot.express as dx | ||
|
|
||
| # import and partition on example stocks data set | ||
| stocks = dx.data.stocks() | ||
| partitioned_stocks = stocks.partition_by(["Sym", "Exchange"]) | ||
|
|
||
| # specify `x` and `y` columns, and make "Sym" and "Exchange" required filters | ||
| filtered_line_plot = dx.line( | ||
| partitioned_stocks, x="Timestamp", y="Price", required_filter_by=True | ||
| ) | ||
| ``` | ||
|
|
||
| ### `PartitionedTable` filter by and plot by | ||
|
|
||
| Providing a `PartitionedTable` defaults to a [plot by](plot-by.md) for the key columns that the table is partitioned on. Set `filter_by` to a subset of the key columns to make those columns filters instead. | ||
jnumainville marked this conversation as resolved.
Outdated
Show resolved
Hide resolved
|
||
|
|
||
| ```python skip-test | ||
| import deephaven.plot.express as dx | ||
|
|
||
| # import and partition on example stocks data set | ||
| stocks = dx.data.stocks() | ||
| partitioned_stocks = stocks.partition_by(["Sym", "Exchange"]) | ||
|
|
||
| # specify `x` and `y` columns, and make "Sym" a filter, maintaining "Exchange" as a plot by | ||
| filtered_line_plot = dx.line( | ||
| partitioned_stocks, x="Timestamp", y="Price", filter_by="Sym" | ||
| ) | ||
| ``` | ||
|
|
||
| ### Subplot filter by | ||
|
|
||
| `make_subplots` maintains any `filter_by` and `required_filter_by` filter columns originally passed into the subplots. | ||
|
|
||
| > [!WARNING] | ||
| > Multiple filters with the same name but different types are not currently supported. Rename columns so that they are unique if necessary. | ||
jnumainville marked this conversation as resolved.
Outdated
Show resolved
Hide resolved
|
||
|
|
||
| ```python skip-test | ||
| import deephaven.plot.express as dx | ||
|
|
||
| stocks = dx.data.stocks() # import the example stocks data set | ||
|
|
||
| # specify `x` and `y` columns, as well as additional filter variables with `filter_by` | ||
| filtered_sym_line_plot = dx.line( | ||
| stocks, | ||
| x="Timestamp", | ||
| y="Price", | ||
| filter_by="Sym", | ||
| ) | ||
|
|
||
| # specify `x` and `y` columns, as well as additional required filter variables with `required_filter_by` | ||
| filtered_exchange_line_plot = dx.line( | ||
| stocks, x="Timestamp", y="Price", by="Sym", required_filter_by="Exchange" | ||
| ) | ||
|
|
||
| # make subplots, maintaining the filters | ||
| filtered_plots = dx.make_subplots( | ||
| filtered_sym_line_plot, filtered_exchange_line_plot, rows=2 | ||
| ) | ||
| ``` | ||
| Original file line number | Diff line number | Diff line change |
|---|---|---|
|
|
@@ -3,6 +3,7 @@ | |
| import json | ||
| from functools import partial | ||
| from typing import Any | ||
| import io | ||
|
|
||
| from deephaven.plugin.object_type import MessageStream | ||
| from deephaven.table_listener import listen, TableUpdate | ||
|
|
@@ -169,9 +170,19 @@ def process_message( | |
|
|
||
| """ | ||
| # need to create a new exporter for each message | ||
| message = json.loads(payload.decode()) | ||
| message = json.loads(io.BytesIO(payload).read().decode()) | ||
| if message["type"] == "RETRIEVE": | ||
| return self._handle_retrieve_figure() | ||
| elif message["type"] == "FILTER": | ||
| self._figure.update_filters(message["filterMap"]) | ||
| revision = self._revision_manager.get_revision() | ||
| # updating the filters automatically recreates the figure, so it's ready to send | ||
| figure = self._get_figure() | ||
| try: | ||
| self._connection.on_data(*self._build_figure_message(figure, revision)) | ||
| except RuntimeError: | ||
| # trying to send data when the connection is closed, ignore | ||
| pass | ||
|
Comment on lines
+176
to
+185
Member
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Something we're going to want to think about going forward is how this ties into sharing tables between workers, and resolving on the client: https://deephaven.atlassian.net/issues/DH-19001
Of course this flow becomes more complicated the more server side processing we do before passing the data table to the client. We're a ways away from this but it's something I want to keep in my mind...
Collaborator
Author
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I think as you allude to the problem we're really going to run into is the server side processing. Ultimately, I think it would be relatively straightforward to detect that there is a URI passed in, suspend processing, and save off arguments, similar to what happens in the background with plot by charts. So now we're back to retrieving the suspended chart, and doing the table processing. It seems like the problem really is we don't want to retrieve a table then have all the table processing happen in a users code studio in case the table is just too big. In my fantasy land, what I'd really want is some sort of identical table interface where I can just say "use this worker to do everything with this table but I can still pull out what I need here". Maybe like a Otherwise, I think there just has to be the concept of a worker that handles the table transformations. As in I could pass in |
||
| return b"", [] | ||
|
|
||
| def __del__(self): | ||
|
|
||
Uh oh!
There was an error while loading. Please reload this page.