|
19 | 19 | "cell_type": "markdown", |
20 | 20 | "metadata": {}, |
21 | 21 | "source": [ |
22 | | - "The `hvsampledata.synthetic_clusters` dataset is in many examples below." |
| 22 | + "The `hvsampledata.synthetic_clusters` dataset is used in many examples below. This dataset, returned as a DataFrame object, consists of five sub-datasets combined. Each of the sub-dataset has a random x, y-coordinate based on a normal distribution centered at a specific (x, y) location, with standard deviations derived from a power law, resulting in very dense to very scattered clusters. Each point also carries a `val` (`0` to `4`) and `cat` (`d1` to `d5`) column to identify its dataset and category. The total dataset contains 1,000,000 points, evenly split across the five distributions." |
23 | 23 | ] |
24 | 24 | }, |
25 | 25 | { |
|
61 | 61 | "- Selection of data from a dimension of the supplied dataset, or the index of the corresponding row in the dataset, including: `'first'`, `'last'`, `'min'`, `'max'`.\n", |
62 | 62 | "\n", |
63 | 63 | "`aggregator` accepts either:\n", |
64 | | - "- A [Datashader reduction object](https://datashader.org/api.html#reductions), such as `ds.count()` or `ds.mean('val')`.\n", |
| 64 | + "- A [Datashader reduction instance](https://datashader.org/api.html#reductions), such as `ds.count()` or `ds.mean('val')`.\n", |
65 | 65 | "- A string (e.g. `'mean'`, `'count'`, `'min'`, `'max'`, etc.), in which case the aggregated dimension can be defined by setting the [`color`](option-color) option (if not, the first non-coordinate variable found is used).\n", |
66 | 66 | "\n", |
67 | 67 | "The `'count_cat'` or `'by'` aggregators can be used for categorical cata. `ds.by(<column>, <reduction>)` allows to define the per-category reduction function (default is `count`). Alternatively, setting the [`by`](option-by) option to a categorical column is equivalent to setting `aggregator=ds.by(<cat_column>)`.\n", |
|
133 | 133 | "The next examples show how to leverage `ds.summary()` and `ds.where()`. Hover over the plots to see how what information is made available in the tooltip." |
134 | 134 | ] |
135 | 135 | }, |
136 | | - { |
137 | | - "cell_type": "code", |
138 | | - "execution_count": null, |
139 | | - "metadata": {}, |
140 | | - "outputs": [], |
141 | | - "source": [ |
142 | | - "ds.summary(min_s=ds.min('s'), min_val=ds.min('val'))" |
143 | | - ] |
144 | | - }, |
145 | | - { |
146 | | - "cell_type": "code", |
147 | | - "execution_count": null, |
148 | | - "metadata": {}, |
149 | | - "outputs": [], |
150 | | - "source": [ |
151 | | - "ds.where(ds.min('s'), 'val')" |
152 | | - ] |
153 | | - }, |
154 | | - { |
155 | | - "cell_type": "code", |
156 | | - "execution_count": null, |
157 | | - "metadata": {}, |
158 | | - "outputs": [], |
159 | | - "source": [ |
160 | | - "ds.summary(min_s=ds.min('s'), min_val=ds.min('val'))" |
161 | | - ] |
162 | | - }, |
163 | 136 | { |
164 | 137 | "cell_type": "code", |
165 | 138 | "execution_count": null, |
|
199 | 172 | "This approach can turn even the largest datasets into an image that captures patterns such as density or value distribution, making it ideal for high-volume scatter plots. When `datashade=True`, hvPlot returns a [`DynamicMap`](inv:holoviews#reference/containers/bokeh/DynamicMap) containing an [`RGB`](inv:holoviews#reference/elements/bokeh/RGB) instead of individual glyphs.\n", |
200 | 173 | "\n", |
201 | 174 | ":::{tip}\n", |
202 | | - "Since `datashade=True` produces an RGB image, the underlying data (e.g. the aggregated values per pixel) is not directly available to the plot. Enabling the `'hover'` [tool](options-hover) (disabled by default when `datashade=True`) would only show the RGB value per pixel, and no meaningful colorbar can be attached to the plot. To let the frontend apply colormapping instead of the backend, and as a consequence expose the underlying data, we recommend setting [`rasterize=True`](option-rasterize) instead of `datashade=True`.\n", |
| 175 | + "Since `datashade=True` produces an RGB image, the underlying data (e.g. the aggregated values per pixel) is not directly available to the plot. Enabling the `'hover'` [tool](options-hover) (disabled by default when `datashade=True` unless [`selector`](option-selector) is set) would only show the RGB value per pixel, and no meaningful colorbar can be attached to the plot. To let the frontend apply colormapping instead of the backend, and as a consequence expose the underlying data, we recommend setting [`rasterize=True`](option-rasterize) instead of `datashade=True`.\n", |
203 | 176 | ":::\n", |
204 | 177 | "\n", |
205 | 178 | "The [`cnorm`](option-cnorm) option defaults to `'eq_hist'` when `datashade=True`." |
|
216 | 189 | "\n", |
217 | 190 | "df = hvsampledata.synthetic_clusters(\"pandas\")\n", |
218 | 191 | "\n", |
219 | | - "df.hvplot.scatter(\n", |
| 192 | + "df.hvplot.points(\n", |
220 | 193 | " x='x', y='y', datashade=True, data_aspect=1, frame_height=250,\n", |
221 | | - " title='Datashaded scatter plot with\\n\"count\" aggregator and\\n\"eq_hist\" cnorm'\n", |
| 194 | + " title='Datashaded points plot with\\n\"count\" aggregator and\\n\"eq_hist\" cnorm'\n", |
222 | 195 | ")" |
223 | 196 | ] |
224 | 197 | }, |
|
305 | 278 | " x='x', y='y', frame_height=250, data_aspect=1,\n", |
306 | 279 | " xlim=(-5.5, -5), ylim=(2.5, 3),\n", |
307 | 280 | ")\n", |
308 | | - "df.hvplot.scatter(\n", |
| 281 | + "df.hvplot.points(\n", |
309 | 282 | " rasterize=True, dynspread=False,\n", |
310 | 283 | " title=\"Datashade without dynspread\", **plot_opts,\n", |
311 | 284 | ") +\\\n", |
312 | | - "df.hvplot.scatter(\n", |
| 285 | + "df.hvplot.points(\n", |
313 | 286 | " rasterize=True, dynspread=True,\n", |
314 | 287 | " title=\"Datashade with dynspread\", **plot_opts,\n", |
315 | 288 | ")" |
|
339 | 312 | " x='x', y='y', frame_height=250, data_aspect=1,\n", |
340 | 313 | " xlim=(-5.5, -5), ylim=(2.5, 3),\n", |
341 | 314 | ")\n", |
342 | | - "df.hvplot.scatter(\n", |
| 315 | + "df.hvplot.points(\n", |
343 | 316 | " rasterize=True, dynspread=True,\n", |
344 | 317 | " title=\"Dynspread with max_px=3 (default)\", **plot_opts,\n", |
345 | 318 | ") +\\\n", |
346 | | - "df.hvplot.scatter(\n", |
| 319 | + "df.hvplot.points(\n", |
347 | 320 | " rasterize=True, dynspread=True, max_px=8,\n", |
348 | 321 | " title=\"Dynspread with max_px=8\", **plot_opts\n", |
349 | 322 | ")" |
|
383 | 356 | "\n", |
384 | 357 | "df = hvsampledata.synthetic_clusters(\"pandas\")\n", |
385 | 358 | "\n", |
386 | | - "df.hvplot.scatter(\n", |
| 359 | + "df.hvplot.points(\n", |
387 | 360 | " x='x', y='y', datashade=True, pixel_ratio=0.1, frame_height=250,\n", |
388 | 361 | " data_aspect=1, title=\"Datashade with low pixel ratio\"\n", |
389 | 362 | ")" |
|
430 | 403 | "\n", |
431 | 404 | "df = hvsampledata.synthetic_clusters(\"pandas\")\n", |
432 | 405 | "\n", |
433 | | - "df.hvplot.scatter(\n", |
| 406 | + "df.hvplot.points(\n", |
434 | 407 | " x='x', y='y', rasterize=True, data_aspect=1, frame_height=250, cnorm='log',\n", |
435 | | - " title='Rasterized scatter with count aggregator\\nand log cnorm'\n", |
| 408 | + " title='Rasterized points with count aggregator\\nand log cnorm'\n", |
436 | 409 | ")" |
437 | 410 | ] |
438 | 411 | }, |
|
464 | 437 | "\n", |
465 | 438 | "df = hvsampledata.synthetic_clusters(\"pandas\")\n", |
466 | 439 | "\n", |
467 | | - "df.hvplot.scatter(\n", |
| 440 | + "df.hvplot.points(\n", |
468 | 441 | " x='x', y='y', rasterize=True, resample_when=1_000,\n", |
469 | 442 | " data_aspect=1, frame_height=250, cnorm='log',\n", |
470 | 443 | " title=\"Rasterize only when >1000 points in view\"\n", |
|
478 | 451 | "When running the code above, you will notice that after zooming in enough, the original data points appear. This gives a hybrid experience: raw points at low density, rasterized aggregates when zoomed out." |
479 | 452 | ] |
480 | 453 | }, |
| 454 | + { |
| 455 | + "cell_type": "markdown", |
| 456 | + "metadata": {}, |
| 457 | + "source": [ |
| 458 | + "(option-selector)=\n", |
| 459 | + "## `selector`\n", |
| 460 | + "\n", |
| 461 | + ":::{versionadded} 0.12.0\n", |
| 462 | + "Requires `holoviews>=1.21`.\n", |
| 463 | + "Requires `bokeh>=3.7`.\n", |
| 464 | + ":::\n", |
| 465 | + "\n", |
| 466 | + "When a Datashader operation is applied, with [`datashade=True`](option-datashade) or [`rasterize=True`](option-rasterize), the `selector` option allows to augment the tooltip with information computed (*selected*) from variables other than the aggregated one, effectively showing a sample of the dataset in the tooltip.\n", |
| 467 | + "\n", |
| 468 | + "Datashader operations allow to easily identify *macro level patterns* in large datasets by aggregating the data appropriately. However, they do not by default expose information about *individual data points*. Let's take for example a simple scatter plots set with `rasterize=True`; hovering over the image will only display the aggregated value per pixel (`'count'` by default), with no way to know more about each point (unless [`resample_when`](option-resample_when) is enabled and the user zooms in enough). Setting `selector` in this case would augment the tooltip with sample information from other variables, selected from *one unique row* of the dataset. Find out more about `selector` in HoloViews' [Interactive Hover for Big Data guide](https://dev.holoviews.org/user_guide/Interactive_Hover_for_Big_Data.html).\n", |
| 469 | + "\n", |
| 470 | + "Like the [`aggregator`](option-aggregator) option, a `selector` refers to a [Datashader `Reduction` object](https://datashader.org/api.html#reductions). However, unlike `aggregator` that accepts reductions that can combine data in a pixel (e.g. `'mean'` or `'count'`), `selector` only accepts reductions that *select* values, including: `'first'`, `'last'`, `'min'`, and `'max'`. Valid options include:\n", |
| 471 | + "- A string object for reductions that do not require a variable name, including `'first'` and `'last'`.\n", |
| 472 | + "- A 2-tuple with a reduction name and a variable name, for reductions that require a variable name, including `'min'` and `'max'` (e.g. `('min', 'column')`).\n", |
| 473 | + "- A reduction instance, including `ds.first()`, `ds.last()`, `ds.min()`, and `ds.max()`.\n", |
| 474 | + "\n", |
| 475 | + "::: {note}\n", |
| 476 | + "The hover tooltip always requires a live kernel when `selector` is set as the values displayed need to be sent by the Python server. Without a live kernel, like on this webpage, all the values are displayed as `'undefined'`.\n", |
| 477 | + ":::\n", |
| 478 | + "\n", |
| 479 | + "When you hover over the first plot below, you will see a value for `s`, `val`, and `cat` in the bottom part of the tooltip. All these values originate from the same row in the DataFrame, that row being the first one found in the subdataset contained within this pixel. In the second plot, the values displayed are derived from the row where `val` is minimum within the hovered pixel." |
| 480 | + ] |
| 481 | + }, |
| 482 | + { |
| 483 | + "cell_type": "code", |
| 484 | + "execution_count": null, |
| 485 | + "metadata": {}, |
| 486 | + "outputs": [], |
| 487 | + "source": [ |
| 488 | + "import hvplot.pandas # noqa\n", |
| 489 | + "import hvsampledata\n", |
| 490 | + "\n", |
| 491 | + "df = hvsampledata.synthetic_clusters(\"pandas\")\n", |
| 492 | + "\n", |
| 493 | + "plot_opts = dict(x='x', y='y', rasterize=True, data_aspect=1, frame_height=250, cnorm='log')\n", |
| 494 | + "(\n", |
| 495 | + " df.hvplot.points(selector='first', title='selector=\"first\"', **plot_opts) +\n", |
| 496 | + " df.hvplot.points(selector=('min', 'val'), title='selector=(\"min\", \"val\")', **plot_opts)\n", |
| 497 | + ")" |
| 498 | + ] |
| 499 | + }, |
| 500 | + { |
| 501 | + "cell_type": "markdown", |
| 502 | + "metadata": {}, |
| 503 | + "source": [ |
| 504 | + "`datashade=True` plots get their hover tool enabled by default when `selector` is set." |
| 505 | + ] |
| 506 | + }, |
| 507 | + { |
| 508 | + "cell_type": "code", |
| 509 | + "execution_count": null, |
| 510 | + "metadata": {}, |
| 511 | + "outputs": [], |
| 512 | + "source": [ |
| 513 | + "import datashader as ds\n", |
| 514 | + "import hvplot.pandas # noqa\n", |
| 515 | + "import hvsampledata\n", |
| 516 | + "\n", |
| 517 | + "df = hvsampledata.synthetic_clusters(\"pandas\")\n", |
| 518 | + "\n", |
| 519 | + "df.hvplot.points(\n", |
| 520 | + " x='x', y='y', data_aspect=1, frame_height=250, cnorm='log',\n", |
| 521 | + " datashade=True, selector=ds.min('val'), title='datashade=True',\n", |
| 522 | + ")" |
| 523 | + ] |
| 524 | + }, |
| 525 | + { |
| 526 | + "cell_type": "markdown", |
| 527 | + "metadata": {}, |
| 528 | + "source": [ |
| 529 | + "`selector` can also be set when datashading categorical data." |
| 530 | + ] |
| 531 | + }, |
| 532 | + { |
| 533 | + "cell_type": "code", |
| 534 | + "execution_count": null, |
| 535 | + "metadata": {}, |
| 536 | + "outputs": [], |
| 537 | + "source": [ |
| 538 | + "import hvplot.pandas # noqa\n", |
| 539 | + "import hvsampledata\n", |
| 540 | + "import datashader as ds\n", |
| 541 | + "\n", |
| 542 | + "df = hvsampledata.synthetic_clusters(\"pandas\")\n", |
| 543 | + "\n", |
| 544 | + "df.hvplot.points(\n", |
| 545 | + " x='x', y='y', data_aspect=1, frame_height=250, colorbar=False,\n", |
| 546 | + " rasterize=True, aggregator=ds.by('cat'), selector='first',\n", |
| 547 | + " title=\"Categorical rasterizing with\\n'count' aggregator'\",\n", |
| 548 | + ")" |
| 549 | + ] |
| 550 | + }, |
481 | 551 | { |
482 | 552 | "cell_type": "markdown", |
483 | 553 | "metadata": {}, |
|
503 | 573 | " x='x', y='y', datashade=True, dynspread=True,\n", |
504 | 574 | " data_aspect=1, frame_width=200, xlim=(-2, 0), ylim=(7, 9),\n", |
505 | 575 | ")\n", |
506 | | - "df.hvplot.scatter(threshold=0.0, title=\"Dynspread threshold=0.0\", **plot_opts) +\\\n", |
507 | | - "df.hvplot.scatter(threshold=0.5, title=\"Dynspread threshold=0.5\", **plot_opts) +\\\n", |
508 | | - "df.hvplot.scatter(threshold=1.0, title=\"Dynspread threshold=1.0\", **plot_opts)" |
| 576 | + "df.hvplot.points(threshold=0.0, title=\"Dynspread threshold=0.0\", **plot_opts) +\\\n", |
| 577 | + "df.hvplot.points(threshold=0.5, title=\"Dynspread threshold=0.5\", **plot_opts) +\\\n", |
| 578 | + "df.hvplot.points(threshold=1.0, title=\"Dynspread threshold=1.0\", **plot_opts)" |
509 | 579 | ] |
510 | 580 | }, |
511 | 581 | { |
|
529 | 599 | "\n", |
530 | 600 | "df = hvsampledata.synthetic_clusters(\"pandas\")\n", |
531 | 601 | "\n", |
532 | | - "df.hvplot.scatter(\n", |
| 602 | + "df.hvplot.points(\n", |
533 | 603 | " x='x', y='y', rasterize=True, x_sampling=0.1, y_sampling=0.1,\n", |
534 | 604 | " data_aspect=1, cnorm='log', xlim=(0, 1), ylim=(0, 1), frame_height=250,\n", |
535 | 605 | " title='Zoomed in rasterized plot\\nwith custom x/y-sampling'\n", |
|
0 commit comments