|
| 1 | +--- |
| 2 | +title: bokeh |
| 3 | +author: "anthropic claude-3-5-sonnet-latest" |
| 4 | +date: 2025-01-06 |
| 5 | +--- |
| 6 | + |
| 7 | +## Question |
| 8 | + |
| 9 | +How can we create a working scatter plot matrix (SPLOM) of the iris dataset using Bokeh? |
| 10 | + |
| 11 | +## Overview |
| 12 | + |
| 13 | +We’ll create an interactive scatter plot matrix visualization of the iris dataset using Bokeh, with correct color mapping for different species. |
| 14 | + |
| 15 | +Note: Bokeh dark theme helper is incomplete due to lack of documentation (?) |
| 16 | + |
| 17 | +## Code |
| 18 | + |
| 19 | +```{python} |
| 20 | +#| echo: false |
| 21 | +from quarto import theme_brand_bokeh |
| 22 | +
|
| 23 | +light_theme = theme_brand_bokeh('light-brand.yml') |
| 24 | +dark_theme = theme_brand_bokeh('dark-brand.yml') |
| 25 | +``` |
| 26 | + |
| 27 | +```{python} |
| 28 | +#| echo: false |
| 29 | +#| renderings: [light, dark] |
| 30 | +from bokeh.plotting import figure, show |
| 31 | +from bokeh.layouts import gridplot |
| 32 | +from bokeh.io import output_notebook |
| 33 | +from bokeh.sampledata.iris import flowers |
| 34 | +from bokeh.models import ColumnDataSource, ColorBar |
| 35 | +from bokeh.transform import factor_cmap |
| 36 | +
|
| 37 | +# Enable notebook output; hiding banner helps the first plot |
| 38 | +# with issue described below |
| 39 | +output_notebook(hide_banner=True) |
| 40 | +
|
| 41 | +# Create ColumnDataSource for the data |
| 42 | +source = ColumnDataSource(flowers) |
| 43 | +
|
| 44 | +# Define the features we want to plot |
| 45 | +features = ['petal_length', 'petal_width', 'sepal_length', 'sepal_width'] |
| 46 | +
|
| 47 | +# Create color mapper |
| 48 | +color_mapper = factor_cmap('species', |
| 49 | + ['#1f77b4', '#ff7f0e', '#2ca02c'], |
| 50 | + ['setosa', 'versicolor', 'virginica']) |
| 51 | +
|
| 52 | +# Create the plots matrix |
| 53 | +plots = [] |
| 54 | +tooltips = [ |
| 55 | + ('Species', '@species'), |
| 56 | + ('Value', '$data_x, $data_y') |
| 57 | +] |
| 58 | +
|
| 59 | +for i, y in enumerate(features): |
| 60 | + row = [] |
| 61 | + for x in features: |
| 62 | + plot = figure(width=200, height=200, |
| 63 | + tooltips=tooltips, |
| 64 | + title="" if x != features[0] or i != 0 else "Iris SPLOM") |
| 65 | +
|
| 66 | + # Add scatter points with proper color mapping |
| 67 | + plot.scatter(x, y, |
| 68 | + color=color_mapper, |
| 69 | + size=8, |
| 70 | + alpha=0.5, |
| 71 | + legend_field='species', |
| 72 | + source=source) |
| 73 | +
|
| 74 | + # Configure axes |
| 75 | + if i != len(features)-1: |
| 76 | + plot.xaxis.visible = False |
| 77 | + else: |
| 78 | + plot.xaxis.axis_label = x |
| 79 | +
|
| 80 | + if x != features[0]: |
| 81 | + plot.yaxis.visible = False |
| 82 | + else: |
| 83 | + plot.yaxis.axis_label = y |
| 84 | +
|
| 85 | + # Show legend only on top-right plot |
| 86 | + if i != 0 or x != features[-1]: |
| 87 | + plot.legend.visible = False |
| 88 | + else: |
| 89 | + plot.legend.click_policy = "hide" |
| 90 | +
|
| 91 | + plot.grid.grid_line_color = None |
| 92 | + row.append(plot) |
| 93 | + plots.append(row) |
| 94 | +
|
| 95 | +# Create and show the grid |
| 96 | +grid = gridplot(plots) |
| 97 | +
|
| 98 | +light_theme() |
| 99 | +show(grid) |
| 100 | +
|
| 101 | +dark_theme() |
| 102 | +show(grid) |
| 103 | +``` |
| 104 | + |
| 105 | +Bokeh has issues with emitting extra outputs. Quarto is partly fixing this up but the second plot will currently not work with `renderings`: |
| 106 | + |
| 107 | +```{python} |
| 108 | +#| renderings: [light, dark] |
| 109 | +light_theme() |
| 110 | +show(grid) |
| 111 | +
|
| 112 | +dark_theme() |
| 113 | +show(grid) |
| 114 | +``` |
| 115 | + |
| 116 | +## Explanation |
| 117 | + |
| 118 | +This code creates a violin plot of the sepal length distribution for each species in the Iris dataset using Bokeh. Here's a breakdown of what the code does: |
| 119 | + |
| 120 | +1. We start by importing the necessary libraries, including Pandas for data manipulation, NumPy for numerical operations, and various Bokeh modules for plotting. |
| 121 | + |
| 122 | +2. We load the Iris dataset using scikit-learn's `load_iris()` function and convert it to a Pandas DataFrame for easy manipulation. |
| 123 | + |
| 124 | +3. We prepare the data for the violin plot by defining the categories (iris species) and choosing a color palette. |
| 125 | + |
| 126 | +4. We create a Bokeh figure with appropriate titles and labels. |
| 127 | + |
| 128 | +5. For each iris species, we: |
| 129 | + - Subset the data for that species. |
| 130 | + - Compute the kernel density estimation (KDE) using NumPy's histogram function. |
| 131 | + - Scale the KDE to create the violin shape. |
| 132 | + - Add the violin shape to the plot using Bokeh's `patch` method, creating a symmetrical violin by mirroring the shape. |
| 133 | + |
| 134 | +6. We customize the plot by removing the x-axis grid, setting the y-axis range, and adding axis labels. |
| 135 | + |
| 136 | +7. Finally, we display the plot using Bokeh's `show` function. |
| 137 | + |
| 138 | +The resulting violin plot will show the distribution of sepal lengths for each iris species. The width of each "violin" represents the frequency of data points at that y-value, giving us a clear visualization of the data distribution. This allows us to compare not just the central tendencies of each species' sepal length, but also the spread and shape of the distributions. |
| 139 | + |
| 140 | +This visualization can help us identify differences between the species. For example, we might see that one species has a broader distribution of sepal lengths, while another has a more concentrated distribution. We might also observe multimodal distributions or other interesting patterns that wouldn't be apparent from simple summary statistics. |
0 commit comments