Skip to content

Commit 91c4b7b

Browse files
committed
adapting a gallery example moves to vega-altair; closes #274
1 parent 2eca864 commit 91c4b7b

File tree

2 files changed

+218
-113
lines changed

2 files changed

+218
-113
lines changed

content/plotting-matplotlib.md

Lines changed: 0 additions & 110 deletions
Original file line numberDiff line numberDiff line change
@@ -392,116 +392,6 @@ ax.tick_params(labelsize=15)
392392
```
393393
````
394394

395-
````{challenge} Exercise Customization-3: adapting a gallery example
396-
**This is a great exercise which is very close to real life.**
397-
398-
- Your task is to select one visualization library (some need to be installed first - in
399-
doubt choose Matplotlib or Seaborn since they are part of Anaconda installation):
400-
- [Matplotlib](https://matplotlib.org/stable/gallery/index.html):
401-
probably the most standard and most widely used
402-
- [Seaborn](https://seaborn.pydata.org/examples/index.html):
403-
high-level interface to Matplotlib, statistical functions built in
404-
- [Vega-Altair](https://altair-viz.github.io/gallery/index.html):
405-
declarative visualization, statistics built in
406-
(we have an [entire lesson about data visualization using Vega-Altair](https://coderefinery.github.io/data-visualization-python/))
407-
- [Plotly](https://plotly.com/python/):
408-
interactive graphs
409-
- [Bokeh](https://demo.bokeh.org/):
410-
also here good for interactivity
411-
- [plotnine](https://plotnine.readthedocs.io/):
412-
implementation of a grammar of graphics in Python, it is based on [ggplot2](https://ggplot2.tidyverse.org/)
413-
- [ggplot](https://yhat.github.io/ggpy/):
414-
R users will be more at home
415-
- [PyNGL](https://www.pyngl.ucar.edu/Examples/gallery.shtml):
416-
used in the weather forecast community
417-
- [K3D](https://k3d-jupyter.org/gallery/index.html):
418-
Jupyter Notebook extension for 3D visualization
419-
420-
- Browse the various example galleries (links above).
421-
- Select one example that is close to your recent visualization project or simply interests you.
422-
- Note that you might need to install additional Python packages in order make use of the libraries.
423-
This could be the visualization library itself, and in addition also any required dependency package.
424-
- First try to reproduce this example in the Jupyter Notebook.
425-
- Then try to print out the data that is used in this example just before the call of the plotting function
426-
to learn about its structure. Is it a pandas dataframe? Is it a NumPy array? Is it a dictionary? A list?
427-
a list of lists?
428-
- Then try to modify the data a bit.
429-
- If you have time, try to feed it different, simplified data.
430-
This will be key for adapting the examples to your projects.
431-
432-
Example "solution" for such an exploration below.
433-
````
434-
435-
````{solution} An example exploration
436-
- Let us imagine we were browsing <https://seaborn.pydata.org/examples/index.html>
437-
- And this example plot caught our eye: <https://seaborn.pydata.org/examples/simple_violinplots.html>
438-
- Try to run it in the notebook.
439-
- The `d` seems to be the data. Right before the call to `sns.violinplot`, add a `print(d)`:
440-
```{code-block} python
441-
---
442-
emphasize-lines: 12
443-
---
444-
import numpy as np
445-
import seaborn as sns
446-
447-
sns.set_theme()
448-
449-
# Create a random dataset across several variables
450-
rs = np.random.default_rng(0)
451-
n, p = 40, 8
452-
d = rs.normal(0, 2, (n, p))
453-
d += np.log(np.arange(1, p + 1)) * -5 + 10
454-
455-
print(d)
456-
457-
# Show each distribution with both violins and points
458-
sns.violinplot(data=d, palette="light:g", inner="points", orient="h")
459-
```
460-
- The print reveals that `d` is a NumPy array and looks like a two-dimensional list:
461-
```text
462-
[[10.25146044 6.27005437 5.78778386 3.27832843 0.88147169 1.76439276 2.87844934 1.49695422]
463-
[ 8.59252953 4.00342116 3.26038963 3.15118015 -2.69725111 0.60361933 -2.22137264 -1.86174242]
464-
... many more lines ...
465-
[12.45950762 4.32352988 6.56724895 3.42215312 0.34419915 0.46123886 -1.56953795 0.95292133]]
466-
```
467-
- Now let's try with a much simplified two-dimensional list:
468-
```{code-block} python
469-
---
470-
emphasize-lines: 12, 13
471-
---
472-
# import numpy as np
473-
import seaborn as sns
474-
475-
sns.set_theme()
476-
477-
# # Create a random dataset across several variables
478-
# rs = np.random.default_rng(0)
479-
# n, p = 40, 8
480-
# d = rs.normal(0, 2, (n, p))
481-
# d += np.log(np.arange(1, p + 1)) * -5 + 10
482-
483-
d = [[1.0, 2.0, 2.0, 3.0, 3.0, 3.0],
484-
[1.0, 1.0, 1.0, 2.0, 2.0, 3.0]]
485-
486-
# Show each distribution with both violins and points
487-
sns.violinplot(data=d, palette="light:g", inner="points", orient="h")
488-
```
489-
- Seems to work! And finally we arrive at a working example with our own data with all
490-
the "clutter" removed:
491-
```python
492-
import seaborn as sns
493-
494-
# l1 and l2 are note great names but they will do for a quick test
495-
l1 = [1.0, 2.0, 2.0, 3.0, 3.0, 3.0]
496-
l2 = [1.0, 1.0, 1.0, 2.0, 2.0, 3.0]
497-
498-
sns.violinplot(data=[l1, l2], palette="light:g", inner="points", orient="h")
499-
```
500-
- And now we can focus the rest of our work to read our real data.
501-
- Finally we can customize the plot, e.g. web search for "seaborn violin plot axis labels"
502-
and add `ax.set_yticklabels(['dataset 1', 'dataset 2'])`.
503-
````
504-
505395
---
506396

507397
```{discussion}

content/plotting-vega-altair.md

Lines changed: 218 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -413,14 +413,229 @@ remember everything so this strategy is useful to practice:
413413
- Select one example that is close to your current/recent visualization project
414414
or simply interests you.
415415
- First try to reproduce this example, as-is, in the Jupyter Notebook.
416+
- **If you get the error "ModuleNotFoundError: No module named
417+
'vega_datasets'", then try one of these examples:** (they do not need the "vega_datasets" module)
418+
- [Slider cutoff](https://altair-viz.github.io/gallery/slider_cutoff.html)
419+
(**below you can find a walk-through for this example**)
420+
- [Multi-Line tooltip](https://altair-viz.github.io/gallery/multiline_tooltip_standard.html)
421+
- [Heatmap](https://altair-viz.github.io/gallery/simple_heatmap.html)
422+
- [Layered histogram](https://altair-viz.github.io/gallery/layered_histogram.html)
416423
- Then try to print out the data that is used in this example just before the call of the plotting function
417424
to learn about its structure.
418425
- Then try to modify the data a bit.
419426
- If you have time, try to feed it different, simplified data.
420-
This will be key for adapting the examples to your projects.
427+
**This will be key for adapting the examples to your projects.**
421428

422-
:::{solution} Example walk-through
423-
(work in progress)
429+
:::{solution} Example walk-through for the slider cutoff example
430+
In this walk-through I imagine browsing: <https://altair-viz.github.io/gallery/index.html>
431+
432+
Then this example caught my eye: <https://altair-viz.github.io/gallery/slider_cutoff.html>
433+
434+
I then copy-paste the example code into a notebook and try to run it and I get
435+
the same result.
436+
437+
If you get stuck below, **you can also browse all the steps** in a [notebook
438+
using
439+
nbviewer](https://nbviewer.org/github/AaltoSciComp/python-for-scicomp/blob/master/resources/notebooks/plotting-exercise-2.ipynb).
440+
441+
Next, there is a lot of code that I don't (need to) understand yet but my eyes are trying to find
442+
`alt.Chart` which tells me that the data must be the "df" in `alt.Chart(df)`:
443+
```{code-block} python
444+
---
445+
emphasize-lines: 15
446+
---
447+
import altair as alt
448+
import pandas as pd
449+
import numpy as np
450+
451+
rand = np.random.RandomState(42)
452+
453+
df = pd.DataFrame({
454+
'xval': range(100),
455+
'yval': rand.randn(100).cumsum()
456+
})
457+
458+
slider = alt.binding_range(min=0, max=100, step=1)
459+
cutoff = alt.param(bind=slider, value=50)
460+
461+
alt.Chart(df).mark_point().encode(
462+
x='xval',
463+
y='yval',
464+
color=alt.condition(
465+
alt.datum.xval < cutoff,
466+
alt.value('red'), alt.value('blue')
467+
)
468+
).add_params(
469+
cutoff
470+
)
471+
```
472+
473+
My next step will be to print out the data `df` just before the call to `alt.Chart`:
474+
```{code-block} python
475+
---
476+
emphasize-lines: 15
477+
---
478+
import altair as alt
479+
import pandas as pd
480+
import numpy as np
481+
482+
rand = np.random.RandomState(42)
483+
484+
df = pd.DataFrame({
485+
'xval': range(100),
486+
'yval': rand.randn(100).cumsum()
487+
})
488+
489+
slider = alt.binding_range(min=0, max=100, step=1)
490+
cutoff = alt.param(bind=slider, value=50)
491+
492+
print(df)
493+
494+
alt.Chart(df).mark_point().encode(
495+
x='xval',
496+
y='yval',
497+
color=alt.condition(
498+
alt.datum.xval < cutoff,
499+
alt.value('red'), alt.value('blue')
500+
)
501+
).add_params(
502+
cutoff
503+
)
504+
```
505+
506+
The print reveals that `df` is a dataframe which contains x and y values:
507+
```text
508+
xval yval
509+
0 0 0.496714
510+
1 1 0.358450
511+
2 2 1.006138
512+
3 3 2.529168
513+
4 4 2.295015
514+
.. ... ...
515+
95 95 -10.712354
516+
96 96 -10.416233
517+
97 97 -10.155178
518+
98 98 -10.150065
519+
99 99 -10.384652
520+
521+
[100 rows x 2 columns]
522+
```
523+
524+
The next thing that often helps me is to save the data to a comma-separated
525+
values (CSV) file:
526+
```python
527+
import pandas as pd
528+
529+
df.to_csv("data.csv", index=False)
530+
```
531+
532+
I then open the file in an editor and see that it contains 100 rows:
533+
```text
534+
xval,yval
535+
0,0.4967141530112327
536+
1,0.358449851840048
537+
2,1.0061383899407406
538+
3,2.5291682463487657
539+
4,2.2950148716254297
540+
5,2.060877914676249
541+
6,3.6400907301836405
542+
7,4.407525459336549
543+
8,3.938051073401597
544+
9,4.4806111169875615
545+
...
546+
```
547+
548+
Saving the data to file often helps me to see the structure of the data and now
549+
I am in a position to replace this with my own data. I create a file called
550+
"mydata.csv" and there I use the maximum temperatures for months 1-10 from the
551+
Tromso monthly data which we used further up:
552+
```text
553+
xval,yval
554+
01,7.7
555+
02,6.6
556+
03,4.5
557+
04,9.8
558+
05,17.7
559+
06,25.4
560+
07,26.7
561+
08,25.1
562+
09,19.3
563+
10,9.8
564+
```
565+
566+
In the notebook I then verify that the reading of the data works:
567+
```python
568+
mydata = pd.read_csv("mydata.csv")
569+
570+
mydata
571+
```
572+
573+
Now I can replace the example with my own data (note how I could comment out
574+
some code that I don't need any longer):
575+
```{code-block} python
576+
---
577+
emphasize-lines: 16
578+
---
579+
import altair as alt
580+
import pandas as pd
581+
# import numpy as np
582+
583+
# rand = np.random.RandomState(42)
584+
585+
# df = pd.DataFrame({
586+
# 'xval': range(100),
587+
# 'yval': rand.randn(100).cumsum()
588+
# })
589+
590+
slider = alt.binding_range(min=0, max=100, step=1)
591+
cutoff = alt.param(bind=slider, value=50)
592+
593+
# print(df)
594+
df = pd.read_csv("mydata.csv")
595+
596+
alt.Chart(df).mark_point().encode(
597+
x='xval',
598+
y='yval',
599+
color=alt.condition(
600+
alt.datum.xval < cutoff,
601+
alt.value('red'), alt.value('blue')
602+
)
603+
).add_params(
604+
cutoff
605+
)
606+
```
607+
608+
Seems to work! I then make few more adjustments (I want the slider to work on
609+
the y-axis and have a more reasonable default):
610+
```{code-block} python
611+
---
612+
emphasize-lines: 4,5,13
613+
---
614+
import altair as alt
615+
import pandas as pd
616+
617+
slider = alt.binding_range(min=0, max=30, step=1)
618+
cutoff = alt.param(bind=slider, value=15)
619+
620+
df = pd.read_csv("mydata.csv")
621+
622+
alt.Chart(df).mark_point().encode(
623+
x='xval',
624+
y='yval',
625+
color=alt.condition(
626+
alt.datum.yval < cutoff,
627+
alt.value('red'), alt.value('blue')
628+
)
629+
).add_params(
630+
cutoff
631+
)
632+
```
633+
634+
My next steps would then be to change axis titles, display the month names, add
635+
a legend, and refine from here.
636+
637+
**You can also browse all the steps** in a [notebook using
638+
nbviewer](https://nbviewer.org/github/AaltoSciComp/python-for-scicomp/blob/master/resources/notebooks/plotting-exercise-2.ipynb).
424639
:::
425640
::::
426641

0 commit comments

Comments
 (0)