@@ -392,116 +392,6 @@ ax.tick_params(labelsize=15)
392392```
393393````
394394
395- ```` {challenge} Exercise Customization-3: adapting a gallery example
396- **This is a great exercise which is very close to real life.**
397-
398- - Your task is to select one visualization library (some need to be installed first - in
399- doubt choose Matplotlib or Seaborn since they are part of Anaconda installation):
400- - [Matplotlib](https://matplotlib.org/stable/gallery/index.html):
401- probably the most standard and most widely used
402- - [Seaborn](https://seaborn.pydata.org/examples/index.html):
403- high-level interface to Matplotlib, statistical functions built in
404- - [Vega-Altair](https://altair-viz.github.io/gallery/index.html):
405- declarative visualization, statistics built in
406- (we have an [entire lesson about data visualization using Vega-Altair](https://coderefinery.github.io/data-visualization-python/))
407- - [Plotly](https://plotly.com/python/):
408- interactive graphs
409- - [Bokeh](https://demo.bokeh.org/):
410- also here good for interactivity
411- - [plotnine](https://plotnine.readthedocs.io/):
412- implementation of a grammar of graphics in Python, it is based on [ggplot2](https://ggplot2.tidyverse.org/)
413- - [ggplot](https://yhat.github.io/ggpy/):
414- R users will be more at home
415- - [PyNGL](https://www.pyngl.ucar.edu/Examples/gallery.shtml):
416- used in the weather forecast community
417- - [K3D](https://k3d-jupyter.org/gallery/index.html):
418- Jupyter Notebook extension for 3D visualization
419-
420- - Browse the various example galleries (links above).
421- - Select one example that is close to your recent visualization project or simply interests you.
422- - Note that you might need to install additional Python packages in order make use of the libraries.
423- This could be the visualization library itself, and in addition also any required dependency package.
424- - First try to reproduce this example in the Jupyter Notebook.
425- - Then try to print out the data that is used in this example just before the call of the plotting function
426- to learn about its structure. Is it a pandas dataframe? Is it a NumPy array? Is it a dictionary? A list?
427- a list of lists?
428- - Then try to modify the data a bit.
429- - If you have time, try to feed it different, simplified data.
430- This will be key for adapting the examples to your projects.
431-
432- Example "solution" for such an exploration below.
433- ````
434-
435- ```` {solution} An example exploration
436- - Let us imagine we were browsing <https://seaborn.pydata.org/examples/index.html>
437- - And this example plot caught our eye: <https://seaborn.pydata.org/examples/simple_violinplots.html>
438- - Try to run it in the notebook.
439- - The `d` seems to be the data. Right before the call to `sns.violinplot`, add a `print(d)`:
440- ```{code-block} python
441- ---
442- emphasize-lines: 12
443- ---
444- import numpy as np
445- import seaborn as sns
446-
447- sns.set_theme()
448-
449- # Create a random dataset across several variables
450- rs = np.random.default_rng(0)
451- n, p = 40, 8
452- d = rs.normal(0, 2, (n, p))
453- d += np.log(np.arange(1, p + 1)) * -5 + 10
454-
455- print(d)
456-
457- # Show each distribution with both violins and points
458- sns.violinplot(data=d, palette="light:g", inner="points", orient="h")
459- ```
460- - The print reveals that `d` is a NumPy array and looks like a two-dimensional list:
461- ```text
462- [[10.25146044 6.27005437 5.78778386 3.27832843 0.88147169 1.76439276 2.87844934 1.49695422]
463- [ 8.59252953 4.00342116 3.26038963 3.15118015 -2.69725111 0.60361933 -2.22137264 -1.86174242]
464- ... many more lines ...
465- [12.45950762 4.32352988 6.56724895 3.42215312 0.34419915 0.46123886 -1.56953795 0.95292133]]
466- ```
467- - Now let's try with a much simplified two-dimensional list:
468- ```{code-block} python
469- ---
470- emphasize-lines: 12, 13
471- ---
472- # import numpy as np
473- import seaborn as sns
474-
475- sns.set_theme()
476-
477- # # Create a random dataset across several variables
478- # rs = np.random.default_rng(0)
479- # n, p = 40, 8
480- # d = rs.normal(0, 2, (n, p))
481- # d += np.log(np.arange(1, p + 1)) * -5 + 10
482-
483- d = [[1.0, 2.0, 2.0, 3.0, 3.0, 3.0],
484- [1.0, 1.0, 1.0, 2.0, 2.0, 3.0]]
485-
486- # Show each distribution with both violins and points
487- sns.violinplot(data=d, palette="light:g", inner="points", orient="h")
488- ```
489- - Seems to work! And finally we arrive at a working example with our own data with all
490- the "clutter" removed:
491- ```python
492- import seaborn as sns
493-
494- # l1 and l2 are note great names but they will do for a quick test
495- l1 = [1.0, 2.0, 2.0, 3.0, 3.0, 3.0]
496- l2 = [1.0, 1.0, 1.0, 2.0, 2.0, 3.0]
497-
498- sns.violinplot(data=[l1, l2], palette="light:g", inner="points", orient="h")
499- ```
500- - And now we can focus the rest of our work to read our real data.
501- - Finally we can customize the plot, e.g. web search for "seaborn violin plot axis labels"
502- and add `ax.set_yticklabels(['dataset 1', 'dataset 2'])`.
503- ````
504-
505395---
506396
507397``` {discussion}
0 commit comments