@@ -24,17 +24,18 @@ From [Claus O. Wilke: "Fundamentals of Data Visualization"](https://clauswilke.c
2424> (which should also be automated), and they should come out of the pipeline
2525> ready to be sent to the printer, no manual post-processing needed.*
2626
27- - ** No manual post-processing** . This will bite you when you need to regenerate 50
27+ - ** Try to minimize manual post-processing** . This could bite you when you need to regenerate 50
2828 figures one day before submission deadline or regenerate a set of figures
2929 after the person who created them left the group.
3030- There is not the one perfect language and ** not the one perfect library** for everything.
3131- Within Python, many libraries exist:
32- - [ Matplotlib] ( https://matplotlib.org/gallery/index.html ) :
32+ - [ Matplotlib] ( https://matplotlib.org/stable/ gallery/index.html ) :
3333 probably the most standard and most widely used
3434 - [ Seaborn] ( https://seaborn.pydata.org/examples/index.html ) :
3535 high-level interface to Matplotlib, statistical functions built in
36- - [ Altair] ( https://altair-viz.github.io/gallery/index.html ) :
37- declarative visualization (R users will be more at home), statistics built in
36+ - [ Vega-Altair] ( https://altair-viz.github.io/gallery/index.html ) :
37+ declarative visualization, statistics built in
38+ (we have an [ entire lesson about data visualization using Vega-Altair] ( https://coderefinery.github.io/data-visualization-python/ ) )
3839 - [ Plotly] ( https://plotly.com/python/ ) :
3940 interactive graphs
4041 - [ Bokeh] ( https://demo.bokeh.org/ ) :
@@ -45,17 +46,16 @@ From [Claus O. Wilke: "Fundamentals of Data Visualization"](https://clauswilke.c
4546 R users will be more at home
4647 - [ PyNGL] ( https://www.pyngl.ucar.edu/Examples/gallery.shtml ) :
4748 used in the weather forecast community
48- - [ K3D] ( https://k3d-jupyter.org/gallery/ ) :
49- Jupyter notebook extension for 3D visualization
49+ - [ K3D] ( https://k3d-jupyter.org/gallery/index.html ) :
50+ Jupyter Notebook extension for 3D visualization
5051 - ...
51- - Two main families of libraries: procedural (e.g. Matplotlib) and declarative
52- (using grammar of graphics).
52+ - Two main families of libraries: procedural (e.g. Matplotlib) and declarative.
5353
5454
5555## Why are we starting with Matplotlib?
5656
57- - Matplotlib is perhaps the most "standard" Python plotting library.
58- - Many libraries build on top of Matplotlib.
57+ - Matplotlib is perhaps the most popular Python plotting library.
58+ - Many libraries build on top of Matplotlib (example: [ Seaborn ] ( https://seaborn.pydata.org/examples/index.html ) ) .
5959- MATLAB users will feel familiar.
6060- Even if you choose to use another library (see above list), chances are high
6161 that you need to adapt a Matplotlib plot of somebody else.
@@ -67,12 +67,12 @@ drawing (in terms of abstractions, not in terms of quality) and does not
6767provide statistical functions. Some figures require typing and tweaking many lines of code.
6868
6969Many other visualization libraries exist with their own strengths, it is also a
70- matter of personal preferences. ** Later we will also try other libraries. **
70+ matter of personal preferences.
7171
7272
7373## Getting started with Matplotlib
7474
75- We can start in a Jupyter notebook since notebooks are typically a good fit
75+ We can start in a Jupyter Notebook since notebooks are typically a good fit
7676for data visualizations. But if you prefer to run this as a script, this is
7777also OK.
7878
@@ -81,10 +81,7 @@ Let us create our first plot using
8181{obj}` ~matplotlib.axes.Axes.scatter ` , and some other methods on the
8282{obj}` ~matplotlib.axes.Axes ` object:
8383
84- ``` {code-block} python
85- # this line tells Jupyter to display matplotlib figures in the notebook
86- %matplotlib inline
87-
84+ ``` python
8885import matplotlib.pyplot as plt
8986
9087# this is dataset 1 from
@@ -106,6 +103,7 @@ ax.set_title("some title")
106103
107104``` {figure} data-visualization/first-plot/getting-started.png
108105:alt: Result of our first plot
106+ :width: 80%
109107
110108This is the result of our first plot.
111109```
@@ -135,26 +133,27 @@ matplotlib.use("Agg")
135133 by 2.0.
136134 ```python
137135 # here we multiply all elements of data2_y by 2.0
138- data2_y_scaled = [y* 2.0 for y in data2_y]
136+ data2_y_scaled = [y * 2.0 for y in data2_y]
139137 ```
140138
141139- Try to add a legend to the plot with {meth}`matplotlib.axes.Axes.legend` and searching the web for clues on
142140 how to add labels to each dataset.
141+ You can also consult this great
142+ [quick start guide](https://matplotlib.org/stable/users/explain/quick_start.html).
143143
144144- At the end it should look like this one:
145- ```{figure} data-visualization/first-plot/exercise.png
146- :alt: Result of the exercise
147- ```
145+ ```{figure} data-visualization/first-plot/exercise.png
146+ :alt: Result of the exercise
147+ ```
148+
149+ - Experiment also by using named colors (e.g. "red") instead of the hex-codes.
148150````
149151
150152```` {solution}
151153```{code-block} python
152154---
153- emphasize-lines: 12, 15, 20-21, 26
155+ emphasize-lines: 9, 12, 17-18, 23
154156---
155- # this line tells Jupyter to display matplotlib figures in the notebook
156- %matplotlib inline
157-
158157import matplotlib.pyplot as plt
159158
160159# this is dataset 1 from
@@ -166,13 +165,13 @@ data_y = [8.04, 6.95, 7.58, 8.81, 8.33, 9.96, 7.24, 4.26, 10.84, 4.82, 5.68]
166165data2_y = [9.14, 8.14, 8.74, 8.77, 9.26, 8.10, 6.13, 3.10, 9.13, 7.26, 4.74]
167166
168167# here we multiply all elements of data2_y by 2.0
169- data2_y_scaled = [y* 2.0 for y in data2_y]
168+ data2_y_scaled = [y * 2.0 for y in data2_y]
170169
171170fig, ax = plt.subplots()
172171
173- ax.scatter(x=data_x, y=data_y, c="#E69F00", label=' set 1' )
174- ax.scatter(x=data_x, y=data2_y, c="#56B4E9", label=' set 2' )
175- ax.scatter(x=data_x, y=data2_y_scaled, c="#009E73", label=' set 2 (scaled)' )
172+ ax.scatter(x=data_x, y=data_y, c="#E69F00", label=" set 1" )
173+ ax.scatter(x=data_x, y=data2_y, c="#56B4E9", label=" set 2" )
174+ ax.scatter(x=data_x, y=data2_y_scaled, c="#009E73", label=" set 2 (scaled)" )
176175
177176ax.set_xlabel("we should label the x axis")
178177ax.set_ylabel("we should label the y axis")
@@ -188,7 +187,7 @@ ax.legend()
188187This qualitative color palette is opimized for all color-vision
189188deficiencies, see <https://clauswilke.com/dataviz/color-pitfalls.html> and
190189[Okabe, M., and K. Ito. 2008. "Color Universal Design (CUD):
191- How to Make Figures and Presentations That Are Friendly to Colorblind People. "](http://jfly.iam.u-tokyo.ac.jp/color/).
190+ How to Make Figures and Presentations That Are Friendly to Colorblind People"](http://jfly.iam.u-tokyo.ac.jp/color/).
192191```
193192
194193---
@@ -199,7 +198,7 @@ When plotting with Matplotlib, it is useful to know and understand that
199198there are ** two approaches** even though the reasons of this dual approach is
200199outside the scope of this lesson.
201200
202- - The more modern option is an ** object-oriented interface** (the
201+ - The more modern option is an ** object-oriented interface** or ** explicit interface ** (the
203202 {class}` fig <matplotlib.figure.Figure> ` and {class}` ax <matplotlib.axes.Axes> ` objects
204203 can be configured separately and passed around to functions):
205204 ``` {code-block} python
@@ -223,7 +222,7 @@ outside the scope of this lesson.
223222 ```
224223
225224- The more traditional option mimics MATLAB plotting and uses the
226- ** pyplot interface** ({mod}` plt <matplotlib.pyplot> ` carries
225+ ** pyplot interface** or ** implicit interface ** ({mod}` plt <matplotlib.pyplot> ` carries
227226 the global settings):
228227 ``` {code-block} python
229228 ---
@@ -254,7 +253,7 @@ into these functions and there is less risk that adjusting figures changes
254253settings also for unrelated figures created in other functions.
255254
256255When using the pyplot interface, settings are modified for the entire
257- {mod}`matplotlib.pyplot` package. The latter is acceptable for linear scripts but may yield
256+ {mod}`matplotlib.pyplot` package. The latter is acceptable for simple scripts but may yield
258257surprising results when introducing functions to enhance/abstract Matplotlib
259258calls.
260259```
@@ -263,12 +262,13 @@ calls.
263262
264263## Styling and customizing plots
265264
266- - ** Do not customize "manually"** using a graphical program (not easily repeatable/reproducible).
267- - ** No manual post-processing** . This will bite you when you need to regenerate 50
268- figures one day before submission deadline or regenerate a set of figures
269- after the person who created them left the group.
265+ - Before you customize plots "manually" using a graphical program, please
266+ consider how this affects reproducibility.
267+ - ** Try to minimize manual post-processing** . This might bite you when you
268+ need to regenerate 50 figures one day before submission deadline or
269+ regenerate a set of figures after the person who created them left the group.
270270- Matplotlib and also all the other libraries allow to customize almost every aspect of a plot.
271- - It is useful to study [ Matplotlib parts of a figure] ( https://matplotlib.org/stable/tutorials/introductory /quick_start.html#parts-of-a-figure )
271+ - It is useful to study [ Matplotlib parts of a figure] ( https://matplotlib.org/stable/users/explain /quick_start.html#parts-of-a-figure )
272272 so that we know what to search for to customize things.
273273- Matplotlib cheatsheets: < https://github.com/matplotlib/cheatsheets >
274274- You can also select among pre-defined themes/
@@ -287,10 +287,10 @@ how the plot looks** (exercises 1 and 2) or to **modify the input data** (exampl
287287
288288This is very close to real life: there are so many options and possibilities and it is
289289almost impossible to remember everything so this strategy is useful to practice:
290- - select an example that is close to what you have in mind
291- - being able to adapt it to your needs
292- - being able to search for help
293- - being able to understand help request answers (not easy)
290+ - Select an example that is close to what you have in mind
291+ - Being able to adapt it to your needs
292+ - Being able to search for help
293+ - Being able to understand help request answers (not easy)
294294
295295```` {challenge} Exercise Customization-1: log scale in Matplotlib (15 min)
296296In this exercise we will learn how to use log scales.
@@ -299,12 +299,12 @@ In this exercise we will learn how to use log scales.
299299 ```python
300300 import pandas as pd
301301
302- url = "https://raw.githubusercontent.com/plotly/datasets/master/gapminder_with_codes.csv"
303- data = pd.read_csv(url)
302+ url = (
303+ "https://raw.githubusercontent.com/plotly/datasets/master/gapminder_with_codes.csv"
304+ )
305+ gapminder_data = pd.read_csv(url).query("year == 2007")
304306
305- data_2007 = data[data["year"] == 2007]
306-
307- data_2007
307+ gapminder_data
308308 ```
309309- Try the above snippet in a notebook and it will give you an overview over the data.
310310
@@ -314,10 +314,10 @@ In this exercise we will learn how to use log scales.
314314
315315 fig, ax = plt.subplots()
316316
317- ax.scatter(x=data_2007 ["gdpPercap"], y=data_2007 ["lifeExp"], alpha=0.5)
317+ ax.scatter(x=gapminder_data ["gdpPercap"], y=gapminder_data ["lifeExp"], alpha=0.5)
318318
319- ax.set_xlabel("GDP (USD) per capita")
320- ax.set_ylabel("life expectancy (years)")
319+ ax.set_xlabel("GDP per capita (PPP dollars) ")
320+ ax.set_ylabel("Life expectancy (years)")
321321 ```
322322
323323 This is the result but we realize that a linear scale is not ideal here:
@@ -342,12 +342,12 @@ emphasize-lines: 5
342342---
343343fig, ax = plt.subplots()
344344
345- ax.scatter(x=data_2007 ["gdpPercap"], y=data_2007 ["lifeExp"], alpha=0.5)
345+ ax.scatter(x=gapminder_data ["gdpPercap"], y=gapminder_data ["lifeExp"], alpha=0.5)
346346
347347ax.set_xscale("log")
348348
349- ax.set_xlabel("GDP (USD) per capita")
350- ax.set_ylabel("life expectancy (years)")
349+ ax.set_xlabel("GDP per capita (PPP dollars) ")
350+ ax.set_ylabel("Life expectancy (years)")
351351```
352352* {obj}`alpha <matplotlib.artist.Artist.set_alpha>` sets transparency
353353 of points.
@@ -362,7 +362,7 @@ For figures that go to print it is good practice to look at them at the size
362362they will be printed in and then often fonts and tickmarks are too small.
363363
364364Your task is to make the tickmarks and the axis label font larger, using
365- [Matplotlib parts of a figure](https://matplotlib.org/stable/tutorials/introductory /quick_start.html#parts-of-a-figure)
365+ [Matplotlib parts of a figure](https://matplotlib.org/stable/users/explain /quick_start.html#parts-of-a-figure)
366366and web search, and to arrive at this:
367367
368368```{figure} data-visualization/customizing/gapminder-larger-font.png
@@ -375,16 +375,17 @@ See {meth}`ax.tick_params <matplotlib.axes.Axes.tick_params>`.
375375
376376```{code-block} python
377377---
378- emphasize-lines: 7-11
378+ emphasize-lines: 7-8, 10-12
379379---
380380fig, ax = plt.subplots()
381381
382- ax.scatter(x=data_2007[ "gdpPercap"] , y=data_2007[ "lifeExp"] , alpha=0.5)
382+ ax.scatter(x="gdpPercap", y="lifeExp", alpha=0.5, data=gapminder_data )
383383
384384ax.set_xscale("log")
385385
386- ax.set_xlabel("GDP (USD) per capita", fontsize=15)
387- ax.set_ylabel("life expectancy (years)", fontsize=15)
386+ ax.set_xlabel("GDP per capita (PPP dollars)", fontsize=15)
387+ ax.set_ylabel("Life expectancy (years)", fontsize=15)
388+
388389ax.tick_params(which="major", length=10)
389390ax.tick_params(which="minor", length=5)
390391ax.tick_params(labelsize=15)
@@ -400,8 +401,9 @@ ax.tick_params(labelsize=15)
400401 probably the most standard and most widely used
401402 - [Seaborn](https://seaborn.pydata.org/examples/index.html):
402403 high-level interface to Matplotlib, statistical functions built in
403- - [Altair](https://altair-viz.github.io/gallery/index.html):
404- declarative visualization (R users will be more at home), statistics built in
404+ - [Vega-Altair](https://altair-viz.github.io/gallery/index.html):
405+ declarative visualization, statistics built in
406+ (we have an [entire lesson about data visualization using Vega-Altair](https://coderefinery.github.io/data-visualization-python/))
405407 - [Plotly](https://plotly.com/python/):
406408 interactive graphs
407409 - [Bokeh](https://demo.bokeh.org/):
@@ -412,14 +414,14 @@ ax.tick_params(labelsize=15)
412414 R users will be more at home
413415 - [PyNGL](https://www.pyngl.ucar.edu/Examples/gallery.shtml):
414416 used in the weather forecast community
415- - [K3D](https://k3d-jupyter.org/showcase/ ):
416- Jupyter notebook extension for 3D visualization
417+ - [K3D](https://k3d-jupyter.org/gallery/index.html ):
418+ Jupyter Notebook extension for 3D visualization
417419
418420- Browse the various example galleries (links above).
419421- Select one example that is close to your recent visualization project or simply interests you.
420422- Note that you might need to install additional Python packages in order make use of the libraries.
421423 This could be the visualization library itself, and in addition also any required dependency package.
422- - First try to reproduce this example in the Jupyter notebook .
424+ - First try to reproduce this example in the Jupyter Notebook .
423425- Then try to print out the data that is used in this example just before the call of the plotting function
424426 to learn about its structure. Is it a pandas dataframe? Is it a NumPy array? Is it a dictionary? A list?
425427 a list of lists?
@@ -509,8 +511,30 @@ clarify questions at this point before moving on.
509511
510512---
511513
514+ ## Matplotlib and pandas DataFrames
515+
516+ In the above exercises we have sent individual columns of the ` gapminder_data ` DataFrame
517+ into ` ax.scatter() ` like this:
518+ ``` python
519+ fig, ax = plt.subplots()
520+
521+ ax.scatter(x = gapminder_data[" gdpPercap" ], y = gapminder_data[" lifeExp" ], alpha = 0.5 )
522+ ```
523+
524+ It is possible to do this instead and let Matplotlib "unpack" the columns:
525+ ``` python
526+ fig, ax = plt.subplots()
527+
528+ ax.scatter(x = " gdpPercap" , y = " lifeExp" , alpha = 0.5 , data = gapminder_data)
529+ ```
530+
531+ Other input types are possible. See [ Types of inputs to plotting
532+ functions] ( https://matplotlib.org/stable/users/explain/quick_start.html#types-of-inputs-to-plotting-functions ) .
533+
534+ ---
535+
512536``` {keypoints}
513- - Avoid manual post-processing, script everything.
537+ - Minimize manual post-processing, script everything.
514538- Browse a number of example galleries to help you choose the library
515539 that fits best your work/style.
516540- Figures for presentation slides and figures for manuscripts have
0 commit comments