Skip to content

Commit df205a4

Browse files
Merge pull request #264 from AaltoSciComp/radovan/matplotlib-episode
Update Matplotlib episode; closes #262
2 parents 83f3b42 + 3c2d06a commit df205a4

File tree

6 files changed

+87
-63
lines changed

6 files changed

+87
-63
lines changed

content/data-visualization.md

Lines changed: 87 additions & 63 deletions
Original file line numberDiff line numberDiff line change
@@ -24,17 +24,18 @@ From [Claus O. Wilke: "Fundamentals of Data Visualization"](https://clauswilke.c
2424
> (which should also be automated), and they should come out of the pipeline
2525
> ready to be sent to the printer, no manual post-processing needed.*
2626
27-
- **No manual post-processing**. This will bite you when you need to regenerate 50
27+
- **Try to minimize manual post-processing**. This could bite you when you need to regenerate 50
2828
figures one day before submission deadline or regenerate a set of figures
2929
after the person who created them left the group.
3030
- There is not the one perfect language and **not the one perfect library** for everything.
3131
- Within Python, many libraries exist:
32-
- [Matplotlib](https://matplotlib.org/gallery/index.html):
32+
- [Matplotlib](https://matplotlib.org/stable/gallery/index.html):
3333
probably the most standard and most widely used
3434
- [Seaborn](https://seaborn.pydata.org/examples/index.html):
3535
high-level interface to Matplotlib, statistical functions built in
36-
- [Altair](https://altair-viz.github.io/gallery/index.html):
37-
declarative visualization (R users will be more at home), statistics built in
36+
- [Vega-Altair](https://altair-viz.github.io/gallery/index.html):
37+
declarative visualization, statistics built in
38+
(we have an [entire lesson about data visualization using Vega-Altair](https://coderefinery.github.io/data-visualization-python/))
3839
- [Plotly](https://plotly.com/python/):
3940
interactive graphs
4041
- [Bokeh](https://demo.bokeh.org/):
@@ -45,17 +46,16 @@ From [Claus O. Wilke: "Fundamentals of Data Visualization"](https://clauswilke.c
4546
R users will be more at home
4647
- [PyNGL](https://www.pyngl.ucar.edu/Examples/gallery.shtml):
4748
used in the weather forecast community
48-
- [K3D](https://k3d-jupyter.org/gallery/):
49-
Jupyter notebook extension for 3D visualization
49+
- [K3D](https://k3d-jupyter.org/gallery/index.html):
50+
Jupyter Notebook extension for 3D visualization
5051
- ...
51-
- Two main families of libraries: procedural (e.g. Matplotlib) and declarative
52-
(using grammar of graphics).
52+
- Two main families of libraries: procedural (e.g. Matplotlib) and declarative.
5353

5454

5555
## Why are we starting with Matplotlib?
5656

57-
- Matplotlib is perhaps the most "standard" Python plotting library.
58-
- Many libraries build on top of Matplotlib.
57+
- Matplotlib is perhaps the most popular Python plotting library.
58+
- Many libraries build on top of Matplotlib (example: [Seaborn](https://seaborn.pydata.org/examples/index.html)).
5959
- MATLAB users will feel familiar.
6060
- Even if you choose to use another library (see above list), chances are high
6161
that you need to adapt a Matplotlib plot of somebody else.
@@ -67,12 +67,12 @@ drawing (in terms of abstractions, not in terms of quality) and does not
6767
provide statistical functions. Some figures require typing and tweaking many lines of code.
6868

6969
Many other visualization libraries exist with their own strengths, it is also a
70-
matter of personal preferences. **Later we will also try other libraries.**
70+
matter of personal preferences.
7171

7272

7373
## Getting started with Matplotlib
7474

75-
We can start in a Jupyter notebook since notebooks are typically a good fit
75+
We can start in a Jupyter Notebook since notebooks are typically a good fit
7676
for data visualizations. But if you prefer to run this as a script, this is
7777
also OK.
7878

@@ -81,10 +81,7 @@ Let us create our first plot using
8181
{obj}`~matplotlib.axes.Axes.scatter`, and some other methods on the
8282
{obj}`~matplotlib.axes.Axes` object:
8383

84-
```{code-block} python
85-
# this line tells Jupyter to display matplotlib figures in the notebook
86-
%matplotlib inline
87-
84+
```python
8885
import matplotlib.pyplot as plt
8986

9087
# this is dataset 1 from
@@ -106,6 +103,7 @@ ax.set_title("some title")
106103

107104
```{figure} data-visualization/first-plot/getting-started.png
108105
:alt: Result of our first plot
106+
:width: 80%
109107
110108
This is the result of our first plot.
111109
```
@@ -135,26 +133,27 @@ matplotlib.use("Agg")
135133
by 2.0.
136134
```python
137135
# here we multiply all elements of data2_y by 2.0
138-
data2_y_scaled = [y*2.0 for y in data2_y]
136+
data2_y_scaled = [y * 2.0 for y in data2_y]
139137
```
140138
141139
- Try to add a legend to the plot with {meth}`matplotlib.axes.Axes.legend` and searching the web for clues on
142140
how to add labels to each dataset.
141+
You can also consult this great
142+
[quick start guide](https://matplotlib.org/stable/users/explain/quick_start.html).
143143
144144
- At the end it should look like this one:
145-
```{figure} data-visualization/first-plot/exercise.png
146-
:alt: Result of the exercise
147-
```
145+
```{figure} data-visualization/first-plot/exercise.png
146+
:alt: Result of the exercise
147+
```
148+
149+
- Experiment also by using named colors (e.g. "red") instead of the hex-codes.
148150
````
149151

150152
````{solution}
151153
```{code-block} python
152154
---
153-
emphasize-lines: 12, 15, 20-21, 26
155+
emphasize-lines: 9, 12, 17-18, 23
154156
---
155-
# this line tells Jupyter to display matplotlib figures in the notebook
156-
%matplotlib inline
157-
158157
import matplotlib.pyplot as plt
159158
160159
# this is dataset 1 from
@@ -166,13 +165,13 @@ data_y = [8.04, 6.95, 7.58, 8.81, 8.33, 9.96, 7.24, 4.26, 10.84, 4.82, 5.68]
166165
data2_y = [9.14, 8.14, 8.74, 8.77, 9.26, 8.10, 6.13, 3.10, 9.13, 7.26, 4.74]
167166
168167
# here we multiply all elements of data2_y by 2.0
169-
data2_y_scaled = [y*2.0 for y in data2_y]
168+
data2_y_scaled = [y * 2.0 for y in data2_y]
170169
171170
fig, ax = plt.subplots()
172171
173-
ax.scatter(x=data_x, y=data_y, c="#E69F00", label='set 1')
174-
ax.scatter(x=data_x, y=data2_y, c="#56B4E9", label='set 2')
175-
ax.scatter(x=data_x, y=data2_y_scaled, c="#009E73", label='set 2 (scaled)')
172+
ax.scatter(x=data_x, y=data_y, c="#E69F00", label="set 1")
173+
ax.scatter(x=data_x, y=data2_y, c="#56B4E9", label="set 2")
174+
ax.scatter(x=data_x, y=data2_y_scaled, c="#009E73", label="set 2 (scaled)")
176175
177176
ax.set_xlabel("we should label the x axis")
178177
ax.set_ylabel("we should label the y axis")
@@ -188,7 +187,7 @@ ax.legend()
188187
This qualitative color palette is opimized for all color-vision
189188
deficiencies, see <https://clauswilke.com/dataviz/color-pitfalls.html> and
190189
[Okabe, M., and K. Ito. 2008. "Color Universal Design (CUD):
191-
How to Make Figures and Presentations That Are Friendly to Colorblind People."](http://jfly.iam.u-tokyo.ac.jp/color/).
190+
How to Make Figures and Presentations That Are Friendly to Colorblind People"](http://jfly.iam.u-tokyo.ac.jp/color/).
192191
```
193192

194193
---
@@ -199,7 +198,7 @@ When plotting with Matplotlib, it is useful to know and understand that
199198
there are **two approaches** even though the reasons of this dual approach is
200199
outside the scope of this lesson.
201200

202-
- The more modern option is an **object-oriented interface** (the
201+
- The more modern option is an **object-oriented interface** or **explicit interface** (the
203202
{class}`fig <matplotlib.figure.Figure>` and {class}`ax <matplotlib.axes.Axes>` objects
204203
can be configured separately and passed around to functions):
205204
```{code-block} python
@@ -223,7 +222,7 @@ outside the scope of this lesson.
223222
```
224223

225224
- The more traditional option mimics MATLAB plotting and uses the
226-
**pyplot interface** ({mod}`plt <matplotlib.pyplot>` carries
225+
**pyplot interface** or **implicit interface** ({mod}`plt <matplotlib.pyplot>` carries
227226
the global settings):
228227
```{code-block} python
229228
---
@@ -254,7 +253,7 @@ into these functions and there is less risk that adjusting figures changes
254253
settings also for unrelated figures created in other functions.
255254
256255
When using the pyplot interface, settings are modified for the entire
257-
{mod}`matplotlib.pyplot` package. The latter is acceptable for linear scripts but may yield
256+
{mod}`matplotlib.pyplot` package. The latter is acceptable for simple scripts but may yield
258257
surprising results when introducing functions to enhance/abstract Matplotlib
259258
calls.
260259
```
@@ -263,12 +262,13 @@ calls.
263262

264263
## Styling and customizing plots
265264

266-
- **Do not customize "manually"** using a graphical program (not easily repeatable/reproducible).
267-
- **No manual post-processing**. This will bite you when you need to regenerate 50
268-
figures one day before submission deadline or regenerate a set of figures
269-
after the person who created them left the group.
265+
- Before you customize plots "manually" using a graphical program, please
266+
consider how this affects reproducibility.
267+
- **Try to minimize manual post-processing**. This might bite you when you
268+
need to regenerate 50 figures one day before submission deadline or
269+
regenerate a set of figures after the person who created them left the group.
270270
- Matplotlib and also all the other libraries allow to customize almost every aspect of a plot.
271-
- It is useful to study [Matplotlib parts of a figure](https://matplotlib.org/stable/tutorials/introductory/quick_start.html#parts-of-a-figure)
271+
- It is useful to study [Matplotlib parts of a figure](https://matplotlib.org/stable/users/explain/quick_start.html#parts-of-a-figure)
272272
so that we know what to search for to customize things.
273273
- Matplotlib cheatsheets: <https://github.com/matplotlib/cheatsheets>
274274
- You can also select among pre-defined themes/
@@ -287,10 +287,10 @@ how the plot looks** (exercises 1 and 2) or to **modify the input data** (exampl
287287

288288
This is very close to real life: there are so many options and possibilities and it is
289289
almost impossible to remember everything so this strategy is useful to practice:
290-
- select an example that is close to what you have in mind
291-
- being able to adapt it to your needs
292-
- being able to search for help
293-
- being able to understand help request answers (not easy)
290+
- Select an example that is close to what you have in mind
291+
- Being able to adapt it to your needs
292+
- Being able to search for help
293+
- Being able to understand help request answers (not easy)
294294

295295
````{challenge} Exercise Customization-1: log scale in Matplotlib (15 min)
296296
In this exercise we will learn how to use log scales.
@@ -299,12 +299,12 @@ In this exercise we will learn how to use log scales.
299299
```python
300300
import pandas as pd
301301
302-
url = "https://raw.githubusercontent.com/plotly/datasets/master/gapminder_with_codes.csv"
303-
data = pd.read_csv(url)
302+
url = (
303+
"https://raw.githubusercontent.com/plotly/datasets/master/gapminder_with_codes.csv"
304+
)
305+
gapminder_data = pd.read_csv(url).query("year == 2007")
304306
305-
data_2007 = data[data["year"] == 2007]
306-
307-
data_2007
307+
gapminder_data
308308
```
309309
- Try the above snippet in a notebook and it will give you an overview over the data.
310310
@@ -314,10 +314,10 @@ In this exercise we will learn how to use log scales.
314314
315315
fig, ax = plt.subplots()
316316
317-
ax.scatter(x=data_2007["gdpPercap"], y=data_2007["lifeExp"], alpha=0.5)
317+
ax.scatter(x=gapminder_data["gdpPercap"], y=gapminder_data["lifeExp"], alpha=0.5)
318318
319-
ax.set_xlabel("GDP (USD) per capita")
320-
ax.set_ylabel("life expectancy (years)")
319+
ax.set_xlabel("GDP per capita (PPP dollars)")
320+
ax.set_ylabel("Life expectancy (years)")
321321
```
322322
323323
This is the result but we realize that a linear scale is not ideal here:
@@ -342,12 +342,12 @@ emphasize-lines: 5
342342
---
343343
fig, ax = plt.subplots()
344344
345-
ax.scatter(x=data_2007["gdpPercap"], y=data_2007["lifeExp"], alpha=0.5)
345+
ax.scatter(x=gapminder_data["gdpPercap"], y=gapminder_data["lifeExp"], alpha=0.5)
346346
347347
ax.set_xscale("log")
348348
349-
ax.set_xlabel("GDP (USD) per capita")
350-
ax.set_ylabel("life expectancy (years)")
349+
ax.set_xlabel("GDP per capita (PPP dollars)")
350+
ax.set_ylabel("Life expectancy (years)")
351351
```
352352
* {obj}`alpha <matplotlib.artist.Artist.set_alpha>` sets transparency
353353
of points.
@@ -362,7 +362,7 @@ For figures that go to print it is good practice to look at them at the size
362362
they will be printed in and then often fonts and tickmarks are too small.
363363
364364
Your task is to make the tickmarks and the axis label font larger, using
365-
[Matplotlib parts of a figure](https://matplotlib.org/stable/tutorials/introductory/quick_start.html#parts-of-a-figure)
365+
[Matplotlib parts of a figure](https://matplotlib.org/stable/users/explain/quick_start.html#parts-of-a-figure)
366366
and web search, and to arrive at this:
367367
368368
```{figure} data-visualization/customizing/gapminder-larger-font.png
@@ -375,16 +375,17 @@ See {meth}`ax.tick_params <matplotlib.axes.Axes.tick_params>`.
375375
376376
```{code-block} python
377377
---
378-
emphasize-lines: 7-11
378+
emphasize-lines: 7-8, 10-12
379379
---
380380
fig, ax = plt.subplots()
381381
382-
ax.scatter(x=data_2007["gdpPercap"], y=data_2007["lifeExp"], alpha=0.5)
382+
ax.scatter(x="gdpPercap", y="lifeExp", alpha=0.5, data=gapminder_data)
383383
384384
ax.set_xscale("log")
385385
386-
ax.set_xlabel("GDP (USD) per capita", fontsize=15)
387-
ax.set_ylabel("life expectancy (years)", fontsize=15)
386+
ax.set_xlabel("GDP per capita (PPP dollars)", fontsize=15)
387+
ax.set_ylabel("Life expectancy (years)", fontsize=15)
388+
388389
ax.tick_params(which="major", length=10)
389390
ax.tick_params(which="minor", length=5)
390391
ax.tick_params(labelsize=15)
@@ -400,8 +401,9 @@ ax.tick_params(labelsize=15)
400401
probably the most standard and most widely used
401402
- [Seaborn](https://seaborn.pydata.org/examples/index.html):
402403
high-level interface to Matplotlib, statistical functions built in
403-
- [Altair](https://altair-viz.github.io/gallery/index.html):
404-
declarative visualization (R users will be more at home), statistics built in
404+
- [Vega-Altair](https://altair-viz.github.io/gallery/index.html):
405+
declarative visualization, statistics built in
406+
(we have an [entire lesson about data visualization using Vega-Altair](https://coderefinery.github.io/data-visualization-python/))
405407
- [Plotly](https://plotly.com/python/):
406408
interactive graphs
407409
- [Bokeh](https://demo.bokeh.org/):
@@ -412,14 +414,14 @@ ax.tick_params(labelsize=15)
412414
R users will be more at home
413415
- [PyNGL](https://www.pyngl.ucar.edu/Examples/gallery.shtml):
414416
used in the weather forecast community
415-
- [K3D](https://k3d-jupyter.org/showcase/):
416-
Jupyter notebook extension for 3D visualization
417+
- [K3D](https://k3d-jupyter.org/gallery/index.html):
418+
Jupyter Notebook extension for 3D visualization
417419
418420
- Browse the various example galleries (links above).
419421
- Select one example that is close to your recent visualization project or simply interests you.
420422
- Note that you might need to install additional Python packages in order make use of the libraries.
421423
This could be the visualization library itself, and in addition also any required dependency package.
422-
- First try to reproduce this example in the Jupyter notebook.
424+
- First try to reproduce this example in the Jupyter Notebook.
423425
- Then try to print out the data that is used in this example just before the call of the plotting function
424426
to learn about its structure. Is it a pandas dataframe? Is it a NumPy array? Is it a dictionary? A list?
425427
a list of lists?
@@ -509,8 +511,30 @@ clarify questions at this point before moving on.
509511

510512
---
511513

514+
## Matplotlib and pandas DataFrames
515+
516+
In the above exercises we have sent individual columns of the `gapminder_data` DataFrame
517+
into `ax.scatter()` like this:
518+
```python
519+
fig, ax = plt.subplots()
520+
521+
ax.scatter(x=gapminder_data["gdpPercap"], y=gapminder_data["lifeExp"], alpha=0.5)
522+
```
523+
524+
It is possible to do this instead and let Matplotlib "unpack" the columns:
525+
```python
526+
fig, ax = plt.subplots()
527+
528+
ax.scatter(x="gdpPercap", y="lifeExp", alpha=0.5, data=gapminder_data)
529+
```
530+
531+
Other input types are possible. See [Types of inputs to plotting
532+
functions](https://matplotlib.org/stable/users/explain/quick_start.html#types-of-inputs-to-plotting-functions).
533+
534+
---
535+
512536
```{keypoints}
513-
- Avoid manual post-processing, script everything.
537+
- Minimize manual post-processing, script everything.
514538
- Browse a number of example galleries to help you choose the library
515539
that fits best your work/style.
516540
- Figures for presentation slides and figures for manuscripts have
18.4 KB
Loading
17 KB
Loading
14.5 KB
Loading
13.1 KB
Loading
9.36 KB
Loading

0 commit comments

Comments
 (0)