You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Current Canadian sentiment is at a low, with high cost-of-living, global political instability, and sweeping layoffs across multiple sectors. For the [2025 `plotnine` contest](https://posit.co/blog/announcing-the-2025-table-and-plotnine-contests/), I wanted to explore current official Canadian labour statistics using `plotnine`, a data visualization library in `python`.
13
+
14
+
# Introduction
15
+
16
+
I am so happy that `plotnine` exists, which is a relatively new python data visualization package. `plotnine` is based on `ggplot2`, an R package that I have been using for almost a decade.
17
+
18
+
In this tutorial, I'll walk through the process of creating my `plotnine` 2025 contest submission. The plot shows employment across Canadian industries, ranked by their percent change in monthly employment. To help visualize data across different industries, industry-specific plots are laid out in a "pseudo" interactive manner.
19
+
20
+
# Setup
21
+
22
+
## Data
23
+
24
+
The data can be downloaded using this bash [script](https://github.com/wvictor14/labourcan/blob/main/data/downloadLabourData.sh), or directly from [StatCan's website](https://www150.statcan.gc.ca/t1/tbl1/en/tv.action?pid=1410035502).
13
25
14
26
## Parameters
15
27
28
+
In this initial code chunk we initialize some paramters that, later if needed, we can rerun this entire notebook with different paramters (e.g. different years).
29
+
30
+
`pyprojroot` is similar to R's package `here`, which lets us construct filepaths relative to the project root. This is very convenient especially for quarto projects with complex file organization.
31
+
16
32
```{python}
17
33
from pyprojroot import here
18
34
```
@@ -26,21 +42,28 @@ FILTER_YEAR = (2018, 2025)
26
42
## Libraries
27
43
28
44
```{python}
45
+
# Data manipulation
29
46
import polars as pl
30
47
import polars.selectors as cs
48
+
49
+
# Visualization
50
+
from plotnine import *
51
+
52
+
# Mizani helps customize the text and breaks on axes
31
53
from mizani.bounds import squish
32
54
import mizani.labels as ml
33
55
import mizani.breaks as mb
34
-
import textwrap
35
-
from pyprojroot import here
36
-
from great_tables import GT, md, html
37
-
from plotnine import *
38
-
from labourcan.data_processing import read_labourcan,calculate_centered_rank
56
+
import textwrap # for wrapping long lines of text
57
+
58
+
# Custom extract and transform functions for plot data
59
+
from labourcan.data_processing import read_labourcan, calculate_centered_rank
39
60
```
40
61
41
-
## Read data
62
+
## Read and process data for graphing
42
63
43
-
[`read_labourcan`](../py/labourcan/data_processing.py) returns a polars with:
64
+
The visualization required a fair amount of data processing which is detailed in this [page](01_develop_data_processing.html). The steps are summarized here:
65
+
66
+
[`read_labourcan`](../py/labourcan/data_processing.py) returns a `polars.Data.Frame` with:
44
67
45
68
- Unused columns removed
46
69
- Filtered to seasonally adjusted estimates only
@@ -55,93 +78,76 @@ labour = read_labourcan(LABOUR_DATA_FILE)
The type of visual that's being developed here is something like a heatmap of employment numbers.
59
84
60
-
Let's take a stab at a first visual.
85
+
We want a clean separation of industries that are growing or shrinking. For that we are using a rank ordering by % monthly changed. But not just any ordinary rank, we center it around 0 such that sectors that are growing (% change > 0) have a positive rank and those that are shrinking are negative.
86
+
87
+
`scale_color_gradient2` is a great option because it allows specification of our `midpoint=0`
This is looking pretty good. I added `height = 0.95` to add some whitespace between tiles vertically.
127
-
I actually wanted to remove whitespace completely, but I discovered `width` for `geom_tile` doesn't
128
-
work the same as it does for `ggplot2`. If I set `width=1` it seems to make the tiles smaller, instead of wider.
129
-
136
+
1. I added `height = 0.95` to add some whitespace between tiles vertically. To remove horizontal whitespace, we need to specify a `width`. Because we are using a `datetime` axis, we need to specify it in unit of days. But each tile here is a month, so we need to express in units of 30 hence: `width = 30*0.95`.
130
137
131
138
## Explicit color mapping with `scale_color_manual`
132
139
133
140
I am fairly happy with the `scale_fill_gradient2` used with `squish`. We get a really nice palette
134
-
that's centered around 0. However `scale_fill_gradient2` is limited to 3 colors (high, midpoint, low),
141
+
that's centered around 0. However `scale_fill_gradient2` is limited to 3 colors (`high`, `midpoint`, `low`),
135
142
which is not quite enable the more dynamic color palette that I'm seeking.
136
143
137
-
To be more explicit with the colors, I will bin the `PDIFF` and map colors manually
138
-
using `scale_fill_manual`
144
+
To be more explicit with the colors, I will bin the % change variable and then map each bin to a color manually using `scale_fill_manual`.
### `scale_fill_manual` for explicit color mapping
192
198
193
-
Now we need to order the levels, and map explicit colors
199
+
Now we need to order the levels, and map to a specific color palette.
194
200
195
-
We will make PDIFF=0%to be gray, positive values to have a green and blue colors (job growth = good), and negative values to have warmer (alarming, bad) colors.
201
+
We will make `PDIFF=0%` (no change) to be gray, positive values to have `green` and `blue` colors (*growth* = *good*), and negative values to be `red` and `orange` (*contraction* = *bad*) colors.
That looks great. The power of `scale_fill_manual` enables much more control over
245
-
the color palette. However, the cost was that it takes a lot more effort and lines of code
246
-
to create a custom mapping.
249
+
1. map `fill` to `PDIFF_BINNED`
250
+
2. provide explicit color mapping to `scale_fill_manual`
251
+
252
+
The power of `scale_fill_manual` is that it enables much more explicit control over
253
+
how color is mapped to data. However, the cost was that it takes a lot more effort and lines of code, compared to `scale_fill_gradient2`, which works well "out-of-box".
247
254
248
255
## The legend
249
256
250
-
...is extremely accurate, however we are going to simplify it and nicer to look at.
257
+
...is mathematically accurate, however we are going to make it nicer to look at.
251
258
252
259
First let's make the text more concise: we don't need every bin to be labelled, and instead of listing the range, we can just describe the midpoint.
Looks much better than my first attempt with a [horizontal legend](#horizontal-legend-with-horizontal-legend-text)
303
+
1. provide the list `legend_labels` to `scale_fill_manual`
304
+
305
+
I originally wanted to make a [horizontal legend](#horizontal-legend-with-horizontal-legend-text), but this works much better.
297
306
298
307
## Text and fonts
299
308
300
-
Next up is the text and fonts. I played with a few fonts on [google fonts](https://fonts.google.com/) before settling on two.
309
+
Next up is the text and fonts. I played with a few fonts on [google fonts](https://fonts.google.com/) before settling on two. Note that this website uses these fonts with the help of [brand.yml](_brand.yml)
301
310
302
-
First, install the fonts:
311
+
Install the fonts:
303
312
304
313
```{python}
305
314
FONT_PRIMARY = "Playfair Display"
@@ -309,7 +318,9 @@ fk.install(FONT_PRIMARY)
309
318
fk.install(FONT_SECONDARY)
310
319
```
311
320
312
-
plotnine breaks and labels for the scales can be easily adjusted using `mizani`, which is like the `scales` equivalent to `ggplot2`
321
+
### `mizani` for axis breaks and labels
322
+
323
+
plotnine breaks and labels for the scales can be easily adjusted using [`mizani`](https://mizani.readthedocs.io/en/stable/), which is like the [`scales`](https://scales.r-lib.org/) equivalent to `ggplot2`
313
324
314
325
We're going to use `mizani.breaks.breaks_date_width` to put breaks for each year, and `mizani.labels.label_date` to drop the "month" part of the date.
labels=ml.label_date("%Y"), # Format labels to show only the year
363
+
labels=ml.label_date("%Y"), # <2>
353
364
expand=(0, 0),
354
-
breaks=mb.breaks_date_width("1 years"),
365
+
breaks=mb.breaks_date_width("1 years"), # <2>
355
366
)
356
-
+ labs(
367
+
+ labs( # <3>
357
368
title="Sector Shifts: Where Canada's Jobs Are Moving",
358
369
subtitle=textwrap.fill(
359
370
"Track the number of industries gaining or losing jobs each month. Boxes are shaded based on percentage change from previous month in each industry's employment levels.",
@@ -366,25 +377,33 @@ plot = (
366
377
plot
367
378
```
368
379
380
+
1. Apply font family changes to the primary font in `theme(...)`
381
+
2. Use `mizani` to format labels to show only the year in `scale_x_datetime`
382
+
3. Add `title`, `subtitle` and wrap long lines with the help of `textwrap`
383
+
369
384
## Highlighting an Industry
370
385
371
-
For more deeper insights, I would like to see where each individual ranks in the graphic.
386
+
For more industry-specific insights, I would like to see where each individual ranks in the graphic.
Initially I wanted a horizontal legend for the colors. But in order to remove the whitespace between keys, I discovered that the text needs to be smaller than the legend keys, otherwise they "push" the legend keys apart in uneven manner. I attempted to (*unsuccesfully*) address this by making the legend text small, eliminating as much text as possible (e.g. removing the "%" characters for `-0.50` and `0.50`), and lastly increasing the legend key size.
0 commit comments