@@ -51,15 +51,351 @@ From [Claus O. Wilke: "Fundamentals of Data Visualization"](https://clauswilke.c
5151- "Simple, friendly and consistent API" allows us to focus on the data
5252 visualization part and get started without too much Python knowledge
5353- The way it ** combines visual channels with data columns** can feel intuitive
54- - Interfaces very nicely with [ pandas ] ( https://pandas.pydata.org/ )
54+ - Interfaces very nicely with [ Pandas ] ( https://pandas.pydata.org/ )
5555- Easy to change figures
5656- Good documentation
5757- Open source
5858- Makes it easy to save figures in a number of formats
5959- Easy to save interactive visualizations to be used in websites
6060
6161
62- ## Exercise
62+ ## Example data: Weather data from two Norwegian cities
63+
64+ We will experiment with some example weather data obtained from [ Norsk
65+ KlimaServiceSenter] ( https://seklima.met.no/observations/ ) , Meteorologisk
66+ institutt (MET) (CC BY 4.0). The data is in CSV format (comma-separated
67+ values) and contains daily and monthly weather data for two cities in Norway:
68+ Oslo and Tromsø. You can browse the data
69+ [ here] ( https://github.com/AaltoSciComp/python-for-scicomp/tree/master/resources/data/plotting ) .
70+
71+ We will use the Pandas library to read the data into a dataframe. We have learned about Pandas in an {ref}` earlier episode <pandas> ` .
72+
73+ Pandas can read from and write to a large set of formats
74+ ([ overview of input/output functions and formats] ( https://pandas.pydata.org/pandas-docs/stable/reference/io.html ) ).
75+ We will load a CSV file directly from the web. Instead of using a web URL we
76+ could use a local file name instead.
77+
78+ Pandas dataframes are a great data structure for ** tabular data** and tabular
79+ data turns out to be a great input format for data visualization libraries.
80+ Vega-Altair understands Pandas dataframes and can plot them directly.
81+
82+
83+ ## Reading data into a dataframe
84+
85+ We can try this together in a notebook:
86+ Using Pandas we can ** merge, join, concatenate, and compare**
87+ dataframes, see < https://pandas.pydata.org/pandas-docs/stable/user_guide/merging.html > .
88+
89+ Let us try to ** concatenate** two dataframes: one for Tromsø weather data (we
90+ will now load monthly values) and one for Oslo:
91+ ``` {code-block} python
92+ ---
93+ emphasize-lines: 8
94+ ---
95+ import pandas as pd
96+
97+ url_prefix = "https://raw.githubusercontent.com/AaltoSciComp/python-for-scicomp/master/resources/data/plotting/"
98+
99+ data_tromso = pd.read_csv(url_prefix + "tromso-monthly.csv")
100+ data_oslo = pd.read_csv(url_prefix + "oslo-monthly.csv")
101+
102+ data_monthly = pd.concat([data_tromso, data_oslo], axis=0)
103+
104+ # let us print the combined result
105+ data_monthly
106+ ```
107+
108+ Before plotting the data, there is a problem which we may not see yet: Dates
109+ are not in a standard date format (YYYY-MM-DD). We can fix this:
110+ ``` python
111+ # replace mm.yyyy to date format
112+ data_monthly[" date" ] = pd.to_datetime(list (data_monthly[" date" ]), format = " %m.%Y" )
113+ ```
114+
115+ With Pandas it is possible to do a lot more (adjusting missing values, fixing
116+ inconsistencies, changing format).
117+
118+
119+ ## Plotting the data
120+
121+ Now let's plot the data. We will start with a plot that is not optimal and
122+ then we will explore and improve a bit as we go:
123+
124+ ``` python
125+ import altair as alt
126+
127+ alt.Chart(data_monthly).mark_bar().encode(
128+ x = " date" ,
129+ y = " precipitation" ,
130+ color = " name" ,
131+ )
132+ ```
133+
134+ :::{figure} plotting-vega-altair/precipitation-on-top.svg
135+ :alt: Monthly precipitation for the cities Oslo and Tromsø over the course of a year.
136+
137+ Monthly precipitation for the cities Oslo and Tromsø over the course of a year.
138+ :::
139+
140+ :::{discussion} Let us pause and explain the code
141+ - ` alt ` is a short-hand for ` altair ` which we imported on top of the notebook
142+ - ` Chart() ` is a function defined inside ` altair ` which takes the data as argument
143+ - ` mark_bar() ` is a function that produces bar charts
144+ - ` encode() ` is a function which encodes data columns to ** visual channels**
145+
146+ Observe how we connect (encode) ** visual channels** to data columns:
147+ - x-coordinate with "date"
148+ - y-coordinate with "precipitation"
149+ - color with "name" (name of weather station; city)
150+ :::
151+
152+ We can improve the plot by giving Vega-Altair a bit more information that the
153+ x-axis is a ** time series** (T) and that we would like to see the year and
154+ month (yearmonth):
155+ ``` {code-block} python
156+ ---
157+ emphasize-lines: 2
158+ ---
159+ alt.Chart(data_monthly).mark_bar().encode(
160+ x="yearmonth(date):T",
161+ y="precipitation",
162+ color="name",
163+ )
164+ ```
165+
166+ :::{figure} plotting-vega-altair/precipitation-on-top-yearmonth.svg
167+ :alt: Monthly precipitation for the cities Oslo and Tromsø over the course of a year.
168+
169+ Monthly precipitation for the cities Oslo and Tromsø over the course of a year.
170+ :::
171+
172+ Let us improve the plot with another one-line change:
173+ ``` {code-block} python
174+ ---
175+ emphasize-lines: 5
176+ ---
177+ alt.Chart(data).mark_bar().encode(
178+ x="yearmonth(date):T",
179+ y="precipitation",
180+ color="name",
181+ column="name",
182+ )
183+ ```
184+
185+ :::{figure} plotting-vega-altair/precipitation-side.svg
186+ :alt: Monthly precipitation for the cities Oslo and Tromsø over the course of a year with with both cities plotted side by side.
187+
188+ Monthly precipitation for the cities Oslo and Tromsø over the course of a year with with both cities plotted side by side.
189+ :::
190+
191+ With another one-line change we can make the bar chart stacked:
192+ ``` {code-block} python
193+ ---
194+ emphasize-lines: 5
195+ ---
196+ alt.Chart(data_monthly).mark_bar().encode(
197+ x="yearmonth(date):T",
198+ y="precipitation",
199+ color="name",
200+ xOffset="name",
201+ )
202+ ```
203+ :::{figure} plotting-vega-altair/precipitation-stacked-x.svg
204+ :alt: Monthly precipitation for the cities Oslo and Tromsø over the course of a year with with both cities plotted side by side.
205+
206+ Monthly precipitation for the cities Oslo and Tromsø over the course of a year
207+ plotted as stacked bar chart.
208+ :::
209+
210+ This is not publication-quality yet but a really good start!
211+
212+ Let us try one more example where we can nicely see how Vega-Altair
213+ is able to map visual channels to data columns:
214+ ``` python
215+ alt.Chart(data_monthly).mark_area(opacity = 0.5 ).encode(
216+ x = " yearmonth(date):T" ,
217+ y = " max temperature" ,
218+ y2 = " min temperature" ,
219+ color = " name" ,
220+ )
221+ ```
222+
223+ :::{figure} plotting-vega-altair/temperature-ranges-combined.svg
224+ :alt: Monthly temperature ranges for two cities in Norway.
225+
226+ Monthly temperature ranges for two cities in Norway.
227+ :::
228+
229+
230+ ## Exercise: Using visual channels to re-arrange plots
231+
232+ ::::{exercise} Plotting-1: Using visual channels to re-arrange plots
233+ 1 . Try to reproduce the above plots if they are not already in your notebook.
234+
235+ 1 . Above we have plotted the monthly precipitation for two cities side by side
236+ using a stacked plot. Try to arrive at the following plot where months are
237+ along the y-axis and the precipitation amount is along the x-axis:
238+ :::{figure} plotting-vega-altair/precipitation-stacked-y.svg
239+ :::
240+
241+ 1 . Ask the web-search or AI how to change the axis title from "precipitation"
242+ to "Precipitation (mm)".
243+
244+ 1 . Modify the temperature range plot to show the temperature ranges for the
245+ two cities side by side like this:
246+ :::{figure} plotting-vega-altair/temperature-ranges-side.svg
247+ :::
248+
249+ :::{solution}
250+ 1 . Copy-paste code blocks from above.
251+
252+ 1 . Basically we switched x and y:
253+ ``` {code-block} python
254+ ---
255+ emphasize-lines: 2,3,5
256+ ---
257+ alt.Chart(data_monthly).mark_bar().encode(
258+ y="yearmonth(date):T",
259+ x="precipitation",
260+ color="name",
261+ yOffset="name",
262+ )
263+ ```
264+
265+ 1 . This can be done with the following modification:
266+ ``` {code-block} python
267+ ---
268+ emphasize-lines: 3
269+ ---
270+ alt.Chart(data_monthly).mark_bar().encode(
271+ y="yearmonth(date):T",
272+ x=alt.X("precipitation").title("Precipitation (mm)"),
273+ color="name",
274+ yOffset="name",
275+ )
276+ ```
277+
278+ 1 . We added one line:
279+ ``` {code-block} python
280+ ---
281+ emphasize-lines: 6
282+ ---
283+ alt.Chart(data_monthly).mark_area(opacity=0.5).encode(
284+ x="yearmonth(date):T",
285+ y="max temperature",
286+ y2="min temperature",
287+ color="name",
288+ column="name",
289+ )
290+ ```
291+ :::
292+ ::::
293+
294+
295+ ## Using visual channels
296+
297+ Now we will try to ** plot the daily data and look at snow depths** . We first
298+ read and concatenate two datasets:
299+ ``` python
300+ url_prefix = " https://raw.githubusercontent.com/AaltoSciComp/python-for-scicomp/master/resources/data/plotting/"
301+
302+ data_tromso = pd.read_csv(url_prefix + " tromso-daily.csv" )
303+ data_oslo = pd.read_csv(url_prefix + " oslo-daily.csv" )
304+
305+ data_daily = pd.concat([data_tromso, data_oslo], axis = 0 )
306+ ```
307+
308+ We adjust the data a bit:
309+ ``` python
310+ # replace dd.mm.yyyy to date format
311+ data_daily[" date" ] = pd.to_datetime(list (data_daily[" date" ]), format = " %d .%m.%Y" )
312+
313+ # we are here only interested in the range december to may
314+ data_daily = data_daily[
315+ (data_daily[" date" ] > " 2022-12-01" ) & (data_daily[" date" ] < " 2023-05-01" )
316+ ]
317+ ```
318+
319+ Now we can plot the snow depths for the months December to May for the two
320+ cities:
321+ ``` python
322+ alt.Chart(data_daily).mark_bar().encode(
323+ x = " date" ,
324+ y = " snow depth" ,
325+ column = " name" ,
326+ )
327+ ```
328+
329+ :::{figure} plotting-vega-altair/snow-depth.svg
330+ :alt: Snow depth (in cm) for the months December 2022 to May 2023 for two cities in Norway.
331+
332+ Snow depth (in cm) for the months December 2022 to May 2023 for two cities in Norway.
333+ :::
334+
335+ What happens if we try to color the plot by the "max temperature" values?
336+ ``` {code-block} python
337+ ---
338+ emphasize-lines: 4
339+ ---
340+ alt.Chart(data_daily).mark_bar().encode(
341+ x="date",
342+ y="snow depth",
343+ color="max temperature",
344+ column="name",
345+ )
346+ ```
347+
348+ The result looks neat:
349+ :::{figure} plotting-vega-altair/snow-depth-color.svg
350+ :alt: Snow depth (in cm) for the months December 2022 to May 2023 for two cities in Norway. Colored by daily max temperature.
351+
352+ Snow depth (in cm) for the months December 2022 to May 2023 for two cities in Norway. Colored by daily max temperature.
353+ :::
354+
355+ We can change the color scheme ([ available color schemes] ( https://vega.github.io/vega/docs/schemes/ ) ):
356+ ``` {code-block} python
357+ ---
358+ emphasize-lines: 4
359+ ---
360+ alt.Chart(data_daily).mark_bar().encode(
361+ x="date",
362+ y="snow depth",
363+ color=alt.Color("max temperature").scale(scheme="plasma"),
364+ column="name",
365+ )
366+ ```
367+
368+ With the following result:
369+ :::{figure} plotting-vega-altair/snow-depth-plasma.svg
370+ :alt: Snow depth (in cm) for the months December 2022 to May 2023 for two cities in Norway. Colored by daily max temperature. Warmer days are often followed by reduced snow depth.
371+
372+ Snow depth (in cm) for the months December 2022 to May 2023 for two cities in Norway. Colored by daily max temperature. Warmer days are often followed by reduced snow depth.
373+ :::
374+
375+ :::{discussion} What other marks and other visual channels exist?
376+ - [ Overview of available marks] ( https://altair-viz.github.io/user_guide/marks/index.html )
377+ - [ Overview of available visual channels] ( https://altair-viz.github.io/user_guide/encodings/channels.html )
378+ - [ Gallery of examples] ( https://altair-viz.github.io/gallery/index.html )
379+ :::
380+
381+
382+ ## Themes
383+
384+ In [ Vega-Altair] ( https://altair-viz.github.io/ ) you can change the theme and
385+ select from a long [ list of themes] ( https://github.com/vega/vega-themes ) . On
386+ top of your notebook try to add:
387+ ``` python
388+ alt.themes.enable(' dark' )
389+ ```
390+ Then re-run all cells. Later you can try some other themes such as:
391+ - ` fivethirtyeight `
392+ - ` latimes `
393+ - ` urbaninstitute `
394+
395+ You can even define your own themes!
396+
397+
398+ ## Exercise: Adapting a gallery example
63399
64400In this exercise we can try to adapt existing scripts to either ** tweak how the
65401plot looks** or to ** modify the input data** . This is very close to real life:
@@ -70,7 +406,7 @@ remember everything so this strategy is useful to practice:
70406- Being able to search for help
71407- Being able to understand help request answers (not easy)
72408
73- :::{challenge} Exercise Customization-1 : Adapting a gallery example
409+ ::::{exercise} Plotting-2 : Adapting a gallery example
74410** This is a great exercise which is very close to real life.**
75411
76412- Browse the [ Vega-Altair example gallery] ( https://altair-viz.github.io/gallery/index.html ) .
@@ -82,7 +418,11 @@ remember everything so this strategy is useful to practice:
82418- Then try to modify the data a bit.
83419- If you have time, try to feed it different, simplified data.
84420 This will be key for adapting the examples to your projects.
421+
422+ :::{solution} Example walk-through
423+ (work in progress)
85424:::
425+ ::::
86426
87427---
88428
0 commit comments