Skip to content

Commit ccf532b

Browse files
authored
Merge pull request #5 from calculquebec/nouveau-chap4
Nouveau chapitre 4 - Graphiques de courbes temporelles
2 parents fd86dc3 + d6fb7f3 commit ccf532b

File tree

1 file changed

+359
-0
lines changed

1 file changed

+359
-0
lines changed

src/04-mark_line.ipynb

Lines changed: 359 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -34,6 +34,365 @@
3434
"* Create different types of plots."
3535
]
3636
},
37+
{
38+
"cell_type": "code",
39+
"execution_count": null,
40+
"id": "7886f873-0800-4a2f-afa7-9f57f6cdb3fa",
41+
"metadata": {
42+
"lang": "fr"
43+
},
44+
"outputs": [],
45+
"source": [
46+
"import pandas as pd\n",
47+
"\n",
48+
"# Charger les données nettoyées\n",
49+
"surveys_complet = pd.read_csv('../data/surveys_0_NA.csv')\n",
50+
"surveys_complet"
51+
]
52+
},
53+
{
54+
"cell_type": "code",
55+
"execution_count": null,
56+
"id": "ace4ab7d-b2d7-42d2-bc7b-d29646f2fd81",
57+
"metadata": {
58+
"lang": "en"
59+
},
60+
"outputs": [],
61+
"source": [
62+
"import pandas as pd\n",
63+
"\n",
64+
"# Load the cleaned data\n",
65+
"surveys_complete = pd.read_csv('../data/surveys_0_NA.csv')\n",
66+
"surveys_complete"
67+
]
68+
},
69+
{
70+
"cell_type": "code",
71+
"execution_count": null,
72+
"id": "dcaa744e-e541-4e99-b385-43820d64bbb1",
73+
"metadata": {
74+
"lang": "en,fr"
75+
},
76+
"outputs": [],
77+
"source": [
78+
"import altair as alt\n",
79+
"alt.data_transformers.disable_max_rows()"
80+
]
81+
},
82+
{
83+
"cell_type": "markdown",
84+
"id": "29c791ed-c9c1-4d28-8a8d-12f43fbc1b7e",
85+
"metadata": {
86+
"lang": "fr"
87+
},
88+
"source": [
89+
"## Visualiser des données selon le temps\n",
90+
"* Nombre d'enregistrements par type d'espèce pour chaque année :"
91+
]
92+
},
93+
{
94+
"cell_type": "markdown",
95+
"id": "edb24b16-97a3-4b47-8146-41cc9a9336b5",
96+
"metadata": {
97+
"lang": "en"
98+
},
99+
"source": [
100+
"## Plotting time series data\n",
101+
"* Let’s visualize the number of records per year for each species"
102+
]
103+
},
104+
{
105+
"cell_type": "code",
106+
"execution_count": null,
107+
"id": "273a8a39-c161-4f74-94f0-50334927174a",
108+
"metadata": {
109+
"lang": "fr"
110+
},
111+
"outputs": [],
112+
"source": [
113+
"alt.Chart(surveys_complet).mark_line().encode(\n",
114+
" x=alt.X('year').type('ordinal'),\n",
115+
" y=alt.Y('count()').scale(type='log', base=2),\n",
116+
" color=alt.Color('species_id'),\n",
117+
")"
118+
]
119+
},
120+
{
121+
"cell_type": "code",
122+
"execution_count": null,
123+
"id": "c34f9aa4-ed84-4c02-85e8-dca890636433",
124+
"metadata": {
125+
"lang": "en"
126+
},
127+
"outputs": [],
128+
"source": [
129+
"alt.Chart(surveys_complete).mark_line().encode(\n",
130+
" x=alt.X('year').type('ordinal'),\n",
131+
" y=alt.Y('count()').scale(type='log', base=2),\n",
132+
" color=alt.Color('species_id'),\n",
133+
")"
134+
]
135+
},
136+
{
137+
"cell_type": "markdown",
138+
"id": "8bc40c68-66ae-408e-8d9c-bcd16e8a9924",
139+
"metadata": {
140+
"lang": "fr"
141+
},
142+
"source": [
143+
"* Poids médian par type d'espèce pour chaque mois :"
144+
]
145+
},
146+
{
147+
"cell_type": "markdown",
148+
"id": "d0f084a6-c69d-4896-8a7e-796ff77be155",
149+
"metadata": {
150+
"lang": "en"
151+
},
152+
"source": [
153+
"* And now, the median weight per month for each species"
154+
]
155+
},
156+
{
157+
"cell_type": "code",
158+
"execution_count": null,
159+
"id": "fef50d4d-305a-4b5d-b8b9-4895f4ad9d50",
160+
"metadata": {
161+
"lang": "fr"
162+
},
163+
"outputs": [],
164+
"source": [
165+
"alt.Chart(surveys_complet).mark_line().encode(\n",
166+
" x=alt.X('month').type('ordinal'),\n",
167+
" y=alt.Y('weight').aggregate('median'),\n",
168+
" color=alt.Color('species_id'),\n",
169+
" tooltip=['species_id'],\n",
170+
")"
171+
]
172+
},
173+
{
174+
"cell_type": "code",
175+
"execution_count": null,
176+
"id": "b95897bc-7434-4405-8b48-557e38af53c8",
177+
"metadata": {
178+
"lang": "en"
179+
},
180+
"outputs": [],
181+
"source": [
182+
"alt.Chart(surveys_complete).mark_line().encode(\n",
183+
" x=alt.X('month').type('ordinal'),\n",
184+
" y=alt.Y('weight').aggregate('median'),\n",
185+
" color=alt.Color('species_id'),\n",
186+
" tooltip=['species_id'],\n",
187+
")"
188+
]
189+
},
190+
{
191+
"cell_type": "markdown",
192+
"id": "ccbd4df4-8152-4810-85bd-260180dbc34c",
193+
"metadata": {
194+
"lang": "fr"
195+
},
196+
"source": [
197+
"### Exercice - Visualisation selon le temps\n",
198+
"`1`. Utilisez la fonction `pd.to_datetime()` pour générer une colonne\n",
199+
" de dates à partir des colonnes `year`, `month` et `day`. (3 min.)"
200+
]
201+
},
202+
{
203+
"cell_type": "markdown",
204+
"id": "fec3f053-3465-449d-81dd-dd2dbd5a8c69",
205+
"metadata": {
206+
"lang": "en"
207+
},
208+
"source": [
209+
"### Exercise - Plotting time series data\n",
210+
"`1`. Use the `pd.to_datetime()` function to generate a new\n",
211+
"`date` column from the columns `year`, `month` and `day`. (3 min.)"
212+
]
213+
},
214+
{
215+
"cell_type": "code",
216+
"execution_count": null,
217+
"id": "54217ca2-08d7-4c56-8063-a168387c2a9b",
218+
"metadata": {
219+
"lang": "fr",
220+
"tags": [
221+
"soln"
222+
]
223+
},
224+
"outputs": [],
225+
"source": [
226+
"# Décennie 1990 - pour éviter les 31 avril et 31 septembre 2000\n",
227+
"dec_1990 = surveys_complet[\n",
228+
" surveys_complet['year'].isin(range(1990, 2000))].copy()\n",
229+
"\n",
230+
"dec_1990['date'] = pd.to_datetime(dec_1990[['year', 'month', 'day']])\n",
231+
"dec_1990['date']"
232+
]
233+
},
234+
{
235+
"cell_type": "code",
236+
"execution_count": null,
237+
"id": "eae426f2-07be-4495-9cb1-a3c6fe6a6af7",
238+
"metadata": {
239+
"lang": "fr",
240+
"tags": [
241+
"exer"
242+
]
243+
},
244+
"outputs": [],
245+
"source": [
246+
"# Décennie 1990 - pour éviter les 31 avril et 31 septembre 2000\n",
247+
"dec_1990 = surveys_complet[\n",
248+
" surveys_complet['year'].isin(range(1990, 2000))].copy()\n",
249+
"\n",
250+
"dec_1990['date'] = ###['year', 'month', 'day']###\n",
251+
"dec_1990['date']"
252+
]
253+
},
254+
{
255+
"cell_type": "code",
256+
"execution_count": null,
257+
"id": "253efd92-63aa-44dd-9364-afb0afd46655",
258+
"metadata": {
259+
"lang": "en",
260+
"tags": [
261+
"soln"
262+
]
263+
},
264+
"outputs": [],
265+
"source": [
266+
"# Decade 1990 - avoid invalid dates April 31 and September 31, 2000\n",
267+
"dec_1990 = surveys_complete[\n",
268+
" surveys_complete['year'].isin(range(1990, 2000))].copy()\n",
269+
"\n",
270+
"dec_1990['date'] = pd.to_datetime(dec_1990[['year', 'month', 'day']])\n",
271+
"dec_1990['date']"
272+
]
273+
},
274+
{
275+
"cell_type": "code",
276+
"execution_count": null,
277+
"id": "348236bf-aaa5-4f3b-8058-0b024e150bec",
278+
"metadata": {
279+
"lang": "en",
280+
"tags": [
281+
"exer"
282+
]
283+
},
284+
"outputs": [],
285+
"source": [
286+
"# Decade 1990 - avoid invalid dates April 31 and September 31, 2000\n",
287+
"dec_1990 = surveys_complete[\n",
288+
" surveys_complete['year'].isin(range(1990, 2000))].copy()\n",
289+
"\n",
290+
"dec_1990['date'] = ###['year', 'month', 'day']###\n",
291+
"dec_1990['date']"
292+
]
293+
},
294+
{
295+
"cell_type": "markdown",
296+
"id": "9de12028-9813-44b6-bc36-a85d83bc106f",
297+
"metadata": {
298+
"lang": "fr"
299+
},
300+
"source": [
301+
"`2`. Affichez le poids médian de chaque espèce selon la `date`.\n",
302+
"(3 min.)"
303+
]
304+
},
305+
{
306+
"cell_type": "markdown",
307+
"id": "1d77e0e8-0663-4f73-a2a3-10f32ea3337f",
308+
"metadata": {
309+
"lang": "en"
310+
},
311+
"source": [
312+
"`2`. Visualize the median weight of each species by the `date`.\n",
313+
"(3 min.)"
314+
]
315+
},
316+
{
317+
"cell_type": "code",
318+
"execution_count": null,
319+
"id": "e67bd045-bf33-4a54-aa4b-8a001509d9fc",
320+
"metadata": {
321+
"lang": "en,fr",
322+
"tags": [
323+
"soln"
324+
]
325+
},
326+
"outputs": [],
327+
"source": [
328+
"alt.Chart(dec_1990).mark_line().encode(\n",
329+
" x=alt.X('date'),\n",
330+
" y=alt.Y('weight').aggregate('median'),\n",
331+
" color=alt.Color('species_id'),\n",
332+
" tooltip=['species_id', 'date'],\n",
333+
")"
334+
]
335+
},
336+
{
337+
"cell_type": "code",
338+
"execution_count": null,
339+
"id": "2e3acb45-f733-4c70-94ef-14771a7f50ef",
340+
"metadata": {
341+
"lang": "en,fr",
342+
"tags": [
343+
"exer"
344+
]
345+
},
346+
"outputs": [],
347+
"source": [
348+
"alt.Chart(###).mark_line().encode(\n",
349+
" x=alt.X(###),\n",
350+
" y=alt.Y('weight').###('median'),\n",
351+
" color=alt.Color('species_id'),\n",
352+
" tooltip=['species_id', 'date'],\n",
353+
")"
354+
]
355+
},
356+
{
357+
"cell_type": "markdown",
358+
"id": "8b198859-0736-4b92-b56c-67cb3e102156",
359+
"metadata": {
360+
"lang": "fr"
361+
},
362+
"source": [
363+
"## Résumé technique\n",
364+
"* **Création d'une colonne de dates**\n",
365+
" * `df['date'] = pd.to_datetime(df[['year', 'month', 'day']])`\n",
366+
"* **Choix du type de marqueurs** à afficher\n",
367+
" * `graphique.mark_line()`\n",
368+
"* **Assigner des variables** à des canaux du graphique\n",
369+
" * `graphique.encode(...)`\n",
370+
" * Différents canaux :\n",
371+
" * `y=alt.Y('count()')`\n",
372+
" * `y=alt.Y('varY').aggregate('stat')`,\n",
373+
" avec les statistiques `'mean'`, `'median'`, etc."
374+
]
375+
},
376+
{
377+
"cell_type": "markdown",
378+
"id": "d9cf858b-cf70-44f3-962e-016144764737",
379+
"metadata": {
380+
"lang": "en"
381+
},
382+
"source": [
383+
"## Key points\n",
384+
"* **Creating a column of dates**\n",
385+
" * `df['date'] = pd.to_datetime(df[['year', 'month', 'day']])`\n",
386+
"* **Choosing a type of chart**\n",
387+
" * `chart.mark_line()`\n",
388+
"* **Assigning data fields to encoding channels**:\n",
389+
" * `chart.encode(...)`\n",
390+
" * Encoding channels:\n",
391+
" * `y=alt.Y('count()')`\n",
392+
" * `y=alt.Y('field_for_Y').aggregate(...)`,\n",
393+
" with either `'mean'`, `'median'`, etc."
394+
]
395+
},
37396
{
38397
"cell_type": "code",
39398
"execution_count": null,

0 commit comments

Comments
 (0)