Skip to content

Commit 141da43

Browse files
committed
Add three example notebooks
1 parent 773234a commit 141da43

File tree

3 files changed

+1328
-0
lines changed

3 files changed

+1328
-0
lines changed
Lines changed: 291 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,291 @@
1+
{
2+
"cells": [
3+
{
4+
"cell_type": "markdown",
5+
"metadata": {},
6+
"source": [
7+
"# Exemple de notebook - filtres sql et récupération des données en pandas\n",
8+
"\n",
9+
"L'objectif de ce notebook est de fournir des exemples pour pré-filtrer les données via sql avant de charger les données dans un DataFrame pandas.\n"
10+
]
11+
},
12+
{
13+
"cell_type": "code",
14+
"execution_count": 1,
15+
"metadata": {},
16+
"outputs": [],
17+
"source": [
18+
"import pandas as pd\n",
19+
"\n",
20+
"pd.set_option(\"display.max_columns\", None) # show all cols\n",
21+
"pd.set_option(\"display.max_colwidth\", None) # show full width of showing cols\n",
22+
"pd.set_option(\n",
23+
" \"display.expand_frame_repr\", False\n",
24+
") # print cols side by side as it's supposed to be"
25+
]
26+
},
27+
{
28+
"cell_type": "code",
29+
"execution_count": null,
30+
"metadata": {},
31+
"outputs": [],
32+
"source": [
33+
"# Nous commencons par importer les librairies nécessaires pour l'analyse des données.\n",
34+
"\n",
35+
"import duckdb\n",
36+
"from pipelines.tasks.config.common import DUCKDB_FILE\n",
37+
"\n",
38+
"con = duckdb.connect(database=DUCKDB_FILE, read_only=True)"
39+
]
40+
},
41+
{
42+
"cell_type": "markdown",
43+
"metadata": {},
44+
"source": [
45+
"## Filtres\n",
46+
"\n",
47+
"1. Filtrer sur les prélévements de 2024\n"
48+
]
49+
},
50+
{
51+
"cell_type": "code",
52+
"execution_count": null,
53+
"metadata": {},
54+
"outputs": [],
55+
"source": [
56+
"query_2024 = \"\"\"\n",
57+
"SELECT * from edc_prelevements\n",
58+
"WHERE dateprel >= '2024-01-01'\n",
59+
"\"\"\"\n",
60+
"\n",
61+
"prelevements_2024 = con.sql(query_2024)\n",
62+
"prelevements_2024_df = prelevements_2024.df()\n",
63+
"prelevements_2024_df.head(2)"
64+
]
65+
},
66+
{
67+
"cell_type": "markdown",
68+
"metadata": {},
69+
"source": [
70+
"2. Filtrer sur les prélévements non conformes en 2024\n"
71+
]
72+
},
73+
{
74+
"cell_type": "code",
75+
"execution_count": null,
76+
"metadata": {},
77+
"outputs": [],
78+
"source": [
79+
"where_clause = \"\"\"\n",
80+
"\"dateprel\" >= '2024-01-01'\n",
81+
" AND (\n",
82+
" (\n",
83+
" \"plvconformitebacterio\" = 'N'\n",
84+
" )\n",
85+
" OR (\n",
86+
" \"plvconformitechimique\" = 'N'\n",
87+
" )\n",
88+
" OR (\n",
89+
" \"plvconformitereferencebact\" = 'N'\n",
90+
" )\n",
91+
" OR (\n",
92+
" \"plvconformitereferencechim\" = 'N'\n",
93+
" )\n",
94+
" )\n",
95+
"\"\"\"\n",
96+
"query_non_conforme = f\"\"\"\n",
97+
"SELECT\n",
98+
" *\n",
99+
"FROM \"edc_prelevements\"\n",
100+
"WHERE\n",
101+
" {where_clause}\n",
102+
"\"\"\"\n",
103+
"prelevements_2024_non_conforme = con.sql(query_non_conforme)\n",
104+
"prelevements_2024_non_conforme_df = prelevements_2024_non_conforme.df()\n",
105+
"prelevements_2024_non_conforme_df.head(2)"
106+
]
107+
},
108+
{
109+
"cell_type": "markdown",
110+
"metadata": {},
111+
"source": [
112+
"## Selectionner des colonnes avant d'exécuter la requête\n"
113+
]
114+
},
115+
{
116+
"cell_type": "markdown",
117+
"metadata": {},
118+
"source": [
119+
"Selectionner les colonnes avant de charger les données permets une exécution plus rapide et limite l'usage de la mémoire.\n"
120+
]
121+
},
122+
{
123+
"cell_type": "code",
124+
"execution_count": null,
125+
"metadata": {},
126+
"outputs": [],
127+
"source": [
128+
"query_preselected = f\"\"\"\n",
129+
"SELECT\n",
130+
" \"referenceprel\",\n",
131+
" \"dateprel\",\n",
132+
" \"nomcommuneprinc\",\n",
133+
" \"plvconformitebacterio\"\n",
134+
"FROM \"edc_prelevements\"\n",
135+
"WHERE\n",
136+
" {where_clause}\n",
137+
"\"\"\"\n",
138+
"preselected = con.sql(query_preselected)\n",
139+
"preselected_df = preselected.df()\n",
140+
"preselected_df.head(2)"
141+
]
142+
},
143+
{
144+
"cell_type": "markdown",
145+
"metadata": {},
146+
"source": [
147+
"## Jointure\n",
148+
"\n",
149+
"Joindre edc_prelevements et edc_resultats sur referenceprel pour obtenir les résultats associés à chaque prélèvement :\n"
150+
]
151+
},
152+
{
153+
"cell_type": "code",
154+
"execution_count": null,
155+
"metadata": {},
156+
"outputs": [],
157+
"source": [
158+
"query = f\"\"\"\n",
159+
"SELECT\n",
160+
" \"edc_prelevements\".\"referenceprel\",\n",
161+
" \"edc_prelevements\".\"dateprel\",\n",
162+
" \"edc_prelevements\".\"nomcommuneprinc\",\n",
163+
" \"edc_resultats\".\"libmajparametre\",\n",
164+
" \"edc_resultats\".\"insituana\",\n",
165+
" \"edc_resultats\".\"rqana\",\n",
166+
" \"edc_resultats\".\"cdunitereferencesiseeaux\"\n",
167+
"FROM (\n",
168+
" SELECT\n",
169+
" *\n",
170+
" FROM \"edc_prelevements\" \n",
171+
" WHERE\n",
172+
" {where_clause}\n",
173+
") AS edc_prelevements\n",
174+
"INNER JOIN \"edc_resultats\"\n",
175+
" ON \"edc_prelevements\".\"referenceprel\" = \"edc_resultats\".\"referenceprel\"\n",
176+
"\"\"\"\n",
177+
"\n",
178+
"\n",
179+
"joined = con.sql(query)\n",
180+
"joined_df = joined.df()\n",
181+
"joined_df"
182+
]
183+
},
184+
{
185+
"cell_type": "markdown",
186+
"metadata": {},
187+
"source": [
188+
"## Groupby et aggregats\n",
189+
"\n",
190+
"Nombre total de prélèvements non conforme par commune en 2024\n"
191+
]
192+
},
193+
{
194+
"cell_type": "code",
195+
"execution_count": null,
196+
"metadata": {},
197+
"outputs": [],
198+
"source": [
199+
"query = f\"\"\"\n",
200+
"SELECT\n",
201+
" \"nomcommuneprinc\",\n",
202+
" COUNT(\"referenceprel\") AS \"nb_prelevements_non_conformes\"\n",
203+
"FROM (\n",
204+
" SELECT\n",
205+
" *\n",
206+
" FROM \"edc_prelevements\" \n",
207+
" WHERE\n",
208+
" {where_clause}\n",
209+
") \n",
210+
"GROUP BY\n",
211+
" 1\n",
212+
"\"\"\"\n",
213+
"grouped = con.sql(query)\n",
214+
"grouped_df = grouped.df()\n",
215+
"grouped_df.sort_values(\"nb_prelevements_non_conformes\", ascending=False)"
216+
]
217+
},
218+
{
219+
"cell_type": "markdown",
220+
"metadata": {},
221+
"source": [
222+
"## Autres exemples :\n"
223+
]
224+
},
225+
{
226+
"cell_type": "code",
227+
"execution_count": null,
228+
"metadata": {},
229+
"outputs": [],
230+
"source": [
231+
"# Exemple issu du notebook premier notebook d'exemple : exemple.ipynb\n",
232+
"# Faisons une requête SQL en utilisant duckdb via la librarie python pour lister les substances qui ont été recherchées\n",
233+
"# et les trier par ordre décroissant de leur nombre d'occurrences\n",
234+
"\n",
235+
"con.sql(\"\"\"\n",
236+
" SELECT libmajparametre, COUNT(*) as count\n",
237+
" FROM edc_resultats\n",
238+
" GROUP BY libmajparametre\n",
239+
" ORDER BY count DESC\n",
240+
"\"\"\").show()"
241+
]
242+
},
243+
{
244+
"cell_type": "code",
245+
"execution_count": null,
246+
"metadata": {},
247+
"outputs": [],
248+
"source": [
249+
"# Exemple issu du notebook premier notebook d'exemple : exemple.ipynb\n",
250+
"\n",
251+
"# Enfin, terminons par lister les prélèvements effectués dans une commune donnée\n",
252+
"\n",
253+
"nomcommune = \"TOULOUSE\"\n",
254+
"\n",
255+
"con.sql(f\"\"\"\n",
256+
" SELECT *\n",
257+
" FROM edc_prelevements\n",
258+
" WHERE nomcommuneprinc = '{nomcommune}'\n",
259+
"\"\"\").show()"
260+
]
261+
},
262+
{
263+
"cell_type": "code",
264+
"execution_count": null,
265+
"metadata": {},
266+
"outputs": [],
267+
"source": []
268+
}
269+
],
270+
"metadata": {
271+
"kernelspec": {
272+
"display_name": ".venv",
273+
"language": "python",
274+
"name": "python3"
275+
},
276+
"language_info": {
277+
"codemirror_mode": {
278+
"name": "ipython",
279+
"version": 3
280+
},
281+
"file_extension": ".py",
282+
"mimetype": "text/x-python",
283+
"name": "python",
284+
"nbconvert_exporter": "python",
285+
"pygments_lexer": "ipython3",
286+
"version": "3.12.7"
287+
}
288+
},
289+
"nbformat": 4,
290+
"nbformat_minor": 4
291+
}

0 commit comments

Comments
 (0)