Skip to content

Commit 5dc7711

Browse files
committed
post_tr2_SG
1 parent 36fc0bf commit 5dc7711

File tree

7 files changed

+1179
-28
lines changed

7 files changed

+1179
-28
lines changed

polars-groupby/README.md

Lines changed: 4 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -4,10 +4,11 @@ You should create a new folder named aggregations on your computer and place eac
44

55
Your download bundle contains the following files:
66

7-
course.parquet - This file contains all data from both `maths.parquet` and `portuguese.parquet`.
8-
maths.parquet - This file contains data relating to a mathematics class.
7+
course.parquet - This file contains all data from both `math.parquet` and `portuguese.parquet`.
8+
math.parquet - This file contains data relating to a mathematics class.
99
portuguese.parquet - This file contains data relating to a Portuguese language class.
10-
maths_classes.parquet - This file contains data used in the time series analysis examples.
10+
student.txt - This file defines every field in the above three files.
11+
math_classes.parquet - This file contains data used in the time series analysis examples.
1112

1213
tutorial_code.ipynb - This file contains the code used in the tutorial.
1314
solutions.ipynb - This file contains the solutions to the exercises.

polars-groupby/post_tr2.md

Lines changed: 946 additions & 0 deletions
Large diffs are not rendered by default.

polars-groupby/solutions.ipynb

Lines changed: 8 additions & 8 deletions
Original file line numberDiff line numberDiff line change
@@ -20,7 +20,7 @@
2020
},
2121
{
2222
"cell_type": "code",
23-
"execution_count": 6,
23+
"execution_count": 3,
2424
"id": "a6fc4149-3f32-477f-ae54-f529aea7c9d9",
2525
"metadata": {},
2626
"outputs": [
@@ -34,30 +34,30 @@
3434
" white-space: pre-wrap;\n",
3535
"}\n",
3636
"</style>\n",
37-
"<small>shape: (1, 3)</small><table border=\"1\" class=\"dataframe\"><thead><tr><th>middle_result</th><th>most_frequent_result</th><th>variance</th></tr><tr><td>f64</td><td>i64</td><td>f64</td></tr></thead><tbody><tr><td>11.0</td><td>10</td><td>20.989616</td></tr></tbody></table></div>"
37+
"<small>shape: (1, 3)</small><table border=\"1\" class=\"dataframe\"><thead><tr><th>median_result</th><th>most_frequent_result</th><th>variance</th></tr><tr><td>f64</td><td>i64</td><td>f64</td></tr></thead><tbody><tr><td>11.0</td><td>10</td><td>20.989616</td></tr></tbody></table></div>"
3838
],
3939
"text/plain": [
4040
"shape: (1, 3)\n",
4141
"┌───────────────┬──────────────────────┬───────────┐\n",
42-
"middle_result ┆ most_frequent_result ┆ variance │\n",
42+
"median_result ┆ most_frequent_result ┆ variance │\n",
4343
"│ --- ┆ --- ┆ --- │\n",
4444
"│ f64 ┆ i64 ┆ f64 │\n",
4545
"╞═══════════════╪══════════════════════╪═══════════╡\n",
4646
"│ 11.0 ┆ 10 ┆ 20.989616 │\n",
4747
"└───────────────┴──────────────────────┴───────────┘"
4848
]
4949
},
50-
"execution_count": 6,
50+
"execution_count": 3,
5151
"metadata": {},
5252
"output_type": "execute_result"
5353
}
5454
],
5555
"source": [
56-
"math_students = pl.read_parquet(\"maths.parquet\")\n",
56+
"math_students = pl.read_parquet(\"math.parquet\")\n",
5757
"\n",
5858
"(\n",
5959
" math_students.select(\n",
60-
" middle_result=pl.col(\"G3\").median(),\n",
60+
" median_result=pl.col(\"G3\").median(),\n",
6161
" most_frequent_result=pl.col(\"G3\").mode(),\n",
6262
" variance=pl.col(\"G3\").var(),\n",
6363
" )\n",
@@ -183,10 +183,10 @@
183183
}
184184
],
185185
"source": [
186-
"maths_attendance = pl.read_parquet(\"maths_classes.parquet\")\n",
186+
"math_attendance = pl.read_parquet(\"math_classes.parquet\")\n",
187187
"\n",
188188
"(\n",
189-
" maths_attendance.group_by_dynamic(\n",
189+
" math_attendance.group_by_dynamic(\n",
190190
" index_column=\"class_start\",\n",
191191
" every=\"1mo\",\n",
192192
" closed=\"both\",\n",

polars-groupby/student.txt

Lines changed: 41 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,41 @@
1+
# Attributes for both student-mat.csv (Math course) and student-por.csv (Portuguese language course) datasets:
2+
0 subject - subject (binary: "M" - Math or "P" - Portuguese language)
3+
1 school - student's school (binary: "GP" - Gabriel Pereira or "MS" - Mousinho da Silveira)
4+
2 sex - student's sex (binary: "F" - female or "M" - male)
5+
3 age - student's age (numeric: from 15 to 22)
6+
4 address - student's home address type (binary: "U" - urban or "R" - rural)
7+
5 famsize - family size (binary: "LE3" - less or equal to 3 or "GT3" - greater than 3)
8+
6 Pstatus - parent's cohabitation status (binary: "T" - living together or "A" - apart)
9+
7 Medu - mother's education (numeric: 0 - none, 1 - primary education (4th grade), 2 – 5th to 9th grade, 3 – secondary education or 4 – higher education)
10+
8 Fedu - father's education (numeric: 0 - none, 1 - primary education (4th grade), 2 – 5th to 9th grade, 3 – secondary education or 4 – higher education)
11+
9 Mjob - mother's job (nominal: "teacher", "health" care related, civil "services" (e.g. administrative or police), "at_home" or "other")
12+
10 Fjob - father's job (nominal: "teacher", "health" care related, civil "services" (e.g. administrative or police), "at_home" or "other")
13+
11 reason - reason to choose this school (nominal: close to "home", school "reputation", "course" preference or "other")
14+
12 guardian - student's guardian (nominal: "mother", "father" or "other")
15+
13 traveltime - home to school travel time (numeric: 1 - <15 min., 2 - 15 to 30 min., 3 - 30 min. to 1 hour, or 4 - >1 hour)
16+
14 studytime - weekly study time (numeric: 1 - <2 hours, 2 - 2 to 5 hours, 3 - 5 to 10 hours, or 4 - >10 hours)
17+
15 failures - number of past class failures (numeric: n if 1<=n<3, else 4)
18+
16 schoolsup - extra educational support (binary: yes or no)
19+
17 famsup - family educational support (binary: yes or no)
20+
18 paid - extra paid classes within the course subject (Math or Portuguese) (binary: yes or no)
21+
19 activities - extra-curricular activities (binary: yes or no)
22+
20 nursery - attended nursery school (binary: yes or no)
23+
21 higher - wants to take higher education (binary: yes or no)
24+
22 internet - Internet access at home (binary: yes or no)
25+
23 romantic - with a romantic relationship (binary: yes or no)
26+
24 famrel - quality of family relationships (numeric: from 1 - very bad to 5 - excellent)
27+
25 freetime - free time after school (numeric: from 1 - very low to 5 - very high)
28+
26 goout - going out with friends (numeric: from 1 - very low to 5 - very high)
29+
27 Dalc - workday alcohol consumption (numeric: from 1 - very low to 5 - very high)
30+
28 Walc - weekend alcohol consumption (numeric: from 1 - very low to 5 - very high)
31+
29 health - current health status (numeric: from 1 - very bad to 5 - very good)
32+
30 absences - number of school absences (numeric: from 0 to 93)
33+
34+
# these grades are related with the course subject, Math or Portuguese:
35+
31 G1 - first period grade (numeric: from 0 to 20)
36+
31 G2 - second period grade (numeric: from 0 to 20)
37+
32 G3 - final grade (numeric: from 0 to 20, output target)
38+
39+
Additional note: there are several (382) students that belong to both datasets .
40+
These students can be identified by searching for identical attributes
41+
that characterize each student, as shown in the annexed R file.

0 commit comments

Comments
 (0)