|
2 | 2 |
|
3 | 3 | ## Classification strategies {#sec-classification-strategies} |
4 | 4 |
|
5 | | -Gait event detection is performed by evaluating and comparing two strategies to classify the observations. |
| 5 | +In this work, the objective is to identify which timepoints of unit QTS correspond to the RHS, LTO, LHS and RTO events. For this purpose, we consider the timepoints as statistical units (observations) and we aim at labelling them by means of classification models. In this view, we can first create the data set that we will use for training. The following code achieves this task by binding together all timepoints from all walking sessions while attaching to each timepoint: |
6 | 6 |
|
7 | | -Strategy E: Predicting gait [E]{.underline}vents |
8 | | - |
9 | | -: The strategy E pertains to directly predicting the gait events occuring when walking. Specifically, time points are viewed as statistical units (observations) and we aim at classifying them into five categories: |
10 | | - |
11 | | -- *Right Heel Strike*, |
12 | | -- *Left Toe Off*, |
13 | | -- *Left Heel Strike*, |
14 | | -- *Right Toe Off*, |
15 | | -- *None* (all other times not corresponding to a certain event). |
16 | | - |
17 | | -The first four events (RHS, LTO, LHS and RTO) are coined *events of interest* while the last one encodes the so-called *negative* class. While conveniently aiming at directly predicting the occurrence of gait events of interest, this strategy suffers from a severe class imbalance issue, with the *None* (negative) class being widely over-represented as summarized in @tbl-class-imbalance. |
| 7 | +- an `event_type` which affects it to one of the five gait events defined in @sec-gaitrite-data; |
| 8 | +- a `phase_type` which affects it one of the four gait pahses defined in @sec-gaitrite-data. |
18 | 9 |
|
19 | 10 | ```{r} |
20 | | -#| label: tbl-class-imbalance |
21 | | -#| tbl-cap: "Strategy E: Count and proportion of observations in each class." |
22 | | -#| tbl-pos: "H" |
23 | | -tibble::tibble( |
24 | | - class = c( |
25 | | - "Right Heel Strike", |
26 | | - "Left Toe Off", |
27 | | - "Left Heel Strike", |
28 | | - "Right Toe Off", |
29 | | - "None" |
30 | | - ), |
31 | | - nb_obs = c("973", "1004", "994", "982", "158401"), |
32 | | - prop = c("0.60%", "0.62%", "0.61%", "0.60%", "97.57%") |
33 | | -) |> |
34 | | - gt::gt() |> |
35 | | - gt::cols_label( |
36 | | - class = "Class", |
37 | | - nb_obs = "Number of observations", |
38 | | - prop = "Proportion" |
39 | | - ) |> |
40 | | - gt::cols_align(align = "center") |> |
41 | | - gt::tab_style( |
42 | | - style = list(gt::cell_text(style = "italic")), |
43 | | - locations = gt::cells_body(columns = class) |
44 | | - ) |> |
45 | | - gt::tab_options(column_labels.background.color = "#616161") |
46 | | -``` |
47 | | - |
48 | | -[AST] TO MODIFY |
49 | | - |
50 | | -Finally, the following code creates the labelled data set that we will use to elaborate the feature space and produces @tbl-class-summary which exhibits class frequencies whether we focus on gait events () or gait phases (). |
51 | | - |
52 | | -```{r} |
53 | | -#| label: tbl-class-summary |
54 | | -#| tbl-cap: Two tables |
55 | | -#| tbl-subcap: ["mtcars", "Just cars"] |
56 | | -#| layout-ncol: 2 |
57 | | -#| classes: plain |
58 | 11 | events_to_phases <- function(events) { |
59 | 12 | events_of_interest <- events != "None" |
60 | 13 | first_event <- events[events_of_interest][1] |
@@ -110,14 +63,82 @@ labelled_gait_data <- purrr::map(1:nrow(bhg), \(session_index) { |
110 | 63 | levels = c("Pre-Stance", "Stance", "Pre-Swing", "Swing") |
111 | 64 | ) |
112 | 65 | ) |
| 66 | +class(labelled_gait_data) <- class(labelled_gait_data)[-1] |
| 67 | +head(labelled_gait_data) |
| 68 | +``` |
| 69 | + |
| 70 | +We first need to decide what models should predict. In effect, we can adopt two different strategies. |
| 71 | + |
| 72 | +The most straightforward way pertains to predicting the gait events of interest themselves. We call it **Strategy E**, where **E** stands for [E]{.underline}vents. Following this strategy, this means that we must design a multiclass prediction model with 5 classes (RHS, LTO, LHS, RTO and None) as defined in @sec-gaitrite-data. The first four events (RHS, LTO, LHS and RTO) are coined *events of interest* while the last one encodes the so-called *negative* class. While conveniently aiming at directly predicting the occurrence of gait events of interest, this strategy suffers from a severe class imbalance issue, with the *None* (negative) class being widely over-represented as shown by @tbl-e-counts. |
| 73 | + |
| 74 | +A solution to mitigate this severe class imbalance issue is to predict gait phases instead of events at the cost of some post-processing efforts needed to identify the occurences of RHS, LTO, LHS and RTO after the phase prediction step. As defined in @sec-gaitrite-data, there are four phases to predict (pre-stance, stance, pre-swing and swing). We call this strategy **Strategy P**, where **P** stands for [P]{.underline}hases. @tbl-p-counts exhibits the frequency of timepoints in each phase, which demonstrate that this strategy successfully reduces dramatically class imbalance. |
| 75 | + |
| 76 | +```{r} |
| 77 | +#| label: tbl-class-imbalance |
| 78 | +#| tbl-cap: "Strategy E: Count and proportion of observations in each class." |
| 79 | +#| tbl-pos: "H" |
| 80 | +tibble::tibble( |
| 81 | + class = c( |
| 82 | + "Right Heel Strike", |
| 83 | + "Left Toe Off", |
| 84 | + "Left Heel Strike", |
| 85 | + "Right Toe Off", |
| 86 | + "None" |
| 87 | + ), |
| 88 | + nb_obs = c("973", "1004", "994", "982", "158401"), |
| 89 | + prop = c("0.60%", "0.62%", "0.61%", "0.60%", "97.57%") |
| 90 | +) |> |
| 91 | + gt::gt() |> |
| 92 | + gt::cols_label( |
| 93 | + class = "Class", |
| 94 | + nb_obs = "Number of observations", |
| 95 | + prop = "Proportion" |
| 96 | + ) |> |
| 97 | + gt::cols_align(align = "center") |> |
| 98 | + gt::tab_style( |
| 99 | + style = list(gt::cell_text(style = "italic")), |
| 100 | + locations = gt::cells_body(columns = class) |
| 101 | + ) |> |
| 102 | + gt::tab_options(column_labels.background.color = "#616161") |
| 103 | +``` |
| 104 | + |
| 105 | +[AST] TO MODIFY |
| 106 | + |
| 107 | +Finally, the following code creates the labelled data set that we will use to elaborate the feature space and produces @tbl-class-summary which exhibits class frequencies whether we focus on gait events () or gait phases (). |
113 | 108 |
|
| 109 | +```{r} |
| 110 | +#| label: tbl-class-summary |
| 111 | +#| tbl-cap: Two tables |
| 112 | +#| tbl-subcap: ["mtcars", "Just cars"] |
| 113 | +#| layout-ncol: 2 |
| 114 | +#| html-table-processing: none |
114 | 115 | labelled_gait_data |> |
115 | 116 | dplyr::count(event_type) |> |
116 | | - gt::gt() |
| 117 | + gt::gt() |> |
| 118 | + gt::cols_label( |
| 119 | + event_type = "Event", |
| 120 | + n = "Frequency" |
| 121 | + ) |> |
| 122 | + gt::opt_stylize(style = 6, color = 'gray') |> |
| 123 | + gt::cols_align(align = "center") |> |
| 124 | + gt::tab_style( |
| 125 | + style = "vertical-align:top", |
| 126 | + locations = gt::cells_column_labels() |
| 127 | + ) |
117 | 128 |
|
118 | 129 | labelled_gait_data |> |
119 | 130 | dplyr::count(phase_type) |> |
120 | | - gt::gt() |
| 131 | + gt::gt() |> |
| 132 | + gt::cols_label( |
| 133 | + phase_type = "Phase", |
| 134 | + n = "Frequency" |
| 135 | + ) |> |
| 136 | + gt::opt_stylize(style = 6, color = 'gray') |> |
| 137 | + gt::cols_align(align = "center") |> |
| 138 | + gt::tab_style( |
| 139 | + style = "vertical-align:top", |
| 140 | + locations = gt::cells_column_labels() |
| 141 | + ) |
121 | 142 | ``` |
122 | 143 |
|
123 | 144 | We can notice that gait events of interest are largely under-represented while this class imbalance issue is moderate when we put the focus on gait phases. |
|
0 commit comments