Skip to content

Commit 0e9c5a8

Browse files
committed
Start using data hosted on zenodo
1 parent 32c8dae commit 0e9c5a8

File tree

7 files changed

+55
-23
lines changed

7 files changed

+55
-23
lines changed

_02_Material.qmd

Lines changed: 38 additions & 15 deletions
Original file line numberDiff line numberDiff line change
@@ -1,6 +1,6 @@
11
# Material {#sec-material}
22

3-
This section is dedicated to describing the data that we used for segmenting gait in the present work. Essentially, we used two sources of data. First, we clipped a 9-axis inertial measurement unit (IMU) sensor at the level of the right hip and measured its orientation (that we assimilate to the hip orientation) over time during walking sessions. The data is recorded in the form of a unit quaternion time series. @sec-quaternions provides a brief overview of unit quaternions and their properties. Second, we used a pressure-sensitive walkway (GAITRite© mat) as a gold standard to label the gait events. @sec-data-acquisition elicits the data acquisition protocol while @sec-data-sets summarizes the two data sets. Finally, @sec-feature-space details the feature space that we constructed from the raw data to feed our machine learning models.
3+
This section is dedicated to describing the data that we used for segmenting gait in the present work. Essentially, we used two sources of data. First, we clipped a 9-axis inertial measurement unit (IMU) sensor at the level of the right hip and measured its orientation (that we assimilate to the hip orientation) over time during walking sessions. The data is recorded in the form of a unit quaternion time series. @sec-quaternions provides a brief overview of unit quaternions and their properties. Second, we used a pressure-sensitive walkway (GAITRite® mat) as a gold standard to label the gait events. @sec-data-acquisition elicits the data acquisition protocol while @sec-data-sets summarizes the two data sets. Finally, @sec-feature-space details the feature space that we constructed from the raw data to feed our machine learning models.
44

55
## Unit quaternions {#sec-quaternions}
66

@@ -48,22 +48,27 @@ We used an IMU sensor to record the hip orientation over time. The IMU was compo
4848
4949
![Sensor positionned on the right hip.](images/sensor-position.png){#fig-sensor-position width=150}
5050
51-
We asked the participants to walk on the GAITRite© mat, a gold standard in gait analysis [@Menz2004] while wearing the IMU sensor on their right hip. This choice, required to have a labeling of gait events from the walkway, constrained the path followed by the participants to a straight line of approximately nine meters. The GAITRite© device gives various information thanks to pressure sensors contained in the mat such as the time points where the subject feet touch and leave the ground at each step, which are exactly the gait events of interest to train our segmentation models for the IMU sensor. To use the two devices simultaneously, they were started at the same time by the same person with their two index fingers, allowing a good synchronization between devices [@de1992stability].
51+
We asked the participants to walk on the GAITRite® mat, a gold standard in gait analysis [@Menz2004] while wearing the IMU sensor on their right hip. This choice, required to have a labeling of gait events from the walkway, constrained the path followed by the participants to a straight line of approximately nine meters. The GAITRite® device gives various information thanks to pressure sensors contained in the mat such as the time points where the subject feet touch and leave the ground at each step, which are exactly the gait events of interest to train our segmentation models for the IMU sensor. To use the two devices simultaneously, they were started at the same time by the same person with their two index fingers, allowing a good synchronization between devices [@de1992stability].
5252
5353
We included six subjects in this study (three men and three women) of different ages and walking at different speeds to have a variety of gait data. @tbl-subjects summarized the demographic characteristics of the participants. We recorded a total of 174 walking sessions between June and September 2024.
5454
5555
::: {#tbl-subjects tbl-pos="H"}
5656
5757
```{r}
58-
data.frame(
59-
id = c("MBA", "MBO", "MSI", "MTR", "NNE", "TDE"),
60-
gender = c("M", "F", "F", "M", "F", "M"),
61-
age = c("50-60", "20-30", "20-30", "20-30", "20-30", "50-60"),
62-
slow = c(60, 73, 67, 77, 57, 61),
63-
intermediate = c(116, 122, 115, NA, 116, NA),
64-
preferential = c(145, 145, 148, 132, 147, 120),
65-
fast = c(199, 188, 179, 185, 190, 193)
66-
) |>
58+
bhg |>
59+
dplyr::group_by(subject, gender, age, condition) |>
60+
dplyr::summarise(speed = round(mean(speed), digits = 0), .groups = "drop") |>
61+
tidyr::pivot_wider(names_from = condition, values_from = speed) |>
62+
janitor::clean_names() |>
63+
dplyr::select(
64+
id = subject,
65+
gender,
66+
age,
67+
slow,
68+
intermediate,
69+
preferential = base,
70+
fast
71+
) |>
6772
gt::gt() |>
6873
gt::tab_spanner(
6974
label = "Walking speed (cm/s)",
@@ -99,7 +104,25 @@ Raw data
99104
100105
: As mentioned before, the IMU sensor records unit QTS representing the orientation of the hip over time, allowing the visualization of each coordinate time series (see @fig-timeserie) at each recorded time point. The four coordinates match the definition of a quaternion provided in @eq-quaternions. It is important to observe that @fig-timeserie displays the gait of a healthy individuals with nicely depicted gait cycles that are consistent over time. Impaired gait will not necessarily exhibit the same regularity.
101106
102-
![Data recorded by the IMU sensor as a four-coordinate unit quaternion time series.](images/time-serie.png){#fig-timeserie width=350}
107+
::: {#fig-timeserie fig-pos="H"}
108+
109+
```{r}
110+
bhg$egait[[33]] |>
111+
dplyr::filter(time >= 3) |>
112+
autoplot() +
113+
theme_bw() +
114+
labs(title = "", x = "Time (seconds)")
115+
```
116+
117+
Data recorded by the IMU sensor as a four-coordinate unit quaternion time series.
118+
119+
:::
120+
121+
::: {.callout-tip title="The [{squat}](https://cran.r-project.org/package=squat) package"}
122+
We developed a dedicated R package coined [{squat}](https://cran.r-project.org/package=squat) for **S**tatistics for **QUA**ternion **T**emporal Data which defines a specific data structure of class `qts` to store and manipulate unit QTS data. In particular, `bhg$egait` is a list of objects of class `qts` which stores the IMU sensor data that we collected. We implemented both a `graphics::plot()` and `ggplot2::autoplot()` S3 specializations for objects of class `qts` in [{squat}](https://cran.r-project.org/package=squat). We implemented many other S3 specializations for QTS as explained in the dedicated website[^squat-website].
123+
124+
[^squat-website]: <https://lmjl-alea.github.io/squat/>.
125+
:::
103126
104127
Centered data
105128
@@ -117,11 +140,11 @@ $$ {#eq-centring-qts}
117140
118141
B-spline representation
119142
120-
: The raw QTS data recorded by the sensor can be noisy due to small movements of the sensor during walking or electronic noise. To reduce this noise, we applied a smoothing step on the centered QTS using cubic splines. This step requires to choose a smoothness parameter that controls the amount of smoothing applied to the original data. A higher value of this parameter results in a smoother QTS but may also remove relevant information. In details, we fit a smoothing cubic spline representation separately to each component of the logarithm of the centered QTS using the `smooth.spline()` from the R **stats** package. The functional representation of the QTS itself is then obtained by exponentiating the smoothed logarithm back to the unit quaternion space. We used the default settings of the `stats::smooth.spline()` function except for the *spar* parameter which we tuned as a hyper-parameter (see @sec-feature-space).
143+
: The raw QTS data recorded by the sensor can be noisy due to small movements of the sensor during walking or electronic noise. To reduce this noise, we applied a smoothing step on the centered QTS using cubic splines. This step requires to choose a smoothness parameter that controls the amount of smoothing applied to the original data. A higher value of this parameter results in a smoother QTS but may also remove relevant information. In details, we fit a smoothing cubic spline representation separately to each component of the logarithm of the centered QTS using the `smooth.spline()` from the R {stats} package. The functional representation of the QTS itself is then obtained by exponentiating the smoothed logarithm back to the unit quaternion space. We used the default settings of the `stats::smooth.spline()` function except for the *spar* parameter which we tuned as a hyper-parameter (see @sec-feature-space).
121144
122145
### Pressure mat data
123146
124-
The GAITRite© mat records the positions of the feet on the mat through pressure-sensitive sensors hidden beneath the mat. It returns a table of spatio-temporal parameters such as stride duration, stride length, walking speed, etc. @tbl-gaitrite-params in the Appendix provides the exhaustive list of all spatio-temporal gait parameters that the walkway outputs. It also returns the time of each event happening during a gait cycle such as the time where a foot touches or leaves the ground. These are the times we use to label our data to predict these events. Since the two devices were triggered simultaneously, the IMU sensor and the GAITRite© mat are assumed to share the same time clock. We use the pressure mat as a gold standard to label the observations between the different classes and train models on this labeled data.
147+
The GAITRite® mat records the positions of the feet on the mat through pressure-sensitive sensors hidden beneath the mat. It returns a table of spatio-temporal parameters such as stride duration, stride length, walking speed, etc. @tbl-gaitrite-params in the Appendix provides the exhaustive list of all spatio-temporal gait parameters that the walkway outputs. It also returns the time of each event happening during a gait cycle such as the time where a foot touches or leaves the ground. These are the times we use to label our data to predict these events. Since the two devices were triggered simultaneously, the IMU sensor and the GAITRite® mat are assumed to share the same time clock. We use the pressure mat as a gold standard to label the observations between the different classes and train models on this labeled data.
125148
126149
## Feature space {#sec-feature-space}
127150
@@ -180,7 +203,7 @@ where $\arctan\!2(y, x)$ computes the angle $\theta$ (in radians) between the po
180203
181204
Walking speed
182205
183-
: It is likely that the shape of the QTS can be quite different according to the walking speed. We therefore included this information in the feature space from the GAITRite© mat output. Since this predictor comes from the gold standard, it is not available when predicting on new time series where patients only used the wearable sensor. To counter this issue, we can estimate the walking speed from the mean angular velocity with a simple linear regression.
206+
: It is likely that the shape of the QTS can be quite different according to the walking speed. We therefore included this information in the feature space from the GAITRite® mat output. Since this predictor comes from the gold standard, it is not available when predicting on new time series where patients only used the wearable sensor. To counter this issue, we can estimate the walking speed from the mean angular velocity with a simple linear regression.
184207
185208
186209
Hyper-parameters

_03_Methods.qmd

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -14,7 +14,7 @@ Strategy E: Predicting gait [E]{.underline}vents
1414
- *Right Toe Off*,
1515
- *None* (all other times not corresponding to a certain event).
1616

17-
This strategy aims at directly predicting the occurrence of gait events of interest. However, it inherits by construction of a severe class imbalance effect, with the *None* class widely over-represented. Indeed, the class that represents all the times that do not belong to an event is clearly larger than the four other ones (see @tbl-class-imbalance). It is clearly represented on @fig-event-timeserie where we can see the event times from the GAITRite© mat overlaid on the QTS recorded by the IMU. Each colored point represents a different event and all other times belong to the *None* class.
17+
This strategy aims at directly predicting the occurrence of gait events of interest. However, it inherits by construction of a severe class imbalance effect, with the *None* class widely over-represented. Indeed, the class that represents all the times that do not belong to an event is clearly larger than the four other ones (see @tbl-class-imbalance). It is clearly represented on @fig-event-timeserie where we can see the event times from the GAITRite® mat overlaid on the QTS recorded by the IMU. Each colored point represents a different event and all other times belong to the *None* class.
1818

1919
::: {#tbl-class-imbalance tbl-pos="H"}
2020

_04_Results.qmd

Lines changed: 3 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -233,13 +233,13 @@ Performance metrics on the test set for both strategies.
233233

234234
:::
235235

236-
We kept one model per strategy to maximize the performance metric evaluated on the test set. For the strategy E, we selected the neural network model which maximizes the GWYI without doubts compared to all other models. For the strategy P, four models have accuracy above 88% with very similar scores. One would be tempted to choose the neural network to have the same type of model for both strategies. However, after testing all four models on the test set by comparing them with the GAITRite mat data (comparing the number of event detected and the event times predicted), the boosted trees model (which maximizes the accuracy) was the only one not to miss any event. Hence, we selected the boosted trees model.
236+
We kept one model per strategy to maximize the performance metric evaluated on the test set. For the strategy E, we selected the neural network model which maximizes the GWYI without doubts compared to all other models. For the strategy P, four models have accuracy above 88% with very similar scores. One would be tempted to choose the neural network to have the same type of model for both strategies. However, after testing all four models on the test set by comparing them with the GAITRite® mat data (comparing the number of event detected and the event times predicted), the boosted trees model (which maximizes the accuracy) was the only one not to miss any event. Hence, we selected the boosted trees model.
237237

238238
Now that we selected a model for each strategy, the next step is to compare the two strategies to choose the best one in its ability to detect gait events.
239239

240240
## Predicting phases or events?
241241

242-
We now have two models to compare: the Neural Network from the strategy E predicting directly the occurences of the gait events of interest and the Boosted Trees from the strategy P predicting the gait phases instead. To choose one over the other, we will compare their predictions on the test set with the real events given by the gold standard. The latter is provided by the GAITRite© mat that the subjects walked on during the data collection sessions.
242+
We now have two models to compare: the Neural Network from the strategy E predicting directly the occurences of the gait events of interest and the Boosted Trees from the strategy P predicting the gait phases instead. To choose one over the other, we will compare their predictions on the test set with the real events given by the gold standard. The latter is provided by the GAITRite® mat that the subjects walked on during the data collection sessions.
243243

244244
As any measurement tool, this mat is not perfect and can miss some contact points at the start and the end of the walking session. Hence, we need to preprocess the time series before comparing the predicted events with the real ones. Specifically, we start by shortening the time series to keep only the time window where the gold standard detected all four events perfectly. At the start of the session, we search for the first *Heel Strike* event which is followed by the three other events in the correct order. At the end of the session, we remove all points after the first missing event.
245245

@@ -251,7 +251,7 @@ Next, no matter the classification strategy (phases v.s. events), we need to ext
251251

252252
![Strategy P: Predictions made by the boosted trees.](images/preds_phases_and_real_events.png){#fig-preds-phases width=270}
253253

254-
Predictions from both classification strategies. The bigger and darker points represent the true occurences of gait events of interest as provided by the GAITRite© mat (gold standard).
254+
Predictions from both classification strategies. The bigger and darker points represent the true occurences of gait events of interest as provided by the GAITRite® mat (gold standard).
255255
:::
256256

257257
In any case, to provide a fair assessment of both models' performance, we devised a common procedure to get a single estimated event time from the predicted time window (Strategy E) or phase (Strategy P). Specifically, we first make sure to correctly identify predicted time windows or phases, using the two following rules:

0 commit comments

Comments
 (0)