diff --git a/.gitignore b/.gitignore
index 817c11e..39ebbab 100644
--- a/.gitignore
+++ b/.gitignore
@@ -8,9 +8,11 @@
 *_cache/
 /cache/
 *_files/
-template-computo-R.html
+*.html
 logo_text_white.pdf
 template-computo-R.pdf
 /.quarto/
 _extensions/
-renv/
\ No newline at end of file
+renv/
+logo_notext_white.png
+**/*.quarto_ipynb
diff --git a/_01_Introduction.qmd b/_01_Introduction.qmd
new file mode 100644
index 0000000..c806820
--- /dev/null
+++ b/_01_Introduction.qmd
@@ -0,0 +1,49 @@
+# Introduction {#sec-introduction}
+
+## Gait analysis
+
+The study of the Human gait has shown great promises in a number of health applications, such as patients diagnosed with neurodegenerative disorders or patients in rehabilitation programs, notably among the elderly. Indeed, locomotor functional deficits are very good predictors of progression towards loss of autonomy and dependence. It is crucial to identify as early as possible elderly people at risk of fall and cognitive decline, both being linked, by evaluating their cognitive-motor interactions with mobility tests [@montero2019consensus]. It has been shown that gait parameters including walking speed or gait variability are very good indicators of their general health, whether it is their physical health or their cognition [@beauchet2016poor; @annweiler2009risk].
+
+To monitor gait, various methods such as motion capture systems or pressure mats are widely used due to their reliability but they require patients to travel to a health center owning a device. Furthermore, the measures are made in front of medical workers leading to an observational bias in the patient's performance. To address these issues, wearable sensors are increasingly used for gait analysis as they are smaller, cheaper and more convenient than more traditional methods as they can be used outside of medical structures.
+
+Gait analysis is based on the cyclical phenomenon of walking which can be decomposed into a sequence of *strides*, each composed of two *steps*. A stride is more formally called a *gait cycle* and is a precise biomechanical concept which is the elementary element of gait analysis. We define a gait cycle as the set of movements performed between two consecutive contacts of the heel of the same foot with the ground. The goal of this paper is to present a novel method to detect key events happening during gait cycles. The added benefits of such a method are two-fold: (i) to segment the gait into strides and (ii) to compute a number of useful gait parameters.
+
+More precisely, the key events are the moments where a foot touches or leaves the ground. They are called *Heel Strike* and *Toe Off* respectively and they can be used to segment the gait cycle in phases. The stance phase corresponds to the moments when the foot is in contact with the ground and can be found between the *Right Heel Strike* and *Left Toe Off* events when a cycle starts with the right foot. The stance phase lasts generally 60% of the entire stride and is followed by the swing phase when the right foot is not anymore in contact with the ground (see @fig-walking-cycle).
+
+![Events and phases of a typical gait cycle. [^jacquelin-perry]](images/walking-cycles.png){#fig-walking-cycle width=400}
+
+[^jacquelin-perry]: Figure annotated from Jaquelin Perry's one on [Wikipedia](https://commons.wikimedia.org/wiki/File:GaitCycle_by_JaquelinPerry.jpg).
+
+In healthy gaits, a gait cycle usually lasts one second and the time difference between two events can be as little as 100 ms. It is therefore critical to design a very precise segmentation model to identify the four events and, ultimately, provide an accurate estimate of gait parameters.
+
+## State of the art
+
+The aim of our study is to build a model segmenting strides and detecting key gait events from hip rotation data represented as unit quaternion time series. In the past, several methods for gait cycle segmentation and event detection have been studied based on signals recorded from wearable sensors placed on different parts of the body. In this section, we layout an overview of proposed methods by grouping them into categories, with variations related to the position of sensors on the body, the type of sensor measurement system used, and the characteristics of the signal.
+
+Search for feature points: Peak Detection and zero-crossing
+
+: Peak detection algorithms exploit the semi-periodic properties of the gait cycle, assuming that specific events during a gait cycle typically match the maxima or minima of a recorded signal. Several algorithms based on this method have been developed. Spatio-temporal gait parameters have been determined from the acceleration signal of a sensor by using peak detection and zero-crossing methods to identify gait cycles [@zijlstra2003assessment]. Some peak detection algorithms have been developed to work on any kind of signal [@jiang2017robust]. Others are instead implemented to detect gait events on a specific signal, often Heel-Strike on acceleration data [@lueken2019peak]. Most methods detect Heel-Strike and Toe-off events by identifying minima, maxima or zero-crossing of acceleration time series [@gonzalez2010real;@mariani2013quantitative;@panebianco2018analysis]. These events can be called Initial Contact and Terminal Contact and have also been identified on the foot inclination angle in the sagittal plane [@nazarahari2022foot].
+
+Analysis of dynamic properties: Wavelet Transform
+
+: To detect gait events in recorded signals, another method is wavelet transform because it allows detection of a specified frequency at a specified time. This method has been used to detect *Heel Strike* and *Toe Off* events on an angular velocity signal by using multi-resolution wavelet decomposition [@aminian2002spatio]. The signal is decomposed into wavelet packets, using high-pass and low-pass filters. In @mccamley2012enhanced, the signal of the vertical acceleration is smoothed by integrating and then differentiating using a Gaussian continuous wavelet transform (CWT). Events are found on minima and maxima of the differentiated signal. Furthermore, a method called sparsity-assisted wavelet denoising (SWAD) has been developed to segment the signal in three events: mid-stance, toe-off and heel-strike [@prateek2019gait]. It uses a combination of linear time-invariant filters, wavelets and sparsity-based methods to extract a coefficient vector of discrete wavelet transform (DTW). That generates a sparse template of moving segments of gyroscope measurements.
+
+Pattern-matching: Dynamic Time Warping (DTW)
+
+: To segment signals, another method is pattern-matching. Dynamic time warping is widely used for this matter. It allows identification of patterns with different lengths and it matches signals non-linearly. Thus it is commonly used to evaluate the similarity between time series. This method has been used on an acceleration signal to identify gait cycles [@ghersi2020gait]. In another study, multi-dimensional subsequence DTW (msDTW) was used to segment strides using information from different axes of an accelerometer and a gyroscope at the same time [@barth2015stride].
+
+Hidden Markov Models
+
+: Hidden Markov Models (HMMs) are machine learning models that can be used to simultaneously segment and classify data. In gait analysis, hidden states of the HMM are viewed as activity classes or gait phases. Continuous HMM with Gaussian mixture model are often used to model the outputs. Each cycle can be modeled with a circular left to right HMM, included in a more global HMM model to classify walking activities [@panahandeh2013continuous]. Other algorithms aim at detecting four gait events which are Heel-Strike, Flat-Foot, Heel-Off and Toe-Off. They use a four-state left to right HMM with observations following a multivariate Gaussian distribution [@mannini2011hidden;@garcia2022adaptive]. Hierarchical HMMs have shown to perform better than peak detection and dynamic time warping algorithms on walking data [@haji2018segmentation].
+
+Other Machine Learning methods
+
+: Many studies on gait analysis use wearable sensors and machine learning. A study showed that most used methods for this purpose are SVM and CNN, and are largely used to detect diseases or recognize activities when analyzing gait data [@saboor2020latest]. SVM divides a set of objects into classes with widest possible margin at the hyperplane boundaries. Neural Networks use interconnected nodes organized in layers to process information and learn patterns through back-propagation and make predictions using activation functions.
+
+: Studies show good results when using machine learning to classify gait of patients having trouble walking, for instance patients with Parkinson Disease [@tahir2012parkinson;@wahid2015classification], or to extract gait parameters [@rampp2014inertial;@hannink2016sensor]. These studies mostly use artificial neural networks but do not address the problem of the segmentation of the signal recorded by the sensors. Similarly, in [@farah2019design], a logistic regression tree is used to classify gait phases in a stride but not to directly segment the signal into strides.
+
+: For the specific task of segmentation, recurrent neural networks have shown good results for identifying Heel-Strike and Toe-off events with pressure sensors, accelerations and Euler angles [@prado2019gait]. Another study compared unsupervised machine learning (k-means) with supervised one (SVM and CNN), showing that CNNs were performing the best to predict stance and swing phases [@potluri2019machine]. Lastly, in another study, 24 time series from three different sensors (4 accelerometer data and 4 gyroscope data for each sensor location) were used as inputs to a 6 layers CNN to estimate the likelihood of the corresponding input sample, given a specific gait event [@gadaleta2019deep]. For instance, the signal can be segmented with the initial contact of the right foot if this gait event is chosen in the model. This method was shown to perform better than a wavelet transform algorithm.
+
+This review shows that many methods have been devised in the literature to either segment gait data or classify gait events or phases. However, none of them have been designed to work with orientation data. They mostly use acceleration or angular velocity signals. In addition, most approaches focus either on stride segmentation or on event detection inside already segmented strides. In this work, we propose a method that performs both tasks at the same time using a unique model which is able to process orientation data represented as unit quaternion time series.
+
+The paper is structured as follows. @sec-material focuses on an in-depth introduction of the data type used in this work, from data acquisition to raw data properties to feature space elaboration. @sec-methods describes the strategies and methods used (i) to implement models that accounts for the specificities of our data and (ii) to tune and evaluate them appropriately. @sec-results presents a systematic comparison of implemented models with a motivated choice that leads to the evaluation of a final model. Finally, @sec-discussion summarises the present work and provides a discussion about its limitations and about next steps to further validate the model. All codes were developed in R [@Rlanguage] using the [{tidymodels}](https://www.tidymodels.org) framework [@tidymodels].
\ No newline at end of file
diff --git a/_02_Material.qmd b/_02_Material.qmd
new file mode 100644
index 0000000..9b29ed6
--- /dev/null
+++ b/_02_Material.qmd
@@ -0,0 +1,190 @@
+# Material {#sec-material}
+
+This section is dedicated to describing the data that we used for segmenting gait in the present work. Essentially, we used two sources of data. First, we clipped a 9-axis inertial measurement unit (IMU) sensor at the level of the right hip and measured its orientation (that we assimilate to the hip orientation) over time during walking sessions. The data is recorded in the form of a unit quaternion time series. @sec-quaternions provides a brief overview of unit quaternions and their properties. Second, we used a pressure-sensitive walkway (GAITRite© mat) as a gold standard to label the gait events. @sec-data-acquisition elicits the data acquisition protocol while @sec-data-sets summarizes the two data sets. Finally, @sec-feature-space details the feature space that we constructed from the raw data to feed our machine learning models.
+
+## Unit quaternions {#sec-quaternions}
+
+Unit quaternions can be used to represent the 3D rotation of an object over time [@hamilton1844;@voight2021quaternion]. This representation has several advantages such as being the most compressed representation of rotation and avoiding the gimbal lock issue. We therefore chose to measure the sensor orientation as a unit quaternion.
+
+Quaternions are four-dimensional vectors denoted by $\mathbf{q} = \left( q_w, q_x, q_y, q_z \right)$, but can also be viewed as hypercomplex numbers of rank $4$. Unit quaternions have a norm of $1$ and encode a 3D rotation with rotation angle $\theta \in [0, 2\pi]$ and rotation axis $\mathbf{u} = (u_x, u_y, u_z) \in S^2$, where $S^2$ is the 2-sphere, using the following formula:
+
+$$
+\mathbf{q} = \left( q_w, q_x, q_y, q_z \right)
+= \cos \left( \frac{\theta}{2} \right) + \left( u_x i + u_y j + u_z k \right) \cdot \sin \left( \frac{\theta}{2} \right),
+$$ {#eq-quaternions}
+
+where $i$, $j$, and $k$ generalize the imaginary number $i$ and are such that $i^2 = j^2 = k^2 = ijk = -1$. Note that it is straightforward to show quaternions define through @eq-quaternions have unit norm.
+
+The set of unit quaternions, denoted by $\mathbb{H}_u$, has a number of interesting properties. First, the quaternions $\mathbf{q}$ and $-\mathbf{q}$ represent the same rotation making $\mathbb{H}_u$ a double coverage of the space $\mathrm{SO}_3(\mathbb{R})$ of 3D rotations. Also, $\mathbb{H}_u$ forms a Lie group which is a group and a differentiable Riemannian manifold. As a group, it is in particular equipped with an identity element $\mathbf{q}^{(0)} = (1,0,0,0)$ which corresponds to the identity rotation, such that, for any $\mathbf{q} \in \mathbb{H}_u$, we have $\mathbf{q} \mathbf{q}^{(0)} = \mathbf{q}^{(0)}\mathbf{q} = \mathbf{q}$ and with a composition law $\circ$ given by the Hamilton product:
+
+$$
+\begin{aligned}
+\mathbf{q}_1 \circ \mathbf{q}_2 = ( &q_{1w} q_{2w} - q_{1x} q_{2x} - q_{1y} q_{2y} - q_{1z} q_{2z}, \\
+&q_{1w} q_{2x} + q_{1x} q_{2w} + q_{1y} q_{2z} - q_{1z} q_{2y}, \\
+&q_{1w} q_{2y} - q_{1x} q_{2z} + q_{1y} q_{2w} + q_{1z} q_{2x}, \\
+&q_{1w} q_{2z} + q_{1x} q_{2y} - q_{1y} q_{2x} + q_{1z} q_{2w}).
+\end{aligned}
+$$ {#eq-hamilton-product}
+
+This is known as quaternion multiplication and, for simplicity, the symbol $\circ$ is often omitted and we simply write $\mathbf{q}_1 \mathbf{q}_2$. Note that the composition law is not commutative. As such, $\mathbb{H}_u$ is an Abelian group. The inverse of a unit quaternion $\mathbf{q}$ is given by its conjugate $\mathbf{q}^{-1} = \mathbf{q}^\star = (q_w, -q_x, -q_y, -q_z)$.
+
+As a differentiable Riemannian manifold, it is also equipped with a natural distance, known as the *geodesic distance*, that can be used to measure the distance between two unit quaternions. For any two quaternions $\mathbf{q}_1$ and $\mathbf{q}_2$, it reads:
+
+$$
+d_g \left( \mathbf{q}_1, \mathbf{q}_2 \right) = \left\| \ln \left( \mathbf{q}_1^{-1} \mathbf{q}_2 \right) \right\| = \arccos \left( 2 \left( \mathbf{q}_1 \cdot \mathbf{q}_2 \right)^2 -1 \right),
+$$ {#eq-geodesic-distance}
+
+where the unit quaternion logarithm is defined as $\ln(\mathbf{q}) = \mathbf{u} \dfrac{\theta}{2}$.
+
+In our application, the sensor orientation is the rotation between the reference frame of origin, here the Earth's reference frame $R_f=(f_1, f_2, f_3)$ and its own reference frame $R_s=(s_1, s_2, s_3)$ formed by the accelerometer, gyroscope, and magnetometer (see @fig-sensor-axis).
+
+![Sensor reference and axis. [@drouin2023semi]](images/sensor-axis.png){#fig-sensor-axis width=200}
+
+A *quaternion time series* (QTS) is an ordered set $\mathbf{Q}$ of unit quaternions along a temporal grid $t_1 < t_2 < \dots < t_p$. We will denote by $\mathbf{q}_{j} = \mathbf{Q}(t_j)$ the unit quaternion of the QTS $\mathbf{Q}$ recorded at time $t_j$. In our application, it represents the orientation of the hip at that time.
+
+## Data acquisition {#sec-data-acquisition}
+
+We used an IMU sensor to record the hip orientation over time. The IMU was composed of a 3-axis accelerometer, a 3-axis gyroscope and a 3-axis magnetometer. Subjects were wearing the sensor on their right hip (see @fig-sensor-position). The frequency of the sensor is $100$ Hz which corresponds to recording its orientation every $10$ ms. With this device, the data acquired is in the form of unit QTS as presented in @sec-quaternions.
+
+![Sensor positionned on the right hip.](images/sensor-position.png){#fig-sensor-position width=150}
+
+We asked the participants to walk on the GAITRite© mat, a gold standard in gait analysis [@Menz2004] while wearing the IMU sensor on their right hip. This choice, required to have a labeling of gait events from the walkway, constrained the path followed by the participants to a straight line of approximately nine meters. The GAITRite© device gives various information thanks to pressure sensors contained in the mat such as the time points where the subject feet touch and leave the ground at each step, which are exactly the gait events of interest to train our segmentation models for the IMU sensor. To use the two devices simultaneously, they were started at the same time by the same person with their two index fingers, allowing a good synchronization between devices [@de1992stability].
+
+We included six subjects in this study (three men and three women) of different ages and walking at different speeds to have a variety of gait data. @tbl-subjects summarized the demographic characteristics of the participants. We recorded a total of 174 walking sessions between June and September 2024.
+
+::: {#tbl-subjects tbl-pos="H"}
+
+```{r}
+data.frame(
+  id = c("MBA", "MBO", "MSI", "MTR", "NNE", "TDE"),
+  gender = c("M", "F", "F", "M", "F", "M"),
+  age = c("50-60", "20-30", "20-30", "20-30", "20-30", "50-60"),
+  slow = c(60, 73, 67, 77, 57, 61),
+  intermediate = c(116, 122, 115, NA, 116, NA),
+  preferential = c(145, 145, 148, 132, 147, 120),
+  fast = c(199, 188, 179, 185, 190, 193)
+) |>
+  gt::gt() |>
+  gt::tab_spanner(
+    label = "Walking speed (cm/s)",
+    columns = c(slow, intermediate, preferential, fast)
+  ) |>
+  gt::cols_label(
+    id = "ID",
+    gender = "Gender",
+    age = "Age range (years)",
+    slow = "Slow",
+    intermediate = "Intermediate",
+    preferential = "Preferential",
+    fast = "Fast"
+  ) |>
+  gt::sub_missing() |>
+  gt::opt_stylize(style = 6, color = 'gray') |>
+  gt::cols_align(align = "center") |>
+  gt::tab_style(
+    style = "vertical-align:top",
+    locations = gt::cells_column_labels()
+  )
+```
+
+Summary of subjects and walking speeds used to train the model.
+
+:::
+
+## Data sets {#sec-data-sets}
+
+### Sensor data
+
+Raw data
+
+: As mentioned before, the IMU sensor records unit QTS representing the orientation of the hip over time, allowing the visualization of each coordinate time series (see @fig-timeserie) at each recorded time point. The four coordinates match the definition of a quaternion provided in @eq-quaternions. It is important to observe that @fig-timeserie displays the gait of a healthy individuals with nicely depicted gait cycles that are consistent over time. Impaired gait will not necessarily exhibit the same regularity.
+
+![Data recorded by the IMU sensor as a four-coordinate unit quaternion time series.](images/time-serie.png){#fig-timeserie width=350}
+
+Centered data
+
+: A centring step is applied on the QTS to mitigate sensor shifts in position along the belt. This preprocessing step ensures that the mean quaternion of each QTS is the identity. In details, let $\mathbf{Q}_1, \dots, \mathbf{Q}_n$ be $n$ QTS observed on the same time grid $t_1 < t_2 < \dots < t_p$. Recall that we write $\mathbf{Q}_i(t_j) = \mathbf{q}_{ij} \in \mathbb{H}_u$, for any $i \in [\![ 1, n ]\!]$ and $j \in [\![ 1, p ]\!]$. We use the Fréchet mean associated to the geodesic distance $d_g$ (see @eq-geodesic-distance) [@moakher02,@manton04] to compute the mean $\mathbf{q}_j^{(m)}$ of all quaternions $\mathbf{q}_{1j}, \dots, \mathbf{q}_{nj}$ at a given time point $t_j$ as:
+
+$$
+\mathbf{q}_j^{(m)} = \underset{\mathbf{q} \in \mathbb{H}_u}{\mathrm{argmin}} \sum_{i=1}^n d_g^2(\mathbf{q}_{ij}, \mathbf{q}), \hspace{5mm} j \in [\![ 1, p ]\!].
+$$ {#eq-mean-qts}
+
+The centered QTS $\mathbf{Q}_1^{(c)}, \dots, \mathbf{Q}_n^{(c)}$ can then be computed as:
+
+$$
+\mathbf{q}_{ij}^{(c)} = \mathbf{Q}_i^{(c)} (t_j) = \left( \mathbf{q}_j^{(m)} \right)^{-1} \mathbf{q}_{ij}, \hspace{5mm} j \in [\![ 1, p ]\!],\hspace{2mm}  i \in [\![ 1, n ]\!].
+$$ {#eq-centring-qts}
+
+B-spline representation
+
+: The raw QTS data recorded by the sensor can be noisy due to small movements of the sensor during walking or electronic noise. To reduce this noise, we applied a smoothing step on the centered QTS using cubic splines. This step requires to choose a smoothness parameter that controls the amount of smoothing applied to the original data. A higher value of this parameter results in a smoother QTS but may also remove relevant information. In details, we fit a smoothing cubic spline representation separately to each component of the logarithm of the centered QTS using the `smooth.spline()` from the R **stats** package. The functional representation of the QTS itself is then obtained by exponentiating the smoothed logarithm back to the unit quaternion space. We used the default settings of the `stats::smooth.spline()` function except for the *spar* parameter which we tuned as a hyper-parameter (see @sec-feature-space).
+
+### Pressure mat data
+
+The GAITRite© mat records the positions of the feet on the mat through pressure-sensitive sensors hidden beneath the mat. It returns a table of spatio-temporal parameters such as stride duration, stride length, walking speed, etc. @tbl-gaitrite-params in the Appendix provides the exhaustive list of all spatio-temporal gait parameters that the walkway outputs. It also returns the time of each event happening during a gait cycle such as the time where a foot touches or leaves the ground. These are the times we use to label our data to predict these events. Since the two devices were triggered simultaneously, the IMU sensor and the GAITRite© mat are assumed to share the same time clock. We use the pressure mat as a gold standard to label the observations between the different classes and train models on this labeled data.
+
+## Feature space {#sec-feature-space}
+
+The feature space is an important piece of machine learning models as it defines the data that will be used to train them. In our application, we constructed a feature space from the raw QTS data recorded by the IMU sensor. First, we need to define what is an observation in our context. An observation corresponds to the data recorded at a given time point $t_j$. Since QTS are ordered sets of unit quaternions, for the $j$-*th* observation (row) of the feature space, we can then use features computed from the data observed at time $t_j$ or any other time points preceding $t_j$. One feature is of course the label of the observation that we get from the gold standard. This variable is available for the train/test process but will not be available at prediction time since it corresponds to the events that we want to predict. The other features are predictors that we compute from the sensor data. The ultimate goal is to design a feature space to label each observation $t_j$ with the gait event happening at that time (if any) using only the predictors computed from the QTS at time points $t_k \le t_j$, with $1 \le k \le j$. In the remainder of this section, we describe the predictors that we computed to build our feature space.
+
+Angular velocity and acceleration vectors
+
+: If we can have access to the first and second time derivatives of a QTS, we can compute the *angular velocity vector* $\pmb{\Omega}$ and *angular acceleration vector* $\dot{\pmb{\Omega}}$ [@narayan2017]. These vectors are aligned with the axis of rotation and carry in their norm the angular velocity and acceleration respectively. The angular velocity vector is computed as follows:
+
+$$
+\begin{bmatrix}
+0 \\
+\pmb{\Omega}
+\end{bmatrix}
+= 2 \mathbf{q}^{-1} \dot{\mathbf{q}} \hspace{3mm} \text{with} \hspace{3mm} \dot{\mathbf{q}} = \frac{d \mathbf{q}}{dt} = \frac{1}{2} \mathbf{q} \begin{bmatrix}
+0 \\
+\pmb{\Omega}
+\end{bmatrix}.
+$$ {#eq-angular-vel}
+
+The angular acceleration vector $\dot{\pmb{\Omega}}$ is then computed as the derivative of the angular velocity vector:
+
+$$
+\begin{bmatrix}
+0 \\
+\dot{\pmb{\Omega}}
+\end{bmatrix}
+= 2 \left( \mathbf{q}^{-1} \ddot{\mathbf{q}} - (\mathbf{q}^{-1}\dot{\mathbf{q}})^2 \right) \hspace{3mm} \text{with} \hspace{3mm} \ddot{\mathbf{q}} = \frac{d^2 \mathbf{q}}{dt^2} = \frac{1}{2} \left( \dot{\mathbf{q}} \begin{bmatrix}
+0 \\
+\pmb{\Omega}
+\end{bmatrix} + \mathbf{q} \begin{bmatrix}
+0 \\
+\dot{\pmb{\Omega}}
+\end{bmatrix} \right)
+$$ {#eq-angular-acc}
+
+Euler angles
+
+: The angles named *Roll*, *Pitch* and *Yaw* represent rotations around the three principal axes. They are computed from a unit quaternion $\mathbf{q} = (q_w, q_x, q_y, q_z)$ using the following formulas:
+
+$$
+\begin{bmatrix}
+\mathrm{Roll} \\
+\mathrm{Pitch} \\
+\mathrm{Yaw}
+\end{bmatrix}
+= 
+\begin{bmatrix}
+\arctan\!2 \left(2(q_w q_x + q_y q_z), 1-2(q_x^2 + q_y^2)  \right) \\
+\arcsin \left(2(q_w q_y - q_x q_z) \right) \\
+\arctan\!2 \left(2(q_w q_z + q_x q_y), 1-2(q_y^2 + q_z^2)  \right)
+\end{bmatrix}
+$$ {#eq-rpy}
+
+where $\arctan\!2(y, x)$ computes the angle $\theta$ (in radians) between the positive $x$-axis and the ray from the origin to the point $(x, y)$ in the Cartesian plane.
+
+Walking speed
+
+: It is likely that the shape of the QTS can be quite different according to the walking speed. We therefore included this information in the feature space from the GAITRite© mat output. Since this predictor comes from the gold standard, it is not available when predicting on new time series where patients only used the wearable sensor. To counter this issue, we can estimate the walking speed from the mean angular velocity with a simple linear regression.
+
+
+Hyper-parameters
+
+: The feature space depends on a number of hyper-parameters. The first one is a smoothness parameter called *spar* that represents the amount of smoothing that we seek to achieve when fitting cubic splines to the original QTS. Since time derivatives are used to compute some predictors, it is indeed useful to smooth the QTS sufficiently so the derivative values remain stable and not dominated by noise. The *spar* parameter takes its values in the interval $]0,1]$. The second hyper-parameter is the *lag* parameter. To keep information from the past as we predict at a given time point $t_j$, we include features computed at $t_j$ as well as the same features computed at time points $t_{j-1}, \dots, t_{j-\mathrm{lag}}$. The size of the feature space therefore depends on the *lag* parameter as it contains $10 + 9 \times \mathrm{lag}$ predictors. The ten first predictors are the three-component angular speed vector, the three-component angular acceleration vector, the three angles Roll, Pitch and Yaw and the walking speed. We then add nine predictors per *lag*: the angular speed and acceleration and the Roll-Pitch-Yaw angles from the previous time.
+
+Finally, a last hyper-parameter is used when we label the observations with the gait event times from the gold standard. It is called *k* and controls the number of points around a gait event that we label as part of the event. In @sec-classification-strategies, we elicit two strategies for building segmentation models, one of which does not use this parameter. Therefore, we differ other details about this parameter in the dedicated section.
\ No newline at end of file
diff --git a/_03_Methods.qmd b/_03_Methods.qmd
new file mode 100644
index 0000000..c355750
--- /dev/null
+++ b/_03_Methods.qmd
@@ -0,0 +1,190 @@
+# Methods {#sec-methods}
+
+## Classification strategies {#sec-classification-strategies}
+
+Gait event detection is performed by evaluating and comparing two strategies to classify the observations.
+
+Strategy E: Predicting gait [E]{.underline}vents
+
+: The strategy E pertains to directly predicting the gait events occuring when walking. Specifically, we classify the observations into five categories:
+
+- *Right Heel Strike*,
+- *Left Toe Off*,
+- *Left Heel Strike*,
+- *Right Toe Off*,
+- *None* (all other times not corresponding to a certain event).
+
+This strategy aims at directly predicting the occurrence of gait events of interest. However, it inherits by construction of a severe class imbalance  effect, with the *None* class widely over-represented. Indeed, the class that represents all the times that do not belong to an event is clearly larger than the four other ones (see @tbl-class-imbalance). It is clearly represented on @fig-event-timeserie where we can see the event times from the GAITRite© mat overlaid on the QTS recorded by the IMU. Each colored point represents a different event and all other times belong to the *None* class.
+
+::: {#tbl-class-imbalance tbl-pos="H"}
+
+```{r}
+tibble::tibble(
+  class = c(
+    "Right Heel Strike",
+    "Left Toe Off",
+    "Left Heel Strike",
+    "Right Toe Off",
+    "None"
+  ),
+  nb_obs = c("973", "1004", "994", "982", "158401"),
+  prop = c("0.60%", "0.62%", "0.61%", "0.60%", "97.57%")
+) |>
+  gt::gt() |>
+  gt::cols_label(
+    class = "Class",
+    nb_obs = "Number of observations",
+    prop = "Proportion"
+  ) |>
+  gt::cols_align(align = "center") |>
+  gt::tab_style(
+    style = list(gt::cell_text(style = "italic")),
+    locations = gt::cells_body(columns = class)
+  ) |>
+  gt::tab_options(column_labels.background.color = "#616161")
+```
+
+Strategy E: Count and proportion of observations in each class.
+
+:::
+
+![Gait event occurences overlaid on a unit QTS.](images/events_on_time_serie_MSI_N_R2.png){#fig-event-timeserie width=350}
+
+To mitigate this effect, when the mat records the occurence of a gait event of interest at a specific time point, we also label surrounding time points with the same event. The size of this window is controlled by a parameter $k$ which sets the number of preceding points labelled as the event of interest. It also allows to account for some uncertainty as the time range between two points is only 10 ms. For instance, if we set $k = 1$, we label $3$ observations as part of the event rather than just one. In this case, we consider that the event happens in a window of $20$ ms.
+
+Strategy P: Predicting gait [P]{.underline}hases
+
+: The strategy P pertains to predicting gait phases in the signal that are characterized by the gait events. We label each part in the signal between two events as a phase, leading to four classes for our observations:
+
+- *Pre-Stance* phase: between *Right Heel Strike* and *Left Toe Off*,
+- *Stance* phase: between *Left Toe Off* and *Left Heel Strike*,
+- *Pre-Swing* phase: between *Left Heel Strike* and *Right Toe Off*,
+- *Swing* phase: between *Right Toe Off* and *Right Heel Strike*.
+
+The definition of these phases matches a typical gait cycle starting when the right foot hits the ground. This strategy generates prediction classes that are less imbalanced, as opposed to the first strategy in which the *None* class dominates the dataset. The downside is that some post-processing is needed because the segmentation models will predict gait phases and we need to further decide on the exact orrucence of gait events of interest within the predicted phases.
+
+![Gait phases overlaid on a unit QTS.](images/phases_on_time_serie_MSI_N_R2.png){#fig-phases-timeserie width=350 fig-pos="H"}
+
+@fig-phases-timeserie shows the gait phases overlaid on a QTS. Each gait phase is mapped to a suitable color. We can see that some phases are larger than others: the *pre-stance* and *pre-swing* phases are indeed shorter than the *stance* and *swing* phases. However, class imbalance is not as severe as in the first strategy.
+
+## Supervised classification models {#sec-supervised-classification-models}
+
+In addition to the two classification strategies presented in @sec-classification-strategies, the literature presents a wide variety of machine learning models that can be used for supervised classification. In this section, we detail the different models that we selected to compare for our task of gait event detection and phase prediction. The choice of these models was made among the models available in the [{tidymodels}](https://www.tidymodels.org) framework in R [@tidymodels] in order to guarantee a consistent and identical implementation for their training and evaluation.
+
+Multinomial Regression [@hosmer2013applied]
+
+: A *multinomial regression* is built by fitting a set of independent binary logistic regressions. We want to classify our observations in $K$ classes. We choose one category to be the baseline and name it $k_0$. We build $K-1$ logit equations modeling the log-odds of a class $k$ versus the class $k_0$. The logit equation for a class $k$ is:
+
+$$
+\log \left( \frac{\mathbb{P}(Y = k | X)}{\mathbb{P}(Y = k_0 | X)} \right) = \beta_{0k} + \beta_k^\top X,
+$$ {#eq-logit}
+
+with $Y$ the class of the observation, $X$ the predictors, $\beta_{0k}$ the intercept and $\beta_{1k}, \dots, \beta_{pk}$ the regression coefficients.
+
+The softmax function is then used to convert the $K-1$ logit scores into probabilities for each class. Finally, the observation is classified in the class with the highest probability. To fit the model, ie to find the best values for the coefficients, we maximize the log likelihood function.
+
+When we have a large set of variables, we can regularized the estimation by introducing penalty terms in the log-likelihood function [@hastie2015statistical]. The two main types of penalization are the L1 regularization (also called a Lasso model) and the L2 regularization (also called a Ridge model). The Ridge regression prevents overfitting by adding the term $\lambda \sum_{k=1}^K \beta_k^2$ to the loss function, while the Lasso model selects features by adding the term $\lambda \sum_{k=1}^K |\beta_k|$ to the loss, giving a sparse solution with only some non-zero coordinates. The parameter $\lambda$ is called the regularization parameter.
+
+Decision tree [@kuhn2013applied; @breiman2017classification]
+
+: *Decision trees* consist of a nested sequence of if-then statements for the predictor classifying data. Observations are assigned to their class according to the variable values. 
+
+The goal is to classify the observations into smaller and more homogeneous groups, forming rectangular areas within data points. Homogeneity is defined here as how pure the splitting nodes are , meaning that there is a higher proportion of one class in each node. We can use the Gini index to compute purity. In a scenario with $K$ classes, each having probability $p_1, \dots, p_K$, the Gini impurity for a certain node is defined as:
+
+$$
+G(p) = \sum_{k=1}^K p_k \sum_{j \ne k}^K p_j = \sum_{k=1}^K p_k (1-p_k) = 1 - \sum_{k=1}^K (p_k)^2.
+$$ {#eq-gini}
+
+This index is minimal when one of the probabilities tends toward zero, indicating that the node is pure. To build the tree, the model builds splitting nodes by evaluating each splitting point (ie values taken by variables to split observations into groups) to find the one minimizing the Gini index, until a stopping criteria is met such as a maximum tree depth.
+
+Bagged trees, Random forest and Boosted trees [@kuhn2013applied]
+
+: A *bagged trees* model is an ensemble of $M$ decision trees trained in parallel. Each decision tree is built with a bootstrap sample of the original data, which is a sample of same size than the original data by selecting random observations with replacement. Then, each of these trees is used to generate a prediction for a new observation. Finally, the observation is classified in the class that has collected the greatest number of votes from all the trees.
+
+A *random forest* is a similar model, also using a number of decision trees. The model also trains a number of trees in parallel on bootstrap samples of the original data. The difference is that, at each split, a random subset of variables is selected to find the best predictors within this subset. The observations are then classified as usual. The prediction is done as seen before, with each tree voting for a class and selecting the majority to classify the observation.
+
+Finally, *boosted trees* train a number of decision trees sequentially to learn from the previous tree mistakes and put a higher weight on previously misclassified observations. The trees are also trained on bootstrapped samples, selecting a random subset of variables at each node. The final prediction is the weighted sum of the predictions from each tree with the weights determined by the performance of the trees.
+
+$k$-Nearest Neighbors [@cover1967nearest]
+
+: A *$k$-Nearest Neighbors* model finds the $k$ closest data points to a given input and makes prediction based on the majority class within its neighbors. This model is non-parametric as it does not make assumption about the underlying data.
+
+To calculate the distance between the input and its neighbors, the Minkowski distance can be used. It is defined as:
+
+$$
+d(x, y) = \left[ \sum_{i=1}^n (x_i-y_i)^p \right]^{1/p}.
+$$ {#eq-minkowski}
+
+This distance is a generalization of multiple well-know distances as we find the Euclidean distance when $p = 2$ and the Manhattan distance when $p = 1$.
+
+Neural networks [@varma2018]
+
+: Neural Networks are deep learning models inspired by the human brain. They consist of interconnected nodes organized in layers which process the data by passing it from one layer to the next. The model contains two essential parameters: weights on each connection and biases on each node. The input data is given to the input layer and then passes through the hidden layers. At each layer, a pre-activation is computed with the weights and biases and is transformed as an activation with non-linear functions (usually the function *ReLU*). Finally, the result goes into the output layer which uses an output function (the *softmax* function for more than two classes) to return the predicted class for each observation. It learns data pattern by adjusting the parameters during forward and backward propagation for each observation to minimize a loss that represents the difference between real and predicted values. Most often, the loss used is the cross-entropy which is defined as followed for the observation $m$:
+
+$$
+\square(m) = - \sum_{k=1}^K t_k(m) \log (y_k(m)),
+$$ {#eq-crossentropy}
+
+with $t_k$ the ground truth and $y_k$ the classification probability.
+
+These models are complex and rely on a efficient algorithm for the parameter update and an adapted activation function, as well as possible utilization of batch normalization, regularization or dropout to find a model with a good complexity for the data. Tuning is done to find the best hyper-parameters such as the number of layers and nodes or the learning rate.
+
+## Dealing with class imbalance {#sec-class-imbalance}
+
+Classification strategy E leads to a strong class imbalance as the *None* class contains most of the observations. Even strategy P has some imbalance between the classes, although less severe. This is an important issue in machine learning because, in such situations, models learn quite well to predict the majority class which, in our application, is not of great interest, but fail to correctly predict the minority classes which are the ones we want to detect. In addition, performance metrics such as accuracy are not adapted to evaluate models in this context because a model predicting only the majority class would achieve a high accuracy while being useless.
+
+A lot of sampling algorithms exist to mitigate this issue such as random oversampling or random undersampling which add or remove observations from certain classes to create a more balanced dataset. More complex methods exist, among which the popular SMOTE [@chawla2002] or the ADASYN [@haibo2008] algorithms that creates synthetic data by considering the classes of the nearest neighbors of random observations. Although these algorithms are very popular, they did not perform well in our application. We conjecture that this is due to the fact that our data is too imbalanced for these methods.
+
+To tell the model to pay more attention to the patterns in the minority classes, an effective way is to put a larger weight on the observations belonging to such classes compared. More precisely, the weight given to each observation specifies how much this observation influences the model's estimation. In order for the model to be biased towards observations deemed more important for the application at hand (minority classes in our case), the weights of the observations are integrated into the cost function. Weights naturally impact the cost of misclassification in the sense that misclassifying more important observations should cost more, encouraging the model to avoid these types of mistakes [@hashemi2018].
+
+When the classes are not too imbalanced, such as in Strategy P (phase prediction), we can use the inverse class frequency weights [@huang2016learning] where the weight on the class $k$ is given by:
+
+$$
+\omega_k = \frac{n}{n_\mathrm{classes} \hspace{1mm} n_k}
+$$ {#eq-inv-class-frequency}
+
+with $n$ the total number of observations, $n_\mathrm{classes}$ the number of classes and $n_k$ the number of observations in the class $k$. This formula yields balanced weights but we can also choose class weights manually and tune them.
+
+## Performance metrics {#sec-performance-metrics}
+
+To tune and evaluate the models, we need to use appropriate metrics according to the classification strategy and the amount of class imbalance.
+
+### Metrics for Strategy P (Phase prediction)
+
+In this strategy, we aim at predicting phases. All classes are therefore equally important and they are not severely unbalanced. Consequently, we use the accuracy metric to tune and evaluate models under this paradigm. It is the most widely used metric in machine learning and is defined as the number of correct predictions divided by the total number of predictions.
+
+### Metrics for Strategy E (Event prediction)
+
+In this strategy, we aim at predicting occurences of gait events directly. As a result, four of the five classes are the classes of interest while the majority class is not relevant. Furthermore, the majority class contains the large majority of observations (for instance, $97.57\%$ of the observations for $k=0$ and $92.70\%$ of the observations for $k=1$), which represents a case of severe class imbalance. Accuracy is not adapted in this case because a model which always predicts the majority class will be deemed a *good* model in the sense of accuracy, while completely missing the classes of interest. We therefore need to use other metrics that are adapted to this situation.
+
+An important tool for designing performance metrics of classification methods is the *confusion matrix* $C$ which is a $K \times K$ matrix for a $K$-class classification problem such that the element $c_{i,j}$ represents the number of observations from the class $i$ that have been predicted in the class $j$. This matrix summarizes the performance of a classification model by showing how many observations have been correctly or incorrectly classified in each class. In a binary classification problem, the matrix reports the number of true positives (TP), false positives (FP), true negatives (TN) and false negatives (FN). Two of the metrics that can be calculated from this matrix are known as *sensitivity* and *specificity*. The sensitivity Se = TP / (TP + FN) measures the proportion of accurate predictions for the positive class among all actual positive instances. Hence, low sensitivity indicates that the model is missing many instances of the positive class. Conversely, the specificity Sp = TN / (TN + FP) measures the proportion of accurate predictions for the negative class among all actual negative instances. In scenarios with class imbalance, sensitivity is particularly crucial as it reveals how effectively the model identifies instances of the minority class, which is often designated as the positive class.
+
+In a multiclass classification problem, we can extend these metrics to use them with more than one positive class. For detecting gait events of interest, we have four classes of interest which correspond to the minority classes compared to time points at which no relevant gait event occurs. We therefore define a $5 \times 5$ confusion matrix as exhibited in @fig-confusion-matrix. The element $c_{i,j}$ is the number of observations from the class $i$ that has been predicted in the class $j$. Therefore, $c_{i,\cdot}$ represents the number of observations from the class $i$, while $c_{\cdot,j}$ represents the number of observations that the model predicted to be in the class $j$.
+
+![Confusion matrix for five classes including the *None* one represented by the letter N and four classes of interest named P1 to P4.](images/confusion-matrix.png){#fig-confusion-matrix width=300 pos="H"}
+
+In this work, we use generalizations ofsensibility and specificity coined macro-average sensibility and macro-average specificity [@mortaz2020]. For a $K$-classifcation problem where the $K$-*th* class is not of interest, these metrics read:
+
+$$
+\mathrm{maSe} = \frac{1}{K-1} \sum_{i=1}^{K-1} \frac{c_{i,i}}{c_{i,\cdot}}, \quad \mathrm{maSp} = \frac{1}{K} \sum_{i=1}^K \frac{\sum_{j \ne k \ne i} c_{j,k} + \sum_{j \ne i} c_{j,j}}{\sum_{j \ne i} c_{\cdot,j}}.
+$$ {#eq-ma-metrics}
+
+Tuning a model is achieved by optimizing a single metric. We therefore need to combine the information from the MA Sensitivity and MA Specificity into a single metric. The weighted Youden index [@li2013weighted] has been proposed in binary classification problems which is a convex combination of sensitivity and specificity. We generalize the weighted Youden index to multiclass classification problems by plugging in the macro-average versions of sensitivity and specificity. The resulting generalized weighted Youden index (GWYI) is defined as:
+
+$$
+\mathrm{GWYI} = 2(w \times \mathrm{maSe} + (1-w) \times \mathrm{maSp}) - 1,
+$$ {#eq-youden}
+
+with $w \in [0,1]$ representing the importance given to the accurate prediction of the minority classes. In our application, we chose to put a weight of $0.7$ on the macro-average sensitivity to favor a good proportions of predictions of minority classes at the cost of over predicting those classes.
+
+## Data splitting and tuning strategy
+
+To correctly tune and evaluate the model, data is initially split into two sets: the *training set* in which we put 80% of the observations and the *test set* in which we put 20% of observations. The critical point is to never use the test set during training or tuning to be able to evaluate the trained and tuned model on data that it never used for training or tuning. We chose to keep entire walking sessions in each set so that the model would never see any observations coming from sessions in the test set during training. With this strategy, we included 139 sessions in the training set and 35 sessions in the test set.
+
+For some models such as neural networks, the training set is further divided into a train and a validation set, containing 80% and 20% of the training set respectively. The model computes the predictions on the train set and the loss is evaluated on the validation set to monitor the training process and avoid overfitting.
+
+For tuning, the training set is divided into five *folds* which are random partitions of the training set to get five equally-sized subsets. In each fold, data is then split into a train and an evaluation set, containing 90% and 10% of the data respectively (see @fig-datasplitting). The strategy of $k$-fold cross-validation for tuning models is particularly effective because it provides $k$ estimates of the performance metric on which models are tuned which reduces the variance of the estimate compared to a single train/evaluation split.
+
+![A schematic view of the data splitting process.](images/data-splitting.png){#fig-datasplitting width=600}
+
+To tune hyper-parameters, automated tuning was used by grid search which is an exhaustive search in the hyper-parameter space. For each hyper-parameter, we defined a grid of plausible values. Then, for each combination of such values, we evaluated the performance metric of interest (see @sec-performance-metrics) and chose the best possible combination of hyper-parameters as the one maximizing the metric. For each combination of hyper-parameters, the performance metric was computed by averaging the metric over the five folds of cross-validation.
\ No newline at end of file
diff --git a/_04_Results.qmd b/_04_Results.qmd
new file mode 100644
index 0000000..a54cb9e
--- /dev/null
+++ b/_04_Results.qmd
@@ -0,0 +1,314 @@
+# Results {#sec-results}
+
+In this section, we aim at comparing both the two classification strategies defined in @sec-classification-strategies and the models described in @sec-supervised-classification-models to select a final model which is best able to detect gait events from the raw sensor data.
+
+## Tuning of hyper-parameters
+
+Since we used two different performance metrics to evaluate the two strategies (the GWYI for the strategy E and accuracy for the strategy P, see @sec-performance-metrics), we first selected one best model per strategy before comparing the two strategies to choose a final model.
+
+@tbl-grid-fs summarizes the hyper-parameter values that we tested for the feature space tuning. Recall that we used two hyper-parameters for the feature space construction: the smoothing curve parameter *spar* and the number of kept times in the past *n_lag*. These two hyper-parameters are used in both strategies. Additionally, for the strategy E, we also used the hyper-parameter *k* to label time points surrounding the times identified by the mat as occurences of relevant gait events as possible occurences of these same events (see @sec-feature-space).
+
+::: {#tbl-grid-fs tbl-pos="H"}
+
+```{r}
+tibble::tibble(
+  param = c("spar", "n_lag", "k"),
+  description = c(
+    "Smoothing curve parameter",
+    "Kept times in the past",
+    "Number of labeled points"
+  ),
+  strategy = c("Both", "Both", "Strategy E"),
+  values = c("0.3, 0.4, 0.5, 0.6, 0.7", "1, 2, 3, 4, 5", "1, 2, 3")
+) |>
+  gt::gt() |>
+  gt::cols_label(
+    param = "Parameter",
+    strategy = "Strategy",
+    description = "Description",
+    values = "Tuning Grid"
+  ) |>
+  gt::cols_align(
+    align = "left",
+    columns = values
+  ) |>
+  gt::tab_style(
+    style = list(gt::cell_text(style = "italic")),
+    locations = gt::cells_body(columns = param)
+  ) |>
+  gt::tab_options(column_labels.background.color = "#616161")
+```
+
+Chosen grids for the hyper-parameters used for feature space construction.
+
+:::
+
+Next, for classification strategy E, we also tuned the class weights used to address the class imbalance issue. For this matter, we fixed the weight of the class *None* to $1$, and tested three different values for the classes corresponding to gait events of interest. These values are reported in @tbl-grid-class-weights. For classification strategy P, the class imbalance is less pronounced. Hence, as mentioned in @sec-class-imbalance, we used the inverse class frequency method which computes weights according to @eq-inv-class-frequency.
+
+::: {#tbl-grid-class-weights tbl-pos="H"}
+
+```{r}
+is_empty <- function(x) {
+  return(x == "")
+}
+
+options(gt.html_tag_check = FALSE)
+tibble::tibble(
+  param = c("Weight on class *None*", "Weight on event classes"),
+  values = c("1", "40, 60, 80")
+) |>
+  gt::gt() |>
+  gt::cols_label(
+    param = "Parameter",
+    values = "Tuning Grid"
+  ) |>
+  gt::tab_options(column_labels.background.color = "#616161") |>
+  gt::fmt_markdown(columns = param)
+```
+
+Chosen grids for the weights of the classes of interest (occurences of gait events) in Strategy E.
+
+:::
+
+Finally, we build tuning grids for every model-specific hyper-parameters which are common to both strategies. @tbl-grid-model summarizes the grids used for tuning the models.
+
+::: {#tbl-grid-model tbl-pos="H"}
+
+```{r}
+tibble::tibble(
+  model = c(
+    "Multinomial Regression",
+    "Multinomial Regression",
+    "Decision Tree",
+    "Decision Tree",
+    "Bagged Trees",
+    "Bagged Trees",
+    "Random Forest",
+    "Random Forest",
+    "Boosted Trees",
+    "Boosted Trees",
+    "Boosted Trees",
+    "Boosted Trees",
+    "KNN",
+    "KNN",
+    "Neural Network",
+    "Neural Network",
+    "Neural Network"
+  ),
+  param = c(
+    "`penalty`",
+    "`mixture`",
+    "`tree_depth`",
+    "`min_n`",
+    "`tree_depth`",
+    "`min_n`",
+    "`trees`",
+    "`min_n`",
+    "`trees`",
+    "`min_n`",
+    "`tree_depth`",
+    "`learn_rate`",
+    "`neighbors`",
+    "`dist_power`",
+    "`hidden_units`",
+    "`learn_rate`",
+    "`dropout`"
+  ),
+  description = c(
+    "Regularization parameter",
+    "Regularization type (Lasso, Ridge, Elastic Net)",
+    "Tree depth",
+    "Minimal number of observations before splitting",
+    "Tree depth",
+    "Minimal number of observations before splitting",
+    "Number of trees",
+    "Minimal number of observations before splitting",
+    "Number of trees",
+    "Minimal number of observations before splitting",
+    "Tree depth",
+    "Learning rate",
+    "Number of nearest neighbors",
+    "Minkowski distance order",
+    "Hidden layers and units",
+    "Learning rate",
+    "Rate of randomly deactivated nodes"
+  ),
+  values = c(
+    "0.01, 0.001, 0.0001",
+    "0, 0.25, 0.5, 0.75, 1",
+    "1:15",
+    "2:40",
+    "1:15",
+    "2:40",
+    "100, 300, 500, 700, 900",
+    "2:40",
+    "400, 600, 800, 1000",
+    "20, 40",
+    "5, 8, 12",
+    "0.01",
+    "3, 5, 10, 20",
+    "1, 2",
+    "hidden layers: 2:5; nodes: 100, 500",
+    "0.01, 0.001",
+    "0.1, 0.3"
+  )
+) |>
+  dplyr::group_by(model) |>
+  dplyr::mutate(model = ifelse(dplyr::row_number() == 1, model, "")) |>
+  dplyr::ungroup() |>
+  gt::gt() |>
+  gt::tab_style(
+    style = list(
+      gt::cell_borders(
+        sides = c("top", "bottom"),
+        weight = gt::px(0)
+      )
+    ),
+    locations = list(
+      gt::cells_body(
+        columns = model,
+        rows = is_empty(model)
+      )
+    )
+  ) |>
+  gt::cols_label(
+    model = "Model",
+    param = "Parameter",
+    description = "Description",
+    values = "Tuning Grid"
+  ) |>
+  gt::cols_width(
+    model ~ "25%",
+    param ~ "17%",
+    description ~ "38%",
+    values ~ "25%"
+  ) |>
+  gt::tab_options(column_labels.background.color = "#616161") |>
+  gt::fmt_markdown(columns = param)
+```
+
+Chosen grids for model-specific hyper-parameters.
+
+:::
+
+For each strategy and each model, we get the optimal parameters after tuning (see @tbl-tuning-res-events and @tbl-tuning-res-phases in the Appendix for the list of optimal parameters for each model). We then evaluate each model on the test set. @tbl-res gives the performance metrics computed on the test set for both strategies.
+
+::: {#tbl-res tbl-pos="H"}
+
+```{r}
+data.frame(
+  model = c(
+    "Multinomial Regression",
+    "Decision Tree",
+    "Bagged Trees",
+    "Random Forest",
+    "Boosted Trees",
+    "KNN",
+    "Neural Network"
+  ),
+  youden = c("67.6%", "60.0%", "70.3%", "70.5%", "83.0%", "24.0%", "88.7%"),
+  accuracy = c("82.7%", "73.9%", "74.9%", "89.4%", "89.7%", "88.0%", "89.4%")
+) |>
+  gt::gt() |>
+  gt::cols_label(
+    model = "Model",
+    youden = "Strategy E: GWYI",
+    accuracy = "Strategy P: accuracy"
+  ) |>
+  gt::tab_options(column_labels.background.color = "#616161") |>
+  gt::tab_style(
+    style = gt::cell_fill(color = 'lightgreen'),
+    locations = gt::cells_body(columns = c(youden), rows = 7)
+  ) |>
+  gt::tab_style(
+    style = gt::cell_fill(color = 'lightgreen'),
+    locations = gt::cells_body(columns = c(accuracy), rows = 5)
+  ) |>
+  gt::cols_width(
+    everything() ~ px(200)
+  )
+```
+
+Performance metrics on the test set for both strategies.
+
+:::
+
+We kept one model per strategy to maximize the performance metric evaluated on the test set. For the strategy E, we selected the neural network model which maximizes the GWYI without doubts compared to all other models. For the strategy P, four models have accuracy above 88% with very similar scores. One would be tempted to choose the neural network to have the same type of model for both strategies. However, after testing all four models on the test set by comparing them with the GAITRite mat data (comparing the number of event detected and the event times predicted), the boosted trees model (which maximizes the accuracy) was the only one not to miss any event. Hence, we selected the boosted trees model.
+
+Now that we selected a model for each strategy, the next step is to compare the two strategies to choose the best one in its ability to detect gait events.
+
+## Predicting phases or events?
+
+We now have two models to compare: the Neural Network from the strategy E predicting directly the occurences of the gait events of interest and the Boosted Trees from the strategy P predicting the gait phases instead. To choose one over the other, we will compare their predictions on the test set with the real events given by the gold standard. The latter is provided by the GAITRite© mat that the subjects walked on during the data collection sessions.
+
+As any measurement tool, this mat is not perfect and can miss some contact points at the start and the end of the walking session. Hence, we need to preprocess the time series before comparing the predicted events with the real ones. Specifically, we start by shortening the time series to keep only the time window where the gold standard detected all four events perfectly. At the start of the session, we search for the first *Heel Strike* event which is followed by the three other events in the correct order. At the end of the session, we remove all points after the first missing event.
+
+Next, no matter the classification strategy (phases v.s. events), we need to extract estimated event times from the model predictions to compare them with the true occurences of gait events of interest given by the gold standard. In effect, the model does not detect only one point matching the event of interest but a window of points containing the event as shown by @fig-preds-both-strategies. In the strategy P (phase predictions), the event corresponding to the beginning of a gait phase should be in principle the initial time point of that phase. But we observed that this is not the case with our model predictions. In the strategy E (event predictions), we could reasonably think instead that the exact occurence of the event of interest is in the center of the predicted time window.
+
+::: {#fig-preds-both-strategies layout="[45, -10, 45]" fig-pos="H"}
+
+![Strategy E: Predictions made by the neural network.](images/preds_events_and_real_events.png){#fig-preds-events width=270}
+
+![Strategy P: Predictions made by the boosted trees.](images/preds_phases_and_real_events.png){#fig-preds-phases width=270}
+
+Predictions from both classification strategies. The bigger and darker points represent the true occurences of gait events of interest as provided by the GAITRite© mat (gold standard).
+:::
+
+In any case, to provide a fair assessment of both models' performance, we devised a common procedure to get a single estimated event time from the predicted time window (Strategy E) or phase (Strategy P). Specifically, we first make sure to correctly identify predicted time windows or phases, using the two following rules:
+
+1. A threshold to first select only the observations with a high probability of class membership: $p > 0.4$.
+2. A time gap between two predicted windows or phases: $t = 40 ms$.
+ 
+These two rules applied sequentially allow us to define unique non-overlapping predicted windows or phases. They also allow us to have windows containing points that are not necessarily consecutive, as we need at least $40 ms$ between observations to consider them in distinct windows. Then, in each of these clearly separated windows or phases, we define the estimated occurrence of the gait event of interest as the point that has the highest probability of class membership according to the prediction model.
+
+The last step is to compare these estimated events with the ones given by the gold standard to decide which of the two strategies leads to the best results and choose a definitive model. The first and simplest criterion that we can think of is the number of each event detected during the walking sessions. For both strategies, the model predicts the exact number of events for every classes in each session. It means that both models seem to perform well but we still cannot tell them apart. The second criterion that can be compared is the time difference between the estimated and true occurences of gait events of interest. To that effect, @fig-boxplot-comparison shows the distributions of such time differences on the test set for each gait event of interest and for each strategy.
+
+![Mean time difference between true and predicted occurences of gait events of interest for both strategies.](images/boxplot_both_strategies.png){#fig-boxplot-comparison width=600}
+
+In the strategy P, the *Heel-Strike* events seem to be well estimated but it not the case for the *Toe-Off* events that are predicted too late, about 150 ms after the actual time of the event. This could mean that, for these events, it would give better results to take one of the first points of the predicted phase instead of the point with the highest class membership probability. However, we do not have a sound theoretical explanation of this phenomenon. Hence, we cannot currently justify to use a different procedure for estimating the occurence of heel strike events compared to toe-off events. On the other hand, in the strategy E, all of the event times are estimated near the actual ones regardless of the type of event. The estimated time median is at approximately 50 ms before the actual event, with a few outliers. Overall, @fig-boxplot-comparison suggests that the model associated with the strategy E is better at estimating the occurrences of gait events of interest.
+
+## Detailed performance of the selected model
+
+The final selected model is the neural network from the strategy E predicting directly the event times. In this section, we give a more in-depth evaluation of this model. @tbl-final-eval summarizes the values of the three performance metrics adapted to this strategy: the macro-average sensitivity and specificity and the generalized weighted Youden index.
+
+::: {#tbl-final-eval tbl-pos="H"}
+
+```{r}
+tibble::tibble(
+  youden = c("88.7%"),
+  sen = c("93.4%"),
+  spe = c("96.4%")
+) |>
+  gt::gt() |>
+  gt::cols_label(
+    youden = "Generalized Weighted Youden Index",
+    sen = "MA Sensitivity",
+    spe = "MA Specificity"
+  ) |>
+  gt::tab_options(column_labels.background.color = "#616161")
+```
+
+Performance metrics of the selected Neural Network on the test set.
+
+:::
+
+
+For a thorough comparison of our model with the gold standard, we use state-of-the-art spatio-temporal gait parameters [@hollman2011normative] that can be computed from the four gait events of the walking cycle that we are now able to estimate. 
+
+Specifically, we focused on (i) the number of cycles in a session, (ii) the cycle duration and (iii) the percentage of the stance phase. We computed the mean value of these parameters on each session from the test set by foot (hence 70 observations, 1 per session and foot in the test set).
+
+For the number of cycles in each session, we found the exact same number of cycles using our model and the gold standard, which is not surprising since we previously showed that the model detected the correct number of each event in every session. For the mean cycle duration and the mean percentage of stance phase, @fig-bland-altman presents the results in the form of Bland-Altman plots [@bland_altman].
+
+::: {#fig-bland-altman layout="[48, -4, 48]" fig-pos="H"}
+
+![Comparison of the mean cycle duration (seconds).](images/ba_cycle_length.png){#fig-ba-cycle-length width=270}
+
+![Comparison of the mean stance percentage (%).](images/ba_stance_percent.png){#fig-ba-stance-percent width=270}
+
+Bland-Altman plots to compare our model with the gold standard on classical spatio-temporal gait parameters, for both feet on the test set.
+:::
+
+For both parameters, the bias represented by the dashed line in the blue interval is really close to zero, meaning that our model gives similar results to the gold standard. More precisely, for the mean cycle duration, the differences between the two methods are between $-26$ ms and $+21$ ms, as shown by the dashed lines in the red and green intervals. Since a gait cycle typically lasts one second, this difference interval seems small enough to consider our method to be comparable to the gold standard. For the mean stance percentage, the differences are between $-8.50\%$ and $+6.5\%$ while it is known that the stance phase generally occupies about $60\%$ of the gait cycle [@LARIBI202083].
+
+
+Hence, once again, our model gives coherent results that are close to the parameters given by the reference method.
\ No newline at end of file
diff --git a/_05_Discussion.qmd b/_05_Discussion.qmd
new file mode 100644
index 0000000..8bf586e
--- /dev/null
+++ b/_05_Discussion.qmd
@@ -0,0 +1,9 @@
+# Discussion and conclusions {#sec-discussion}
+
+In this study, we used rotation data in the shape of unit quaternion time series to analyze the hip rotation of subjects walking in a straight line. We used the GAITRite© sensor mat as the reference method to label data acquired with a wearable sensor placed at the right hip to measure rotations over time. The goal was to detect key gait events (*Heel Strike* and *Toe Off* events) in the signal to compute from them spatio-temporal parameters giving similar results as the gold standard. 
+
+We compared two strategies to detect events during gait cycles: predicting directly the gait events or predicting sub-phases in between these events. We selected a model for both strategies, a neural network for the first one and a boosted trees model for the second one, by tuning and evaluating various classification models on different metrics: the Generalized Weighted Youden Index for event detection and accuracy for sub-phases detection. By comparing the two models with the reference events given by the gold standard GAITRite©, we chose to keep the strategy detecting directly the events as it shown more coherent results. The selected model is a neural network containing four hidden layers each having 500 nodes, with higher class weights on the minority classes of interest to address the issue of class imbalance. This model performs well on the test set as we get similar results than the ones provided by the reference method, as for the number of gait cycles, their duration, and the percentage of stance phase in each gait cycles.
+
+Since this model was trained on healthy subjects, we expect less performance on data from patients with gait disorders as their gait would be less regular. Indeed, the model was tested on a data set containing gait data from 44 multiple sclerosis patients, each walking approximately seven meters while wearing the IMU. As we don't have reference points for this data set, the evaluation of the model's predictions was visual. For each time-serie, we plotted the predictions and rated them in three classes: perfect performance (the model detects all four events in the correct order for all the sessions), moderate performance (the model misses a few events but correctly segments the signal into cycles) and poor performance (the model misses most events). Within the 44 patients, we counted 20 perfect performances, 19 moderate performances and 5 poor performances, hence the model works perfectly for 45% of the data set and works poorly on only 11% of the data set. We can note that some of the recorded signals seem to have anomalies such as abnormal spikes, aside from the fact that unhealthy gaits are not necessarily regulars. Therefore, the results are promising.
+
+To improve our model so it better generalizes on various subjects especially with gait disorders, we could acquire data on more subjects in various age groups. We could also use transfer learning to go from this model trained on healthy subject to a model specialized for subjects having gait disorders. Due to the data particularity, we could perform rotation correction or use rotation-invariant features to counter the issue of the referential being different between subjects. Finally, we could add in the model the information of the order in which the events should be predicted (*Right Heel Strike*, then *Left Toe Off*, then *Left Heel Strike*, then *Right Toe Off*, etc) as this could prevent the model to predict wrong events.
\ No newline at end of file
diff --git a/_Appendix.qmd b/_Appendix.qmd
new file mode 100644
index 0000000..4016d4a
--- /dev/null
+++ b/_Appendix.qmd
@@ -0,0 +1,314 @@
+# Appendix {.unnumbered}
+
+::: {#tbl-gaitrite-params tbl-pos="H"}
+
+```{r}
+#| layout="[45, -10, 45]"
+
+tibble::tibble(
+  param = c(
+    "Total distance (cm)",
+    "Total duration (s)",
+    "Speed (cm/s)",
+    "Normalized speed",
+    "Number of steps",
+    "Walking pace (step/m)",
+    "Difference of step duration (s)",
+    "Difference of step length (cm)",
+    "Difference of cycle duration (s)"
+  ),
+) |>
+  gt::gt() |>
+  gt::cols_label(
+    param = "Global Parameters",
+  ) |>
+  gt::tab_options(column_labels.background.color = "#616161")
+
+tibble::tibble(
+  param = c(
+    "Step duration (s)",
+    "Cycle duration (s)",
+    "Step length (cm)",
+    "Cycle length (cm)",
+    "Step width (cm)",
+    "Simple support phase (%)",
+    "Double support phase (%)",
+    "Stance phase (%)",
+    "Swing phase (%)",
+    "Normalized step",
+    "Foot rotation (°)"
+  ),
+) |>
+  gt::gt() |>
+  gt::cols_label(
+    param = "Bilateral Parameters",
+  ) |>
+  gt::tab_options(column_labels.background.color = "#616161")
+```
+
+Spatio-temporal parameters returned by the GAITRite© pressure mat.
+
+:::
+
+
+::: {#tbl-tuning-res-events tbl-pos="H"}
+
+```{r}
+tibble::tibble(
+  model = c(
+    rep("Multinomial Regression", 6),
+    rep("Decision Tree", 6),
+    rep("Bagged Trees", 6),
+    rep("Random Forest", 6),
+    rep("Boosted Trees", 8),
+    rep("KNN", 5),
+    rep("Neural Network", 7)
+  ),
+  param = c(
+    "spar",
+    "n_lag",
+    "k",
+    "weight on positive classes",
+    "`penalty`",
+    "`mixture`",
+    "spar",
+    "n_lag",
+    "k",
+    "weight on positive classes",
+    "`tree_depth`",
+    "`min_n`",
+    "spar",
+    "n_lag",
+    "k",
+    "weight on positive classes",
+    "`tree_depth`",
+    "`min_n`",
+    "spar",
+    "n_lag",
+    "k",
+    "weight on positive classes",
+    "`trees`",
+    "`min_n`",
+    "spar",
+    "n_lag",
+    "k",
+    "weight on positive classes",
+    "`trees`",
+    "`min_n`",
+    "`tree_depth`",
+    "`learn_rate`",
+    "spar",
+    "n_lag",
+    "k",
+    "`neighbors`",
+    "`dist_power`",
+    "spar",
+    "n_lag",
+    "k",
+    "weight on positive classes",
+    "`hidden_units`",
+    "`learn_rate`",
+    "`dropout`"
+  ),
+  values = c(
+    "0.4",
+    "5",
+    "1",
+    "80",
+    "0.0001",
+    "1",
+    "0.3",
+    "5",
+    "1",
+    "80",
+    "5",
+    "7",
+    "0.4",
+    "4",
+    "1",
+    "60",
+    "6",
+    "32",
+    "0.6",
+    "4",
+    "2",
+    "80",
+    "900",
+    "40",
+    "0.4",
+    "4",
+    "2",
+    "80",
+    "400",
+    "40",
+    "5",
+    "0.01",
+    "0.4",
+    "5",
+    "5",
+    "5",
+    "2",
+    "0.4",
+    "5",
+    "1",
+    "80",
+    "rep(500, 4)",
+    "0.01",
+    "0.1"
+  )
+) |>
+  dplyr::group_by(model) |>
+  dplyr::mutate(model = ifelse(dplyr::row_number() == 1, model, "")) |>
+  dplyr::ungroup() |>
+  gt::gt() |>
+  gt::tab_style(
+    style = list(
+      gt::cell_borders(
+        sides = c("top", "bottom"),
+        weight = gt::px(0)
+      )
+    ),
+    locations = list(
+      gt::cells_body(
+        columns = model,
+        rows = is_empty(model)
+      )
+    )
+  ) |>
+  gt::tab_style(
+    style = list(gt::cell_text(style = "italic")),
+    locations = gt::cells_body(
+      columns = param,
+      rows = param %in% c("spar", "n_lag", "k")
+    )
+  ) |>
+  gt::cols_label(
+    model = "Model",
+    param = "Parameter",
+    values = "Optimal value"
+  ) |>
+  gt::tab_options(column_labels.background.color = "#616161") |>
+  gt::fmt_markdown(columns = param)
+```
+
+Optimal parameters for event detection for each tuned model.
+
+:::
+
+::: {#tbl-tuning-res-phases tbl-pos="H"}
+
+```{r}
+tibble::tibble(
+  model = c(
+    rep("Multinomial Regression", 4),
+    rep("Decision Tree", 4),
+    rep("Bagged Trees", 4),
+    rep("Random Forest", 4),
+    rep("Boosted Trees", 6),
+    rep("KNN", 4),
+    rep("Neural Network", 5)
+  ),
+  param = c(
+    "spar",
+    "n_lag",
+    "`penalty`",
+    "`mixture`",
+    "spar",
+    "n_lag",
+    "`tree_depth`",
+    "`min_n`",
+    "spar",
+    "n_lag",
+    "`tree_depth`",
+    "`min_n`",
+    "spar",
+    "n_lag",
+    "`trees`",
+    "`min_n`",
+    "spar",
+    "n_lag",
+    "`trees`",
+    "`min_n`",
+    "`tree_depth`",
+    "`learn_rate`",
+    "spar",
+    "n_lag",
+    "`neighbors`",
+    "`dist_power`",
+    "spar",
+    "n_lag",
+    "`hidden_units`",
+    "`learn_rate`",
+    "`dropout`"
+  ),
+  values = c(
+    "0.4",
+    "5",
+    "0.0001",
+    "1",
+    "0.6",
+    "4",
+    "6",
+    "13",
+    "0.4",
+    "4",
+    "8",
+    "40",
+    "0.7",
+    "5",
+    "900",
+    "2",
+    "0.7",
+    "5",
+    "900",
+    "3",
+    "12",
+    "0.1",
+    "0.7",
+    "5",
+    "10",
+    "1",
+    "0.4",
+    "4",
+    "rep(500, 3)",
+    "0.1",
+    "0.1"
+  )
+) |>
+  dplyr::group_by(model) |>
+  dplyr::mutate(model = ifelse(dplyr::row_number() == 1, model, "")) |>
+  dplyr::ungroup() |>
+  gt::gt() |>
+  gt::tab_style(
+    style = list(
+      gt::cell_borders(
+        sides = c("top", "bottom"),
+        weight = gt::px(0)
+      )
+    ),
+    locations = list(
+      gt::cells_body(
+        columns = model,
+        rows = is_empty(model)
+      )
+    )
+  ) |>
+  gt::tab_style(
+    style = list(gt::cell_text(style = "italic")),
+    locations = gt::cells_body(
+      columns = param,
+      rows = param %in% c("spar", "n_lag")
+    )
+  ) |>
+  gt::cols_label(
+    model = "Model",
+    param = "Parameter",
+    values = "Optimal value"
+  ) |>
+  gt::tab_options(column_labels.background.color = "#616161") |>
+  gt::fmt_markdown(columns = param)
+```
+
+Optimal parameters for phase detection for each tuned model.
+
+:::
\ No newline at end of file
diff --git a/_quarto.yml b/_quarto.yml
index fbaab41..c7a4d5d 100644
--- a/_quarto.yml
+++ b/_quarto.yml
@@ -11,6 +11,10 @@ author:
     email: lise.bellanger@univ-ubs.fr
     affiliations:
       - name: Lab-STICC Laboratory, UMR CNRS 6285, South Brittany University
+  - name: Thibault Deschamps
+    email: thibault.deschamps@univ-nantes.fr
+    affiliations:
+      - name: Department of Movement, Interactions and Performance, UR 4334, Nantes University, France
   - name: Aymeric Stamm
     corresponding: true
     email: aymeric.stamm@cnrs.fr
diff --git a/references.bib b/references.bib
index ab3780c..682c779 100644
--- a/references.bib
+++ b/references.bib
@@ -2,6 +2,17 @@ @Preamble{
 
 ----------------------- Context ------------------------------------------------
 
+@article{montero2019consensus,
+  title={Consensus on shared measures of mobility and cognition: from the Canadian Consortium on Neurodegeneration in Aging (CCNA)},
+  author={Montero-Odasso, Manuel and Almeida, Quincy J and Bherer, Louis and Burhan, Amer M and Camicioli, Richard and Doyon, Julien and Fraser, Sarah and Muir-Hunter, Susan and Li, Karen ZH and Liu-Ambrose, Teresa and others},
+  journal={The Journals of Gerontology: Series A},
+  volume={74},
+  number={6},
+  pages={897--909},
+  year={2019},
+  publisher={Oxford University Press US}
+}
+
 @article{beauchet2016poor,
   title={Poor gait performance and prediction of dementia: results from a meta-analysis},
   author={Beauchet, Olivier and Annweiler, Cédric and Callisaya, Michele L and De Cock, Anne-Marie and Helbostad, Jorunn L and Kressig, Reto W and Srikanth, Velandai and Steinmetz, Jean-Paul and Blumen, Helena M and Verghese, Joe and others},
@@ -359,6 +370,17 @@ @article{Menz2004
   author = {Hylton B Menz and Mark D Latt and Anne Tiedemann and Marcella {Mun San Kwan} and Stephen R Lord},
 }
 
+@article{de1992stability,
+  title={Stability of choice reaction time and synchronicity of peak EMG values during bimanual reactions},
+  author={De Brabander, Bert and Gerits, Pol and Boone, Christophe},
+  journal={Perceptual and motor skills},
+  volume={75},
+  number={1},
+  pages={165--166},
+  year={1992},
+  publisher={Sage Publications Sage CA: Los Angeles, CA}
+}
+
 @book{ramsay2005,
   title={Functional Data Analysis},
   author={J. O. Ramsay, B. W. Silverman},
@@ -367,6 +389,27 @@ @book{ramsay2005
   doi={https://doi.org/10.1007/b98888}
 }
 
+@article{moakher02,
+  title = {Means and averaging in the group of rotations},
+  author = {M. Moakher},
+  journal = {SIAM Journal on Matrix Analysis and Applications},
+  volume = {24},
+  number = {1},
+  pages = {1--16},
+  year = {2002},
+  publisher = {Citeseer},
+}
+
+@inproceedings{manton04,
+  title={A globally convergent numerical algorithm for computing the centre of mass on compact Lie groups},
+  author={Manton, Jonathan H},
+  booktitle={Control, Automation, Robotics and Vision Conference, 2004. ICARCV 2004 8th},
+  volume={3},
+  pages={2211--2216},
+  year={2004},
+  organization={IEEE}
+}
+
 @electronic{narayan2017,
   author = {Ashwin Narayan},
   title = {How to Integrate Quaternions},
@@ -460,6 +503,14 @@ @article{hashemi2018
   doi = {10.19139/soic.v6i4.479}
 }
 
+@inproceedings{huang2016learning,
+  title={Learning deep representation for imbalanced classification},
+  author={Huang, Chen and Li, Yining and Loy, Chen Change and Tang, Xiaoou},
+  booktitle={Proceedings of the IEEE conference on computer vision and pattern recognition},
+  pages={5375--5384},
+  year={2016}
+}
+
 @article{mortaz2020,
   title={Imbalance accuracy metric for model selection in multi-class imbalance classification problems},
   author={Mortaz, Ebrahim},
@@ -481,6 +532,30 @@ @article{li2013weighted
   publisher={LWW}
 }
 
+@article{hollman2011normative,
+  title={Normative spatiotemporal gait parameters in older adults},
+  author={Hollman, John H and McDade, Eric M and Petersen, Ronald C},
+  journal={Gait \& posture},
+  volume={34},
+  number={1},
+  pages={111--118},
+  year={2011},
+  publisher={Elsevier}
+}
+
+@incollection{LARIBI202083,
+  title = {Chapter 4 - Human lower limb operation tracking via motion capture systems},
+  editor = {Marco Ceccarelli and Giuseppe Carbone},
+  booktitle = {Design and Operation of Human Locomotion Systems},
+  publisher = {Academic Press},
+  pages = {83-107},
+  year = {2020},
+  isbn = {978-0-12-815659-9},
+  doi = {https://doi.org/10.1016/B978-0-12-815659-9.00004-4},
+  url = {https://www.sciencedirect.com/science/article/pii/B9780128156599000044},
+  author = {Med Amine Laribi and Said Zeghloul}
+}
+
 @article{bland_altman,
   title = {Statistical Methods For Assessing Agreemment Between Two Methods Of Clinical Measurement},
   journal = {The Lancet},
diff --git a/template-computo-R.qmd b/template-computo-R.qmd
index 3035816..55d8cf6 100644
--- a/template-computo-R.qmd
+++ b/template-computo-R.qmd
@@ -1,855 +1,18 @@
-\newpage
 \tableofcontents
-\newpage
 
-# Introduction
+{{< include _01_Introduction.qmd >}}
 
-## Gait analysis
+{{< include _02_Material.qmd >}}
 
-Study of the human gait has been shown to be important in various health applications, such as study of general health in elderly populations. It has been shown that gait parameters including walking speed or gait variability are very good indicators of their general health, whether it is their physical health or their cognition [@beauchet2016poor; @annweiler2009risk]. 
+{{< include _03_Methods.qmd >}}
 
-To monitor gait, various methods such as motion capture systems or pressure mats are widely used due to their reliability but they require patients to travel to a health center owning a device. Furthermore, the measures are made in front of medical workers leading to an observational bias in the patient's performance. To address these issues, wearable sensors are increasingly used for gait analysis as they are smaller, cheaper and more convenient than more traditional methods as they can be used outside of medical structures.
+{{< include _04_Results.qmd >}}
 
-Gait analysis is based on the cyclical phenomenon of walking as it can be decomposed as a number of strides, each composed of two steps. A stride is more formally called a gait cycle and is a precise biomechanical concept which is the elementary element of gait analysis. We define a gait cycle as the set of movements performed between two consecutive contacts of the heel of the same foot with the ground. The goal of our method is to detect key events happening during gait cycles both to segment the gait as strides and to further study the gait by computing spatio-temporal parameters based on the events.
-
-More precisely, the key events are the moments where a foot touches or leaves the ground. They are called *Heel Strike* and *Toe Off* respectively and they can be used to segment the gait cycle in phases. The stance phase corresponds to the moments when the foot is in contact with the ground and can be found between the *Right Heel Strike* and *Left Toe Off* events when a cycle starts with the right foot. The stance phase lasts generally 60% of the entire stride and is followed by the swing phase when the right foot is not anymore in contact with the ground (see @fig-walking-cycle).
-
-![Events and phases of a typical gait cycle. [^jacquelin-perry]](images/walking-cycles.png){#fig-walking-cycle width=400}
-
-[^jacquelin-perry]: Figure annotated from Jaquelin Perry's one on [Wikipedia](https://commons.wikimedia.org/wiki/File:GaitCycle_by_JaquelinPerry.jpg).
-
-In healthy gaits, a gait cycle usually lasts one second and the time difference between two events can be as little as 100 ms. The events require a precise detection for an accurate analysis of gait parameters.
-
-
-## State of the art
-
-In the past, several methods for gait cycle segmentation and event detection have been studied based on signals recorded from wearable sensors placed on different parts of the body. We try in this part to give an overview of proposed methods by grouping them into categories, with variations related to the position of sensors on the body, the type of sensor measurement system used, and the characteristics of the signal.
-
-Search for feature points: Peak Detection and zero-crossing
-
-: Peak detection algorithms exploit the semi-periodic properties of the gait cycle, assuming that specific events during a gait cycle typically match the maxima or minima of a recorded signal. Several algorithms based on this method have been developed. Spatio-temporal gait parameters have been determined from the acceleration signal of a sensor by using peak detection and zero-crossing methods to identify stride cycles [@zijlstra2003assessment]. Some peak detection algorithms have been developed to work on any kind of signal [@jiang2017robust] or are specifically implemented to work to detect gait events on a signal, often Heel-Strike on acceleration data [@lueken2019peak]. Most methods detect Heel-Strike and Toe-off events by identifying minima, maxima or zero-crossing of acceleration time-series [@gonzalez2010real;@mariani2013quantitative;@panebianco2018analysis]. These events can be called Initial Contact and Terminal Contact and have also been identified on the foot inclination angle in the sagittal plane [@nazarahari2022foot].
-
-Analysis of dynamic properties: Wavelet Transform
-
-: To detect gait events in recorded signals, another used method is wavelet transform because it allows detection of a specified frequency at a specified time. This method has been used to detect *Heel Strike* and *Toe Off* events on an angular velocity signal by using multi-resolution wavelet decomposition [@aminian2002spatio]. The signal is decomposed into wavelet packages, using high-scale and low-scale filters. In @mccamley2012enhanced, the signal of the vertical acceleration is smoothed by integrating and then differentiating using a Gaussian Continuous Wavelet Transform (CTW). Events are found on minima and maxima of the differentiated signal. Furthermore, a method called Sparsity-assisted wavelet denoising (SWAD) has been developed to segment the signal in three events: mid-stance, toe-off and heel-strike [@prateek2019gait]. It uses a combination of linear time invariant filters, wavelets and sparsity based methods to extract a coefficient vector of Discrete Wavelet Transform (DTW). That generates a sparse template of moving segments of gyroscope measurements.
-
-Pattern-matching: Dynamic Time Warping (DTW)
-
-: To segment signals, another method is pattern-matching. Dynamic Time Warping is widely used for this matter. It allows identification of patterns with different length and it matches signals non-linearly. Thus it is commonly used to evaluate the similarity between time-series. This method has been used on an acceleration signal to identify gait cycles [@ghersi2020gait]. In another study, multi-dimensional subsequence DTW (msDTW) was used to segment strides using information from different axes of an accelerometer and a gyroscope at the same time [@barth2015stride].
-
-Hidden Markov Models
-
-: Hidden Markov Models (HMMs) are machine learning models that can be used to simultaneously segment and classify data. In gait analysis, hidden states of the HMM are viewed as activity classes or gait phases. Continuous HMM with Gaussian mixture model are often used to model the outputs. Each cycle can be modeled with a circular left to right HMM, included in a more global HMM model to classify walking activities [@panahandeh2013continuous]. Other algorithms have the goal of detecting four gait events which are Heel-Strike, Flat-Foot, Heel-Off and Toe-Off. They use a four state left to right HMM with observations following a multivariate Gaussian distributions [@mannini2011hidden;@garcia2022adaptive]. Hierarchical HMMs have shown to perform better than Peak detection and Dynamic Time Warping algorithms on walking data [@haji2018segmentation].
-
-Other Machine Learning methods
-
-: Many studies on gait analysis use wearable sensors and machine learning. A study showed that most used methods for this purpose are SVM and CNN, and are largely used to detect diseases or recognize activities when analyzing gait data [@saboor2020latest]. SVM divides a set of objects into classes with widest possible margin at the hyperplane boundaries. Neural Networks use interconnected nodes organized in layers to process information and learn patterns through back-propagation and make predictions using activation functions.
-
-: Studies show good results when using machine learning to classify gait of patients having trouble walking, for instance patients with Parkinson Disease [@tahir2012parkinson;@wahid2015classification], or to extract gait parameters [@rampp2014inertial;@hannink2016sensor]. These studies mostly use Artificial Neural Networks but do not address the problem of the segmentation of the signal recorded by the sensors. Similarly, in [@farah2019design], a logistic regression tree is used to classify gait phases in a stride but not to directly segment the signal into strides.
-
-: For the segmentation matter, Recurrent Neural Networks have shown good results for identifying Heel-Strike and Toe-off events with pressure sensors, accelerations and Euler angles [@prado2019gait]. Another study compared unsupervised machine learning (k-means) with supervised one (SVM and CNN), showing that CNNs were performing the best to predict stance and swing phases [@potluri2019machine]. Lastly, in another study, 24 time-series from three sensors’ accelerometers and gyroscopes were used as input into a 6 layers CNN to estimate the likelihood of the corresponding input sample, given a specific gait event [@gadaleta2019deep]. For instance, the signal can be segmented with the initial contact of the right foot if this gait event is chosen in the model. This method was shown to perform better than a Wavelet Transform algorithm.
-
-We did not find in the literature a method based on hip rotations data represented as unit quaternion time-series which is the particularity of the data type used in our work. Furthermore, most methods presented either segment strides during a walking session or detect events inside already segmented strides. We propose here a method segmenting the gait cycles and detecting key gait events at the same time with a unique model.
-
-The paper is structured as follows. The second section focuses on the data type used in the study, describing data acquisition, the raw data properties and the creation of the feature space. The third section describes the strategies and methods used to implement models in order to take into account the data specificity as well as accurately tuning and evaluating models. The fourth section presents the results leading to the choice and evaluation of a final model. We complete our work with a discussion and conclusion on the study. All codes were developed in R [@Rlanguage] using the {tidymodels} framework [@tidymodels].
-
-
-# Material
-
-This section details the data employed in this study. As unit quaternion time-series are used to represent rotations, we start by defining them and linking them to our goal.
-
-## Unit quaternions
-
-Unit quaternions can be used to represent the 3D rotation of an object over time [@hamilton1844;@voight2021quaternion]. This representation has several advantages, as it is a rather compressed representation containing only four values and avoiding the gimbal lock problem presents in other representations. Unit quaternions were therefore chosen as the data type returned by the sensor.
-
-Quaternions are four-dimensional vectors denoted as $\mathbf{q} = \left( q_w, q_x, q_y, q_z \right)$, but can also be viewed as hypercomplex numbers of rank 4. Unit quaternions have a norm of 1 and can encode a 3D rotation with a rotation angle $\theta \in [0, 2\pi]$ and a rotation axis $\mathbf{u} = (u_x, u_y, u_z) \in S^2$, where $S^2$ is the 2-sphere, using the following formula:
-
-$$
-\mathbf{q} = \left( q_w, q_x, q_y, q_z \right)
-= \text{cos}\left(\frac{\theta}{2}\right) + \left(u_x i + u_y j + u_z k \right) \text{sin}\left(\frac{\theta}{2}\right)
-$$ {#eq-quaternions}
-
-with :
-
-- $i$, $j$, and $k$ generalizing the imaginary number $i$, as $i^2=j^2=k^2=ijk=-1$.
-- $||\mathbf{q}|| = \sqrt{\mathbf{qq}^t} = 1$.
-
-The set of unit quaternions, denoted $\mathbb{H}_u$, possesses interesting properties. The quaternions $\mathbf{q}$ and $-\mathbf{q}$ represent the same rotation. The group is equipped with an identity element $\mathbf{q}^{(0)} = (1,0,0,0)$ which corresponds to the identity rotation, such that $\mathbf{q} \mathbf{q}^{(0)} = \mathbf{q}^{(0)}\mathbf{q} = \mathbf{q}$.
-
-It is possible to use the geodesic distance $d_g$ between two quaternions $\mathbf{q}_1$ and $\mathbf{q}_2$ to define a metric space $(\mathbb{H}_u, d_g)$, with:
-
-$$
-d_g (\mathbf{q}_1, \mathbf{q}_2) = ||\text{ln}(\mathbf{q}_1^{-1} \mathbf{q}_2)|| = \text{arccos}\left(2(\mathbf{q}_1 . \mathbf{q}_2)^2 -1\right)
-$$ {#eq-dist-geodesique}
-
-with the log-quaternion defined as $\text{ln}(\mathbf{q}) =  \text{ln}\left(\text{exp}\left({\mathbf{u}\dfrac{\theta}{2}}\right)\right)$.
-
-In this application, the sensor orientation is the rotation between the reference frame of origin, here the Earth's reference frame $R_f=(f_1, f_2, f_3)$ and its own reference frame $R_s=(s_1, s_2, s_3)$ formed by the accelerometer, gyroscope, and magnetometer (see @fig-sensor-axis).
-
-![Sensor reference and axis. [@drouin2023semi]](images/sensor-axis.png){#fig-sensor-axis width=200}
-
-A quaternion time-serie is a set of unit quaternions following a temporal grid $t_1, \dots, t_p$, noted as $\mathbf{Q}_i = (\mathbf{q}_{1i}, \dots, \mathbf{q}_{pi})$. It represents the consecutive 3D hip rotation over time, as four-component vectors.
-
-
-## Data acquisition
-
-A wearable sensor was used to record the hip rotation. It contains an accelerometer, a gyroscope and a magnetometer. Subjects were wearing the sensor on their right hip (see @fig-sensor-position). The frequency of the sensor is 100 Hz so data is acquired every 10 ms. With this device, the data acquired is in the form of unit quaternion time-series as presented previously.
-
-![Sensor positionned on the right hip.](images/sensor-position.png){#fig-sensor-position width=150}
-
-During acquisitions, subjects were walking on the GAITRite© mat, a gold standard in gait analysis [@Menz2004] while wearing the wearable sensor on their right hip. It implies that subjects were walking approximately nine meters on a straight line. The GAITRite© device gives various information thanks to pressure sensors contained in the mat such as the times where the subject feet touche and leave the ground at each step, meaning that these times are used to know when the gait events actually happened. This data is then used label the quaternion time-series to train the model. To use the two devices simultaneously, they were started at the same time by the same person with their two indexes, allowing a good synchronization between devices.
-
-Six subjects have been included in this study (three men and three women) walking at different speeds to have a variety of gait data (see @tbl-subjects). We have in total 174 walking sessions acquired between June and September 2024.
-
-::: {#tbl-subjects tbl-pos="H"}
-
-```{r}
-data.frame(
-  id = c("MBA", "MBO", "MSI", "MTR", "NNE", "TDE"),
-  gender = c("M", "F", "F", "M", "F", "M"),
-  age = c("50-60", "20-30", "20-30", "20-30", "20-30", "50-60"),
-  slow = c(60, 73, 67, 77, 57, 61),
-  intermediate = c(116, 122, 115, NA, 116, NA),
-  preferential = c(145, 145, 148, 132, 147, 120),
-  fast = c(199, 188, 179, 185, 190, 193)
-) |> 
-  gt::gt() |> 
-  gt::tab_spanner(
-    label = "Walking speed (cm/s)",
-    columns = c(slow, intermediate, preferential, fast)
-  ) |> 
-  gt::cols_label(
-    id = "ID",
-    gender = "Gender",
-    age = "Age range (years)",
-    slow = "Slow",
-    intermediate = "Intermediate",
-    preferential = "Preferential",
-    fast = "Fast"
-  ) |> 
-  gt::sub_missing() |> 
-  gt::opt_stylize(style = 6, color = 'gray') |> 
-  gt::cols_align(align = "center") |> 
-  gt::tab_style(
-    style = "vertical-align:top", 
-    locations = gt::cells_column_labels()
-  )
-```
-
-Summary of subjects and walking speeds used to train the model.
-
-:::
-
-## Data presentation
-
-Sensor data
-
-: As mentioned before, the wearable sensor returns unit quaternion time-series representing the rotation of the hip over time, allowing visualization of each coordinate time-serie (see @fig-timeserie) at each recorder time. The four coordinates comes from the definition of quaternions as defined in @eq-quaternions. It is important to note that on this figure, the gait cycles are clearly apparent and consistent over time, as it represents a healthy gait, which would not necessarily be the case for subjects having gait disorders.
-
-![Data returned by the wearable sensor as a four-coordinate unit quaternion time-serie.](images/time-serie.png){#fig-timeserie width=350}
-
-Sensor data preprocessing 
-
-: A centring step is applied on the quaternion time-series to center them around a mean. Supposing we have $n$ time-series $\mathbf{Q}_1, \dots, \mathbf{Q}_n$ on the same time grid $t_1, \dots, t_p$, we can write $\mathbf{Q}_i(t_j) = \mathbf{q}_{ij} \in \mathbb{H}_u$, with $i \in [\![ 1, n ]\!]$ and $j \in [\![ 1, p ]\!]$. We use the Fréchet mean associated to the geodesic distance $d_g$ (see @eq-dist-geodesique) to compute the mean of each quaternion $\mathbf{q}_{1j}, \dots, \mathbf{q}_{nj}$ for each time $t_j$. 
-
-: $$
-\mathbf{q}_j^{(m)} = \mathbf{Q}^{(m)} (t_j) = \underset{\mathbf{q} \in \mathbb{H}_u}{\mathrm{argmin}} \sum_{i=1}^n d_g^2(\mathbf{q}_{ij}, \mathbf{q}), \hspace{5mm} j \in [\![ 1, p ]\!]
-$$ {#eq-mean-qts}
-
-: The centered time-series $\mathbf{Q}_1^{(c)}, \dots, \mathbf{Q}_n^{(c)}$ can then be computed.
-
-: $$
-\mathbf{q}_{ij}^{(c)} = \mathbf{Q}_i^{(c)} (t_j) = \left( \mathbf{q}_j^{(m)} \right)^{-1} \mathbf{q}_{ij}, \hspace{5mm} j \in [\![ 1, p ]\!],\hspace{2mm}  i \in [\![ 1, n ]\!]
-$$ {#eq-centring-qts}
-
-: The other pre-processing step is to switch to a functional representation using cubic splines to be able to compute derivatives [@ramsay2005].
-
-
-Pressure mat data
-
-: The GAITRite© mat measures the feet position on the mat with its sensors while the subject walks on it. It returns directly spatio-temporal parameters such as stride duration, stride length or walking speed (see @tbl-gaitrite-params in the Appendix for a list of all parameters). It also returns the time of each event happening during a gait cycle such as the time where a foot touches or leaves the ground. These are the times we use to label our data to predict these events. Since the two devices were triggered simultaneously, the same time range is used to associate each time on the IMU time-series with a gait event recorder by the mat. We use the pressure mat as a gold standard to label the observations between the different classes and train the model on this labeled data.
-
-
-## Feature space
-
-The feature space is defined by a set of variables that captures the time-varying characteristics of hip rotation. It forms the data used to build machine learning models that will learn its pattern to detect the events associated to each time.
-
-Angular velocity and acceleration
-
-: Supposing we can compute first and second derivatives of a quaternion time-series over time, we can compute angular velocity and acceleration [@narayan2017]. The angular velocity $\pmb{\Omega}$ is a vector which has for direction the axis of rotation and for quantity the angular velocity.
-
-: $$
-\pmb{\Omega} = 2 \mathbf{q}^{-1} \dot{\mathbf{q}} \hspace{3mm} \text{with} \hspace{3mm} \dot{\mathbf{q}} = \frac{d \mathbf{q}}{dt} = \frac{1}{2} \mathbf{q} \hspace{1mm} \pmb{\Omega}
-$$ {#eq-angular-vel}
-
-: Similarly, the angular acceleration $\dot{\pmb{\Omega}}$ is the angular velocity derivative.
-
-: $$
-\dot{\pmb{\Omega}} = 2 \left( \mathbf{q}^{-1} \ddot{\mathbf{q}} - (\mathbf{q}^{-1}\dot{\mathbf{q}})^2 \right) \hspace{3mm} \text{with} \hspace{3mm} \ddot{\mathbf{q}} = \frac{d^2 \mathbf{q}}{dt^2} = \frac{1}{2} \left( \dot{\mathbf{q}} \hspace{1mm}  \pmb{\Omega} + \mathbf{q} \hspace{1mm}  \dot{\pmb{\Omega}} \right)
-$$ {#eq-angular-acc}
-
-Euler angles
-
-: The angles named Roll, Pitch and Yaw represent rotations around the three principal axis. Their computation is done using the quaternion time-series, with the following rotation matrix to go from the quaternion $\mathbf{q} = (q_w, q_x, q_y, q_z)$ to the angles.
-
-$$
-\begin{bmatrix}
-\text{Roll} \\
-\text{Pitch} \\
-\text{Yaw}
-\end{bmatrix}
-= 
-\begin{bmatrix}
-\text{atan2} \left(2(q_w q_x + q_y q_z), 1-2(q_x^2 + q_y^2)  \right) \\
-\text{asin} \left(2(q_w q_y - q_x q_z) \right) \\
-\text{atan2} \left(2(q_w q_z + q_x q_y), 1-2(q_y^2 + q_z^2)  \right)
-\end{bmatrix}
-$$ {#eq-rpy}
-
-with $\text{atan2}(y, x)$ returning the angle $\theta$ (in radians) between the positive $x$-axis and the ray from the origin to the point $(x, y)$ in the Cartesian plane.
-
-
-Walking speed
-
-: One of our hypothesis is that the gait can differ depending if the subject walks slowly of faster. Thus this variable was also added to the feature space by getting it from the GAITRite© mat output.
-
-Feature Space Hyper-parameters
-
-: The feature space depends on a number of hyper-parameters. The first one is a smoothness parameter called *spar* for the time-serie curves. Since derivation is used to compute some predictors, it is useful to smooth the curves so the values are stable and not dominated by noise. The *spar* parameter takes its values in the interval $]0,1]$ and is used to fit a cubic smoothing spline to the data. The second hyper-parameter is the *lag* parameter. To keep information from the past as we predict at a certain time, we add on the observation $t_j$ the predictor values from the times $t_{j-1}, \dots, t_{j-lag}$. The feature space length depends on the *spar* parameter as it contains $10 + 9 \times lag$ variables. Finally, a last hyper-parameter is used when we label the observations with the event times from the gold standard. It is called *k* and controls the number of points that we label as part of the event. Since it is not used in both compared strategies, it is explained in the next section while we describe the two strategies.
-
-
-# Methods
-
-## Classification strategies
-
-Gait event detection is performed by evaluating and comparing two strategies to classify the observations.
-
-Strategy E: Predicting gait Events
-
-: The strategy E is to directly predict the gait events happening during the walk. We classify the observations in five classes depending if the time corresponds to an event or not:
-
-: - *Right Heel Strike*
-- *Left Toe Off*
-- *Left Heel Strike*
-- *Right Toe Off*
-- *None* (all other times not corresponding to a certain event)
-
-: This strategy is the most direct one as we obtain the event times with the gold standard. However, this classification raises an issue of severe class imbalance with the *None* class widely over-represented. Indeed, the class that represents all the times that do not belong to an event is clearly larger than the four other ones. It is clearly represented on the @fig-event-timeserie where we can see the event times from the GAITRite© mat overlaid on the time-serie recorded by the IMU. Each colored point represents a different event and all other times belong to the *None* class.
-
-: ![Time of events represented on a unit quaternion time-serie.](images/events_on_time_serie_MSI_N_R2.png){#fig-event-timeserie width=350}
-
-: When predicting events, we choose to label a number of points $k$ as corresponding to the event around the precise event time given by the gold standard. This allows to take into account some uncertainty, as the time range between two points is only 10 ms. For instance, if we have $k=1$, we label 3 observations as part of the event rather than just one. In this case we consider that the event happens in a window of 20 ms instead of just at a precise time. This strategy also reduces somewhat the imbalance between the class proportions.
-
-\newpage
-
-Strategy P: Predicting sub-Phases
-
-: The strategy P consists of predicting sub-phases in the signal that are characterized by the gait events. We label each part in the signal between two events as a sub-phase, leading to four classes for our observations:
-
-: - *Pre-Stance* phase: between *Right Heel Strike* and *Left Toe Off*
-- *Stance* phase: between *Left Toe Off* and *Left Heel Strike*
-- *Pre-Swing* phase: between *Left Heel Strike* and *Right Toe Off*
-- *Swing* phase: between *Right Toe Off* and *Right Heel Strike*
-
-: To name these classes, we based ourselves on a typical cycle starting with the right foot, starting with a stance phase while the right foot is on the ground followed by a swing phase. This classification allows us to have four classes that are less imbalanced, without a class that does not contain events like the *None* class in the first strategy. The downside is that we loose some precision as for the event times and that we need to find the events from the predicted phases after the model predictions.
-
-: ![Example of phases represented on a unit quaternion time-serie.](images/phases_on_time_serie_MSI_N_R2.png){#fig-phases-timeserie width=350 fig-pos="H"}
-
-: We represent the sub-phases on the IMU time-serie in the @fig-phases-timeserie where each sub-phase is represented by a different color. It shows that some sub-phases are larger than others as it is clear on this example that the sub-phases *Pre-Stance* and *Pre-Swing* are shorter than the *Stance* and *Swing* ones.
-
-
-## Supervised classification models
-
-Now that we defined our feature space predictors and the different classes used to classify the data, we can build supervised classification models with them. We define in this part the machine learning models that we compare for both strategies.
-
-Multinomial Regression [@hosmer2013applied]
-
-: A multinomial regression is built by fitting a set of independent binary logistic regressions. We want to classify our observations in $K$ classes. We choose one category to be the baseline and name it $k_0$. We build $K-1$ logit equations modeling the log-odds of a class $k$ versus the class $k_0$. The logit equation for a class $k$ is:
-
-: $$
-log\left( \frac{\mathbb{P}(Y = k | X)}{\mathbb{P}(Y = k_0 | X)} \right) = \beta_{0k} + \beta_k^TX
-$${#eq-logit}
-
-: with $Y$ the class of the observation, $X$ the predictors, $\beta_{0k}$ the intercept and $\beta_{1k}, \dots, \beta_{pk}$ the regression coefficients.
-
-: The softmax function is then used to convert the $K-1$ logit scores into probabilities for each class. Finally, the observation is classified in the class with the highest probability. To fit the model, ie to find the best values for the coefficients, we maximize the log likelihood function.
-
-: When we have a large set of variables, we can regularized the estimation by introducing penalty terms in the log-likelihood function [@hastie2015statistical]. The two main types of penalization are the L1 regularization (also called a Lasso model) and the L2 regularization (also called a Ridge model). The Ridge regression prevents overfitting by adding the term $\lambda \sum_{k=1}^K \beta_k^2$ to the loss function, while the Lasso model selects features by adding the term $\lambda \sum_{k=1}^K |\beta_k|$ to the loss, giving a sparse solution with only some non-zero coordinates. The parameter $\lambda$ is called the regularization parameter.
-
-
-Decision tree [@kuhn2013applied; @breiman2017classification]
-
-: Decision trees consist of a nested sequence of if-then statements for the predictor classifying data. Observations are assigned to their class according to the variable values. 
-<!-- We can see in the @fig-decisiontree an example of a tree with two splitting, leading to three leaf nodes.  -->
-
-<!-- : ![Basic decision tree.](images/example_tree.png){#fig-decisiontree width=250} -->
-
-: The goal is to classify the observations into smaller and more homogeneous groups, forming rectangular areas within data points. Homogeneity is defined here as how pure the splitting nodes are , meaning that there is a higher proportion of one class in each node. We can use the Gini index to compute purity. In a scenario with $K$ classes, each having probability $p_1, \dots, p_K$, the Gini impurity for a certain node is defined as:
-
-: $$
-G(p) = \sum_{k=1}^K p_k \sum_{j \ne k}^K p_j = \sum_{k=1}^K p_k (1-p_k) = 1 - \sum_{k=1}^K (p_k)^2
-$${#eq-gini}
-
-: This index is minimal when one of the probabilities tends toward zero, indicating that the node is pure. To build the tree, the model builds splitting nodes by evaluating each splitting point (ie values taken by variables to split observations into groups) to find the one minimizing the Gini index, until a stopping criteria is met such as a maximum tree depth.
-
-Bagged trees, Random forest and Boosted trees [@kuhn2013applied]
-
-: A bagged trees model is an ensemble of $M$ decision trees trained in parallel. Each decision tree is built with a bootstrap sample of the original data, which is a sample of same size than the original data by selecting random observations with replacement. Then, each of these trees is used to generate a prediction for a new observation. Finally, the observation is classified in the class that has collected the greatest number of votes from all the trees.
-
-: A random forest is a similar model, also using a number of decision trees. The model also trains a number of trees in parallel on bootstrap samples of the original data. The difference is that, at each split, a random subset of variables is selected to find the best predictors within this subset. The observations are then classified as usual. The prediction is done as seen before, with each tree voting for a class and selecting the majority to classify the observation.
-
-: Finally, boosted trees train a number of decision trees sequentially to learn from the previous tree mistakes and put a higher weight on previously misclassified observations. The trees are also trained on bootstrapped samples, selecting a random subset of variables at each node. The final prediction is the weighted sum of the predictions from each tree with the weights determined by the performance of the trees.
-
-
-$k$-Nearest Neighbors [@cover1967nearest]
-
-: A $k$-Nearest Neighbors model finds the $k$ closest data points to a given input and makes prediction based on the majority class within its neighbors. This model is non-parametric as it does not make assumption about the underlying data.
-
-<!-- : ![A point classified in the majority class within its $k = 3$ neighbors.](images/example_3nn.png){#fig-knn width=300} -->
-
-To calculate the distance between the input and its neighbors, the Minkowski distance can be used. It is defined as:
-
-$$
-d(x, y) = \left[ \sum_{i=1}^n (x_i-y_i)^p \right]^{1/p}
-$${#eq-minkowski}
-
-This distance is a generalization of multiple well-know distances as we find the Euclidean distance when $p = 2$ and the Manhattan distance when $p = 1$.
-
-
-Neural networks [@varma2018]
-
-: Neural Networks are deep learning models inspired by the human brain. They consist of interconnected nodes organized in layers which process the data by passing it from one layer to the next. The model contains two essential parameters: weights on each connection and biases on each node. The input data is given to the input layer and then passes through the hidden layers. At each layer, a pre-activation is computed with the weights and biases and is transformed as an activation with non-linear functions (usually the function *ReLU*). Finally, the result goes into the output layer which uses an output function (the *softmax* function for more than two classes) to return the predicted class for each observation. It learns data pattern by adjusting the parameters during forward and backward propagation for each observation to minimize a loss that represents the difference between real and predicted values. Most often, the loss used is the cross-entropy which is defined as followed for the observation $m$:
-
-: $$
-\square(m) = - \sum_{k=1}^K t_k(m) \text{log}(y_k(m))
-$$ {#eq-crossentropy}
-
-: with:
-
-: - $t_k$ the ground truth.
-: - $y_k$ the classification probability.
-
-: These models are complex and rely on a efficient algorithm for the parameter update and an adapted activation function, as well as possible utilization of batch normalization, regularization or dropout to find a model with a good complexity for the data. Tuning is done to find the best hyper-parameters such as the number of layers and nodes or the learning rate.
-
-## Dealing with imbalanced data
-
-By classifying specific times in a time-serie, we work with imbalanced data. Especially in the strategy E when we predict events, a small proportion of the data is in the target classes, and most of the data points are in the negative class. In this situation, the model learns adequate information about the majority class but doesn't have enough information on the minority classes. This implies bad predictions on the target classes because the model misses them.
-
-A lot of sampling algorithms exist to address this issue such as random oversampling or random undersampling which add or remove observations from certain classes to create a balanced dataset. More complex methods exist, we can cite the popular SMOTE [@chawla2002] and ADASYN [@haibo2008] algorithms that create synthetic data by considering the classes of the nearest neighbors of random observations. Although these algorithms are very popular, they did not show good results for our data. We are making the hypothesis that our data is too imbalanced for these methods.
-
-To tell the model to pay more attention to the patterns in the minority classes, an effective way is to put a larger weight on these classes compared to the negative one. More precisely, the weight given to each observation specifies how much this observation influences the model's estimation In order for the model to be biased in favor of the observations considered more important, in our case those belonging to the minority classes, the weights of the observations are integrated into the cost function. This makes it possible to regulate the cost of misclassification, in the sense that misclassifying more important observations will be more costly, encouraging the model to avoid this situation [@hashemi2018].
-
-When the classes are not too imbalanced, such as in our second strategy of phase prediction, we can use the inverse class frequency weights where the weight on the class $k$ is the following:
-
-$$
-\omega_k = \frac{n}{n_{classes} \hspace{1mm} n_k}
-$$ {#eq-inv-class-frequency}
-
-with:
-
-- $n$ the total number of observations.
-- $n_{classes}$ the number of classes.
-- $n_k$ the number of observations in the class $k$.
-
-This formula gives balanced weights but we can also choose class weights manually and tune them as one of our model parameter.
-
-
-## Performance metrics
-
-To tune and evaluate the models, we need to use metrics adapted to the data. Here, depending on the strategy used, we can't use the same metrics as the labeled classes are very different.
-
-First of all, for the strategy P predicting sub-phases, we have four classes that are not severely unbalanced and all having the same importance. In this case, we can simply use the accuracy metric to tune and evaluate our models. It is the most widely used metric in machine learning and is defined as the number of correct predictions divided by the total number of predictions
-
-For the strategy E where we directly predict events, we are in a case of severe class imbalance so we can't use accuracy to correctly evaluate the model. Indeed, since the model will predict almost only the majority class which is the negative one, the predictions in this class will be good, implying that the accuracy will be high because most of the observations will be well-classified. This does not take into account the observations in the minority classes which are not predicted.
-
-We need to use other metrics that are adapted to this situation. An important tool is the confusion matrix which indicate the number of predictions in each class compared to the real classes of the observations.
-
-In a binary classification, The matrix contains the values of the True Positives ($TP$), False Positives ($FP$), True Negatives ($TN$) and False Negatives ($FN$). Two of the metrics that can be calculated from this matrix are known as Sensitivity and Specificity. The Sensitivity measures how well the model has found all occurrences of our positive class, meaning that a low sensitivity implies a lot of missing observations in this class. The Specificity measures how well the model predicts the negative class. In a class imbalance problem, the Sensitivity is essential to know if the model achieved to predict correctly our minority class represented as the positive class.
-
-In a multiclass classification, we can extend these metrics to use them with a different number of positive classes. In our case, we have four classes containing less observations and having to be correctly predicted, while the other class is the majority class. 
-
-We define a confusion matrix with multiple positive classes and one negative class (see @fig-confusion-matrix). The element $c_{i,j}$ is the number of observations from the class $i$ that has been predicted in the class $j$. Therefore, $c_{i,.}$ represents the number of observations from the class $i$, while $c_{.,j}$ represents the number of observations that the model predicted to be in the class $j$.
-
-![Confusion matrix for five classes including the *None* one represented by the letter N and four classes of interest named P1 to P4.](images/confusion-matrix.png){#fig-confusion-matrix width=300 pos="H"}
-
-We now define our extensions of Sensibility and Specificity with these notations, that can be called Macro-Average Sensibility and Macro-Average Specificity [@mortaz2020]. The formulas are generalized for a total of $K$ classes including one class not containing any relevant information, in our case the *None* class.
-
-$$
-\text{MA Sensitivity} = \frac{1}{K-1} \sum_{i=1}^{K-1} \frac{c_{i,i}}{c_{i,.}}
-$${#eq-masensitivity}
-
-$$
-\text{MA Specificity} = \frac{1}{K} \sum_{i=1}^K \frac{\sum_{j \ne k \ne i} c_{j,k} + \sum_{j \ne i} c_{j,j}}{\sum_{j \ne i} c_{.,j}}
-$${#eq-maspecificity}
-
-These metrics can be used to evaluate our model to know if the classes of interest are correctly predicted.
-
-To tune a model, we need a metric that combines the information from the previously defined MA Sensitivity and MA Specificity. The weighted Youden index [@li2013weighted] has this function, allowing the user to put a different weight on the Sensitivity and the Specificity. We generalize the weighted Youden index with the Macro-Average versions of the metrics and name it the Generalized Weighted Youden Index (GWYI):
-
-$$
-\text{GWYI} = 2(w \times \text{MA Sensitivity} + (1-w) \times \text{MA Specificity}) -1
-$${#eq-youden}
-
-with $w \in [0,1]$ representing the weight put on the MA Sensitivity.
-
-The weight parameter controls the importance of each metric. The greater it is, the more weight we put on the MA Sensitivity. In our case, we chose to put a weight of $0.7$ on the MA Sensitivity because we wanted to make sure not to forget observations in the minority classes, even if it meant predicting too many of these observations.
-
-
-## Data splitting and Tuning strategy
-
-To correctly tune and evaluate the model, data is initially split in two sets: the training set representing 80% of observations and the test set representing 20% of observations. The key is to never use the test set during training or tuning to be able to evaluate the model on data never seen before. We chose to keep entire walking sessions in each set so that the model would never see any observation of the sessions contained in the test set during training. We had 139 sessions in the training set and 35 sessions in the test set.
-
-In certain models such as neural networks, the training set is divided furthermore into a train and a validation set, containing 80% and 20% of the training set respectively. In this case, the model computes the predictions on the train set and the loss over the validation set to determine the model parameters.
-
-For tuning, the training set is used to create five folds which are random partitions of the training set to get five equal sized sub-samples. In each fold, data is then split into a train and an evaluation set, containing 90% and 10% of the data respectively (see @fig-datasplitting). This ensures that the tuning result is relevant since since its outcome is the average of the scores obtained on the five folds.
-
-![Data splitting.](images/data-splitting.png){#fig-datasplitting width=600}
-
-To tune hyper-parameters, automated tuning was used by grid search which is an exhaustive search in the hyper-parameter space. For each of our parameter, we set a grid of values to be tested. Every combination of the values is then evaluated to get the best possible combination of hyper-parameter values for a certain metric. As mentioned before, we used the accuracy for the phase prediction and the GWYI for the event prediction.
-
-Now that we defined all methods and tools necessary to build machine learning models on our data, we can present our results.
-
-# Results
-
-Various models are built and evaluated to find the best model for each strategy. Then, both strategies are compared to select the best one and have a resulting final model.
-
-## Tuning of hyper-parameters
-
-In order to choose the best model for each strategy, we tuned the hyper-parameter of each model for both strategies. Since we do not use the same metric to tune and evaluate models for each strategy, as we use the GWYI in the strategy E and accuracy in the strategy P, we cannot directly choose a strategy at this step. Therefore, we select one model per strategy and compare the two strategies later.
-
-We choose a set of values to test for each hyper-parameter of the feature space tuning (see @tbl-grid-fs). We recall that the hyper-parameters *spar* and *n_lag* are used in both strategies but the hyper-parameter *k* is used only in the strategy E to label events.
-
-::: {#tbl-grid-fs tbl-pos="H"}
-
-```{r}
-tibble::tibble(
-  param = c("spar", "n_lag", "k"),
-  description = c("Smoothing curve parameter", "Kept times in the past", "Number of labeled points"),
-  strategy = c("Both", "Both", "Strategy E"),
-  values = c("0.3, 0.4, 0.5, 0.6, 0.7", "1, 2, 3, 4, 5", "1, 2, 3")
-) |>
-  gt::gt() |> 
-  gt::cols_label(
-    param = "Parameter",
-    strategy = "Strategy",
-    description = "Description",
-    values = "Tuning Grid"
-  ) |> 
-  gt::cols_align(
-    align = "left",
-    columns = values
-  ) |>
-  gt::tab_style(
-    style = list(gt::cell_text(style = "italic")),
-    locations = gt::cells_body(columns = param)
-  ) |> 
-  gt::tab_options(column_labels.background.color = "#616161")
-```
-
-Feature space parameters tuning grid values.
-
-:::
-
-For the strategy E, we also tune the weight classes used to address the class imbalance issue. For this matter, we fix the weight of the class *None* with the value 1, and we test different weights for the classes of interest (see @tbl-grid-class-weights). For the strategy P, as the class imbalance is less present, we only use the weights from the inverse class frequency method defined in @eq-inv-class-frequency.
-
-::: {#tbl-grid-class-weights tbl-pos="H"}
-
-```{r}
-is_empty <- function(x) {
-  return(x == "")
-}
-
-options(gt.html_tag_check = FALSE)
-tibble::tibble(
-  param = c("Weight on class *None*", "Weight on event classes"),
-  values = c("1", "40, 60, 80")
-) |> gt::gt() |> 
-  gt::cols_label(
-    param = "Parameter",
-    values = "Tuning Grid"
-  ) |> gt::tab_options(column_labels.background.color = "#616161") |> 
-  gt::fmt_markdown(columns = param)
-```
-
-Strategy E: Class weight tuning grid values.
-
-:::
-
-Finally, we build tuning grid values for every model hyper-parameters which are common to both strategies (see @tbl-grid-model).
-
-::: {#tbl-grid-model tbl-pos="H"}
-
-```{r}
-tibble::tibble(model = c(
-  "Multinomial Regression", "Multinomial Regression",  
-  "Decision Tree", "Decision Tree",
-  "Bagged Trees", "Bagged Trees",
-  "Random Forest", "Random Forest",
-  "Boosted Trees", "Boosted Trees", "Boosted Trees", "Boosted Trees",
-  "KNN", "KNN",
-  "Neural Network", "Neural Network", "Neural Network"
-),
-param = c(
-  "`penalty`", "`mixture`",
-  "`tree_depth`", "`min_n`",
-  "`tree_depth`", "`min_n`",
-  "`trees`", "`min_n`",
-  "`trees`", "`min_n`", "`tree_depth`", "`learn_rate`",
-  "`neighbors`", "`dist_power`",
-  "`hidden_units`", "`learn_rate`", "`dropout`"
-),
-description = c(
-  "Regularization parameter", "Regularization type (Lasso, Ridge, Elastic Net)",
-  "Tree depth", "Minimal number of observations before splitting",
-  "Tree depth", "Minimal number of observations before splitting",
-  "Number of trees", "Minimal number of observations before splitting",
-  "Number of trees", "Minimal number of observations before splitting", "Tree depth", "Learning rate",
-  "Number of nearest neighbors", "Minkowski distance order",
-  "Hidden layers and units", "Learning rate", "Rate of randomly deactivated nodes"
-),
-values = c(
-  "0.01, 0.001, 0.0001", "0, 0.25, 0.5, 0.75, 1",
-  "1:15", "2:40",
-  "1:15", "2:40",
-  "100, 300, 500, 700, 900", "2:40",
-  "400, 600, 800, 1000", "20, 40", "5, 8, 12", "0.01",
-  "3, 5, 10, 20", "1, 2",
-  "hidden layers: 2:5; nodes: 100, 500", "0.01, 0.001", "0.1, 0.3"
-)) |> 
-  dplyr::group_by(model) |> 
-  dplyr::mutate(model = ifelse(dplyr::row_number() == 1, model, "")) |> 
-  dplyr::ungroup() |> 
-  gt::gt() |>
-  gt::tab_style(
-    style = list(
-      gt::cell_borders(
-        sides = c("top", "bottom"),
-        weight = gt::px(0)
-      )),
-    locations = list(
-      gt::cells_body(
-        columns = model,
-        rows = is_empty(model)
-      )
-    )
-  ) |> gt::cols_label(
-    model = "Model",
-    param = "Parameter",
-    description = "Description",
-    values = "Tuning Grid"
-  ) |> 
-  gt::cols_width(
-    model ~ "25%",
-    param ~ "17%", 
-    description ~ "38%",
-    values ~ "25%"
-  ) |>  
-  gt::tab_options(column_labels.background.color = "#616161") |> 
-  gt::fmt_markdown(columns = param)
-```
-
-Model hyper-parameter tuning grid values.
-
-:::
-
-For each strategy and each model, we get the optimal parameters thanks to tuning (see @tbl-tuning-res-events and @tbl-tuning-res-phases in the Appendix for the detail of chosen parameters for each model). We then evaluate each model on the test set which was never seen before to find the best model for each strategy (see @tbl-res).
-
-::: {#tbl-res tbl-pos="H"}
-
-```{r}
-data.frame(
-  model = c("Multinomial Regression", "Decision Tree", "Bagged Trees", "Random Forest", "Boosted Trees", "KNN", "Neural Network"),
-  youden = c("67.6%", "60.0%", "70.3%", "70.5%", "83.0%", "24.0%", "88.7%"),
-  accuracy = c("82.7%", "73.9%", "74.9%", "89.4%", "89.7%", "88.0%", "89.4%")
-) |> 
-  gt::gt() |> 
-  gt::cols_label(
-    model = "Model",
-    youden = "Strategy E: GWYI",
-    accuracy = "Strategy P: accuracy"
-  ) |> 
-  gt::tab_options(column_labels.background.color = "#616161") |> 
-  gt::tab_style(
-    style = gt::cell_fill(color = 'lightgreen'),
-    locations = gt::cells_body(columns = c(youden), rows = 7)
-  ) |>
-  gt::tab_style(
-    style = gt::cell_fill(color = 'lightgreen'),
-    locations = gt::cells_body(columns = c(accuracy), rows = 5)
-  ) |> gt::cols_width(
-    everything() ~ px(200)
-  )
-```
-
-Performance metrics on the test set for both strategies.
-
-:::
-
-We keep one model per strategy to maximize the performance metric evaluated on the test set. For the strategy E, we select the Neural Network model to maximize the GWYI. For the strategy P, we select the Boosted Trees model to maximize accuracy. We can note that for the strategy P, three models (the random forest, the boosted trees and the neural network) had close results between 89.4% and 89.7%. However, since all models had really quick computing times to make predictions on the test set, this was not a criteria to choose one model over another one, so we could simply choose the model giving the highest score.
-
-Now that we selected a model for each strategy, the next step is to compare the two strategies to choose the best one, leading to a final model.
-
-## Selecting a strategy
-
-To choose a final model, we need to determine the best strategy. In order to do so, we compare the predictions made on the test set by the two previously selected models with the real events given by the gold standard to see which model gives closer results. We still use the test set to evaluate the strategies.
-
-First, we shorten the time-series to retain the times where we want to detect points by keeping only the time range where the gold standard detected the four events perfectly. Indeed, at the start and the end of the session, this tool can miss contact points because the subject is not walking on the mat sensors. At the start of the session, we search for the first *Heel Strike* event which is followed by the three other events in the correct order. At the end of the session, we remove all points after the first missing event.
-
-In both strategies, the model does not detect only one point corresponding to the event but a window of points containing the event (see @fig-preds-both-strategies). This means that we need to build a procedure to get precise estimated event times from the model predictions. In the strategy P, when predicting sub-phases, the event corresponding to the sub-phase is theoretically located at the start of the sub-phase but we see that it is not always the case with our model's predictions. Hence we use the same process for both strategies to choose one point either in the windows detected with the strategy E or in the sub-phases predicted with the strategy P.
-
-
-::: {#fig-preds-both-strategies layout="[45, -10, 45]" fig-pos="H"}
-
-![Strategy E: Predictions made by the neural network.](images/preds_events_and_real_events.png){#fig-preds-events width=270}
-
-![Strategy P: Predictions made by the boosted trees.](images/preds_phases_and_real_events.png){#fig-preds-phases width=270}
-
-Predictions from both strategies. The bigger and darker points represent the real time of events given by the GAITRite© gold standard.
-:::
-
-In each window or phase detected, the following process is applied to keep one point representing the event. We first need to identify each of them correctly, using the two following parameter:
-
-- A threshold to firstly select only the observations with a high probability of being in the class: $p > 0.4$.
-- A time to know when to distinguish two predicted windows or phases: if the first point of the window $i$ is at more than $t = 40 ms$ from the last point of the window $i-1$ we consider the windows distinct. It enables us to define unique windows with points that are not necessarily consecutive when only one event should be detected in this time range.
-
-Then, in each of these clearly separated windows or phases, we keep the point having the highest probability of being in this class according to the model. This point is the time of the estimated event. The last step is to compare these estimated events with the ones given by the gold standard to see which of the strategy gives better result and select a final strategy.
-
-The first point of comparison is the number of each event detected during the walking sessions. For both strategies, the model predicts the exact number of events for every classes in each session. It means that both models seem to perform well but it does not allow us to tell them apart. The second item that can be compared is the time at which the events are happening. We plot the mean time difference between real and predicted events for each session in the test set (see @fig-boxplot-comparison).
-
-![Mean time difference between real and predicted events for both strategies.](images/boxplot_both_strategies.png){#fig-boxplot-comparison width=600}
-
-In the strategy P, the *Heel-Strike* events seem to be well estimated but it not the case for the *Toe-Off* events that are predicted too late, about 150 ms after the real time of the event. This could mean that for these events, it would give better results to take one of the first points of the detected sub-phase instead of the point with the model highest probability. Since we do not have theoretical explanation of this phenomena, we can not justify the use of different procedure for either *Heel Strike* or *Toe-Off* events. On the other hand, in the strategy E, all of the event times are estimated near the real ones with no distinction between classes. For each type of event, the estimated time median is at approximately 50 ms before the real event, with a few outliers. Overall, the results given by the event detection are closer to the times given by the gold standard so we retain the strategy E. 
-
-\newpage
-
-## Detailed performance of the selected model
-
-The final selected model is the neural network from the strategy E predicting directly the event times. We give a more in-depth evaluation of the final model in this part. We evaluate this model on the test set for the three metrics adapted to this strategy: the Macro-Average Sensitivity and Specificity and the GWYI (see @tbl-final-eval). We also perform a deeper comparison of the estimated event times by the model with the real ones given by the gold standard.
-
-::: {#tbl-final-eval tbl-pos="H"}
-
-```{r}
-tibble::tibble(
-  youden = c("88.7%"),
-  sen = c("93.4%"),
-  spe = c("96.4%")
-) |>
-  gt::gt() |> 
-  gt::cols_label(
-    youden = "Generalized Weighted Youden Index",
-    sen = "MA Sensitivity",
-    spe = "MA Specificity"
-  ) |> 
-  gt::tab_options(column_labels.background.color = "#616161")
-```
-
-Performance metrics of the selected Neural Network on the test set.
-
-:::
-
-For a complete comparison of our model with the gold standard, we use spatio-temporal parameters that can be computed from the four events happening during a walking cycle: the number of cycles in a session, the cycle duration and the percentage of stance phase. We compute the mean value for these parameters on each session from the test set. We differentiate the results for each foot by computing the gait cycles starting with the right and the left foot, thus we have 70 observations for this comparison (1 per session and foot in the test set). 
-
-For the number of cycles in each session, we find the exact same number of cycles for our model and the gold standard, which is not surprising since we previously shown that the model detected the correct number of each event in every sessions. For the mean cycle duration and the mean percentage of stance phase, we use Bland-Altman plots [@bland_altman] to compare the the results (see @fig-bland-altman).
-
-::: {#fig-bland-altman layout="[48, -4, 48]" fig-pos="H"}
-
-![Comparison of the mean cycle duration (seconds).](images/ba_cycle_length.png){#fig-ba-cycle-length width=270}
-
-![Comparison of the mean stance percentage (%).](images/ba_stance_percent.png){#fig-ba-stance-percent width=270}
-
-Bland-Altman plots to compare our model with the gold standard on spatio-temporal parameters.
-:::
-
-For both parameters, the bias represented by the dashed line in the blue interval is really close to zero, meaning that our model gives similar results than the gold standard. More precisely, for the mean cycle duration, the differences between the two methods are between minus 26 ms and 21 ms, represented by the dashed lines in the red and green intervals. Since a gait cycle typically lasts one second, this difference interval seems low enough to consider our method to be comparable to the gold standard. For the mean stance percentage, the differences are between minus 8.50% and 6.5% and we know that a stance phase generally lasts about 60% of the gait cycle. Once again our model gives coherent results that are close to the parameters given by the reference method.
-
-\newpage
-
-# Discussion and conclusions
-
-In this study, we used rotation data in the shape of unit quaternion time-series to analyze the hip rotation of subjects walking in a straight line. We used the GAITRite© sensor mat as the reference method to label data acquired with a wearable sensor placed at the right hip to measure rotations over time. The goal was to detect key gait events (*Heel Strike* and *Toe Off* events) in the signal to compute from them spatio-temporal parameters giving similar results as the gold standard. 
-
-We compared two strategies to detect events during gait cycles: predicting directly the gait events or predicting sub-phases in between these events. We selected a model for both strategies, a neural network for the first one and a boosted trees model for the second one, by tuning and evaluating various classification models on different metrics: the Generalized Weighted Youden Index for event detection and accuracy for sub-phases detection. By comparing the two models with the reference events given by the gold standard GAITRite©, we chose to keep the strategy detecting directly the events as it shown more coherent results. The selected model is a neural network containing four hidden layers each having 500 nodes, with higher class weights on the minority classes of interest to address the issue of class imbalance. This model performs well on the test set as we get similar results than the ones provided by the reference method, as for the number of gait cycles, their duration, and the percentage of stance phase in each gait cycles.
-
-Since this model was trained on healthy subjects, we expect less performance on data from patients with gait disorders as their gait would be less regular. Indeed, the model was tested on a data set containing gait data from 44 multiple sclerosis patients, each walking approximately seven meters while wearing the IMU. As we don't have reference points for this data set, the evaluation of the model's predictions was visual. For each time-serie, we plotted the predictions and rated them in three classes: perfect performance (the model detects all four events in the correct order for all the sessions), moderate performance (the model misses a few events but correctly segments the signal into cycles) and poor performance (the model misses most events). Within the 44 patients, we counted 20 perfect performances, 19 moderate performances and 5 poor performances, hence the model works perfectly for 45% of the data set and works poorly on only 11% of the data set. We can note that some of the recorded signals seem to have anomalies such as abnormal spikes, aside from the fact that unhealthy gaits are not necessarily regulars. Therefore, the results are promising.
-
-To improve our model so it better generalizes on various subjects especially with gait disorders, we could acquire data on more subjects in various age groups. We could also use transfer learning to go from this model trained on healthy subject to a model specialized for subjects having gait disorders. Finally, we could perform rotation correction or use rotation-invariant features to counter the issue of the referential being different between subjects. 
-
-\newpage
+{{< include _05_Discussion.qmd >}}
 
 # References {.unnumbered}
 
 ::: {#refs}
 :::
 
-
-# Appendix {.unnumbered}
-
-::: {#tbl-gaitrite-params tbl-pos="H"}
-
-```{r}
-#| layout="[45, -10, 45]"
-
-tibble::tibble(
-  param = c("Total distance (cm)",
-            "Total duration (s)",
-            "Speed (cm/s)",
-            "Normalized speed",
-            "Number of steps",
-            "Walking pace (step/m)",
-            "Difference of step duration (s)",
-            "Difference of step length (cm)",
-            "Difference of cycle duration (s)"
-            ),
-) |> gt::gt() |> 
-  gt::cols_label(
-    param = "Global Parameters",
-  ) |> gt::tab_options(column_labels.background.color = "#616161")
-
-tibble::tibble(
-  param = c("Step duration (s)",
-            "Cycle duration (s)",
-            "Step length (cm)",
-            "Cycle length (cm)",
-            "Step width (cm)",
-            "Simple support phase (%)",
-            "Double support phase (%)",
-            "Stance phase (%)",
-            "Swing phase (%)",
-            "Normalized step",
-            "Foot rotation (°)"
-            ),
-) |> gt::gt() |> 
-  gt::cols_label(
-    param = "Bilateral Parameters",
-  ) |> gt::tab_options(column_labels.background.color = "#616161")
-```
-
-Spatio-temporal parameters returned by the GAITRite© pressure mat.
-
-:::
-
-
-::: {#tbl-tuning-res-events tbl-pos="H"}
-
-```{r}
-tibble::tibble(model = c(
-  rep("Multinomial Regression", 6),
-  rep("Decision Tree", 6),
-  rep("Bagged Trees", 6),
-  rep("Random Forest", 6),
-  rep("Boosted Trees", 8),
-  rep("KNN", 5),
-  rep("Neural Network", 7)
-),
-param = c(
-  "spar", "n_lag", "k", "weight on positive classes", "`penalty`", "`mixture`",
-  "spar", "n_lag", "k", "weight on positive classes", "`tree_depth`", "`min_n`",
-  "spar", "n_lag", "k", "weight on positive classes", "`tree_depth`", "`min_n`",
-  "spar", "n_lag", "k", "weight on positive classes", "`trees`", "`min_n`",
-  "spar", "n_lag", "k", "weight on positive classes", "`trees`", "`min_n`", "`tree_depth`", "`learn_rate`",
-  "spar", "n_lag", "k", "`neighbors`", "`dist_power`",
-  "spar", "n_lag", "k", "weight on positive classes", "`hidden_units`", "`learn_rate`", "`dropout`"
-),
-values = c(
-  "0.4", "5", "1", "80", "0.0001", "1",
-  "0.3", "5", "1", "80", "5", "7",
-  "0.4", "4", "1", "60", "6", "32",
-  "0.6", "4", "2","80", "900", "40",
-  "0.4", "4", "2", "80", "400", "40", "5", "0.01",
-  "0.4", "5", "5", "5", "2",
-  "0.4", "5", "1", "80", "rep(500, 4)", "0.01", "0.1"
-)) |> 
-  dplyr::group_by(model) |> 
-  dplyr::mutate(model = ifelse(dplyr::row_number() == 1, model, "")) |> 
-  dplyr::ungroup() |> 
-  gt::gt() |>
-  gt::tab_style(
-    style = list(
-      gt::cell_borders(
-        sides = c("top", "bottom"),
-        weight = gt::px(0)
-      )),
-    locations = list(
-      gt::cells_body(
-        columns = model,
-        rows = is_empty(model)
-      )
-    )
-  ) |>
-  gt::tab_style(
-    style = list(gt::cell_text(style = "italic")),
-    locations = gt::cells_body(columns = param, rows = param %in% c("spar", "n_lag", "k"))
-  ) |> 
-  gt::cols_label(
-    model = "Model",
-    param = "Parameter",
-    values = "Optimal value"
-  ) |> gt::tab_options(column_labels.background.color = "#616161") |> 
-  gt::fmt_markdown(columns = param)
-```
-
-Optimal parameters for event detection for each tuned model.
-
-:::
-
-::: {#tbl-tuning-res-phases tbl-pos="H"}
-
-```{r}
-tibble::tibble(model = c(
-  rep("Multinomial Regression", 4),
-  rep("Decision Tree", 4),
-  rep("Bagged Trees", 4),
-  rep("Random Forest", 4),
-  rep("Boosted Trees", 6),
-  rep("KNN", 4),
-  rep("Neural Network", 5)
-),
-param = c(
-  "spar", "n_lag", "`penalty`", "`mixture`",
-  "spar", "n_lag", "`tree_depth`", "`min_n`",
-  "spar", "n_lag", "`tree_depth`", "`min_n`",
-  "spar", "n_lag", "`trees`", "`min_n`",
-  "spar", "n_lag", "`trees`", "`min_n`", "`tree_depth`", "`learn_rate`",
-  "spar", "n_lag", "`neighbors`", "`dist_power`",
-  "spar", "n_lag", "`hidden_units`", "`learn_rate`", "`dropout`"
-),
-values = c(
-  "0.4", "5", "0.0001", "1",
-  "0.6", "4", "6", "13",
-  "0.4", "4", "8", "40",
-  "0.7", "5", "900", "2",
-  "0.7", "5", "900", "3", "12", "0.1",
-  "0.7", "5", "10", "1",
-  "0.4", "4", "rep(500, 3)", "0.1", "0.1"
-)) |> 
-  dplyr::group_by(model) |> 
-  dplyr::mutate(model = ifelse(dplyr::row_number() == 1, model, "")) |> 
-  dplyr::ungroup() |> 
-  gt::gt() |>
-  gt::tab_style(
-    style = list(
-      gt::cell_borders(
-        sides = c("top", "bottom"),
-        weight = gt::px(0)
-      )),
-    locations = list(
-      gt::cells_body(
-        columns = model,
-        rows = is_empty(model)
-      )
-    )
-  ) |>
-  gt::tab_style(
-    style = list(gt::cell_text(style = "italic")),
-    locations = gt::cells_body(columns = param, rows = param %in% c("spar", "n_lag"))
-  ) |> gt::cols_label(
-    model = "Model",
-    param = "Parameter",
-    values = "Optimal value"
-  ) |> gt::tab_options(column_labels.background.color = "#616161") |> 
-  gt::fmt_markdown(columns = param)
-```
-
-Optimal parameters for phase detection for each tuned model.
-
-:::
+{{< include _Appendix.qmd >}}
\ No newline at end of file