Reproducible Package for "A Study on Training Set Size and Model Performance in Smartphone- and Smartwatch-Based Human Activity Recognition"

This repository is the reproducibility package for the “A Study on Training Set Size and Model Performance in Smartphone- and Smartwatch-Based Human Activity Recognition" journal paper, authored by Miguel Matey-Sanz , Joaquín Torres-Sospedra , Sven Casteleyn and Carlos Granell .

Matey-Sanz, M., Torres-Sospedra, J., Casteleyn, S. & Granell, C. "A Study on Training Set Size and Model Performance in Smartphone- and Smartwatch-Based Human Activity Recognition".

01_DATA: contains the source (dataset) and intermediate (raw results of scripts) data used for obtaining the results presented in the paper.
02_RESULTS: contains the final results presented in the paper, generated from analysing the raw results obtained from executing the experiments.
lib: Python library contanining all the code employed to execute the experiments (lib/pipeline/) and analyses (lib/analysis/) presented in the paper.
*.ipynb files: Jupyter notebooks containing the analyses whose results are presented in the paper.
requirements.txt: Python libraries employed to execute experiments and analyses. All these experiments and analyses have been executed using Python 3.9.
Dockerfile: file to build a Docker image with a computational environment to reproduce the experiments and analyses.

Reproducibility

This repository contains all the required data (except the dataset, which can be downloaded from its source), code and scripts to reproduce the experiments and results presented in the paper.

Reproducibility setup

Several options to setup a computational environment to reproduce the analyses are offered: online and locally.

Reproduce online with Binder

Binder allows to create custom computing environments in the cloud so it can be shared to many remote users. To open the Binder computing environment, click on the "Binder" badge above.

Note

Building the computing enviroment in Binder can be slow.

Reproduce locally

Install Python 3.9, download or clone the repository, open a command line in the root of the directory and install the required software executing the following command:

pip install -r requirements.txt

Tip

The usage of a virtual enviroment such as the ones provided by Conda or venv are recommended.

Reproduce locally with Docker

Install Docker for building an image based on the provided .docker/Dockerfile with a Jupyter environment and running a container based on the image.

Download the repository, open a command line in the root of the directory and:

Build the image:

docker build . --tag har-performance-study

Run the image:

docker run -it -p 8888:8888 har-performance-study

Click on the login link (or copy and paste in the browser) shown in the console to access to a Jupyter environment.

Reproduce the analyses

The Python scripts employed to execute the experiments described in the paper are located in lib/pipeline/[n]_*.py, where n determines the order in which the scripts must be executed. The reproduction of these scripts is not needed since their outputs are already stored in the 01_DATA/02_GRID-SEARCH/ and 01_DATA/03_MODEL-REPORTS/ directories.

Note

When executing a script with a component of randomness (i.e., ML models), the obtained results might change compared with the reported ones.

Caution

It is not recommended to execute these scripts, since they can run for hours, days or weeks depending on the computer's hardware.

To reproduce the outcomes presented in the paper, open the desired Jupyter Notebook (*.ipynb) file and execute its cells to generate reported results from the data generated in the experiments (lib/pipeline/[n]_*.py scripts). More concretely, the Jupyter Nobebooks are the following:

0_grid-search.ipynb: contains the results of the Grid Search hyperparameters optimization process, i.e., results generated by executing lib/pipeline/02_hyperparameter-optimization.py. These results are reported in the paper's Table II (Section III-C).
1_training-data.ipynb: shows the accuracy evolution over the addition of training data in the selected models. It analyses the data generated by the lib/pipeline/03_incremental-loso.py script. These results are reported in paper's Figure 2 and 3 (Section IV-A).
2_data-sources.ipynb: shows the difference in performance regarding employed datasource for each selected model and amount of training data, i.e., which data source provides better results. It analyses the data generated by the lib/pipeline/03_incremental-loso.py script. These results are reported in paper's Figure 4 (Section IV-B).
3_models.ipynb: shows the difference in performance regarding employed model type for each data source and amount of training data, i.e., which model architecture provides better results. It analyses the data generated by the lib/pipeline/03_incremental-loso.py script. These results are reported in paper's Figure 5 (Section IV-C).

License

All the code contained in the .ipynb notebooks and the lib folder is licensed under the Apache License 2.0.

The remaining documents included in this repository are licensed under the Creative Commons Attribution-ShareAlike (CC BY-SA 4.0).

Acknowledgements

This work has been funded by the Spanish Ministry of Universities (grant FPU19/05352), by the Spanish Ministry of Science and Innovation (MCIN/AEI/10.13039/501100011033) and ``ERDF/EU'' (grants PID2020-120250RB-I00, PID2022-1404475OB-C21 and PID2022-140475OB-C22), and partially funded by the Department of Innovation, Universities, Science, and Digital Society of the Valencian Government, Spain (grant number CIAICO/2022/111.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

Reproducible Package for "A Study on Training Set Size and Model Performance in Smartphone- and Smartwatch-Based Human Activity Recognition"

Contents

Reproducibility

Reproducibility setup

Reproduce online with Binder

Reproduce locally

Reproduce locally with Docker

Reproduce the analyses

License

Acknowledgements

About

Uh oh!

Releases

Packages

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 2 Commits
01_DATA		01_DATA
02_RESULTS		02_RESULTS
lib		lib
0_grid-search.ipynb		0_grid-search.ipynb
1_training-data.ipynb		1_training-data.ipynb
2_data-sources.ipynb		2_data-sources.ipynb
3_models.ipynb		3_models.ipynb
Dockerfile		Dockerfile
LICENSE		LICENSE
README.md		README.md
requirements.txt		requirements.txt

License

GeoTecINIT/har-performance-study

Folders and files

Latest commit

History

Repository files navigation

Reproducible Package for "A Study on Training Set Size and Model Performance in Smartphone- and Smartwatch-Based Human Activity Recognition"

Contents

Reproducibility

Reproducibility setup

Reproduce online with Binder

Reproduce locally

Reproduce locally with Docker

Reproduce the analyses

License

Acknowledgements

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages