|
| 1 | +# [Can Population-based Engagement Improve Personalisation? A Novel Dataset and Experiment](https://doi.org/10.5281/zenodo.6853185) |
| 2 | + |
| 3 | + |
| 4 | + |
| 5 | +This is a project constructor for the paper [*Can Population-based Engagement Improve Personalisation? A Novel Dataset and Experiment*](https://doi.org/10.5281/zenodo.6853185) by Sahan Bulathwela, Meghana Verma, [María Pérez-Ortiz](https://orcid.org/0000-0003-1302-6093), [Emine Yilmaz](https://orcid.org/0000-0003-4734-4532), [John Shawe-Taylor](https://orcid.org/0000-0002-2030-0073). |
| 6 | + |
| 7 | +### Associated Metadata |
| 8 | + |
| 9 | +#### Tested Systems |
| 10 | + |
| 11 | + |
| 12 | + |
| 13 | + |
| 14 | +#### Languages |
| 15 | + |
| 16 | + |
| 17 | + |
| 18 | +#### Resources |
| 19 | + |
| 20 | +* [Can Population-based Engagement Improve Personalisation? A Novel Dataset and Experiment](https://doi.org/10.5281/zenodo.6853185) (Public) |
| 21 | + * Contains paper under [CC-BY-4.0](https://creativecommons.org/licenses/by/4.0/) |
| 22 | +* [GitHub](https://github.com/sahanbull/VLE-Dataset) (Public) |
| 23 | + * Contains data under ARR |
| 24 | + * Contains materials under ARR |
| 25 | + |
| 26 | +## Project Files |
| 27 | + |
| 28 | +The constructor downloads the following files: |
| 29 | +* [Cloned GitHub](https://github.com/ahaim5357/VLE-Dataset) under ARR |
| 30 | + |
| 31 | +## Setup Instructions |
| 32 | + |
| 33 | +### Method 1: Docker |
| 34 | + |
| 35 | +This project contains the necessary files needed to setup a [docker container][docker]. Make sure you have Docker installed before attempting anything below. |
| 36 | + |
| 37 | +To build the docker container, navigate to this directory and run the following command: |
| 38 | + |
| 39 | +```sh |
| 40 | +docker build -t <image_name> . |
| 41 | +``` |
| 42 | + |
| 43 | +`image_name` should be replaced with whatever name you would like to refer to the docker container as. It will take around 30 minutes to an hour to build the image. |
| 44 | + |
| 45 | +From there, you can load into the terminal via: |
| 46 | + |
| 47 | +```sh |
| 48 | +docker run --rm -itv <local_directory>:/volume <image_name> sh |
| 49 | +``` |
| 50 | + |
| 51 | +A `volume` directory will be created within the image which will link to the `local_directory` specified. You can specify the current directory of execution via `${PWD}`. |
| 52 | + |
| 53 | +> We are loading into the terminal instead of into Python to copy any generated figures onto the local machine as they cannot otherwise be easily viewed. |
| 54 | +
|
| 55 | +Once in the docker terminal, you can run the Python script via: |
| 56 | + |
| 57 | +```sh |
| 58 | +python3 ./helper_code/models/regression/train_rf_regression_full_cv.py --training-data-filepath VLE_datasets/v1/VLE_12k_dataset_v1.csv --output-dir ./results |
| 59 | +``` |
| 60 | + |
| 61 | +You can look through the terminal output and compare the numbers within the paper. To view the figures on the local machine, you can copy them to the volume via: |
| 62 | + |
| 63 | +```sh |
| 64 | +cp -R ./results /volume |
| 65 | +``` |
| 66 | + |
| 67 | +## Method 2: Local Setup |
| 68 | + |
| 69 | +This project uses the Python package `jammies[all]` to setup and fix any issues in the codebase. For instructions on how to download and generate the project from this directory, see the [`jammies`][jammies] repository. |
| 70 | + |
| 71 | +You will also need a version of [Java][java] to run Spark, as consumed by the codebase. Any version of Java 8+ will work, though this setup guide recommends using the latest LTS, which is 17 as of the writing of this guide. |
| 72 | + |
| 73 | +Spark also takes advantage of [Apache Hadoop][hadoop], but this is not necessary to run the codebase, nor does it affect the outcomes, so it will not be used in this guide. |
| 74 | + |
| 75 | +The following instructions have been reproduced using [Python][python] 3.11.4. This project does not make any guarantees that this will work outside of the specified version. Make sure you have Python, along with gcc for Cython, before attempting anything below. |
| 76 | + |
| 77 | +First, you will need to navigate to the generated `src` directory. You will need to install the required dependencies into the global Python instance or a virtual environment via: |
| 78 | + |
| 79 | +```sh |
| 80 | +python3 -m pip install . |
| 81 | +``` |
| 82 | + |
| 83 | +> `python3` is replaced with `py` on Windows machines. Additionally, the `python3 -m` prefix is unnecessary if `pip` is properly added to the path. |
| 84 | +
|
| 85 | +After installing the required dependencies, run the Python script via: |
| 86 | + |
| 87 | +```sh |
| 88 | +python3 ./helper_code/models/regression/train_rf_regression_full_cv.py --training-data-filepath VLE_datasets/v1/VLE_12k_dataset_v1.csv --output-dir ./results |
| 89 | +``` |
| 90 | + |
| 91 | +You can look through the `results` directory and compare the numbers within the paper. |
| 92 | + |
| 93 | +[docker]: https://www.docker.com/ |
| 94 | +[jammies]: https://github.com/ahaim5357/jammies |
| 95 | +[java]: https://adoptium.net/temurin/releases/?version=17 |
| 96 | +[hadoop]: http://apache.github.io/hadoop/ |
| 97 | +[python]: https://www.python.org/ |
| 98 | + |
| 99 | +## Issues |
| 100 | + |
| 101 | +None of the results generated match anything reported in the papers. The `results.csv` generated reports the RMSE, but not for the 12k results, so while the code may work, no direct correlation can be interpreted from the results in the paper. |
| 102 | + |
| 103 | +As such, no consistent results are reported in the paper. |
| 104 | + |
| 105 | +*[ARR]: All Rights Reserved |
| 106 | +*[Cloned GitHub]: Cloned GitHub Repository |
| 107 | +*[GitHub]: GitHub Repository |
| 108 | +*[CC-BY-4.0]: Creative Commons Attribution 4.0 International |
0 commit comments