Welcome to this RAMP challenge using skore and skrub, two recent libraries developed by the scikit-learn team at Probabl!
Links:
- skore: GitHub repository, documentation, demo videos
- skrub: GitHub repository, documentation
- First of all, fork and clone this GitHub repository.
To run a submission and the notebook, you will need the dependencies listed
in requirements.txt.
For that, it is recommended to create a new environment for this project and to install those dependencies inside this new environment.
-
You can create a new conda environement named
ramp-bike-skoreusing:conda create --name ramp-bike-skore python=3.12
-
Then, use this new environment install the dependencies in it using pip:
conda activate ramp-bike-skore pip install -r requirements.txtAlternatively to pip, you can also install the environment with conda via the
environment.ymlfile:conda env create -f environment.yml
Later on, when you work on your project, you need to use the ramp-bike-skore
environment in any terminal session. This is once again done with:
conda activate ramp-bike-skore
Download the data files with the download_data.py script. It will untar the
archive and put the train and test files in a data folder:
conda activate ramp-bike-skore
python download_data.py
The challenge in hosted on RAMP: your feed a python script containing your model, so that your model is ran on the hidden test data to obtain the predictions, and the leaderboard displays your resulting scores.
- Create an account on https://ramp.studio using a valid email, then click on the received link.
- Subscribe to the RAMP challenge about skore and skrub.
- If not already done: clone this repository, install the
ramp-bike-skoreenvironment, and download the data.
Get started on this RAMP with the dedicated notebook.
First install Jupyter:
pip install jupyterthen launch the notebook using:
jupyter notebook ./bike_counters_starting_kit.ipynbYour submissions need to be located in the submissions folder. For instance
for my_submission, it should be located in submissions/my_submission.
To run a specific submission, you can use the ramp-test command line:
ramp-test --submission my_submissionFor instance, you can run the provided starting_kit submission example with:
ramp-test --submission starting_kitYou should get an output similar to the following one:
Example output
Testing Bike count prediction
Reading train and test files from ./data/ ...
Reading cv ...
Training submissions/starting_kit ...
CV fold 0
score rmse time
train 0.610 0.084952
valid 0.983 0.408040
test 0.703 0.033141
CV fold 1
score rmse time
train 0.663 0.106090
valid 0.852 0.399937
test 0.759 0.032243
CV fold 2
score rmse time
train 0.682 0.170388
valid 0.891 0.324898
test 0.771 0.025760
CV fold 3
score rmse time
train 0.705 0.208704
valid 0.844 0.324345
test 0.875 0.024143
CV fold 4
score rmse time
train 0.728 0.233596
valid 0.804 0.319224
test 0.872 0.024262
CV fold 5
score rmse time
train 0.737 0.280230
valid 0.939 0.320182
test 0.863 0.024391
CV fold 6
score rmse time
train 0.763 0.327653
valid 1.131 0.316819
test 0.843 0.025528
CV fold 7
score rmse time
train 0.793 0.376762
valid 0.896 0.324821
test 0.767 0.024473
----------------------------
Mean CV scores
----------------------------
score rmse time
train 0.71 +- 0.0546 0.2 +- 0.1
valid 0.917 +- 0.0962 0.3 +- 0.04
test 0.807 +- 0.0607 0.0 +- 0.0
----------------------------
Bagged scores
----------------------------
score rmse
valid 0.923
test 0.765
You can get more information regarding this command line:
ramp-test --helpOn the UI of the challenge on RAMP studio, you can go to the sandbox, then:
- either upload your python script containing your model,
- or copy-paste your Python code directly in the UI.
You can find more information regarding ramp-workflow in the
dedicated documentation.
You can find the description of the columns present in the external_data.csv
in parameter-description-weather-external-data.pdf. For more information about this
dataset see the Meteo France
website
(in French).