This repository contains the necessary files for the exogenous variables feature selection on various examples. The goal of this project is to identify the most relevant features from the exogenous variables in the dataset and evaluate the performance across various time series forecasting models, using data from Notional API.
The following files are included in this repository:
-
utils.py: This file contains utility functions that are used in other scripts for data preprocessing and feature engineering.
-
requirement.txt: This file lists the dependencies required to run the project. Make sure to install these dependencies before executing the code.
-
bike_sharing/demo_bike_dataset.ipynb: This Jupyter Notebook provides a demonstration of the feature selection process on the bike dataset. It loads the processed data file (
bike_sharing/data/bike_sharing_day.csv), applies feature selection techniques, and evaluates the selected features. You will also learn how to use the bulk API from Notional API to get the exogenous features. -
calculate_feature_score.py: This script is designed to calculate scores for a batch of features. It is placed in a separate file to support multiprocessing, allowing for efficient computation.
-
feature_selection.py: This script is responsible for executing the feature selection process. Within the main training loop, it calls the
calculate_feature_scorefunction fromcalculate_feature_score.py. The feature selection technique employed is metric-based, wherein it evaluates the features based on a specific metric (scoring function). It will return a ranked list of selected features subset that has been deemed most relevant for the dataset.
To use this project, follow these steps:
Install the dependencies listed in the requirement.txt file using different package managers, you can follow the instructions below:
pip install -r requirement.txtconda install --file requirement.txtYou can checkout the example on the bike sharing dataset:
-
Open the
bike_sharing/demo_bike_dataset.ipynbnotebook in Jupyter Notebook. -
Execute the cells in the notebook sequentially to see the step-by-step process of feature selection on the bike dataset.
Contributions to this project are welcome. If you find any issues or have suggestions for improvements, please open an issue or submit a pull request.