LANL Earthquake Prediction Challenge: https://www.kaggle.com/c/LANL-Earthquake-Prediction#description
Date: 11 Feb 2019
| Name | Responsibilities | |
|---|---|---|
| Denish Shitov | - | - |
| Alexey Shaymanov | - | - |
| Pavel Tolmachev | - | - |
| Sergey Iakovlev | siakovlev@student.unimelb.edu.au | - |
Under the /src directory there is the following structure:
/configs- configs for data processing, model training and validation./data_processingdp_run.py- main data processing script with parameters specified indp_config.json.feature.py-Featureclass implementation.dp_utils.py-
/folds- directory with custom fold implementation/models- custom models that followmodel.pyinterface/preproc- ?/test_directory- the main script generating test prediction results/validation- directory with training and validation scripts:train_single_model.py- script for training of a single model with parameters specified in/configs/train_config.jsonvalidation_run.py- script for training of a single/multiple models with parameters specified in/configs/validation_config.json
Data processing, model training and generation of test results can be managed via the following configs (in /configs dir):
dp_config.json- A user needs to specify directories with the original dataframe (
data_dir), output directory for the processed dataframe (data_processed_dir), and their names (data_fnameanddata_processed_fname); - There are two global parameters:
window_sizeandwindow_stridethat are used by default during each feature calculation. Note, these parameters can be overriden by each feature locally (see below). featuresis the list of features to be calculated. In the configuration file, each feature has 3 parameters:name- feature name;on- the feature caluculation can be enabled or disabled;functions- a dictionary of functions with corresponding parameters fromfeature.py. An example of a single feature is provided below:
{ "name": "q_05_std_rolling_50", "on": true, "functions": { "r_std": { "window_size": 50, "window_stride": null }, "w_quantile": { "q": 0.05 } } }- A user needs to specify directories with the original dataframe (
train_config.jsonorvalidation_config.jsonmulti_test_config.json
Notes:
- Fold:
CustomFold(n_splits=1, shuffle=True, fragmentation=0, pad=150) - 10 runs is equivalent to 90 model trainings
Best performing models:
| Feature config | Model | Params | 10 runs score(std) | 100 runs score(std) | 300 runs score(std) | Public score |
|---|---|---|---|---|---|---|
| e9 | XGBRegressor | {'booster': 'gbtree', 'colsample_bytree': 1.0, 'eta': 0.177, 'eval_metric': 'mae', 'gamma': 0.93, 'max_depth': 5, 'min_child_weight': 10, 'n_estimators': 20, 'objective': 'gpu:reg:linear', 'seed': 65001, 'silent': 1, 'subsample': 0.65, 'tree_method': 'gpu_hist'} | 2.0092 | - | 2.1364 (0.7928) | 1.647 |
| e9 | XGBRegressor | {'booster': 'gbtree', 'colsample_bytree': 1.00, 'eta': 0.165, 'eval_metric': 'mae', 'gamma': 0.95, 'max_depth': 5, 'min_child_weight': 10, 'n_estimators': 20, 'objective': 'gpu:reg:linear', 'seed': 0, 'silent': 1, 'subsample': 0.60, 'tree_method': 'gpu_hist'} | 2.0096 | - | 2.1368 (0.7929) | 1.646 |
| e7 | XGBRegressor | {'booster': 'gbtree', 'colsample_bytree': 0.59, 'eta': 0.273, 'eval_metric': 'mae', 'gamma': 0.82, 'max_depth': 4, 'min_child_weight': 10, 'n_estimators': 20, 'objective': 'gpu:reg:linear', 'seed': 0, 'silent': 1, 'subsample': 0.78, 'tree_method': 'gpu_hist'} | 2.0105 | 2.0917 (0.7735) | 2.1393 (0.7966) | 1.650 |
Other models:
| Feature config | Model | Params | 10 runs score(std) | 100 runs score(std) | 300 runs score(std) | Public score |
|---|---|---|---|---|---|---|
| e1 | XGBRegressor | {'booster': 'gbtree', 'colsample_bytree': 0.8, 'eta': 0.07, 'eval_metric': 'mae', 'gamma': 0.6, 'max_depth': 4, 'min_child_weight': 3, 'n_estimators': 20, 'objective': 'gpu:reg:linear', 'seed': 0, 'silent': 1, 'subsample': 1.0, 'tree_method': 'gpu_hist'} | 2.011 | 2.092 | 2.1397 (0.7959) | 1.650 |
| e6 | XGBRegressor | {'booster': 'gbtree', 'colsample_bytree': 0.95, 'eta': 0.015, 'eval_metric': 'mae', 'gamma': 0.75, 'max_depth': 6, 'min_child_weight': 10, 'n_estimators': 20, 'objective': 'gpu:reg:linear', 'seed': 314159265, 'silent': 1, 'subsample': 0.95, 'tree_method': 'gpu_hist'} | 2.013 | 2.094 | 2.1418 | 1.680 |
| e3 | XGBRegressor | {'booster': 'gbtree', 'colsample_bytree': 0.7, 'eta': 0.15, 'eval_metric': 'mae', 'gamma': 0.75, 'max_depth': 4, 'min_child_weight': 4, 'n_estimators': 21, 'objective': 'gpu:reg:linear', 'seed': 314159265, 'silent': 1, 'subsample': 0.8, 'tree_method': 'gpu_hist'} | 2.031 | 2.1099 | 2.1556 (0.7787) | - |
| e6 | AdaBoost | {'learning_rate': 0.23, 'loss': 'linear', 'n_estimators': 13, 'random_state': 0} | 2.0728 | - | - | - |