LANL-Earthquake

LANL Earthquake Prediction Challenge: https://www.kaggle.com/c/LANL-Earthquake-Prediction#description

Date: 11 Feb 2019

Team

Name	Email	Responsibilities
Denish Shitov	-	-
Alexey Shaymanov	-	-
Pavel Tolmachev	-	-
Sergey Iakovlev	siakovlev@student.unimelb.edu.au	-

Abstract

Introduction

Project structure

Under the /src directory there is the following structure:

/configs - configs for data processing, model training and validation.
/data_processing
- dp_run.py - main data processing script with parameters specified in dp_config.json.
- feature.py - Feature class implementation.
- dp_utils.py -
/folds - directory with custom fold implementation
/models - custom models that follow model.py interface
/preproc - ?
/test_directory - the main script generating test prediction results
/validation - directory with training and validation scripts:
- train_single_model.py - script for training of a single model with parameters specified in /configs/train_config.json
- validation_run.py- script for training of a single/multiple models with parameters specified in /configs/validation_config.json

Configs

Data processing, model training and generation of test results can be managed via the following configs (in /configs dir):

dp_config.json
- A user needs to specify directories with the original dataframe (data_dir), output directory for the processed dataframe (data_processed_dir), and their names (data_fname and data_processed_fname);
- There are two global parameters: window_size and window_stride that are used by default during each feature calculation. Note, these parameters can be overriden by each feature locally (see below).
- features is the list of features to be calculated. In the configuration file, each feature has 3 parameters:
  - name - feature name;
  - on - the feature caluculation can be enabled or disabled;
  - functions - a dictionary of functions with corresponding parameters from feature.py. An example of a single feature is provided below:
```
{
   "name": "q_05_std_rolling_50",
   "on": true,
   "functions": {
     "r_std": {
       "window_size": 50,
       "window_stride": null
     },
     "w_quantile": {
       "q": 0.05
     }
   }
 }
```
train_config.json or validation_config.json
multi_test_config.json

Setup

Windows

Mac OS

Linux

Results

Notes:

Fold: CustomFold(n_splits=1, shuffle=True, fragmentation=0, pad=150)
10 runs is equivalent to 90 model trainings

Best performing models:

Feature config	Model	Params	10 runs score(std)	100 runs score(std)	300 runs score(std)	Public score
e9	XGBRegressor	{'booster': 'gbtree', 'colsample_bytree': 1.0, 'eta': 0.177, 'eval_metric': 'mae', 'gamma': 0.93, 'max_depth': 5, 'min_child_weight': 10, 'n_estimators': 20, 'objective': 'gpu:reg:linear', 'seed': 65001, 'silent': 1, 'subsample': 0.65, 'tree_method': 'gpu_hist'}	2.0092	-	2.1364 (0.7928)	1.647
e9	XGBRegressor	{'booster': 'gbtree', 'colsample_bytree': 1.00, 'eta': 0.165, 'eval_metric': 'mae', 'gamma': 0.95, 'max_depth': 5, 'min_child_weight': 10, 'n_estimators': 20, 'objective': 'gpu:reg:linear', 'seed': 0, 'silent': 1, 'subsample': 0.60, 'tree_method': 'gpu_hist'}	2.0096	-	2.1368 (0.7929)	1.646
e7	XGBRegressor	{'booster': 'gbtree', 'colsample_bytree': 0.59, 'eta': 0.273, 'eval_metric': 'mae', 'gamma': 0.82, 'max_depth': 4, 'min_child_weight': 10, 'n_estimators': 20, 'objective': 'gpu:reg:linear', 'seed': 0, 'silent': 1, 'subsample': 0.78, 'tree_method': 'gpu_hist'}	2.0105	2.0917 (0.7735)	2.1393 (0.7966)	1.650

Other models:

Feature config	Model	Params	10 runs score(std)	100 runs score(std)	300 runs score(std)	Public score
e1	XGBRegressor	{'booster': 'gbtree', 'colsample_bytree': 0.8, 'eta': 0.07, 'eval_metric': 'mae', 'gamma': 0.6, 'max_depth': 4, 'min_child_weight': 3, 'n_estimators': 20, 'objective': 'gpu:reg:linear', 'seed': 0, 'silent': 1, 'subsample': 1.0, 'tree_method': 'gpu_hist'}	2.011	2.092	2.1397 (0.7959)	1.650
e6	XGBRegressor	{'booster': 'gbtree', 'colsample_bytree': 0.95, 'eta': 0.015, 'eval_metric': 'mae', 'gamma': 0.75, 'max_depth': 6, 'min_child_weight': 10, 'n_estimators': 20, 'objective': 'gpu:reg:linear', 'seed': 314159265, 'silent': 1, 'subsample': 0.95, 'tree_method': 'gpu_hist'}	2.013	2.094	2.1418	1.680
e3	XGBRegressor	{'booster': 'gbtree', 'colsample_bytree': 0.7, 'eta': 0.15, 'eval_metric': 'mae', 'gamma': 0.75, 'max_depth': 4, 'min_child_weight': 4, 'n_estimators': 21, 'objective': 'gpu:reg:linear', 'seed': 314159265, 'silent': 1, 'subsample': 0.8, 'tree_method': 'gpu_hist'}	2.031	2.1099	2.1556 (0.7787)	-
e6	AdaBoost	{'learning_rate': 0.23, 'loss': 'linear', 'n_estimators': 13, 'random_state': 0}	2.0728	-	-	-

Name		Name	Last commit message	Last commit date
Latest commit History 275 Commits
results		results
src		src
.gitignore		.gitignore
Post-processing.ipynb		Post-processing.ipynb
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

LANL-Earthquake

Table of contents

Team

Abstract

Introduction

Project structure

Configs

Setup

Windows

Mac OS

Linux

Results

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

LANL-Earthquake

Table of contents

Team

Abstract

Introduction

Project structure

Configs

Setup

Windows

Mac OS

Linux

Results

About

Topics

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages