Skip to content

Commit 58540f7

Browse files
authored
Csi500 example (#1126)
* Stage code * Update results and scripts
1 parent 3e6e286 commit 58540f7

File tree

5 files changed

+210
-6
lines changed

5 files changed

+210
-6
lines changed
Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -1,3 +1,3 @@
11
pandas==1.1.2
22
numpy==1.21.0
3-
lightgbm==3.1.0
3+
lightgbm
Lines changed: 72 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,72 @@
1+
qlib_init:
2+
provider_uri: "~/.qlib/qlib_data/cn_data"
3+
region: cn
4+
market: &market csi500
5+
benchmark: &benchmark SH000905
6+
data_handler_config: &data_handler_config
7+
start_time: 2008-01-01
8+
end_time: 2020-08-01
9+
fit_start_time: 2008-01-01
10+
fit_end_time: 2014-12-31
11+
instruments: *market
12+
port_analysis_config: &port_analysis_config
13+
strategy:
14+
class: TopkDropoutStrategy
15+
module_path: qlib.contrib.strategy
16+
kwargs:
17+
model: <MODEL>
18+
dataset: <DATASET>
19+
topk: 50
20+
n_drop: 5
21+
backtest:
22+
start_time: 2017-01-01
23+
end_time: 2020-08-01
24+
account: 100000000
25+
benchmark: *benchmark
26+
exchange_kwargs:
27+
limit_threshold: 0.095
28+
deal_price: close
29+
open_cost: 0.0005
30+
close_cost: 0.0015
31+
min_cost: 5
32+
task:
33+
model:
34+
class: LGBModel
35+
module_path: qlib.contrib.model.gbdt
36+
kwargs:
37+
loss: mse
38+
colsample_bytree: 0.8879
39+
learning_rate: 0.2
40+
subsample: 0.8789
41+
lambda_l1: 205.6999
42+
lambda_l2: 580.9768
43+
max_depth: 8
44+
num_leaves: 210
45+
num_threads: 20
46+
dataset:
47+
class: DatasetH
48+
module_path: qlib.data.dataset
49+
kwargs:
50+
handler:
51+
class: Alpha158
52+
module_path: qlib.contrib.data.handler
53+
kwargs: *data_handler_config
54+
segments:
55+
train: [2008-01-01, 2014-12-31]
56+
valid: [2015-01-01, 2016-12-31]
57+
test: [2017-01-01, 2020-08-01]
58+
record:
59+
- class: SignalRecord
60+
module_path: qlib.workflow.record_temp
61+
kwargs:
62+
model: <MODEL>
63+
dataset: <DATASET>
64+
- class: SigAnaRecord
65+
module_path: qlib.workflow.record_temp
66+
kwargs:
67+
ana_long_short: False
68+
ann_scaler: 252
69+
- class: PortAnaRecord
70+
module_path: qlib.workflow.record_temp
71+
kwargs:
72+
config: *port_analysis_config
Lines changed: 80 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,80 @@
1+
qlib_init:
2+
provider_uri: "~/.qlib/qlib_data/cn_data"
3+
region: cn
4+
market: &market csi500
5+
benchmark: &benchmark SH000905
6+
data_handler_config: &data_handler_config
7+
start_time: 2008-01-01
8+
end_time: 2020-08-01
9+
fit_start_time: 2008-01-01
10+
fit_end_time: 2014-12-31
11+
instruments: *market
12+
infer_processors: []
13+
learn_processors:
14+
- class: DropnaLabel
15+
- class: CSRankNorm
16+
kwargs:
17+
fields_group: label
18+
label: ["Ref($close, -2) / Ref($close, -1) - 1"]
19+
port_analysis_config: &port_analysis_config
20+
strategy:
21+
class: TopkDropoutStrategy
22+
module_path: qlib.contrib.strategy
23+
kwargs:
24+
signal:
25+
- <MODEL>
26+
- <DATASET>
27+
topk: 50
28+
n_drop: 5
29+
backtest:
30+
start_time: 2017-01-01
31+
end_time: 2020-08-01
32+
account: 100000000
33+
benchmark: *benchmark
34+
exchange_kwargs:
35+
limit_threshold: 0.095
36+
deal_price: close
37+
open_cost: 0.0005
38+
close_cost: 0.0015
39+
min_cost: 5
40+
task:
41+
model:
42+
class: LGBModel
43+
module_path: qlib.contrib.model.gbdt
44+
kwargs:
45+
loss: mse
46+
colsample_bytree: 0.8879
47+
learning_rate: 0.0421
48+
subsample: 0.8789
49+
lambda_l1: 205.6999
50+
lambda_l2: 580.9768
51+
max_depth: 8
52+
num_leaves: 210
53+
num_threads: 20
54+
dataset:
55+
class: DatasetH
56+
module_path: qlib.data.dataset
57+
kwargs:
58+
handler:
59+
class: Alpha360
60+
module_path: qlib.contrib.data.handler
61+
kwargs: *data_handler_config
62+
segments:
63+
train: [2008-01-01, 2014-12-31]
64+
valid: [2015-01-01, 2016-12-31]
65+
test: [2017-01-01, 2020-08-01]
66+
record:
67+
- class: SignalRecord
68+
module_path: qlib.workflow.record_temp
69+
kwargs:
70+
model: <MODEL>
71+
dataset: <DATASET>
72+
- class: SigAnaRecord
73+
module_path: qlib.workflow.record_temp
74+
kwargs:
75+
ana_long_short: False
76+
ann_scaler: 252
77+
- class: PortAnaRecord
78+
module_path: qlib.workflow.record_temp
79+
kwargs:
80+
config: *port_analysis_config

examples/benchmarks/README.md

Lines changed: 36 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -20,7 +20,9 @@ The numbers shown below demonstrate the performance of the entire `workflow` of
2020
> NOTE:
2121
> We have very limited resources to implement and finetune the models. We tried our best effort to fairly compare these models. But some models may have greater potential than what it looks like in the table below. Your contribution is highly welcomed to explore their potential.
2222
23-
## Alpha158 dataset
23+
## Results on CSI300
24+
25+
### Alpha158 dataset
2426

2527
| Model Name | Dataset | IC | ICIR | Rank IC | Rank ICIR | Annualized Return | Information Ratio | Max Drawdown |
2628
|------------------------------------------|-------------------------------------|-------------|-------------|-------------|-------------|-------------------|-------------------|--------------|
@@ -44,7 +46,7 @@ The numbers shown below demonstrate the performance of the entire `workflow` of
4446
| DoubleEnsemble(Chuheng Zhang, et al.) | Alpha158 | 0.0544±0.00 | 0.4340±0.00 | 0.0523±0.00 | 0.4284±0.01 | 0.1168±0.01 | 1.3384±0.12 | -0.1036±0.01 |
4547

4648

47-
## Alpha360 dataset
49+
### Alpha360 dataset
4850

4951
| Model Name | Dataset | IC | ICIR | Rank IC | Rank ICIR | Annualized Return | Information Ratio | Max Drawdown |
5052
|-------------------------------------------|----------|-------------|-------------|-------------|-------------|-------------------|-------------------|--------------|
@@ -79,6 +81,38 @@ The numbers shown below demonstrate the performance of the entire `workflow` of
7981
- Signal-based evaluation: IC, ICIR, Rank IC, Rank ICIR
8082
- Portfolio-based metrics: Annualized Return, Information Ratio, Max Drawdown
8183

84+
## Results on CSI500
85+
The results on CSI500 is not complete. PR's for models on csi500 are welcome!
86+
87+
Transfer previous models in CSI300 to CSI500 is quite easy. You can try models with just a few commands below.
88+
```
89+
cd examples/benchmarks/LightGBM
90+
pip install -r requirements.txt
91+
92+
# create new config and set the benchmark to csi500
93+
cp workflow_config_lightgbm_Alpha158.yaml workflow_config_lightgbm_Alpha158_csi500.yaml
94+
sed -i "s/csi300/csi500/g" workflow_config_lightgbm_Alpha158_csi500.yaml
95+
sed -i "s/SH000300/SH000905/g" workflow_config_lightgbm_Alpha158_csi500.yaml
96+
97+
# you can either run the model once
98+
qrun workflow_config_lightgbm_Alpha158_csi500.yaml
99+
100+
# or run it for multiple times automatically and get the summarized results.
101+
cd ../../
102+
python run_all_model.py run 3 lightgbm Alpha158 csi500 # for models with randomness. please run it for 20 times.
103+
```
104+
105+
### Alpha158 dataset
106+
107+
| Model Name | Dataset | IC | ICIR | Rank IC | Rank ICIR | Annualized Return | Information Ratio | Max Drawdown |
108+
|------------|----------|-------------|-------------|-------------|-------------|-------------------|-------------------|--------------|
109+
| LightGBM | Alpha158 | 0.0377±0.00 | 0.3860±0.00 | 0.0448±0.00 | 0.4675±0.00 | 0.1151±0.00 | 1.3884±0.00 | -0.0898±0.00 |
110+
111+
### Alpha360 dataset
112+
| Model Name | Dataset | IC | ICIR | Rank IC | Rank ICIR | Annualized Return | Information Ratio | Max Drawdown |
113+
|------------|----------|-------------|-------------|-------------|-------------|-------------------|-------------------|--------------|
114+
| LightGBM | Alpha360 | 0.0400±0.00 | 0.3605±0.00 | 0.0536±0.00 | 0.5431±0.00 | 0.0505±0.00 | 0.7658±0.02 | -0.1880±0.00 |
115+
82116

83117
# Contributing
84118

examples/run_all_model.py

Lines changed: 21 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -117,8 +117,10 @@ def get_all_folders(models, exclude) -> dict:
117117

118118

119119
# function to get all the files under the model folder
120-
def get_all_files(folder_path, dataset) -> (str, str):
121-
yaml_path = str(Path(f"{folder_path}") / f"*{dataset}*.yaml")
120+
def get_all_files(folder_path, dataset, universe="") -> (str, str):
121+
if universe != "":
122+
universe = f"_{universe}"
123+
yaml_path = str(Path(f"{folder_path}") / f"*{dataset}{universe}.yaml")
122124
req_path = str(Path(f"{folder_path}") / f"*.txt")
123125
yaml_file = glob.glob(yaml_path)
124126
req_file = glob.glob(req_path)
@@ -224,6 +226,7 @@ def run(
224226
times=1,
225227
models=None,
226228
dataset="Alpha360",
229+
universe="",
227230
exclude=False,
228231
qlib_uri: str = "git+https://github.com/microsoft/qlib#egg=pyqlib",
229232
exp_folder_name: str = "run_all_model_records",
@@ -245,6 +248,9 @@ def run(
245248
determines whether the model being used is excluded or included.
246249
dataset : str
247250
determines the dataset to be used for each model.
251+
universe : str
252+
the stock universe of the dataset.
253+
default "" indicates that
248254
qlib_uri : str
249255
the uri to install qlib with pip
250256
it could be url on the we or local path (NOTE: the local path must be a absolute path)
@@ -259,6 +265,15 @@ def run(
259265
-------
260266
Here are some use cases of the function in the bash:
261267
268+
The run_all_models will decide which config to run based no `models` `dataset` `universe`
269+
Example 1):
270+
271+
models="lightgbm", dataset="Alpha158", universe="" will result in running the following config
272+
examples/benchmarks/LightGBM/workflow_config_lightgbm_Alpha158.yaml
273+
274+
models="lightgbm", dataset="Alpha158", universe="csi500" will result in running the following config
275+
examples/benchmarks/LightGBM/workflow_config_lightgbm_Alpha158_csi500.yaml
276+
262277
.. code-block:: bash
263278
264279
# Case 1 - run all models multiple times
@@ -279,6 +294,9 @@ def run(
279294
# Case 6 - run other models except those are given as arguments for one time
280295
python run_all_model.py run --models=[mlp,tft,sfm] --exclude=True
281296
297+
# Case 7 - run lightgbm model on csi500.
298+
python run_all_model.py run 3 lightgbm Alpha158 csi500
299+
282300
"""
283301
self._init_qlib(exp_folder_name)
284302

@@ -290,7 +308,7 @@ def run(
290308
for fn in folders:
291309
# get all files
292310
sys.stderr.write("Retrieving files...\n")
293-
yaml_path, req_path = get_all_files(folders[fn], dataset)
311+
yaml_path, req_path = get_all_files(folders[fn], dataset, universe=universe)
294312
if yaml_path is None:
295313
sys.stderr.write(f"There is no {dataset}.yaml file in {folders[fn]}")
296314
continue

0 commit comments

Comments
 (0)