File tree Expand file tree Collapse file tree 8 files changed +24
-18
lines changed Expand file tree Collapse file tree 8 files changed +24
-18
lines changed Original file line number Diff line number Diff line change 1414
1515
1616runner :
17- raw_file_dir : " path " # raw_data dir
17+ raw_file_dir : " raw_file/train " # raw_data dir
1818 raw_filled_file_dir : " ./raw_data" # raw_data_filled dir
1919 train_data_dir : " ./train_data_full" # train datasets
2020 test_data_dir : " ./test_data_full" # test datasets
Original file line number Diff line number Diff line change @@ -59,7 +59,7 @@ def __init__(self, config):
5959 self .min_threshold = self .config .get ("runner.min_threshold" )
6060 self .feature_map_cache = self .config .get ("runner.feature_map_cache" )
6161
62- # self.filled_raw()
62+ self .filled_raw ()
6363
6464 self .init ()
6565
Original file line number Diff line number Diff line change 22#### 1.Get raw datasets:
33you can go to:[ https://www.kaggle.com/c/avazu-ctr-prediction/data ] ( https://www.kaggle.com/c/avazu-ctr-prediction )
44
5- 将下载的原始数据目录配置在data_config.yaml中,执行命令获取全量数据
5+ 将下载的数据解压后,只保留训练集即可,且命名为`train``
66
77| 名称 | 说明 |
88| -------- | -------- |
9- | raw_file_dir | 原始数据集目录 |
9+ | raw_file | 原始数据集目录 |
1010| raw_filled_file_dir | 原始数据缺失值处理后的目录 |
1111| train_data_dir | 训练集存放目录 |
1212| test_data_dir | 测试集存放目录 |
@@ -15,9 +15,9 @@ you can go to:[https://www.kaggle.com/c/avazu-ctr-prediction/data](https://www
1515| feature_map_cache | 特征缓存数据 |
1616
1717
18-
18+ 然后执行脚本
1919``` bash
20- sh data_process .sh
20+ sh run .sh
2121```
2222#### 2.Get preprocessd datasets:
2323you can also go to: [ AiStudio数据集] ( https://aistudio.baidu.com/aistudio/datasetdetail/125200 )
Original file line number Diff line number Diff line change 1+ mkdir train_data_full
2+ mkdir test_data_full
3+ mkdir raw_file
4+ mkdir raw_filled_file_dir
5+ mv train ./raw_file
6+
17python preprocess.py -m data_config.yaml
Original file line number Diff line number Diff line change 44
55```
66├── data # 样例数据
7- ├── sample_data # 样例数据
8- ├── train
9- ├── sample_train.txt # 训练数据样例
7+ ├── sample_train.txt # 训练数据样例
108├── __init__.py
119├── README.md # 文档
1210├── config.yaml # sample数据配置
Original file line number Diff line number Diff line change @@ -63,8 +63,14 @@ os : windows/linux/macos
6363
6464## 快速开始
6565
66-
67- 本文提供了[ FLEN-Paddle AiStudio项目] ( https://aistudio.baidu.com/aistudio/projectdetail/3247609 ) 可以供您快速体验,进入项目快速开始。
66+ 本文提供了样例数据可以供您快速体验,在任意目录下均可执行。在FLEN模型目录的快速执行命令如下:
67+ ``` bash
68+ # 进入模型目录
69+ # cd models/rank/flen # 在任意目录均可运行
70+ # 动态图训练
71+ python -u ../../../tools/trainer.py -m config.yaml # 全量数据运行config_bigdata.yaml
72+ # 动态图预测
73+ python -u ../../../tools/infer.py -m config.yaml # 全量数据运行config_bigdata.yaml
6874
6975
7076# # 模型组网
Original file line number Diff line number Diff line change 1414
1515
1616runner :
17- train_data_dir : " ./data/sample_data/dataset "
17+ train_data_dir : " ./data/sample_data/train "
1818 train_reader_path : " avazu_reader" # importlib format
1919 use_gpu : False
2020 use_auc : True
@@ -25,7 +25,7 @@ runner:
2525
2626 # model_init_path: "output_model/0" # init model
2727 model_save_path : " output_model_flen"
28- test_data_dir : " ./data/sample_data/dataset " # "../../../../data/test"
28+ test_data_dir : " ./data/sample_data/train " # "../../../../data/test"
2929 infer_reader_path : " avazu_reader" # importlib format
3030 infer_batch_size : 3 # 512
3131 infer_load_path : " output_model_flen"
@@ -41,7 +41,7 @@ hyper_parameters:
4141 learning_rate : 0.04
4242 strategy : async
4343 # user-defined <key, value> pairs
44- sparse_inputs_slots : 23
44+ sparse_inputs_slots : 22
4545 sparse_feature_number : 20 # 1544488
4646 sparse_num_field : 3
4747 sparse_feature_dim : 32
Original file line number Diff line number Diff line change 66
77```
88├── data # 样例数据
9- ├── train
10- ├── train.txt
11- ├── test
12- ├── test.txt
139 ├── ratings.txt
1410 ├── trusts.txt
1511├── __init__.py
You can’t perform that action at this time.
0 commit comments