Skip to content

Commit 616e9e4

Browse files
authored
[ModelZoo] Update modelzoo README. (#916)
Models: bst/dbmtl/dcn/deepfm/dien/din/dlrm/dssm/esmm/mmoe/ple/simple_multitask/wide_and_deep Signed-off-by: candy.dc <[email protected]>
1 parent 56cc51e commit 616e9e4

File tree

13 files changed

+267
-268
lines changed

13 files changed

+267
-268
lines changed

modelzoo/bst/README.md

Lines changed: 21 additions & 21 deletions
Original file line numberDiff line numberDiff line change
@@ -127,7 +127,7 @@ input:
127127
- `--data_location`: Full path of train & eval data, default to `./data`.
128128
- `--steps`: Set the number of steps on train dataset. Default will be set to 100 epoch.
129129
- `--no_eval`: Do not evaluate trained model by eval dataset.
130-
- `--batch_size`: Batch size to train. Default to 512.
130+
- `--batch_size`: Batch size to train. Default to 2048.
131131
- `--output_dir`: Full path to output directory for logs and saved model, default to `./result`.
132132
- `--checkpoint`: Full path to checkpoints input/output directory, default to `$(OUTPUT_DIR)/model_$(MODEL_NAME)_$(TIMESTAMPS)`
133133
- `--save_steps`: Set the number of steps on saving checkpoints, zero to close. Default will be set to 0.
@@ -157,21 +157,20 @@ input:
157157
## Benchmark
158158
### Stand-alone Training
159159
#### Test Environment
160-
The benchmark is performed on the [Alibaba Cloud ECS general purpose instance family with high clock speeds - **ecs.hfg7.2xlarge**](https://help.aliyun.com/document_detail/25378.html?spm=5176.2020520101.vmBInfo.instanceType.4a944df5PvCcED#hfg7).
160+
The benchmark is performed on the [Alibaba Cloud ECS general purpose instance family with high clock speeds - **ecs.g8i.4xlarge**](https://help.aliyun.com/document_detail/25378.html#g8i).
161161
- Hardware
162-
- Model name: Intel(R) Xeon(R) Platinum 8369HC CPU @ 3.30GHz
163-
- CPU(s): 8
162+
- Model name: Intel(R) Xeon(R) Platinum 8475B
163+
- CPU(s): 16
164164
- Socket(s): 1
165-
- Core(s) per socket: 4
165+
- Core(s) per socket: 8
166166
- Thread(s) per core: 2
167-
- Memory: 32G
167+
- Memory: 64G
168168
169169
- Software
170-
- kernel: 4.18.0-348.2.1.el8_5.x86_64
171-
- OS: CentOS Linux release 8.5.2111
172-
- GCC: 8.5.0
173-
- Docker: 20.10.12
174-
- Python: 3.6.8
170+
- kernel: Linux version 5.15.0-58-generic (buildd@lcy02-amd64-101)(AMX patched)
171+
- OS: Ubuntu 22.04.2 LTS
172+
- GCC: 11.3.0
173+
- Docker: 20.10.21
175174
176175
#### Performance Result
177176
@@ -182,33 +181,34 @@ The benchmark is performed on the [Alibaba Cloud ECS general purpose instance fa
182181
<td>DType</td>
183182
<td>Accuracy</td>
184183
<td>AUC</td>
185-
<td>Globalsetp/Sec</td>
184+
<td>Throughput</td>
186185
</tr>
187186
<tr>
188187
<td rowspan="3">BST</td>
189188
<td>Community TensorFlow</td>
190189
<td>FP32</td>
191-
<td></td>
192-
<td></td>
193-
<td></td>
190+
<td>0.912500</td>
191+
<td>0.499316</td>
192+
<td>16924.47(baseline)</td>
194193
</tr>
195194
<tr>
196195
<td>DeepRec w/ oneDNN</td>
197196
<td>FP32</td>
198-
<td></td>
199-
<td></td>
200-
<td></td>
197+
<td>0.894900</td>
198+
<td>0.499316</td>
199+
<td>22143.04(1.30x)</td>
201200
</tr>
202201
<tr>
203202
<td>DeepRec w/ oneDNN</td>
204203
<td>FP32+BF16</td>
205-
<td></td>
206-
<td></td>
207-
<td></td>
204+
<td>0.909099</td>
205+
<td>0.499316</td>
206+
<td>28686.70(1.69x)</td>
208207
</tr>
209208
</table>
210209
211210
- Community TensorFlow version is v1.15.5.
211+
- Due to the small size of the dataset, the results did not converge, leading to limited reference value for ACC and AUC.
212212
213213
### Distributed Training
214214
#### Test Environment

modelzoo/dbmtl/README.md

Lines changed: 22 additions & 21 deletions
Original file line numberDiff line numberDiff line change
@@ -121,7 +121,7 @@ Context │ │────►│ │
121121
- `--data_location`: Full path of train & eval data. Default is `./data`.
122122
- `--steps`: Set the number of steps on train dataset. When default(`0`) is used, the number of steps is computed based on dataset size and number of epochs equals 1000.
123123
- `--no_eval`: Do not evaluate trained model by eval dataset.
124-
- `--batch_size`: Batch size to train. Default is `512`.
124+
- `--batch_size`: Batch size to train. Default is `2048`.
125125
- `--output_dir`: Full path to output directory for logs and saved model. Default is `./result`.
126126
- `--checkpoint`: Full path to checkpoints output directory. Default is `$(OUTPUT_DIR)/model_$(MODEL_NAME)_$(TIMESTAMP)`
127127
- `--save_steps`: Set the number of steps on saving checkpoints, zero to close. Default will be set to `None`.
@@ -151,20 +151,20 @@ Context │ │────►│ │
151151
### Stand-alone Training
152152
153153
#### Test Environment
154-
The benchmark is performed on the [Alibaba Cloud ECS general purpose instance family with high clock speeds - **ecs.hfg7.2xlarge**](https://help.aliyun.com/document_detail/25378.html?spm=5176.2020520101.vmBInfo.instanceType.4a944df5PvCcED#hfg7).
155-
- Hardware
156-
- Model name: Intel(R) Xeon(R) Platinum 8369HC CPU @ 3.30GHz
157-
- CPU(s): 8
154+
The benchmark is performed on the [Alibaba Cloud ECS general purpose instance family with high clock speeds - **ecs.g8i.4xlarge**](https://help.aliyun.com/document_detail/25378.html#g8i).
155+
- Hardware
156+
- Model name: Intel(R) Xeon(R) Platinum 8475B
157+
- CPU(s): 16
158158
- Socket(s): 1
159-
- Core(s) per socket: 4
159+
- Core(s) per socket: 8
160160
- Thread(s) per core: 2
161-
- Memory: 32G
161+
- Memory: 64G
162162
163163
- Software
164-
- kernel: 4.18.0-305.12.1.el8_4.x86_64
165-
- OS: CentOS Linux release 8.4.2105
166-
- Docker: 20.10.12
167-
- Python: 3.6.12
164+
- kernel: Linux version 5.15.0-58-generic (buildd@lcy02-amd64-101)(AMX patched)
165+
- OS: Ubuntu 22.04.2 LTS
166+
- GCC: 11.3.0
167+
- Docker: 20.10.21
168168
169169
#### Performance Result
170170
@@ -175,33 +175,34 @@ The benchmark is performed on the [Alibaba Cloud ECS general purpose instance fa
175175
<td>DType</td>
176176
<td>Accuracy</td>
177177
<td>AUC</td>
178-
<td>Globalsetp/Sec</td>
178+
<td>Throughput</td>
179179
</tr>
180180
<tr>
181181
<td rowspan="3">DBMTL</td>
182182
<td>Community TensorFlow</td>
183183
<td>FP32</td>
184-
<td></td>
185-
<td></td>
186-
<td></td>
184+
<td>0.973150</td>
185+
<td>0.753008</td>
186+
<td>63220.87(baseline)</td>
187187
</tr>
188188
<tr>
189189
<td>DeepRec w/ oneDNN</td>
190190
<td>FP32</td>
191-
<td></td>
192-
<td></td>
193-
<td></td>
191+
<td>0.973150</td>
192+
<td>0.753070</td>
193+
<td>77383.57(1.22x)</td>
194194
</tr>
195195
<tr>
196196
<td>DeepRec w/ oneDNN</td>
197197
<td>FP32+BF16</td>
198-
<td></td>
199-
<td></td>
200-
<td></td>
198+
<td>0.973150</td>
199+
<td>0.753070</td>
200+
<td>137581.54(2.17x)</td>
201201
</tr>
202202
</table>
203203
204204
- Community TensorFlow version is v1.15.5.
205+
- Due to the small size of the dataset, the results did not converge, leading to limited reference value for ACC and AUC.
205206
206207
## Dataset
207208
Train & eval dataset using ***Taobao dataset***.

modelzoo/dcn/README.md

Lines changed: 19 additions & 20 deletions
Original file line numberDiff line numberDiff line change
@@ -98,7 +98,7 @@ The following is a brief directory structure and description for this example:
9898
- `--data_location`: Full path of train & eval data, default to `./data`.
9999
- `--steps`: Set the number of steps on train dataset. Default will be set to 1 epoch.
100100
- `--no_eval`: Do not evaluate trained model by eval dataset.
101-
- `--batch_size`: Batch size to train. Default to 512.
101+
- `--batch_size`: Batch size to train. Default to 2048.
102102
- `--output_dir`: Full path to output directory for logs and saved model, default to `./result`.
103103
- `--checkpoint`: Full path to checkpoints input/output directory, default to `$(OUTPUT_DIR)/model_$(MODEL_NAME)_$(TIMESTAMPS)`
104104
- `--save_steps`: Set the number of steps on saving checkpoints, zero to close. Default will be set to 0.
@@ -128,21 +128,20 @@ The following is a brief directory structure and description for this example:
128128
## Benchmark
129129
### Stand-alone Training
130130
#### Test Environment
131-
The benchmark is performed on the [Alibaba Cloud ECS general purpose instance family with high clock speeds - **ecs.hfg7.2xlarge**](https://help.aliyun.com/document_detail/25378.html?spm=5176.2020520101.vmBInfo.instanceType.4a944df5PvCcED#hfg7).
131+
The benchmark is performed on the [Alibaba Cloud ECS general purpose instance family with high clock speeds - **ecs.g8i.4xlarge**](https://help.aliyun.com/document_detail/25378.html#g8i).
132132
- Hardware
133-
- Model name: Intel(R) Xeon(R) Platinum 8369HC CPU @ 3.30GHz
134-
- CPU(s): 8
133+
- Model name: Intel(R) Xeon(R) Platinum 8475B
134+
- CPU(s): 16
135135
- Socket(s): 1
136-
- Core(s) per socket: 4
136+
- Core(s) per socket: 8
137137
- Thread(s) per core: 2
138-
- Memory: 32G
138+
- Memory: 64G
139139
140140
- Software
141-
- kernel: 4.18.0-348.2.1.el8_5.x86_64
142-
- OS: CentOS Linux release 8.5.2111
143-
- GCC: 8.5.0
144-
- Docker: 20.10.12
145-
- Python: 3.6.8
141+
- kernel: Linux version 5.15.0-58-generic (buildd@lcy02-amd64-101)(AMX patched)
142+
- OS: Ubuntu 22.04.2 LTS
143+
- GCC: 11.3.0
144+
- Docker: 20.10.21
146145
147146
#### Performance Result
148147
@@ -159,23 +158,23 @@ The benchmark is performed on the [Alibaba Cloud ECS general purpose instance fa
159158
<td rowspan="3">DCN</td>
160159
<td>Community TensorFlow</td>
161160
<td>FP32</td>
162-
<td>0.775859</td>
163-
<td>0.768275</td>
164-
<td></td>
161+
<td>0.776260</td>
162+
<td>0.769636</td>
163+
<td>24524.91(baseline)</td>
165164
</tr>
166165
<tr>
167166
<td>DeepRec w/ oneDNN</td>
168167
<td>FP32</td>
169-
<td></td>
170-
<td></td>
171-
<td></td>
168+
<td>0.775738</td>
169+
<td>0.769095</td>
170+
<td>31917.35(1.30x)</td>
172171
</tr>
173172
<tr>
174173
<td>DeepRec w/ oneDNN</td>
175174
<td>FP32+BF16</td>
176-
<td></td>
177-
<td></td>
178-
<td></td>
175+
<td>0.775738</td>
176+
<td>0.768651</td>
177+
<td>55753.15(2.27x)</td>
179178
</tr>
180179
</table>
181180

modelzoo/deepfm/README.md

Lines changed: 19 additions & 20 deletions
Original file line numberDiff line numberDiff line change
@@ -123,7 +123,7 @@ input: | |
123123
- `--data_location`: Full path of train & eval data, default to `./data`.
124124
- `--steps`: Set the number of steps on train dataset. Default will be set to 1 epoch.
125125
- `--no_eval`: Do not evaluate trained model by eval dataset.
126-
- `--batch_size`: Batch size to train. Default to 512.
126+
- `--batch_size`: Batch size to train. Default to 2048.
127127
- `--output_dir`: Full path to output directory for logs and saved model, default to `./result`.
128128
- `--checkpoint`: Full path to checkpoints input/output directory, default to `$(OUTPUT_DIR)/model_$(MODEL_NAME)_$(TIMESTAMPS)`
129129
- `--save_steps`: Set the number of steps on saving checkpoints, zero to close. Default will be set to 0.
@@ -153,21 +153,20 @@ input: | |
153153
## Benchmark
154154
### Stand-alone Training
155155
#### Test Environment
156-
The benchmark is performed on the [Alibaba Cloud ECS general purpose instance family with high clock speeds - **ecs.hfg7.2xlarge**](https://help.aliyun.com/document_detail/25378.html?spm=5176.2020520101.vmBInfo.instanceType.4a944df5PvCcED#hfg7).
156+
The benchmark is performed on the [Alibaba Cloud ECS general purpose instance family with high clock speeds - **ecs.g8i.4xlarge**](https://help.aliyun.com/document_detail/25378.html#g8i).
157157
- Hardware
158-
- Model name: Intel(R) Xeon(R) Platinum 8369HC CPU @ 3.30GHz
159-
- CPU(s): 8
158+
- Model name: Intel(R) Xeon(R) Platinum 8475B
159+
- CPU(s): 16
160160
- Socket(s): 1
161-
- Core(s) per socket: 4
161+
- Core(s) per socket: 8
162162
- Thread(s) per core: 2
163-
- Memory: 32G
163+
- Memory: 64G
164164
165165
- Software
166-
- kernel: 4.18.0-348.2.1.el8_5.x86_64
167-
- OS: CentOS Linux release 8.5.2111
168-
- GCC: 8.5.0
169-
- Docker: 20.10.12
170-
- Python: 3.6.8
166+
- kernel: Linux version 5.15.0-58-generic (buildd@lcy02-amd64-101)(AMX patched)
167+
- OS: Ubuntu 22.04.2 LTS
168+
- GCC: 11.3.0
169+
- Docker: 20.10.21
171170
172171
#### Performance Result
173172
@@ -184,23 +183,23 @@ The benchmark is performed on the [Alibaba Cloud ECS general purpose instance fa
184183
<td rowspan="3">DeepFM</td>
185184
<td>Community TensorFlow</td>
186185
<td>FP32</td>
187-
<td>0.784695</td>
188-
<td>0.781548</td>
189-
<td>18848.64(baseline)</td>
186+
<td>0.782777</td>
187+
<td>0.776113</td>
188+
<td>61230.80(baseline)</td>
190189
</tr>
191190
<tr>
192191
<td>DeepRec w/ oneDNN</td>
193192
<td>FP32</td>
194-
<td>0.782755</td>
195-
<td>0.777158</td>
196-
<td>31260.00(1.65x)</td>
193+
<td>0.780460</td>
194+
<td>0.773281</td>
195+
<td>74380.35(1.22x)</td>
197196
</tr>
198197
<tr>
199198
<td>DeepRec w/ oneDNN</td>
200199
<td>FP32+BF16</td>
201-
<td>0.782659</td>
202-
<td>0.776537</td>
203-
<td>34627.46(1.84x)</td>
200+
<td>0.780460</td>
201+
<td>0.775249</td>
202+
<td>95107.32(1.55x)</td>
204203
</tr>
205204
</table>
206205

modelzoo/dien/README.md

Lines changed: 19 additions & 20 deletions
Original file line numberDiff line numberDiff line change
@@ -108,7 +108,7 @@ The following is a brief directory structure and description for this example:
108108
- `--data_location`: Full path of train & eval data, default to `./data`.
109109
- `--steps`: Set the number of steps on train dataset. Default will be set to 1 epoch.
110110
- `--no_eval`: Do not evaluate trained model by eval dataset.
111-
- `--batch_size`: Batch size to train. Default to 512.
111+
- `--batch_size`: Batch size to train. Default to 2048.
112112
- `--output_dir`: Full path to output directory for logs and saved model, default to `./result`.
113113
- `--checkpoint`: Full path to checkpoints input/output directory, default to `$(OUTPUT_DIR)/model_$(MODEL_NAME)_$(TIMESTAMPS)`
114114
- `--save_steps`: Set the number of steps on saving checkpoints, zero to close. Default will be set to 0.
@@ -138,21 +138,20 @@ The following is a brief directory structure and description for this example:
138138
## Benchmark
139139
### Stand-alone Training
140140
#### Test Environment
141-
The benchmark is performed on the [Alibaba Cloud ECS general purpose instance family with high clock speeds - **ecs.hfg7.2xlarge**](https://help.aliyun.com/document_detail/25378.html?spm=5176.2020520101.vmBInfo.instanceType.4a944df5PvCcED#hfg7).
141+
The benchmark is performed on the [Alibaba Cloud ECS general purpose instance family with high clock speeds - **ecs.g8i.4xlarge**](https://help.aliyun.com/document_detail/25378.html#g8i).
142142
- Hardware
143-
- Model name: Intel(R) Xeon(R) Platinum 8369HC CPU @ 3.30GHz
144-
- CPU(s): 8
143+
- Model name: Intel(R) Xeon(R) Platinum 8475B
144+
- CPU(s): 16
145145
- Socket(s): 1
146-
- Core(s) per socket: 4
146+
- Core(s) per socket: 8
147147
- Thread(s) per core: 2
148-
- Memory: 32G
148+
- Memory: 64G
149149
150150
- Software
151-
- kernel: 4.18.0-348.2.1.el8_5.x86_64
152-
- OS: CentOS Linux release 8.5.2111
153-
- GCC: 8.5.0
154-
- Docker: 20.10.12
155-
- Python: 3.6.8
151+
- kernel: Linux version 5.15.0-58-generic (buildd@lcy02-amd64-101)(AMX patched)
152+
- OS: Ubuntu 22.04.2 LTS
153+
- GCC: 11.3.0
154+
- Docker: 20.10.21
156155
157156
#### Performance Result
158157
@@ -169,23 +168,23 @@ The benchmark is performed on the [Alibaba Cloud ECS general purpose instance fa
169168
<td rowspan="3">DIEN</td>
170169
<td>Community TensorFlow</td>
171170
<td>FP32</td>
172-
<td>0.681824</td>
173-
<td>0.757496</td>
174-
<td>2822.78(baseline)</td>
171+
<td>0.575529</td>
172+
<td>0.597272</td>
173+
<td>6327.50(baseline)</td>
175174
</tr>
176175
<tr>
177176
<td>DeepRec w/ oneDNN</td>
178177
<td>FP32</td>
179-
<td>0.692499</td>
180-
<td>0.767193</td>
181-
<td>3834.05(1.36x)</td>
178+
<td>0.543935</td>
179+
<td>0.5972728</td>
180+
<td>10094.21(1.60x)</td>
182181
</tr>
183182
<tr>
184183
<td>DeepRec w/ oneDNN</td>
185184
<td>FP32+BF16</td>
186-
<td>0.693011</td>
187-
<td>0.768412</td>
188-
<td>3862.06(1.37x)</td>
185+
<td>0.551233</td>
186+
<td>0.597272</td>
187+
<td>11565.63(1.83x)</td>
189188
</tr>
190189
</table>
191190

0 commit comments

Comments
 (0)