You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: modelzoo/bst/README.md
+21-21Lines changed: 21 additions & 21 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -127,7 +127,7 @@ input:
127
127
- `--data_location`: Full path of train & eval data, default to `./data`.
128
128
- `--steps`: Set the number of steps on train dataset. Default will be set to 100 epoch.
129
129
- `--no_eval`: Do not evaluate trained model by eval dataset.
130
-
- `--batch_size`: Batch size to train. Default to 512.
130
+
- `--batch_size`: Batch size to train. Default to 2048.
131
131
- `--output_dir`: Full path to output directory for logs and saved model, default to `./result`.
132
132
- `--checkpoint`: Full path to checkpoints input/output directory, default to `$(OUTPUT_DIR)/model_$(MODEL_NAME)_$(TIMESTAMPS)`
133
133
- `--save_steps`: Set the number of steps on saving checkpoints, zero to close. Default will be set to 0.
@@ -157,21 +157,20 @@ input:
157
157
## Benchmark
158
158
### Stand-alone Training
159
159
#### Test Environment
160
-
The benchmark is performed on the [Alibaba Cloud ECS general purpose instance family with high clock speeds - **ecs.hfg7.2xlarge**](https://help.aliyun.com/document_detail/25378.html?spm=5176.2020520101.vmBInfo.instanceType.4a944df5PvCcED#hfg7).
160
+
The benchmark is performed on the [Alibaba Cloud ECS general purpose instance family with high clock speeds - **ecs.g8i.4xlarge**](https://help.aliyun.com/document_detail/25378.html#g8i).
161
161
- Hardware
162
-
- Model name: Intel(R) Xeon(R) Platinum 8369HC CPU @ 3.30GHz
163
-
- CPU(s): 8
162
+
- Model name: Intel(R) Xeon(R) Platinum 8475B
163
+
- CPU(s): 16
164
164
- Socket(s): 1
165
-
- Core(s) per socket: 4
165
+
- Core(s) per socket: 8
166
166
- Thread(s) per core: 2
167
-
- Memory: 32G
167
+
- Memory: 64G
168
168
169
169
- Software
170
-
- kernel: 4.18.0-348.2.1.el8_5.x86_64
171
-
- OS: CentOS Linux release 8.5.2111
172
-
- GCC: 8.5.0
173
-
- Docker: 20.10.12
174
-
- Python: 3.6.8
170
+
- kernel: Linux version 5.15.0-58-generic (buildd@lcy02-amd64-101)(AMX patched)
171
+
- OS: Ubuntu 22.04.2 LTS
172
+
- GCC: 11.3.0
173
+
- Docker: 20.10.21
175
174
176
175
#### Performance Result
177
176
@@ -182,33 +181,34 @@ The benchmark is performed on the [Alibaba Cloud ECS general purpose instance fa
182
181
<td>DType</td>
183
182
<td>Accuracy</td>
184
183
<td>AUC</td>
185
-
<td>Globalsetp/Sec</td>
184
+
<td>Throughput</td>
186
185
</tr>
187
186
<tr>
188
187
<td rowspan="3">BST</td>
189
188
<td>Community TensorFlow</td>
190
189
<td>FP32</td>
191
-
<td></td>
192
-
<td></td>
193
-
<td></td>
190
+
<td>0.912500</td>
191
+
<td>0.499316</td>
192
+
<td>16924.47(baseline)</td>
194
193
</tr>
195
194
<tr>
196
195
<td>DeepRec w/ oneDNN</td>
197
196
<td>FP32</td>
198
-
<td></td>
199
-
<td></td>
200
-
<td></td>
197
+
<td>0.894900</td>
198
+
<td>0.499316</td>
199
+
<td>22143.04(1.30x)</td>
201
200
</tr>
202
201
<tr>
203
202
<td>DeepRec w/ oneDNN</td>
204
203
<td>FP32+BF16</td>
205
-
<td></td>
206
-
<td></td>
207
-
<td></td>
204
+
<td>0.909099</td>
205
+
<td>0.499316</td>
206
+
<td>28686.70(1.69x)</td>
208
207
</tr>
209
208
</table>
210
209
211
210
- Community TensorFlow version is v1.15.5.
211
+
- Due to the small size of the dataset, the results did not converge, leading to limited reference value for ACC and AUC.
Copy file name to clipboardExpand all lines: modelzoo/dbmtl/README.md
+22-21Lines changed: 22 additions & 21 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -121,7 +121,7 @@ Context │ │────►│ │
121
121
- `--data_location`: Full path of train & eval data. Default is `./data`.
122
122
- `--steps`: Set the number of steps on train dataset. When default(`0`) is used, the number of steps is computed based on dataset size and number of epochs equals 1000.
123
123
- `--no_eval`: Do not evaluate trained model by eval dataset.
124
-
- `--batch_size`: Batch size to train. Default is `512`.
124
+
- `--batch_size`: Batch size to train. Default is `2048`.
125
125
- `--output_dir`: Full path to output directory for logs and saved model. Default is `./result`.
126
126
- `--checkpoint`: Full path to checkpoints output directory. Default is `$(OUTPUT_DIR)/model_$(MODEL_NAME)_$(TIMESTAMP)`
127
127
- `--save_steps`: Set the number of steps on saving checkpoints, zero to close. Default will be set to `None`.
@@ -151,20 +151,20 @@ Context │ │────►│ │
151
151
### Stand-alone Training
152
152
153
153
#### Test Environment
154
-
The benchmark is performed on the [Alibaba Cloud ECS general purpose instance family with high clock speeds - **ecs.hfg7.2xlarge**](https://help.aliyun.com/document_detail/25378.html?spm=5176.2020520101.vmBInfo.instanceType.4a944df5PvCcED#hfg7).
155
-
- Hardware
156
-
- Model name: Intel(R) Xeon(R) Platinum 8369HC CPU @ 3.30GHz
157
-
- CPU(s): 8
154
+
The benchmark is performed on the [Alibaba Cloud ECS general purpose instance family with high clock speeds - **ecs.g8i.4xlarge**](https://help.aliyun.com/document_detail/25378.html#g8i).
155
+
- Hardware
156
+
- Model name: Intel(R) Xeon(R) Platinum 8475B
157
+
- CPU(s): 16
158
158
- Socket(s): 1
159
-
- Core(s) per socket: 4
159
+
- Core(s) per socket: 8
160
160
- Thread(s) per core: 2
161
-
- Memory: 32G
161
+
- Memory: 64G
162
162
163
163
- Software
164
-
- kernel: 4.18.0-305.12.1.el8_4.x86_64
165
-
- OS: CentOS Linux release 8.4.2105
166
-
- Docker: 20.10.12
167
-
- Python: 3.6.12
164
+
- kernel: Linux version 5.15.0-58-generic (buildd@lcy02-amd64-101)(AMX patched)
165
+
- OS: Ubuntu 22.04.2 LTS
166
+
- GCC: 11.3.0
167
+
- Docker: 20.10.21
168
168
169
169
#### Performance Result
170
170
@@ -175,33 +175,34 @@ The benchmark is performed on the [Alibaba Cloud ECS general purpose instance fa
175
175
<td>DType</td>
176
176
<td>Accuracy</td>
177
177
<td>AUC</td>
178
-
<td>Globalsetp/Sec</td>
178
+
<td>Throughput</td>
179
179
</tr>
180
180
<tr>
181
181
<td rowspan="3">DBMTL</td>
182
182
<td>Community TensorFlow</td>
183
183
<td>FP32</td>
184
-
<td></td>
185
-
<td></td>
186
-
<td></td>
184
+
<td>0.973150</td>
185
+
<td>0.753008</td>
186
+
<td>63220.87(baseline)</td>
187
187
</tr>
188
188
<tr>
189
189
<td>DeepRec w/ oneDNN</td>
190
190
<td>FP32</td>
191
-
<td></td>
192
-
<td></td>
193
-
<td></td>
191
+
<td>0.973150</td>
192
+
<td>0.753070</td>
193
+
<td>77383.57(1.22x)</td>
194
194
</tr>
195
195
<tr>
196
196
<td>DeepRec w/ oneDNN</td>
197
197
<td>FP32+BF16</td>
198
-
<td></td>
199
-
<td></td>
200
-
<td></td>
198
+
<td>0.973150</td>
199
+
<td>0.753070</td>
200
+
<td>137581.54(2.17x)</td>
201
201
</tr>
202
202
</table>
203
203
204
204
- Community TensorFlow version is v1.15.5.
205
+
- Due to the small size of the dataset, the results did not converge, leading to limited reference value for ACC and AUC.
Copy file name to clipboardExpand all lines: modelzoo/dcn/README.md
+19-20Lines changed: 19 additions & 20 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -98,7 +98,7 @@ The following is a brief directory structure and description for this example:
98
98
- `--data_location`: Full path of train & eval data, default to `./data`.
99
99
- `--steps`: Set the number of steps on train dataset. Default will be set to 1 epoch.
100
100
- `--no_eval`: Do not evaluate trained model by eval dataset.
101
-
- `--batch_size`: Batch size to train. Default to 512.
101
+
- `--batch_size`: Batch size to train. Default to 2048.
102
102
- `--output_dir`: Full path to output directory for logs and saved model, default to `./result`.
103
103
- `--checkpoint`: Full path to checkpoints input/output directory, default to `$(OUTPUT_DIR)/model_$(MODEL_NAME)_$(TIMESTAMPS)`
104
104
- `--save_steps`: Set the number of steps on saving checkpoints, zero to close. Default will be set to 0.
@@ -128,21 +128,20 @@ The following is a brief directory structure and description for this example:
128
128
## Benchmark
129
129
### Stand-alone Training
130
130
#### Test Environment
131
-
The benchmark is performed on the [Alibaba Cloud ECS general purpose instance family with high clock speeds - **ecs.hfg7.2xlarge**](https://help.aliyun.com/document_detail/25378.html?spm=5176.2020520101.vmBInfo.instanceType.4a944df5PvCcED#hfg7).
131
+
The benchmark is performed on the [Alibaba Cloud ECS general purpose instance family with high clock speeds - **ecs.g8i.4xlarge**](https://help.aliyun.com/document_detail/25378.html#g8i).
132
132
- Hardware
133
-
- Model name: Intel(R) Xeon(R) Platinum 8369HC CPU @ 3.30GHz
134
-
- CPU(s): 8
133
+
- Model name: Intel(R) Xeon(R) Platinum 8475B
134
+
- CPU(s): 16
135
135
- Socket(s): 1
136
-
- Core(s) per socket: 4
136
+
- Core(s) per socket: 8
137
137
- Thread(s) per core: 2
138
-
- Memory: 32G
138
+
- Memory: 64G
139
139
140
140
- Software
141
-
- kernel: 4.18.0-348.2.1.el8_5.x86_64
142
-
- OS: CentOS Linux release 8.5.2111
143
-
- GCC: 8.5.0
144
-
- Docker: 20.10.12
145
-
- Python: 3.6.8
141
+
- kernel: Linux version 5.15.0-58-generic (buildd@lcy02-amd64-101)(AMX patched)
142
+
- OS: Ubuntu 22.04.2 LTS
143
+
- GCC: 11.3.0
144
+
- Docker: 20.10.21
146
145
147
146
#### Performance Result
148
147
@@ -159,23 +158,23 @@ The benchmark is performed on the [Alibaba Cloud ECS general purpose instance fa
Copy file name to clipboardExpand all lines: modelzoo/deepfm/README.md
+19-20Lines changed: 19 additions & 20 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -123,7 +123,7 @@ input: | |
123
123
- `--data_location`: Full path of train & eval data, default to `./data`.
124
124
- `--steps`: Set the number of steps on train dataset. Default will be set to 1 epoch.
125
125
- `--no_eval`: Do not evaluate trained model by eval dataset.
126
-
- `--batch_size`: Batch size to train. Default to 512.
126
+
- `--batch_size`: Batch size to train. Default to 2048.
127
127
- `--output_dir`: Full path to output directory for logs and saved model, default to `./result`.
128
128
- `--checkpoint`: Full path to checkpoints input/output directory, default to `$(OUTPUT_DIR)/model_$(MODEL_NAME)_$(TIMESTAMPS)`
129
129
- `--save_steps`: Set the number of steps on saving checkpoints, zero to close. Default will be set to 0.
@@ -153,21 +153,20 @@ input: | |
153
153
## Benchmark
154
154
### Stand-alone Training
155
155
#### Test Environment
156
-
The benchmark is performed on the [Alibaba Cloud ECS general purpose instance family with high clock speeds - **ecs.hfg7.2xlarge**](https://help.aliyun.com/document_detail/25378.html?spm=5176.2020520101.vmBInfo.instanceType.4a944df5PvCcED#hfg7).
156
+
The benchmark is performed on the [Alibaba Cloud ECS general purpose instance family with high clock speeds - **ecs.g8i.4xlarge**](https://help.aliyun.com/document_detail/25378.html#g8i).
157
157
- Hardware
158
-
- Model name: Intel(R) Xeon(R) Platinum 8369HC CPU @ 3.30GHz
159
-
- CPU(s): 8
158
+
- Model name: Intel(R) Xeon(R) Platinum 8475B
159
+
- CPU(s): 16
160
160
- Socket(s): 1
161
-
- Core(s) per socket: 4
161
+
- Core(s) per socket: 8
162
162
- Thread(s) per core: 2
163
-
- Memory: 32G
163
+
- Memory: 64G
164
164
165
165
- Software
166
-
- kernel: 4.18.0-348.2.1.el8_5.x86_64
167
-
- OS: CentOS Linux release 8.5.2111
168
-
- GCC: 8.5.0
169
-
- Docker: 20.10.12
170
-
- Python: 3.6.8
166
+
- kernel: Linux version 5.15.0-58-generic (buildd@lcy02-amd64-101)(AMX patched)
167
+
- OS: Ubuntu 22.04.2 LTS
168
+
- GCC: 11.3.0
169
+
- Docker: 20.10.21
171
170
172
171
#### Performance Result
173
172
@@ -184,23 +183,23 @@ The benchmark is performed on the [Alibaba Cloud ECS general purpose instance fa
Copy file name to clipboardExpand all lines: modelzoo/dien/README.md
+19-20Lines changed: 19 additions & 20 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -108,7 +108,7 @@ The following is a brief directory structure and description for this example:
108
108
- `--data_location`: Full path of train & eval data, default to `./data`.
109
109
- `--steps`: Set the number of steps on train dataset. Default will be set to 1 epoch.
110
110
- `--no_eval`: Do not evaluate trained model by eval dataset.
111
-
- `--batch_size`: Batch size to train. Default to 512.
111
+
- `--batch_size`: Batch size to train. Default to 2048.
112
112
- `--output_dir`: Full path to output directory for logs and saved model, default to `./result`.
113
113
- `--checkpoint`: Full path to checkpoints input/output directory, default to `$(OUTPUT_DIR)/model_$(MODEL_NAME)_$(TIMESTAMPS)`
114
114
- `--save_steps`: Set the number of steps on saving checkpoints, zero to close. Default will be set to 0.
@@ -138,21 +138,20 @@ The following is a brief directory structure and description for this example:
138
138
## Benchmark
139
139
### Stand-alone Training
140
140
#### Test Environment
141
-
The benchmark is performed on the [Alibaba Cloud ECS general purpose instance family with high clock speeds - **ecs.hfg7.2xlarge**](https://help.aliyun.com/document_detail/25378.html?spm=5176.2020520101.vmBInfo.instanceType.4a944df5PvCcED#hfg7).
141
+
The benchmark is performed on the [Alibaba Cloud ECS general purpose instance family with high clock speeds - **ecs.g8i.4xlarge**](https://help.aliyun.com/document_detail/25378.html#g8i).
142
142
- Hardware
143
-
- Model name: Intel(R) Xeon(R) Platinum 8369HC CPU @ 3.30GHz
144
-
- CPU(s): 8
143
+
- Model name: Intel(R) Xeon(R) Platinum 8475B
144
+
- CPU(s): 16
145
145
- Socket(s): 1
146
-
- Core(s) per socket: 4
146
+
- Core(s) per socket: 8
147
147
- Thread(s) per core: 2
148
-
- Memory: 32G
148
+
- Memory: 64G
149
149
150
150
- Software
151
-
- kernel: 4.18.0-348.2.1.el8_5.x86_64
152
-
- OS: CentOS Linux release 8.5.2111
153
-
- GCC: 8.5.0
154
-
- Docker: 20.10.12
155
-
- Python: 3.6.8
151
+
- kernel: Linux version 5.15.0-58-generic (buildd@lcy02-amd64-101)(AMX patched)
152
+
- OS: Ubuntu 22.04.2 LTS
153
+
- GCC: 11.3.0
154
+
- Docker: 20.10.21
156
155
157
156
#### Performance Result
158
157
@@ -169,23 +168,23 @@ The benchmark is performed on the [Alibaba Cloud ECS general purpose instance fa
0 commit comments