DeepRec-AI
diff --git a/‎modelzoo/bst/README.md‎
Lines changed: 21 additions & 21 deletions b/‎modelzoo/bst/README.md‎
Lines changed: 21 additions & 21 deletions
diff --git a/‎modelzoo/dbmtl/README.md‎
Lines changed: 22 additions & 21 deletions b/‎modelzoo/dbmtl/README.md‎
Lines changed: 22 additions & 21 deletions
diff --git a/‎modelzoo/dcn/README.md‎
Lines changed: 19 additions & 20 deletions b/‎modelzoo/dcn/README.md‎
Lines changed: 19 additions & 20 deletions
diff --git a/‎modelzoo/deepfm/README.md‎
Lines changed: 19 additions & 20 deletions b/‎modelzoo/deepfm/README.md‎
Lines changed: 19 additions & 20 deletions
diff --git a/‎modelzoo/dien/README.md‎
Lines changed: 19 additions & 20 deletions b/‎modelzoo/dien/README.md‎
Lines changed: 19 additions & 20 deletions
@@ -127,7 +127,7 @@ input:
       - `--data_location`: Full path of train & eval data, default to `./data`.
       - `--steps`: Set the number of steps on train dataset. Default will be set to 100 epoch.
       - `--no_eval`: Do not evaluate trained model by eval dataset.
-      - `--batch_size`: Batch size to train. Default to 512.
+      - `--batch_size`: Batch size to train. Default to 2048.
       - `--output_dir`: Full path to output directory for logs and saved model, default to `./result`.
       - `--checkpoint`: Full path to checkpoints input/output directory, default to `$(OUTPUT_DIR)/model_$(MODEL_NAME)_$(TIMESTAMPS)`
       - `--save_steps`: Set the number of steps on saving checkpoints, zero to close. Default will be set to 0.
@@ -157,21 +157,20 @@ input:
 ## Benchmark
 ### Stand-alone Training
 #### Test Environment
-The benchmark is performed on the [Alibaba Cloud ECS general purpose instance family with high clock speeds - **ecs.hfg7.2xlarge**](https://help.aliyun.com/document_detail/25378.html?spm=5176.2020520101.vmBInfo.instanceType.4a944df5PvCcED#hfg7).
+The benchmark is performed on the [Alibaba Cloud ECS general purpose instance family with high clock speeds - **ecs.g8i.4xlarge**](https://help.aliyun.com/document_detail/25378.html#g8i).
 - Hardware 
-  - Model name:          Intel(R) Xeon(R) Platinum 8369HC CPU @ 3.30GHz
-  - CPU(s):              8
+  - Model name:          Intel(R) Xeon(R) Platinum 8475B
+  - CPU(s):              16
   - Socket(s):           1
-  - Core(s) per socket:  4
+  - Core(s) per socket:  8
   - Thread(s) per core:  2
-  - Memory:              32G
+  - Memory:              64G
 
 - Software
-  - kernel:                 4.18.0-348.2.1.el8_5.x86_64
-  - OS:                     CentOS Linux release 8.5.2111
-  - GCC:                    8.5.0
-  - Docker:                 20.10.12
-  - Python:                 3.6.8
+  - kernel:                 Linux version 5.15.0-58-generic (buildd@lcy02-amd64-101)(AMX patched)
+  - OS:                     Ubuntu 22.04.2 LTS
+  - GCC:                    11.3.0
+  - Docker:                 20.10.21
 
 #### Performance Result
 
@@ -182,33 +181,34 @@ The benchmark is performed on the [Alibaba Cloud ECS general purpose instance fa
         <td>DType</td>
         <td>Accuracy</td>
         <td>AUC</td>
-        <td>Globalsetp/Sec</td>
+        <td>Throughput</td>
     </tr>
     <tr>
         <td rowspan="3">BST</td>
         <td>Community TensorFlow</td>
         <td>FP32</td>
-        <td></td>
-        <td></td>
-        <td></td>
+        <td>0.912500</td>
+        <td>0.499316</td>
+        <td>16924.47(baseline)</td>
     </tr>
     <tr>
         <td>DeepRec w/ oneDNN</td>
         <td>FP32</td>
-        <td></td>
-        <td></td>
-        <td></td>
+        <td>0.894900</td>
+        <td>0.499316</td>
+        <td>22143.04(1.30x)</td>
     </tr>
     <tr>
         <td>DeepRec w/ oneDNN</td>
         <td>FP32+BF16</td>
-        <td></td>
-        <td></td>
-        <td></td>
+        <td>0.909099</td>
+        <td>0.499316</td>
+        <td>28686.70(1.69x)</td>
     </tr>
 </table>
 
 - Community TensorFlow version is v1.15.5.
+- Due to the small size of the dataset, the results did not converge, leading to limited reference value for ACC and AUC.
 
 ### Distributed Training
 #### Test Environment
 
@@ -121,7 +121,7 @@ Context     │      │────►│                  │
       - `--data_location`: Full path of train & eval data. Default is `./data`.
       - `--steps`: Set the number of steps on train dataset. When default(`0`) is used, the number of steps is computed based on dataset size and number of epochs equals 1000.
       - `--no_eval`: Do not evaluate trained model by eval dataset.
-      - `--batch_size`: Batch size to train. Default is `512`.
+      - `--batch_size`: Batch size to train. Default is `2048`.
       - `--output_dir`: Full path to output directory for logs and saved model. Default is `./result`.
       - `--checkpoint`: Full path to checkpoints output directory. Default is `$(OUTPUT_DIR)/model_$(MODEL_NAME)_$(TIMESTAMP)`
       - `--save_steps`: Set the number of steps on saving checkpoints, zero to close. Default will be set to `None`.
@@ -151,20 +151,20 @@ Context     │      │────►│                  │
 ### Stand-alone Training
 
 #### Test Environment
-The benchmark is performed on the [Alibaba Cloud ECS general purpose instance family with high clock speeds - **ecs.hfg7.2xlarge**](https://help.aliyun.com/document_detail/25378.html?spm=5176.2020520101.vmBInfo.instanceType.4a944df5PvCcED#hfg7).
-- Hardware
-  - Model name:          Intel(R) Xeon(R) Platinum 8369HC CPU @ 3.30GHz
-  - CPU(s):              8
+The benchmark is performed on the [Alibaba Cloud ECS general purpose instance family with high clock speeds - **ecs.g8i.4xlarge**](https://help.aliyun.com/document_detail/25378.html#g8i).
+- Hardware 
+  - Model name:          Intel(R) Xeon(R) Platinum 8475B
+  - CPU(s):              16
   - Socket(s):           1
-  - Core(s) per socket:  4
+  - Core(s) per socket:  8
   - Thread(s) per core:  2
-  - Memory:              32G
+  - Memory:              64G
 
 - Software
-  - kernel:                 4.18.0-305.12.1.el8_4.x86_64
-  - OS:                     CentOS Linux release 8.4.2105
-  - Docker:                 20.10.12
-  - Python:                 3.6.12
+  - kernel:                 Linux version 5.15.0-58-generic (buildd@lcy02-amd64-101)(AMX patched)
+  - OS:                     Ubuntu 22.04.2 LTS
+  - GCC:                    11.3.0
+  - Docker:                 20.10.21
 
 #### Performance Result
 
@@ -175,33 +175,34 @@ The benchmark is performed on the [Alibaba Cloud ECS general purpose instance fa
         <td>DType</td>
         <td>Accuracy</td>
         <td>AUC</td>
-        <td>Globalsetp/Sec</td>
+        <td>Throughput</td>
     </tr>
     <tr>
         <td rowspan="3">DBMTL</td>
         <td>Community TensorFlow</td>
         <td>FP32</td>
-        <td></td>
-        <td></td>
-        <td></td>
+        <td>0.973150</td>
+        <td>0.753008</td>
+        <td>63220.87(baseline)</td>
     </tr>
     <tr>
         <td>DeepRec w/ oneDNN</td>
         <td>FP32</td>
-        <td></td>
-        <td></td>
-        <td></td>
+        <td>0.973150</td>
+        <td>0.753070</td>
+        <td>77383.57(1.22x)</td>
     </tr>
     <tr>
         <td>DeepRec w/ oneDNN</td>
         <td>FP32+BF16</td>
-        <td></td>
-        <td></td>
-        <td></td>
+        <td>0.973150</td>
+        <td>0.753070</td>
+        <td>137581.54(2.17x)</td>
     </tr>
 </table>
 
 - Community TensorFlow version is v1.15.5.
+- Due to the small size of the dataset, the results did not converge, leading to limited reference value for ACC and AUC.
 
 ## Dataset
 Train & eval dataset using ***Taobao dataset***.
 
@@ -98,7 +98,7 @@ The following is a brief directory structure and description for this example:
       - `--data_location`: Full path of train & eval data, default to `./data`.
       - `--steps`: Set the number of steps on train dataset. Default will be set to 1 epoch.
       - `--no_eval`: Do not evaluate trained model by eval dataset.
-      - `--batch_size`: Batch size to train. Default to 512.
+      - `--batch_size`: Batch size to train. Default to 2048.
       - `--output_dir`: Full path to output directory for logs and saved model, default to `./result`.
       - `--checkpoint`: Full path to checkpoints input/output directory, default to `$(OUTPUT_DIR)/model_$(MODEL_NAME)_$(TIMESTAMPS)`
       - `--save_steps`: Set the number of steps on saving checkpoints, zero to close. Default will be set to 0.
@@ -128,21 +128,20 @@ The following is a brief directory structure and description for this example:
 ## Benchmark
 ### Stand-alone Training
 #### Test Environment
-The benchmark is performed on the [Alibaba Cloud ECS general purpose instance family with high clock speeds - **ecs.hfg7.2xlarge**](https://help.aliyun.com/document_detail/25378.html?spm=5176.2020520101.vmBInfo.instanceType.4a944df5PvCcED#hfg7).
+The benchmark is performed on the [Alibaba Cloud ECS general purpose instance family with high clock speeds - **ecs.g8i.4xlarge**](https://help.aliyun.com/document_detail/25378.html#g8i).
 - Hardware 
-  - Model name:          Intel(R) Xeon(R) Platinum 8369HC CPU @ 3.30GHz
-  - CPU(s):              8
+  - Model name:          Intel(R) Xeon(R) Platinum 8475B
+  - CPU(s):              16
   - Socket(s):           1
-  - Core(s) per socket:  4
+  - Core(s) per socket:  8
   - Thread(s) per core:  2
-  - Memory:              32G
+  - Memory:              64G
 
 - Software
-  - kernel:                 4.18.0-348.2.1.el8_5.x86_64
-  - OS:                     CentOS Linux release 8.5.2111
-  - GCC:                    8.5.0
-  - Docker:                 20.10.12
-  - Python:                 3.6.8
+  - kernel:                 Linux version 5.15.0-58-generic (buildd@lcy02-amd64-101)(AMX patched)
+  - OS:                     Ubuntu 22.04.2 LTS
+  - GCC:                    11.3.0
+  - Docker:                 20.10.21
 
 #### Performance Result
 
@@ -159,23 +158,23 @@ The benchmark is performed on the [Alibaba Cloud ECS general purpose instance fa
         <td rowspan="3">DCN</td>
         <td>Community TensorFlow</td>
         <td>FP32</td>
-        <td>0.775859</td>
-        <td>0.768275</td>
-        <td></td>
+        <td>0.776260</td>
+        <td>0.769636</td>
+        <td>24524.91(baseline)</td>
     </tr>
     <tr>
         <td>DeepRec w/ oneDNN</td>
         <td>FP32</td>
-        <td></td>
-        <td></td>
-        <td></td>
+        <td>0.775738</td>
+        <td>0.769095</td>
+        <td>31917.35(1.30x)</td>
     </tr>
     <tr>
         <td>DeepRec w/ oneDNN</td>
         <td>FP32+BF16</td>
-        <td></td>
-        <td></td>
-        <td></td>
+        <td>0.775738</td>
+        <td>0.768651</td>
+        <td>55753.15(2.27x)</td>
     </tr>
 </table>
 
 
@@ -123,7 +123,7 @@ input:                                  |               |
       - `--data_location`: Full path of train & eval data, default to `./data`.
       - `--steps`: Set the number of steps on train dataset. Default will be set to 1 epoch.
       - `--no_eval`: Do not evaluate trained model by eval dataset.
-      - `--batch_size`: Batch size to train. Default to 512.
+      - `--batch_size`: Batch size to train. Default to 2048.
       - `--output_dir`: Full path to output directory for logs and saved model, default to `./result`.
       - `--checkpoint`: Full path to checkpoints input/output directory, default to `$(OUTPUT_DIR)/model_$(MODEL_NAME)_$(TIMESTAMPS)`
       - `--save_steps`: Set the number of steps on saving checkpoints, zero to close. Default will be set to 0.
@@ -153,21 +153,20 @@ input:                                  |               |
 ## Benchmark
 ### Stand-alone Training
 #### Test Environment
-The benchmark is performed on the [Alibaba Cloud ECS general purpose instance family with high clock speeds - **ecs.hfg7.2xlarge**](https://help.aliyun.com/document_detail/25378.html?spm=5176.2020520101.vmBInfo.instanceType.4a944df5PvCcED#hfg7).
+The benchmark is performed on the [Alibaba Cloud ECS general purpose instance family with high clock speeds - **ecs.g8i.4xlarge**](https://help.aliyun.com/document_detail/25378.html#g8i).
 - Hardware 
-  - Model name:          Intel(R) Xeon(R) Platinum 8369HC CPU @ 3.30GHz
-  - CPU(s):              8
+  - Model name:          Intel(R) Xeon(R) Platinum 8475B
+  - CPU(s):              16
   - Socket(s):           1
-  - Core(s) per socket:  4
+  - Core(s) per socket:  8
   - Thread(s) per core:  2
-  - Memory:              32G
+  - Memory:              64G
 
 - Software
-  - kernel:                 4.18.0-348.2.1.el8_5.x86_64
-  - OS:                     CentOS Linux release 8.5.2111
-  - GCC:                    8.5.0
-  - Docker:                 20.10.12
-  - Python:                 3.6.8
+  - kernel:                 Linux version 5.15.0-58-generic (buildd@lcy02-amd64-101)(AMX patched)
+  - OS:                     Ubuntu 22.04.2 LTS
+  - GCC:                    11.3.0
+  - Docker:                 20.10.21
 
 #### Performance Result
 
@@ -184,23 +183,23 @@ The benchmark is performed on the [Alibaba Cloud ECS general purpose instance fa
         <td rowspan="3">DeepFM</td>
         <td>Community TensorFlow</td>
         <td>FP32</td>
-        <td>0.784695</td>
-        <td>0.781548</td>
-        <td>18848.64(baseline)</td>
+        <td>0.782777</td>
+        <td>0.776113</td>
+        <td>61230.80(baseline)</td>
     </tr>
     <tr>
         <td>DeepRec w/ oneDNN</td>
         <td>FP32</td>
-        <td>0.782755</td>
-        <td>0.777158</td>
-        <td>31260.00(1.65x)</td>
+        <td>0.780460</td>
+        <td>0.773281</td>
+        <td>74380.35(1.22x)</td>
     </tr>
     <tr>
         <td>DeepRec w/ oneDNN</td>
         <td>FP32+BF16</td>
-        <td>0.782659</td>
-        <td>0.776537</td>
-        <td>34627.46(1.84x)</td>
+        <td>0.780460</td>
+        <td>0.775249</td>
+        <td>95107.32(1.55x)</td>
     </tr>
 </table>
 
 
@@ -108,7 +108,7 @@ The following is a brief directory structure and description for this example:
       - `--data_location`: Full path of train & eval data, default to `./data`.
       - `--steps`: Set the number of steps on train dataset. Default will be set to 1 epoch.
       - `--no_eval`: Do not evaluate trained model by eval dataset.
-      - `--batch_size`: Batch size to train. Default to 512.
+      - `--batch_size`: Batch size to train. Default to 2048.
       - `--output_dir`: Full path to output directory for logs and saved model, default to `./result`.
       - `--checkpoint`: Full path to checkpoints input/output directory, default to `$(OUTPUT_DIR)/model_$(MODEL_NAME)_$(TIMESTAMPS)`
       - `--save_steps`: Set the number of steps on saving checkpoints, zero to close. Default will be set to 0.
@@ -138,21 +138,20 @@ The following is a brief directory structure and description for this example:
 ## Benchmark
 ### Stand-alone Training
 #### Test Environment
-The benchmark is performed on the [Alibaba Cloud ECS general purpose instance family with high clock speeds - **ecs.hfg7.2xlarge**](https://help.aliyun.com/document_detail/25378.html?spm=5176.2020520101.vmBInfo.instanceType.4a944df5PvCcED#hfg7).
+The benchmark is performed on the [Alibaba Cloud ECS general purpose instance family with high clock speeds - **ecs.g8i.4xlarge**](https://help.aliyun.com/document_detail/25378.html#g8i).
 - Hardware 
-  - Model name:          Intel(R) Xeon(R) Platinum 8369HC CPU @ 3.30GHz
-  - CPU(s):              8
+  - Model name:          Intel(R) Xeon(R) Platinum 8475B
+  - CPU(s):              16
   - Socket(s):           1
-  - Core(s) per socket:  4
+  - Core(s) per socket:  8
   - Thread(s) per core:  2
-  - Memory:              32G
+  - Memory:              64G
 
 - Software
-  - kernel:                 4.18.0-348.2.1.el8_5.x86_64
-  - OS:                     CentOS Linux release 8.5.2111
-  - GCC:                    8.5.0
-  - Docker:                 20.10.12
-  - Python:                 3.6.8
+  - kernel:                 Linux version 5.15.0-58-generic (buildd@lcy02-amd64-101)(AMX patched)
+  - OS:                     Ubuntu 22.04.2 LTS
+  - GCC:                    11.3.0
+  - Docker:                 20.10.21
 
 #### Performance Result
 
@@ -169,23 +168,23 @@ The benchmark is performed on the [Alibaba Cloud ECS general purpose instance fa
         <td rowspan="3">DIEN</td>
         <td>Community TensorFlow</td>
         <td>FP32</td>
-        <td>0.681824</td>
-        <td>0.757496</td>
-        <td>2822.78(baseline)</td>
+        <td>0.575529</td>
+        <td>0.597272</td>
+        <td>6327.50(baseline)</td>
     </tr>
     <tr>
         <td>DeepRec w/ oneDNN</td>
         <td>FP32</td>
-        <td>0.692499</td>
-        <td>0.767193</td>
-        <td>3834.05(1.36x)</td>
+        <td>0.543935</td>
+        <td>0.5972728</td>
+        <td>10094.21(1.60x)</td>
     </tr>
     <tr>
         <td>DeepRec w/ oneDNN</td>
         <td>FP32+BF16</td>
-        <td>0.693011</td>
-        <td>0.768412</td>
-        <td>3862.06(1.37x)</td>
+        <td>0.551233</td>
+        <td>0.597272</td>
+        <td>11565.63(1.83x)</td>
     </tr>
 </table>