You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: TensorFlow/Recommendation/WideAndDeep/README.md
+23-27Lines changed: 23 additions & 27 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -52,7 +52,7 @@ The differences between this Wide & Deep Recommender Model and the model from th
52
52
53
53
The model enables you to train a recommender model that combines the memorization of the Wide part and generalization of the Deep part of the network.
54
54
55
-
This model is trained with mixed precision using Tensor Cores on NVIDIA Volta, Turing and the NVIDIA Ampere GPU architectures. Therefore, researchers can get results 1.43 times faster than training without Tensor Cores, while experiencing the benefits of mixed precision training. This model is tested against each NGC monthly container release to ensure consistent accuracy and performance over time.
55
+
This model is trained with mixed precision using Tensor Cores on NVIDIA Volta, Turing and the NVIDIA Ampere GPU architectures. Therefore, researchers can get results 1.49 times faster than training without Tensor Cores, while experiencing the benefits of mixed precision training. This model is tested against each NGC monthly container release to ensure consistent accuracy and performance over time.
56
56
57
57
### Model architecture
58
58
@@ -168,7 +168,7 @@ The following section lists the requirements that you need to meet in order to s
168
168
169
169
This repository contains Dockerfile which extends the TensorFlow NGC container and encapsulates some dependencies. Aside from these dependencies, ensure you have the following components:
@@ -283,9 +283,8 @@ These are the important parameters in the `trainer/task.py` script:
283
283
--linear_l1_regularization: L1 regularization for the wide part of the model
284
284
--linear_l2_regularization: L2 regularization for the wide part of the model
285
285
--deep_learning_rate: Learning rate for the deep part of the model
286
-
--deep_l1_regularization: L1 regularization for the deep part of the model
287
-
--deep_l2_regularization: L2 regularization for the deep part of the model
288
286
--deep_dropout: Dropout probability for deep model
287
+
--deep_warmup_epochs: Number of epochs with linear learning rate warmup
289
288
--predict: Perform only the prediction on the validation set, do not train
290
289
--evaluate: Perform only the evaluation on the validation set, do not train
291
290
--gpu: Run computations on GPU
@@ -321,7 +320,7 @@ The original data is stored in several separate files:
321
320
-`promoted_content.csv` - metadata about the ads
322
321
-`document_meta.csv`, `document_topics.csv`, `document_entities.csv`, `document_categories.csv` - metadata about the documents
323
322
324
-
During the preprocessing stage the data is transformed into 55M rows tabular data of 54 features and eventually saved in pre-batched TFRecord format.
323
+
During the preprocessing stage the data is transformed into 59M rows tabular data of 54 features and eventually saved in pre-batched TFRecord format.
325
324
326
325
327
326
#### Spark preprocessing
@@ -357,7 +356,7 @@ For more information about Spark, please refer to the
357
356
### Training process
358
357
359
358
The training can be started by running the `trainer/task.py` script. By default the script is in train mode. Other training related
360
-
configs are also present in the `trainer/task.py` and can be seen using the command `python -m trainer.task --help`. Training happens for `--num_epochs` epochs with a custom estimator for the model. The model has a wide linear part and a deep feed forward network, and the networks are built according to the default configuration.
359
+
configs are also present in the `trainer/task.py` and can be seen using the command `python -m trainer.task --help`. Training happens for `--num_epochs` epochs with a DNNLinearCombinedClassifier estimator for the model. The model has a wide linear part and a deep feed forward network, and the networks are built according to the default configuration.
361
360
362
361
Two separate optimizers are used to optimize the wide and the deep part of the network:
363
362
@@ -401,23 +400,23 @@ accuracy in training.
401
400
402
401
##### Training accuracy: NVIDIA DGX A100 (8x A100 40GB)
403
402
404
-
Our results were obtained by running the benchmark scripts from the `scripts` directory in the TensorFlow NGC container on NVIDIA DGX A100 with (8x A100 40GB) GPUs.
403
+
Our results were obtained by running the `trainer/task.py` training script in the TensorFlow NGC container on NVIDIA DGX A100 with (8x A100 40GB) GPUs.
405
404
406
-
|**GPUs**|**Batch size / GPU**|**Accuracy - TF32 (MAP@12)**|**Accuracy - mixed precision (MAP@12)**|**Time to train - TF32 (minutes)**|**Time to train - mixed precision (minutes)**|**Time to train speedup (FP32 to mixed precision)**|
405
+
|**GPUs**|**Batch size / GPU**|**Accuracy - TF32 (MAP@12)**|**Accuracy - mixed precision (MAP@12)**|**Time to train - TF32 (minutes)**|**Time to train - mixed precision (minutes)**|**Time to train speedup (TF32 to mixed precision)**|
To achieve the same results, follow the steps in the [Quick Start Guide](#quick-start-guide).
412
411
413
412
##### Training accuracy: NVIDIA DGX-1 (8x V100 16GB)
414
413
415
-
Our results were obtained by running the benchmark scripts from the `scripts` directory in the TensorFlow NGC container on NVIDIA DGX-1 with (8x V100 16GB) GPUs.
414
+
Our results were obtained by running the `trainer/task.py` training script in the TensorFlow NGC container on NVIDIA DGX-1 with (8x V100 16GB) GPUs.
416
415
417
416
|**GPUs**|**Batch size / GPU**|**Accuracy - FP32 (MAP@12)**|**Accuracy - mixed precision (MAP@12)**|**Time to train - FP32 (minutes)**|**Time to train - mixed precision (minutes)**|**Time to train speedup (FP32 to mixed precision)**|
To achieve the same results, follow the steps in the [Quick Start Guide](#quick-start-guide).
423
422
@@ -430,7 +429,7 @@ Models trained with FP32, TF32 and Automatic Mixed Precision (AMP) achieve simil
430
429
##### Training stability test
431
430
432
431
The Wide and Deep model was trained for 54,713 training steps, starting
433
-
from 6 different initial random seeds for each setup. The training was performed in the 20.06-tf1-py3 NGC container on
432
+
from 6 different initial random seeds for each setup. The training was performed in the 20.10-tf1-py3 NGC container on
434
433
NVIDIA DGX A100 40GB and DGX-1 16GB machines with and without mixed precision enabled.
435
434
After training, the models were evaluated on the validation set. The following
436
435
table summarizes the final MAP@12 score on the validation set.
@@ -448,32 +447,29 @@ table summarizes the final MAP@12 score on the validation set.
448
447
449
448
##### Training performance: NVIDIA DGX A100 (8x A100 40GB)
450
449
451
-
Our results were obtained by running the `trainer/task.py` training script in the TensorFlow NGC container on NVIDIA DGX A100 with (8x A100 40GB) GPUs. Performance numbers (in samples per second) were averaged over 50 training iterations. Improving model scaling for multi-GPU is [under development](#known-issues).
450
+
Our results were obtained by running the benchmark scripts from the `scripts` directory in the TensorFlow NGC container on NVIDIA DGX A100 with (8x A100 40GB) GPUs. Improving model scaling for multi-GPU is [under development](#known-issues).
452
451
453
-
To achieve these same results, follow the steps in the [Quick Start Guide](#quick-start-guide).
##### Training performance: NVIDIA DGX-1 (8x V100 16GB)
462
458
463
-
Our results were obtained by running the `trainer/task.py` training script in the TensorFlow NGC container on NVIDIA DGX-1 with (8x V100 16GB) GPUs. Performance numbers (in samples per second) were averaged over 50 training iterations. Improving model scaling for multi-GPU is planned, see [known issues](#known-issues).
464
-
465
-
To achieve these same results, follow the steps in the [Quick Start Guide](#quick-start-guide).
459
+
Our results were obtained by running the benchmark scripts from the `scripts` directory in the TensorFlow NGC container on NVIDIA DGX-1 with (8x V100 16GB) GPUs. Improving model scaling for multi-GPU is [under development](#known-issues).
0 commit comments