Skip to content

Commit 3e59112

Browse files
committed
Update DTensor docs, lint notebooks
1 parent 3e8b654 commit 3e59112

File tree

2 files changed

+46
-43
lines changed

2 files changed

+46
-43
lines changed

site/en/tutorials/distribute/dtensor_keras_tutorial.ipynb

Lines changed: 28 additions & 25 deletions
Original file line numberDiff line numberDiff line change
@@ -69,15 +69,16 @@
6969
},
7070
"source": [
7171
"## Overview\n",
72-
"In this tutoral, you will learn how to use DTensor with Keras.\n",
72+
"\n",
73+
"In this tutorial, you will learn how to use DTensors with Keras.\n",
7374
"\n",
7475
"Through DTensor integration with Keras, you can reuse your existing Keras layers and models to build and train distributed machine learning models.\n",
7576
"\n",
7677
"You will train a multi-layer classification model with the MNIST data. Setting the layout for subclassing model, Sequential model, and functional model will be demonstrated.\n",
7778
"\n",
78-
"This tutoral assumes that you have already read the [DTensor programing guide](/guide/dtensor_overview), and are familiar with basic DTensor concepts like `Mesh` and `Layout`.\n",
79+
"This tutorial assumes that you have already read the [DTensor programing guide](/guide/dtensor_overview), and are familiar with basic DTensor concepts like `Mesh` and `Layout`.\n",
7980
"\n",
80-
"This tutoral is based on https://www.tensorflow.org/datasets/keras_example."
81+
"This tutorial is based on [Training a neural network on MNIST with Keras](https://www.tensorflow.org/datasets/keras_example)."
8182
]
8283
},
8384
{
@@ -88,7 +89,9 @@
8889
"source": [
8990
"## Setup\n",
9091
"\n",
91-
"DTensor is part of TensorFlow 2.9.0 release."
92+
"DTensor (`tf.experimental.dtensor`) has been part of TensorFlow since the 2.9.0 release.\n",
93+
"\n",
94+
"First, install or upgrade TensorFlow and TensorFlow Datasets:"
9295
]
9396
},
9497
{
@@ -99,7 +102,7 @@
99102
},
100103
"outputs": [],
101104
"source": [
102-
"!pip install --quiet --upgrade --pre tensorflow tensorflow-datasets"
105+
"!pip install --quiet --upgrade tensorflow tensorflow-datasets"
103106
]
104107
},
105108
{
@@ -108,9 +111,9 @@
108111
"id": "VttBMZngDx8x"
109112
},
110113
"source": [
111-
"Next, import `tensorflow` and `tensorflow.experimental.dtensor`, and configure TensorFlow to use 8 virtual CPUs.\n",
114+
"Next, import `tensorflow` and `dtensor`, and configure TensorFlow to use 8 virtual CPUs.\n",
112115
"\n",
113-
"Even though this example uses CPUs, DTensor works the same way on CPU, GPU or TPU devices."
116+
"Even though this example uses virtual CPUs, DTensor works the same way on CPU, GPU or TPU devices."
114117
]
115118
},
116119
{
@@ -176,11 +179,11 @@
176179
"source": [
177180
"## Creating a Data Parallel Mesh\n",
178181
"\n",
179-
"This tutorial demonstrates Data Parallel training. Adapting to Model Parallel training and Spatial Parallel training can be as simple as switching to a different set of `Layout` objects. Refer to [DTensor in-depth ML Tutorial](https://www.tensorflow.org/tutorials/distribute/dtensor_ml_tutorial) for more information on distributed training beyond Data Parallel.\n",
182+
"This tutorial demonstrates Data Parallel training. Adapting to Model Parallel training and Spatial Parallel training can be as simple as switching to a different set of `Layout` objects. Refer to the [Distributed training with DTensors](dtensor_ml_tutorial.ipynb) tutorial for more information on distributed training beyond Data Parallel.\n",
180183
"\n",
181-
"Data Parallel training is a commonly used parallel training scheme, also used by for example `tf.distribute.MirroredStrategy`.\n",
184+
"Data Parallel training is a commonly used parallel training scheme, also used by, for example, `tf.distribute.MirroredStrategy`.\n",
182185
"\n",
183-
"With DTensor, a Data Parallel training loop uses a `Mesh` that consists of a single 'batch' dimension, where each device runs a replica of the model that receives a shard from the global batch.\n"
186+
"With DTensor, a Data Parallel training loop uses a `Mesh` that consists of a single 'batch' dimension, where each device runs a replica of the model that receives a shard from the global batch."
184187
]
185188
},
186189
{
@@ -248,7 +251,7 @@
248251
"\n",
249252
"In order to configure the layout information for your layers' weights, Keras has exposed an extra parameter in the layer constructor for most of the built-in layers.\n",
250253
"\n",
251-
"The following example builds a small image classification model with fully replicated weight layout. You can specify layout information `kernel` and `bias` in `tf.keras.layers.Dense` via argument `kernel_layout` and `bias_layout`. Most of the built-in keras layers are ready for explicitly specifying the `Layout` for the layer weights."
254+
"The following example builds a small image classification model with fully replicated weight layout. You can specify layout information `kernel` and `bias` in `tf.keras.layers.Dense` via arguments `kernel_layout` and `bias_layout`. Most of the built-in keras layers are ready for explicitly specifying the `Layout` for the layer weights."
252255
]
253256
},
254257
{
@@ -315,7 +318,7 @@
315318
"source": [
316319
"## Load a dataset and build input pipeline\n",
317320
"\n",
318-
"Load a MNIST dataset and configure some pre-processing input pipeline for it. The dataset itself is not associated with any DTensor layout information. There are plans to improve DTensor Keras integration with `tf.data` in future TensorFlow releases.\n"
321+
"Load a MNIST dataset and configure some pre-processing input pipeline for it. The dataset itself is not associated with any DTensor layout information."
319322
]
320323
},
321324
{
@@ -389,9 +392,9 @@
389392
"source": [
390393
"## Define the training logic for the model\n",
391394
"\n",
392-
"Next define the training and evalution logic for the model. \n",
395+
"Next, define the training and evaluation logic for the model. \n",
393396
"\n",
394-
"As of TensorFlow 2.9, you have to write a custom-training-loop for a DTensor enabled Keras model. This is to pack the input data with proper layout information, which is not integrated with the standard `tf.keras.Model.fit()` or `tf.keras.Model.eval()` functions from Keras. you will get more `tf.data` support in the upcoming release. "
397+
"As of TensorFlow 2.9, you have to write a custom-training-loop for a DTensor-enabled Keras model. This is to pack the input data with proper layout information, which is not integrated with the standard `tf.keras.Model.fit()` or `tf.keras.Model.eval()` functions from Keras. you will get more `tf.data` support in the upcoming release. "
395398
]
396399
},
397400
{
@@ -467,7 +470,7 @@
467470
"id": "9Eb-qIJGrxB9"
468471
},
469472
"source": [
470-
"## Metrics and Optimizers\n",
473+
"## Metrics and optimizers\n",
471474
"\n",
472475
"When using DTensor API with Keras `Metric` and `Optimizer`, you will need to provide the extra mesh information, so that any internal state variables and tensors can work with variables in the model.\n",
473476
"\n",
@@ -497,9 +500,9 @@
497500
"source": [
498501
"## Train the model\n",
499502
"\n",
500-
"The following example shards the data from input pipeline on the batch dimension, and train with the model, which has fully replicated weights. \n",
503+
"The following example demonstrates how to shard the data from input pipeline on the batch dimension, and train with the model, which has fully replicated weights. \n",
501504
"\n",
502-
"With 3 epochs, the model should achieve about 97% of accuracy."
505+
"After 3 epochs, the model should achieve about 97% of accuracy:"
503506
]
504507
},
505508
{
@@ -561,13 +564,13 @@
561564
"\n",
562565
"Often you have models that work well for your use case. Specifying `Layout` information to each individual layer within the model will be a large amount of work requiring a lot of edits.\n",
563566
"\n",
564-
"To help you easily convert your existing Keras model to work with DTensor API you can use the new `dtensor.LayoutMap` API that allow you to specify the `Layout` from a global point of view.\n",
567+
"To help you easily convert your existing Keras model to work with DTensor API you can use the new `tf.keras.dtensor.experimental.LayoutMap` API that allow you to specify the `Layout` from a global point of view.\n",
565568
"\n",
566569
"First, you need to create a `LayoutMap` instance, which is a dictionary-like object that contains all the `Layout` you would like to specify for your model weights.\n",
567570
"\n",
568571
"`LayoutMap` needs a `Mesh` instance at init, which can be used to provide default replicated `Layout` for any weights that doesn't have Layout configured. In case you would like all your model weights to be just fully replicated, you can provide empty `LayoutMap`, and the default mesh will be used to create replicated `Layout`.\n",
569572
"\n",
570-
"`LayoutMap` uses a string as key and a `Layout` as value. There is a behavior difference between a normal Python dict and this class. The string key will be treated as a regex when retrieving the value"
573+
"`LayoutMap` uses a string as key and a `Layout` as value. There is a behavior difference between a normal Python dict and this class. The string key will be treated as a regex when retrieving the value."
571574
]
572575
},
573576
{
@@ -616,9 +619,9 @@
616619
"* `model.feature_2.kernel`\n",
617620
"* `model.feature_2.bias`\n",
618621
"\n",
619-
"Note: For Subclassed Models, the attribute name, rather than the `.name` attribute of layer are used as the key to retrieve the Layout from the mapping. This is consistent with the convention followed by `tf.Module` checkpointing. For complex models with more than a few layers, you can [manually inspect checkpoints](https://www.tensorflow.org/guide/checkpoint#manually_inspecting_checkpoints) to see the attribute mappings. \n",
622+
"Note: For subclassed Models, the attribute name, rather than the `.name` attribute of the layer, is used as the key to retrieve the Layout from the mapping. This is consistent with the convention followed by `tf.Module` checkpointing. For complex models with more than a few layers, you can [manually inspect checkpoints](https://www.tensorflow.org/guide/checkpoint#manually_inspecting_checkpoints) to view the attribute mappings. \n",
620623
"\n",
621-
"Now define the following `LayoutMap` and apply it to the model."
624+
"Now define the following `LayoutMap` and apply it to the model:"
622625
]
623626
},
624627
{
@@ -644,7 +647,7 @@
644647
"id": "M32HcSp_PyWs"
645648
},
646649
"source": [
647-
"The model weights are created on the first call, so call the model with a DTensor input and confirm the weights have the expected layouts."
650+
"The model weights are created on the first call, so call the model with a DTensor input and confirm the weights have the expected layouts:"
648651
]
649652
},
650653
{
@@ -686,9 +689,9 @@
686689
"id": "6zzvTqAR2Teu"
687690
},
688691
"source": [
689-
"For keras functional and sequential models, you can use `LayoutMap` as well.\n",
692+
"For Keras Functional and Sequential models, you can use `tf.keras.dtensor.experimental.LayoutMap` as well.\n",
690693
"\n",
691-
"Note: For functional and sequential models, the mappings are slightly different. The layers in the model don't have a public attribute attached to the model (though you can access them via `model.layers` as a list). Use the string name as the key in this case. The string name is guaranteed to be unique within a model."
694+
"Note: For Functional and Sequential models, the mappings are slightly different. The layers in the model don't have a public attribute attached to the model (though you can access them via `Model.layers` as a list). Use the string name as the key in this case. The string name is guaranteed to be unique within a model."
692695
]
693696
},
694697
{
@@ -745,7 +748,7 @@
745748
"metadata": {
746749
"colab": {
747750
"name": "dtensor_keras_tutorial.ipynb",
748-
"toc_visible": true
751+
"toc_visible": true
749752
},
750753
"kernelspec": {
751754
"display_name": "Python 3",

site/en/tutorials/distribute/dtensor_ml_tutorial.ipynb

Lines changed: 18 additions & 18 deletions
Original file line numberDiff line numberDiff line change
@@ -37,7 +37,7 @@
3737
"id": "MfBg1C5NB3X0"
3838
},
3939
"source": [
40-
"# Distributed Training with DTensors\n"
40+
"# Distributed training with DTensors"
4141
]
4242
},
4343
{
@@ -70,25 +70,22 @@
7070
"source": [
7171
"## Overview\n",
7272
"\n",
73-
"DTensor provides a way for you to distribute the training of your model across devices to improve efficiency, reliability and scalability. For more details on DTensor concepts, see [The DTensor Programming Guide](https://www.tensorflow.org/guide/dtensor_overview).\n",
73+
"DTensor provides a way for you to distribute the training of your model across devices to improve efficiency, reliability and scalability. For more details, check out the [DTensor concepts](../guide/dtensor_overview.ipynb) guide.\n",
7474
"\n",
75-
"In this tutorial, you will train a Sentiment Analysis model with DTensor. Three distributed training schemes are demonstrated with this example:\n",
75+
"In this tutorial, you will train a sentiment analysis model using DTensors. The example demonstrates three distributed training schemes:\n",
7676
"\n",
7777
" - Data Parallel training, where the training samples are sharded (partitioned) to devices.\n",
7878
" - Model Parallel training, where the model variables are sharded to devices.\n",
79-
" - Spatial Parallel training, where the features of input data are sharded to devices. (Also known as [Spatial Partitioning](https://cloud.google.com/blog/products/ai-machine-learning/train-ml-models-on-large-images-and-3d-volumes-with-spatial-partitioning-on-cloud-tpus))\n",
79+
" - Spatial Parallel training, where the features of input data are sharded to devices (also known as [Spatial Partitioning](https://cloud.google.com/blog/products/ai-machine-learning/train-ml-models-on-large-images-and-3d-volumes-with-spatial-partitioning-on-cloud-tpus)).\n",
8080
"\n",
81-
"The training portion of this tutorial is inspired [A Kaggle guide on Sentiment Analysis](https://www.kaggle.com/code/anasofiauzsoy/yelp-review-sentiment-analysis-tensorflow-tfds/notebook) notebook. To learn about the complete training and evaluation workflow (without DTensor), refer to that notebook.\n",
81+
"The training portion of this tutorial is inspired by a Kaggle notebook called [A Kaggle guide on sentiment analysis](https://www.kaggle.com/code/anasofiauzsoy/yelp-review-sentiment-analysis-tensorflow-tfds/notebook). To learn about the complete training and evaluation workflow (without DTensor), refer to that notebook.\n",
8282
"\n",
8383
"This tutorial will walk through the following steps:\n",
8484
"\n",
85-
"- First start with some data cleaning to obtain a `tf.data.Dataset` of tokenized sentences and their polarity.\n",
86-
"\n",
87-
"- Next build an MLP model with custom Dense and BatchNorm layers. Use a `tf.Module` to track the inference variables. The model constructor takes additional `Layout` arguments to control the sharding of variables.\n",
88-
"\n",
89-
"- For training, you will first use data parallel training together with `tf.experimental.dtensor`'s checkpoint feature. Then continue with Model Parallel Training and Spatial Parallel Training.\n",
90-
"\n",
91-
"- The final section briefly describes the interaction between `tf.saved_model` and `tf.experimental.dtensor` as of TensorFlow 2.9.\n"
85+
"- Some data cleaning to obtain a `tf.data.Dataset` of tokenized sentences and their polarity.\n",
86+
"- Then, building an MLP model with custom Dense and BatchNorm layers using a `tf.Module` to track the inference variables. The model constructor will take additional `Layout` arguments to control the sharding of variables.\n",
87+
"- For training, you will first use data parallel training together with `tf.experimental.dtensor`'s checkpoint feature. Then, you will continue with Model Parallel Training and Spatial Parallel Training.\n",
88+
"- The final section briefly describes the interaction between `tf.saved_model` and `tf.experimental.dtensor` as of TensorFlow 2.9."
9289
]
9390
},
9491
{
@@ -99,7 +96,9 @@
9996
"source": [
10097
"## Setup\n",
10198
"\n",
102-
"DTensor is part of TensorFlow 2.9.0 release."
99+
"DTensor (`tf.experimental.dtensor`) has been part of TensorFlow since the 2.9.0 release.\n",
100+
"\n",
101+
"First, install or upgrade TensorFlow and TensorFlow Datasets:"
103102
]
104103
},
105104
{
@@ -110,7 +109,7 @@
110109
},
111110
"outputs": [],
112111
"source": [
113-
"!pip install --quiet --upgrade --pre tensorflow tensorflow-datasets"
112+
"!pip install --quiet --upgrade tensorflow tensorflow-datasets"
114113
]
115114
},
116115
{
@@ -119,9 +118,9 @@
119118
"id": "tcxP4_Zu7ciQ"
120119
},
121120
"source": [
122-
"Next, import `tensorflow` and `tensorflow.experimental.dtensor`. Then configure TensorFlow to use 8 virtual CPUs.\n",
121+
"Next, import `tensorflow` and `dtensor`, and configure TensorFlow to use 8 virtual CPUs.\n",
123122
"\n",
124-
"Even though this example uses CPUs, DTensor works the same way on CPU, GPU or TPU devices."
123+
"Even though this example uses virtual CPUs, DTensor works the same way on CPU, GPU or TPU devices."
125124
]
126125
},
127126
{
@@ -139,6 +138,7 @@
139138
"import tensorflow as tf\n",
140139
"\n",
141140
"from tensorflow.experimental import dtensor\n",
141+
"\n",
142142
"print('TensorFlow version:', tf.__version__)"
143143
]
144144
},
@@ -170,7 +170,7 @@
170170
"source": [
171171
"## Download the dataset\n",
172172
"\n",
173-
"Download the IMDB reviews data set to train the sentiment analysis model."
173+
"Download the IMDB reviews data set to train the sentiment analysis model:"
174174
]
175175
},
176176
{
@@ -1058,7 +1058,7 @@
10581058
"colab": {
10591059
"collapsed_sections": [],
10601060
"name": "dtensor_ml_tutorial.ipynb",
1061-
"toc_visible": true
1061+
"toc_visible": true
10621062
},
10631063
"kernelspec": {
10641064
"display_name": "Python 3",

0 commit comments

Comments
 (0)