|
13 | 13 | "cell_type": "code",
|
14 | 14 | "execution_count": null,
|
15 | 15 | "metadata": {
|
| 16 | + "cellView": "form", |
16 | 17 | "id": "tuOe1ymfHZPu"
|
17 | 18 | },
|
18 | 19 | "outputs": [],
|
|
36 | 37 | "id": "MfBg1C5NB3X0"
|
37 | 38 | },
|
38 | 39 | "source": [
|
39 |
| - "# DTensor Maching Learning Tutorial\n" |
| 40 | + "# Distributed Training with DTensors\n" |
40 | 41 | ]
|
41 | 42 | },
|
42 | 43 | {
|
|
75 | 76 | " \n",
|
76 | 77 | " - Data Parallel training, where the training samples are sharded (partitioned) to devices.\n",
|
77 | 78 | " - Model Parallel training, where the model variables are sharded to devices. \n",
|
78 |
| - " - Spatial Parallel training, where the features of input data are sharded to devices.\n", |
| 79 | + " - Spatial Parallel training, where the features of input data are sharded to devices. (Also known as [Spatial Partitioning](https://cloud.google.com/blog/products/ai-machine-learning/train-ml-models-on-large-images-and-3d-volumes-with-spatial-partitioning-on-cloud-tpus))\n", |
79 | 80 | "\n",
|
80 | 81 | "The training portion of this tutorial is inspired [A Kaggle guide on Sentiment Analysis](https://www.kaggle.com/code/anasofiauzsoy/yelp-review-sentiment-analysis-tensorflow-tfds/notebook) notebook. To learn about the complete training and evaluation workflow (without DTensor), refer to that notebook. \n",
|
81 | 82 | "\n",
|
|
237 | 238 | " 'y': dataset_y,\n",
|
238 | 239 | "})\n",
|
239 | 240 | "\n",
|
240 |
| - "dataset.take(1).get_single_element()\n", |
241 |
| - "\n" |
| 241 | + "dataset.take(1).get_single_element()\n" |
242 | 242 | ]
|
243 | 243 | },
|
244 | 244 | {
|
|
297 | 297 | "id": "PMCt-Gj3b3Jy"
|
298 | 298 | },
|
299 | 299 | "source": [
|
300 |
| - "\n", |
301 | 300 | "### Dense Layer\n",
|
302 | 301 | "\n",
|
303 | 302 | "The following custom Dense layer defines 2 layer variables: $W_{ij}$ is the variable for weights, and $b_i$ is the variable for the biases.\n",
|
|
809 | 808 | "- The 2 devices within a single model replica receive replicated training data.\n",
|
810 | 809 | "\n",
|
811 | 810 | "\n",
|
812 |
| - "<img src=\"https://www.tensorflow.org/tutorials/distribute/images/dtensor_model_para.png\" alt=\"Model parallel mesh\" class=\"no-filter\">\n", |
813 |
| - "\n" |
| 811 | + "<img src=\"https://www.tensorflow.org/tutorials/distribute/images/dtensor_model_para.png\" alt=\"Model parallel mesh\" class=\"no-filter\">\n" |
814 | 812 | ]
|
815 | 813 | },
|
816 | 814 | {
|
|
905 | 903 | "id": "u-bK6IZ9GCS9"
|
906 | 904 | },
|
907 | 905 | "source": [
|
908 |
| - "When training data of very high dimensionality (e.g. a very large image or a video), it may be desirable to shard along the feature dimension. This is called Spatial Parallel training.\n", |
| 906 | + "When training data of very high dimensionality (e.g. a very large image or a video), it may be desirable to shard along the feature dimension. This is called [Spatial Partitioning](https://cloud.google.com/blog/products/ai-machine-learning/train-ml-models-on-large-images-and-3d-volumes-with-spatial-partitioning-on-cloud-tpus), which was first introduced into TensorFlow for training models with large 3-d input samples.\n", |
909 | 907 | "\n",
|
910 | 908 | "<img src=\"https://www.tensorflow.org/tutorials/distribute/images/dtensor_spatial_para.png\" alt=\"Spatial parallel mesh\" class=\"no-filter\">\n",
|
911 | 909 | "\n",
|
|
1067 | 1065 | "Composing a model with `tf.Module` from scratch is a lot of work, and reusing existing building blocks such as layers and helper functions can drastically speed up model development.\n",
|
1068 | 1066 | "As of TensorFlow 2.9, all Keras Layers under `tf.keras.layers` accepts DTensor layouts as their arguments, and can be used to build DTensor models. You can even directly reuse a Keras model with DTensor without modifying the model implementation. Refer to the [DTensor Keras Integration Tutorial](link) (TODO: add link) for information on using DTensor Keras. "
|
1069 | 1067 | ]
|
1070 |
| - }, |
1071 |
| - { |
1072 |
| - "cell_type": "code", |
1073 |
| - "execution_count": null, |
1074 |
| - "metadata": { |
1075 |
| - "id": "A-YWPfJyHPcX" |
1076 |
| - }, |
1077 |
| - "outputs": [], |
1078 |
| - "source": [ |
1079 |
| - "" |
1080 |
| - ] |
1081 | 1068 | }
|
1082 | 1069 | ],
|
1083 | 1070 | "metadata": {
|
1084 | 1071 | "colab": {
|
1085 | 1072 | "collapsed_sections": [],
|
1086 | 1073 | "name": "dtensor_ml_tutorial.ipynb",
|
1087 |
| - "provenance": [], |
1088 | 1074 | "toc_visible": true
|
1089 | 1075 | },
|
1090 | 1076 | "kernelspec": {
|
1091 | 1077 | "display_name": "Python 3",
|
1092 | 1078 | "name": "python3"
|
1093 |
| - }, |
1094 |
| - "language_info": { |
1095 |
| - "name": "python" |
1096 | 1079 | }
|
1097 | 1080 | },
|
1098 | 1081 | "nbformat": 4,
|
|
0 commit comments