|
13 | 13 | "cell_type": "code",
|
14 | 14 | "execution_count": null,
|
15 | 15 | "metadata": {
|
| 16 | + "cellView": "form", |
16 | 17 | "id": "tuOe1ymfHZPu"
|
17 | 18 | },
|
18 | 19 | "outputs": [],
|
|
67 | 68 | "id": "MGZuakHVlVQf"
|
68 | 69 | },
|
69 | 70 | "source": [
|
70 |
| - "\n", |
71 | 71 | "## Overview\n",
|
72 | 72 | "\n",
|
73 | 73 | "This colab introduces DTensor, an extension to TensorFlow for synchronous distributed computing.\n",
|
|
76 | 76 | "\n",
|
77 | 77 | "By decoupling the application from sharding directives, DTensor enables running the same application on a single device, multiple devices, or even multiple clients, while preserving its global semantics. \n",
|
78 | 78 | "\n",
|
79 |
| - "This guide introduces DTensor concepts for distributed computing, and how DTensor integrates with TensorFlow. To see a demo of using DTensor in model training, see [Distributed training with DTensor](https://www.tensorflow.org/tutorials/distribute/dtensor_ml_tutorial.ipynb) tutorial." |
| 79 | + "This guide introduces DTensor concepts for distributed computing, and how DTensor integrates with TensorFlow. To see a demo of using DTensor in model training, see [Distributed training with DTensor](https://www.tensorflow.org/tutorials/distribute/dtensor_ml_tutorial) tutorial." |
80 | 80 | ]
|
81 | 81 | },
|
82 | 82 | {
|
|
157 | 157 | "id": "JjiHaH0ql9yo"
|
158 | 158 | },
|
159 | 159 | "source": [
|
160 |
| - "\n", |
161 | 160 | "### Mesh\n",
|
162 | 161 | "\n",
|
163 | 162 | "`Mesh` represents a logical Cartisian topology of a set of devices. Each dimension of the Cartisian grid is called a **Mesh dimension**, and referred to with a name. Names of mesh dimension within the same `Mesh` must be unique.\n",
|
|
173 | 172 | "id": "_J6cOieEbaUw"
|
174 | 173 | },
|
175 | 174 | "source": [
|
176 |
| - "\n", |
177 | 175 | "In a 1 dimensional `Mesh`, all devices form a list in a single mesh dimension. The following example uses `dtensor.create_mesh` to create a mesh from 6 CPU devices along a mesh dimension `'x'` with a size of 6 devices:\n",
|
178 | 176 | "\n",
|
179 | 177 | "<img src=\"https://www.tensorflow.org/guide/images/dtensor_mesh_1d.png\" alt=\"A 1 dimensional mesh with 6 CPUs\" class=\"no-filter\">\n"
|
|
250 | 248 | "id": "fqzCNlWAbm-c"
|
251 | 249 | },
|
252 | 250 | "source": [
|
253 |
| - "\n", |
254 | 251 | "On a 1-dimensional mesh such as `[(\"x\", 6)]` (`mesh_1d` in the previous section), `Layout([\"unsharded\"], mesh_1d)` is a layout for a rank-1 tensor replicated on 6 devices.\n",
|
255 | 252 | "\n",
|
256 | 253 | "<img src=\"https://www.tensorflow.org/guide/images/dtensor_layout_rank1.png\" alt=\"Layout for a rank-1 tensor\" class=\"no-filter\">"
|
|
308 | 305 | "During `Mesh` creation, each client provides its *local device list* together with the expected *global device list*. DTensor validates that both lists are consistent. Please refer to the API documentation for `dtensor.create_mesh` and `dtensor.create_distributed_mesh`\n",
|
309 | 306 | " for more information on multi-client mesh creation and the *global device list*.\n",
|
310 | 307 | "\n",
|
311 |
| - "Single-client can be thought of as a special case of multi-client, with 1 client. In a single-client application, the *global device list* is identical to the *local device list*.\n", |
312 |
| - "\n" |
| 308 | + "Single-client can be thought of as a special case of multi-client, with 1 client. In a single-client application, the *global device list* is identical to the *local device list*.\n" |
313 | 309 | ]
|
314 | 310 | },
|
315 | 311 | {
|
|
454 | 450 | "source": [
|
455 | 451 | "The inverse operation of `dtensor.unpack` is `dtensor.pack`. Component tensors can be packed back into a DTensor.\n",
|
456 | 452 | "\n",
|
457 |
| - "The components must have the same rank and dtype, which will be the rank and dtype of the returned DTensor. However there is no strict requirement on the device placement of component tensors as inputs of `dtensor.unpack`: the function will automatically copy the component tensors to their respective corresponding devices. \n", |
458 |
| - "\n" |
| 453 | + "The components must have the same rank and dtype, which will be the rank and dtype of the returned DTensor. However there is no strict requirement on the device placement of component tensors as inputs of `dtensor.unpack`: the function will automatically copy the component tensors to their respective corresponding devices. \n" |
459 | 454 | ]
|
460 | 455 | },
|
461 | 456 | {
|
|
601 | 596 | "id": "T7FtZ9kQRZgE"
|
602 | 597 | },
|
603 | 598 | "source": [
|
604 |
| - "\n", |
605 | 599 | "You can inspect the component tensors of the created DTensor and verify they are indeed sharded according to your scheme. It may be helpful to illustrate the situation with a chart:\n",
|
606 | 600 | "\n",
|
607 | 601 | " <img src=\"https://www.tensorflow.org/guide/images/dtensor_hybrid_mesh.png\" alt=\"A 3x2 hybrid mesh with 6 CPUs\"\n",
|
|
712 | 706 | "print('Sharding spec:', dtensor.fetch_layout(c).sharding_specs)\n",
|
713 | 707 | "print(\"components:\")\n",
|
714 | 708 | "for component_tensor in dtensor.unpack(c):\n",
|
715 |
| - " print(component_tensor.device, component_tensor.numpy())\n", |
716 |
| - "\n" |
| 709 | + " print(component_tensor.device, component_tensor.numpy())\n" |
717 | 710 | ]
|
718 | 711 | },
|
719 | 712 | {
|
|
1039 | 1032 | "source": [
|
1040 | 1033 | "## What's next?\n",
|
1041 | 1034 | "\n",
|
1042 |
| - "In this colab, you learned about DTensor, an extension to TensorFlow for distributed computing. To try out these concepts in a tutorial, see [Distributed training with DTensor](https://www.tensorflow.org/tutorials/distribute/dtensor_ml_tutorial.ipynb)." |
| 1035 | + "In this colab, you learned about DTensor, an extension to TensorFlow for distributed computing. To try out these concepts in a tutorial, see [Distributed training with DTensor](https://www.tensorflow.org/tutorials/distribute/dtensor_ml_tutorial)." |
1043 | 1036 | ]
|
1044 | 1037 | }
|
1045 | 1038 | ],
|
1046 | 1039 | "metadata": {
|
1047 | 1040 | "colab": {
|
1048 | 1041 | "collapsed_sections": [],
|
1049 | 1042 | "name": "dtensor_overview.ipynb",
|
1050 |
| - "provenance": [], |
1051 | 1043 | "toc_visible": true
|
1052 | 1044 | },
|
1053 | 1045 | "kernelspec": {
|
1054 | 1046 | "display_name": "Python 3",
|
1055 | 1047 | "name": "python3"
|
1056 |
| - }, |
1057 |
| - "language_info": { |
1058 |
| - "name": "python" |
1059 | 1048 | }
|
1060 | 1049 | },
|
1061 | 1050 | "nbformat": 4,
|
|
0 commit comments