Fix a minor typo in DTensor overview ipynb file.

tensorflower-gardener · copybara-github · commit bb07b77cc796 · 2023-02-07T10:42:14.000-08:00
PiperOrigin-RevId: 507825081
diff --git a/site/en/guide/dtensor_overview.ipynb b/site/en/guide/dtensor_overview.ipynb
@@ -770,7 +770,7 @@
         "\n",
         "The global matrix product sharded under this scheme can be performed efficiently, by local matmuls that runs concurrently, followed by a collective reduction to aggregate the local results. This is also the [canonical way](https://github.com/open-mpi/ompi/blob/ee87ec391f48512d3718fc7c8b13596403a09056/docs/man-openmpi/man3/MPI_Reduce.3.rst?plain=1#L265) of implementing a distributed matrix dot product.\n",
         "\n",
-        "Total number of floating point mul operations is `6 devices * 4 result * 1 = 24`, a factor of 3 reduction compared to the fully replicated case (72) above. The factor of 3 is due to the sharing along `x` mesh dimension with a size of `3` devices.\n",
+        "Total number of floating point mul operations is `6 devices * 4 result * 1 = 24`, a factor of 3 reduction compared to the fully replicated case (72) above. The factor of 3 is due to the sharding along `x` mesh dimension with a size of `3` devices.\n",
         "\n",
         "The reduction of the number of operations run sequentially is the main mechansism with which synchronuous model parallelism accelerates training."
       ]
@@ -805,7 +805,7 @@
         "You can perform additional sharding on the inputs, and they are appropriately carried over to the results. For example, you can apply additional sharding of operand `a` along its first axis to the `'y'` mesh dimension. The additional sharding will be carried over to the first axis of the result `c`.\n",
         "\n",
         "\n",
-        "Total number of floating point mul operations is `6 devices * 2 result * 1 = 12`, an additional factor of 2 reduction compared to the case (24) above. The factor of 2 is due to the sharing along `y` mesh dimension with a size of `2` devices."
+        "Total number of floating point mul operations is `6 devices * 2 result * 1 = 12`, an additional factor of 2 reduction compared to the case (24) above. The factor of 2 is due to the sharding along `y` mesh dimension with a size of `2` devices."
       ]
     },
     {

Original file line number	Diff line number	Diff line change
`@@ -770,7 +770,7 @@`
`770`	`770`	`"\n",`
`771`	`771`	`"The global matrix product sharded under this scheme can be performed efficiently, by local matmuls that runs concurrently, followed by a collective reduction to aggregate the local results. This is also the [canonical way](https://github.com/open-mpi/ompi/blob/ee87ec391f48512d3718fc7c8b13596403a09056/docs/man-openmpi/man3/MPI_Reduce.3.rst?plain=1#L265) of implementing a distributed matrix dot product.\n",`
`772`	`772`	`"\n",`
`773`		- "Total number of floating point mul operations is `6 devices * 4 result * 1 = 24`, a factor of 3 reduction compared to the fully replicated case (72) above. The factor of 3 is due to the sharing along `x` mesh dimension with a size of `3` devices.\n",
	`773`	+ "Total number of floating point mul operations is `6 devices * 4 result * 1 = 24`, a factor of 3 reduction compared to the fully replicated case (72) above. The factor of 3 is due to the sharding along `x` mesh dimension with a size of `3` devices.\n",
`774`	`774`	`"\n",`
`775`	`775`	`"The reduction of the number of operations run sequentially is the main mechansism with which synchronuous model parallelism accelerates training."`
`776`	`776`	`]`
`@@ -805,7 +805,7 @@`
`805`	`805`	"You can perform additional sharding on the inputs, and they are appropriately carried over to the results. For example, you can apply additional sharding of operand `a` along its first axis to the `'y'` mesh dimension. The additional sharding will be carried over to the first axis of the result `c`.\n",
`806`	`806`	`"\n",`
`807`	`807`	`"\n",`
`808`		- "Total number of floating point mul operations is `6 devices * 2 result * 1 = 12`, an additional factor of 2 reduction compared to the case (24) above. The factor of 2 is due to the sharing along `y` mesh dimension with a size of `2` devices."
	`808`	+ "Total number of floating point mul operations is `6 devices * 2 result * 1 = 12`, an additional factor of 2 reduction compared to the case (24) above. The factor of 2 is due to the sharding along `y` mesh dimension with a size of `2` devices."
`809`	`809`	`]`
`810`	`810`	`},`
`811`	`811`	`{`