Skip to content

Commit 7d5ea2e

Browse files
edlopercopybara-github
authored andcommitted
RaggedTensor guide: Add a section on ragged shapes.
PiperOrigin-RevId: 446696647
1 parent 8cb1b13 commit 7d5ea2e

File tree

1 file changed

+257
-2
lines changed

1 file changed

+257
-2
lines changed

site/en/guide/ragged_tensor.ipynb

Lines changed: 257 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -81,6 +81,7 @@
8181
},
8282
"outputs": [],
8383
"source": [
84+
"!pip install --pre -U tensorflow\n",
8485
"import math\n",
8586
"import tensorflow as tf"
8687
]
@@ -1459,13 +1460,267 @@
14591460
"print(\"Indexed value:\", rt[1].numpy())"
14601461
]
14611462
},
1463+
{
1464+
"cell_type": "markdown",
1465+
"metadata": {
1466+
"id": "J87jMZa0M_YW"
1467+
},
1468+
"source": [
1469+
"## Ragged Shapes\n",
1470+
"\n",
1471+
"The shape of a tensor specifies the size of each axis. For example, the shape of `[[1, 2], [3, 4], [5, 6]]` is `[3, 2]`, since there are 3 rows and 2 columns. TensorFlow has two separate but related ways to describe shapes:\n",
1472+
"\n",
1473+
"* ***static shape***: Information about axis sizes that is known statically (e.g., while tracing a `tf.function`). May be partially specified.\n",
1474+
"\n",
1475+
"* ***dynamic shape***: Runtime information about the axis sizes."
1476+
]
1477+
},
1478+
{
1479+
"cell_type": "markdown",
1480+
"metadata": {
1481+
"id": "IOETE_OLPLZo"
1482+
},
1483+
"source": [
1484+
"### Static shape\n",
1485+
"\n",
1486+
"A Tensor's static shape contains information about its axis sizes that is known at graph-construction time. For both `tf.Tensor` and `tf.RaggedTensor`, it is available using the `.shape` property, and is encoded using `tf.TensorShape`:"
1487+
]
1488+
},
1489+
{
1490+
"cell_type": "code",
1491+
"execution_count": null,
1492+
"metadata": {
1493+
"id": "btGDjT4uNgQy"
1494+
},
1495+
"outputs": [],
1496+
"source": [
1497+
"x = tf.constant([[1, 2], [3, 4], [5, 6]])\n",
1498+
"x.shape # shape of a tf.tensor"
1499+
]
1500+
},
1501+
{
1502+
"cell_type": "code",
1503+
"execution_count": null,
1504+
"metadata": {
1505+
"id": "__OgvmrGPEjq"
1506+
},
1507+
"outputs": [],
1508+
"source": [
1509+
"rt = tf.ragged.constant([[1], [2, 3], [], [4]])\n",
1510+
"rt.shape # shape of a tf.RaggedTensor"
1511+
]
1512+
},
1513+
{
1514+
"cell_type": "markdown",
1515+
"metadata": {
1516+
"id": "9EWnQd3qPWaw"
1517+
},
1518+
"source": [
1519+
"The static shape of a ragged dimension is always `None` (i.e., unspecified). However, the inverse is not true -- if a `TensorShape` dimension is `None`, then that could indicate that the dimension is ragged, *or* it could indicate that the dimension is uniform but that its size is not statically known."
1520+
]
1521+
},
1522+
{
1523+
"cell_type": "markdown",
1524+
"metadata": {
1525+
"id": "75E9YXYMNfne"
1526+
},
1527+
"source": [
1528+
"### Dynamic shape\n",
1529+
"\n",
1530+
"A tensor's dynamic shape contains information about its axis sizes that is known when the graph is run. It is constructed using the `tf.shape` operation. For `tf.Tensor`, `tf.shape` returns the shape as a 1D integer `Tensor`, where `tf.shape(x)[i]` is the size of axis `i`."
1531+
]
1532+
},
1533+
{
1534+
"cell_type": "code",
1535+
"execution_count": null,
1536+
"metadata": {
1537+
"id": "kWJ7Cn1EQTD_"
1538+
},
1539+
"outputs": [],
1540+
"source": [
1541+
"x = tf.constant([['a', 'b'], ['c', 'd'], ['e', 'f']])\n",
1542+
"tf.shape(x)"
1543+
]
1544+
},
1545+
{
1546+
"cell_type": "markdown",
1547+
"metadata": {
1548+
"id": "BeZEfxwmRcSv"
1549+
},
1550+
"source": [
1551+
"However, a 1D `Tensor` is not expressive enough to describe the shape of a `tf.RaggedTensor`. Instead, the dynamic shape for ragged tensors is encoded using a dedicated type, `tf.experimental.DynamicRaggedShape`. In the following example, the `DynamicRaggedShape` returned by `tf.shape(rt)` indicates that the ragged tensor has 4 rows, with lengths 1, 3, 0, and 2:"
1552+
]
1553+
},
1554+
{
1555+
"cell_type": "code",
1556+
"execution_count": null,
1557+
"metadata": {
1558+
"id": "nZc2wqgQQUFU"
1559+
},
1560+
"outputs": [],
1561+
"source": [
1562+
"rt = tf.ragged.constant([[1], [2, 3, 4], [], [5, 6]])\n",
1563+
"rt_shape = tf.shape(rt)\n",
1564+
"print(rt_shape)"
1565+
]
1566+
},
1567+
{
1568+
"cell_type": "markdown",
1569+
"metadata": {
1570+
"id": "EphU60YvTf98"
1571+
},
1572+
"source": [
1573+
"#### Dynamic shape: operations\n",
1574+
"\n",
1575+
"`DynamicRaggedShape`s can be used with most TensorFlow ops that expect shapes, including `tf.reshape`, `tf.zeros`, `tf.ones`. `tf.fill`, `tf.broadcast_dynamic_shape`, and `tf.broadcast_to`."
1576+
]
1577+
},
1578+
{
1579+
"cell_type": "code",
1580+
"execution_count": null,
1581+
"metadata": {
1582+
"id": "pclAODLXT6Gr"
1583+
},
1584+
"outputs": [],
1585+
"source": [
1586+
"print(f\"tf.reshape(x, rt_shape) = {tf.reshape(x, rt_shape)}\")\n",
1587+
"print(f\"tf.zeros(rt_shape) = {tf.zeros(rt_shape)}\")\n",
1588+
"print(f\"tf.ones(rt_shape) = {tf.ones(rt_shape)}\")\n",
1589+
"print(f\"tf.fill(rt_shape, 9) = {tf.fill(rt_shape, 'x')}\")"
1590+
]
1591+
},
1592+
{
1593+
"cell_type": "markdown",
1594+
"metadata": {
1595+
"id": "rNP_3_btRAHj"
1596+
},
1597+
"source": [
1598+
"#### Dynamic shape: indexing and slicing\n",
1599+
"\n",
1600+
"`DynamicRaggedShape` can be also be indexed to get the sizes of uniform dimensions. For example, we can find the number of rows in a raggedtensor using `tf.shape(rt)[0]` (just as we would for a non-ragged tensor):"
1601+
]
1602+
},
1603+
{
1604+
"cell_type": "code",
1605+
"execution_count": null,
1606+
"metadata": {
1607+
"id": "MzQvPhsxS6HN"
1608+
},
1609+
"outputs": [],
1610+
"source": [
1611+
"rt_shape[0]"
1612+
]
1613+
},
1614+
{
1615+
"cell_type": "markdown",
1616+
"metadata": {
1617+
"id": "wvr2iT6zS_e8"
1618+
},
1619+
"source": [
1620+
"However, it is an error to use indexing to try to retrieve the size of a ragged dimension, since it doesn't have a single size. (Since `RaggedTensor` keeps track of which axes are ragged, this error is only thrown during eager execution or when tracing a `tf.function`; it will never be thrown when executing a concrete function.)"
1621+
]
1622+
},
1623+
{
1624+
"cell_type": "code",
1625+
"execution_count": null,
1626+
"metadata": {
1627+
"id": "HgGMk0LeTGik"
1628+
},
1629+
"outputs": [],
1630+
"source": [
1631+
"try:\n",
1632+
" rt_shape[1]\n",
1633+
"except ValueError as e:\n",
1634+
" print(\"Got expected ValueError:\", e)"
1635+
]
1636+
},
1637+
{
1638+
"cell_type": "markdown",
1639+
"metadata": {
1640+
"id": "5QUsdawGU0SM"
1641+
},
1642+
"source": [
1643+
"`DynamicRaggedShape`s can also be sliced, as long as the slice either begins with axis `0`, or contains only dense dimensions."
1644+
]
1645+
},
1646+
{
1647+
"cell_type": "code",
1648+
"execution_count": null,
1649+
"metadata": {
1650+
"id": "APT72EaBU70t"
1651+
},
1652+
"outputs": [],
1653+
"source": [
1654+
"rt_shape[:1]"
1655+
]
1656+
},
1657+
{
1658+
"cell_type": "markdown",
1659+
"metadata": {
1660+
"id": "a-Wl9IrQXcdY"
1661+
},
1662+
"source": [
1663+
"#### Dynamic shape: encoding\n",
1664+
"\n",
1665+
"`DynamicRaggedShape` is encoded using two fields:\n",
1666+
"\n",
1667+
"* `inner_shape`: An integer vector giving the shape of a dense `tf.Tensor`.\n",
1668+
"* `row_partitions`: A list of `tf.experimental.RowPartition` objects, describing how the outermost dimension of that inner shape should be partitioned to add ragged axes.\n",
1669+
"\n",
1670+
"For more information about row partitions, see the \"RaggedTensor encoding\" section below, and the API docs for `tf.experimental.RowPartition`."
1671+
]
1672+
},
1673+
{
1674+
"cell_type": "markdown",
1675+
"metadata": {
1676+
"id": "jfeY9tTcV_zL"
1677+
},
1678+
"source": [
1679+
"#### Dynamic shape: construction\n",
1680+
"\n",
1681+
"`DynamicRaggedShape` is most often constructed by applying `tf.shape` to a `RaggedTensor`, but it can also be constructed directly:"
1682+
]
1683+
},
1684+
{
1685+
"cell_type": "code",
1686+
"execution_count": null,
1687+
"metadata": {
1688+
"id": "NSRgD667WwIZ"
1689+
},
1690+
"outputs": [],
1691+
"source": [
1692+
"tf.experimental.DynamicRaggedShape(\n",
1693+
" row_partitions=[tf.experimental.RowPartition.from_row_lengths([5, 3, 2])],\n",
1694+
" inner_shape=[10, 8])"
1695+
]
1696+
},
1697+
{
1698+
"cell_type": "markdown",
1699+
"metadata": {
1700+
"id": "EjzVjs9MXIIA"
1701+
},
1702+
"source": [
1703+
"If the lengths of all rows are known statically, `DynamicRaggedShape.from_lengths` can also be used to construct a dynamic ragged shape. (This is mostly useful for testing and demonstration code, since it's rare for the lengths of ragged dimensions to be known statically).\n"
1704+
]
1705+
},
1706+
{
1707+
"cell_type": "code",
1708+
"execution_count": null,
1709+
"metadata": {
1710+
"id": "gMxCzADUYIjY"
1711+
},
1712+
"outputs": [],
1713+
"source": [
1714+
"tf.experimental.DynamicRaggedShape.from_lengths([4, (2, 1, 0, 8), 12])"
1715+
]
1716+
},
14621717
{
14631718
"cell_type": "markdown",
14641719
"metadata": {
14651720
"id": "EdljbNPq-PWS"
14661721
},
14671722
"source": [
1468-
"## Broadcasting\n",
1723+
"### Broadcasting\n",
14691724
"\n",
14701725
"Broadcasting is the process of making tensors with different shapes have compatible shapes for elementwise operations. For more background on broadcasting, refer to:\n",
14711726
"\n",
@@ -1491,7 +1746,7 @@
14911746
"id": "-S2hOUWx-PWU"
14921747
},
14931748
"source": [
1494-
"### Broadcasting examples"
1749+
"#### Broadcasting examples"
14951750
]
14961751
},
14971752
{

0 commit comments

Comments
 (0)