Skip to content
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
3 changes: 3 additions & 0 deletions intermediate_source/transformer_building_blocks.py
Original file line number Diff line number Diff line change
Expand Up @@ -79,6 +79,9 @@
# sequence lengths. They eliminate the need for the bug-prone practices of explicit
# padding and masking (think ``key_padding_mask`` in ``nn.MultiHeadAttention``).
#
# .. warning::
# Nested tensors are not currently under active development. Use at your own risk.
#
# * `scaled_dot_product_attention <https://pytorch.org/tutorials/intermediate/scaled_dot_product_attention_tutorial.html>`_
#
# ``scaled_dot_product_attention`` is a primitive for
Expand Down
5 changes: 3 additions & 2 deletions unstable_source/nestedtensor.py
Original file line number Diff line number Diff line change
Expand Up @@ -3,6 +3,9 @@
Getting Started with Nested Tensors
===============================================================
.. warning::
Nested tensors are not currently under active development. Use at your own risk.
Nested tensors generalize the shape of regular dense tensors, allowing for representation
of ragged-sized data.
Expand All @@ -21,8 +24,6 @@
they are invaluable for building transformers that can efficiently operate on ragged sequential
inputs. Below, we present an implementation of multi-head attention using nested tensors that,
combined usage of ``torch.compile``, out-performs operating naively on tensors with padding.
Nested tensors are currently a prototype feature and are subject to change.
"""

import numpy as np
Expand Down