From 0162a0177054ec6ac99ad3adb5c9a22e1e52b580 Mon Sep 17 00:00:00 2001 From: Moaz Ali <32923319+moazDev1@users.noreply.github.com> Date: Mon, 6 Oct 2025 21:52:14 -0700 Subject: [PATCH] Update 3.mdx MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit The current text says the list of encoded sequences is “already of rectangular shape,” but the example actually contains lists of different lengths (16 and 8 tokens). This means the array is not rectangular and cannot be directly converted to a tensor without padding. --- chapters/en/chapter2/3.mdx | 8 +++++++- 1 file changed, 7 insertions(+), 1 deletion(-) diff --git a/chapters/en/chapter2/3.mdx b/chapters/en/chapter2/3.mdx index cf6309eb1..c39e7a761 100644 --- a/chapters/en/chapter2/3.mdx +++ b/chapters/en/chapter2/3.mdx @@ -277,11 +277,17 @@ encoded_sequences = [ ] ``` -This is a list of encoded sequences: a list of lists. Tensors only accept rectangular shapes (think matrices). This "array" is already of rectangular shape, so converting it to a tensor is easy: +This is a list of encoded sequences: a list of lists. Tensors only accept rectangular shapes. +Because these lists have different lengths, we can **pad** the shorter ones with zeros so they all have the same size: ```py import torch +encoded_sequences = [ + [101, 1045, 1005, 2310, 2042, 3403, 2005, 1037, 17662, 12172, 2607, 2026, 2878, 2166, 1012, 102], + [101, 1045, 5223, 2023, 2061, 2172, 999, 102, 0, 0, 0, 0, 0, 0, 0, 0], +] + model_inputs = torch.tensor(encoded_sequences) ```