From 86dc6abf1b58ba8119f40313a2768054bbb4a369 Mon Sep 17 00:00:00 2001 From: Sanjay Kumar Sakamuri Kamalakar Date: Wed, 13 Aug 2025 19:38:01 +0530 Subject: [PATCH 1/2] Update about_map_batch.mdx --- docs/source/about_map_batch.mdx | 19 +++++++++++++++++++ 1 file changed, 19 insertions(+) diff --git a/docs/source/about_map_batch.mdx b/docs/source/about_map_batch.mdx index 4ebbdf9acaf..d92999bbd07 100644 --- a/docs/source/about_map_batch.mdx +++ b/docs/source/about_map_batch.mdx @@ -38,3 +38,22 @@ To make it valid, you have to drop one of the columns: >>> len(dataset_with_duplicates) 6 ``` +Alternatively, you can overwrite the existing column to achieve the same result. +For example, here’s how to duplicate every row in the dataset by overwriting column `"a"`: + +```py +>>> from datasets import Dataset +>>> dataset = Dataset.from_dict({"a": [0, 1, 2]}) +# Overwrites the existing "a" column with duplicated values +>>> duplicated_dataset = dataset.map( +... lambda batch: {"a": [x for x in batch["a"] for _ in range(2)]}, +... batched=True +... ) +>>> duplicated_dataset +Dataset({ + features: ['a'], + num_rows: 6 +}) +>>> duplicated_dataset["a"] +[0, 0, 1, 1, 2, 2] +``` From 7018bf9a00146542f0921ea92221f37c2c0aaccd Mon Sep 17 00:00:00 2001 From: Sanjay Kumar Sakamuri Kamalakar Date: Wed, 13 Aug 2025 19:46:26 +0530 Subject: [PATCH 2/2] Update about_map_batch.mdx --- docs/source/about_map_batch.mdx | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/docs/source/about_map_batch.mdx b/docs/source/about_map_batch.mdx index d92999bbd07..6c33fed593f 100644 --- a/docs/source/about_map_batch.mdx +++ b/docs/source/about_map_batch.mdx @@ -44,7 +44,7 @@ For example, here’s how to duplicate every row in the dataset by overwriting c ```py >>> from datasets import Dataset >>> dataset = Dataset.from_dict({"a": [0, 1, 2]}) -# Overwrites the existing "a" column with duplicated values +# overwrites the existing "a" column with duplicated values >>> duplicated_dataset = dataset.map( ... lambda batch: {"a": [x for x in batch["a"] for _ in range(2)]}, ... batched=True