Skip to content

Commit 4c4503e

Browse files
docs: Add column overwrite example to batch mapping guide (#7737)
* Update about_map_batch.mdx * Update about_map_batch.mdx
1 parent 910fab2 commit 4c4503e

File tree

1 file changed

+19
-0
lines changed

1 file changed

+19
-0
lines changed

docs/source/about_map_batch.mdx

Lines changed: 19 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -38,3 +38,22 @@ To make it valid, you have to drop one of the columns:
3838
>>> len(dataset_with_duplicates)
3939
6
4040
```
41+
Alternatively, you can overwrite the existing column to achieve the same result.
42+
For example, here’s how to duplicate every row in the dataset by overwriting column `"a"`:
43+
44+
```py
45+
>>> from datasets import Dataset
46+
>>> dataset = Dataset.from_dict({"a": [0, 1, 2]})
47+
# overwrites the existing "a" column with duplicated values
48+
>>> duplicated_dataset = dataset.map(
49+
... lambda batch: {"a": [x for x in batch["a"] for _ in range(2)]},
50+
... batched=True
51+
... )
52+
>>> duplicated_dataset
53+
Dataset({
54+
features: ['a'],
55+
num_rows: 6
56+
})
57+
>>> duplicated_dataset["a"]
58+
[0, 0, 1, 1, 2, 2]
59+
```

0 commit comments

Comments
 (0)