Skip to content

Commit 2c52dee

Browse files
dushyantbehlwillmj
andauthored
Update architecture_records/004-datapreprocessor.md
Co-authored-by: Will Johnson <[email protected]> Signed-off-by: Dushyant Behl <[email protected]>
1 parent 8aef57e commit 2c52dee

File tree

1 file changed

+1
-1
lines changed

1 file changed

+1
-1
lines changed

architecture_records/004-datapreprocessor.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -220,7 +220,7 @@ Since this design allows complex preprocessing of the dataset on fly, the design
220220
221221
The goal that we have here is to not be slower than the HuggingFace library which our whole design is based upon, in this sense we also imagine any performance improvements that we come across to be contributed back to HF library to keep our design simple and not reimplement stuff.
222222
223-
-> Handling Large Dataset
223+
#### Handling Large Dataset
224224
225225
Our main reason for using HF [Map](https://huggingface.co/docs/datasets/en/process#map) heavily for data preprocessing is that for large datasets which are generally loaded as `IterableDatasets` the MAP API automatically performs [`lazy map operations`](https://huggingface.co/docs/datasets/en/about_mapstyle_vs_iterable#eager-data-processing-and-lazy-data-processing) and hence doesn't produce too much overhead while training.
226226

0 commit comments

Comments
 (0)