Skip to content

Commit dd2c11a

Browse files
committed
fixing
1 parent 38bfd60 commit dd2c11a

File tree

1 file changed

+2
-2
lines changed

1 file changed

+2
-2
lines changed

src/lightning/data/README.md

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -15,7 +15,7 @@ We developed `StreamingDataset` to optimize training of large datasets stored on
1515

1616
Specifically crafted for multi-gpu & multi-node (with [DDP](https://lightning.ai/docs/pytorch/stable/accelerators/gpu_intermediate.html), [FSDP](https://lightning.ai/docs/pytorch/stable/advanced/model_parallel/fsdp.html), etc...), distributed training with large models, it enhances accuracy, performance, and user-friendliness. Now, training efficiently is possible regardless of the data's location. Simply stream in the required data when needed.
1717

18-
The `StreamingDataset` is compatible with any data type, including **images, text, video, audio, geo-spatial, and multimodal data** and it is a drop-in replacement for your PyTorch [IterableDataset](https://pytorch.org/docs/stable/data.html#torch.utils.data.IterableDataset) class. For example, it is used by [Lit-GPT](https://github.com/Lightning-AI/lit-gpt/blob/main/litgpt/data/tinyllama.py) to pretrain LLMs.
18+
The `StreamingDataset` is compatible with any data type, including **images, text, video, audio, geo-spatial, and multimodal data** and it is a drop-in replacement for your PyTorch [IterableDataset](https://pytorch.org/docs/stable/data.html#torch.utils.data.IterableDataset) class. For example, it is used by [Lit-GPT](https://github.com/Lightning-AI/litgpt/blob/main/litgpt/data/tinyllama.py) to pretrain LLMs.
1919

2020
<br/>
2121

@@ -284,7 +284,7 @@ for batch in tqdm(train_dataloader):
284284

285285
Lightning Data provides a stateful `StreamingDataLoader`. This simplifies resuming training over large datasets.
286286

287-
Note: The `StreamingDataLoader` is used by [Lit-GPT](https://github.com/Lightning-AI/lit-gpt/blob/main/litgpt/data/tinyllama.py) to pretrain LLMs. The statefulness still works when using a mixture of datasets with the `CombinedStreamingDataset`.
287+
Note: The `StreamingDataLoader` is used by [Lit-GPT](https://github.com/Lightning-AI/litgpt/blob/main/litgpt/data/tinyllama.py) to pretrain LLMs. The statefulness still works when using a mixture of datasets with the `CombinedStreamingDataset`.
288288

289289
```python
290290
import os

0 commit comments

Comments
 (0)