Skip to content

Commit e69fdd0

Browse files
authored
add CHANGELOG.md to track project updates (Lightning-AI#733)
1 parent 7afd55f commit e69fdd0

File tree

7 files changed

+180
-13
lines changed

7 files changed

+180
-13
lines changed

.pre-commit-config.yaml

Lines changed: 9 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -37,6 +37,10 @@ repos:
3737
- id: check-case-conflict
3838
- id: check-added-large-files
3939
args: ["--maxkb=350", "--enforce-all"]
40+
exclude: |
41+
(?x)^(
42+
src/litdata/CHANGELOG.md
43+
)$
4044
- id: detect-private-key
4145

4246
- repo: https://github.com/codespell-project/codespell
@@ -66,7 +70,11 @@ repos:
6670
#- mdformat-black
6771
- mdformat_frontmatter
6872
args: ["--number"]
69-
exclude: README.md
73+
exclude: |
74+
(?x)^(
75+
src/litdata/CHANGELOG.md |
76+
README.md
77+
)$
7078
7179
- repo: https://github.com/pre-commit/mirrors-prettier
7280
rev: v3.1.0

benchmarks/README.md

Lines changed: 2 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -15,10 +15,11 @@ You can compare both approaches for your own datasets and training pipelines.
1515
## Why benchmarks?
1616

1717
Benchmarks help you:
18+
1819
- Measure data loading speed and efficiency
1920
- Compare different formats and pipelines
2021
- Choose the best setup for your training
2122

22-
---
23+
______________________________________________________________________
2324

2425
For more details, check the README in each subfolder.

benchmarks/ffcv/README.md

Lines changed: 3 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -1,4 +1,4 @@
1-
# FFCV Benchmarks
1+
# FFCV Benchmarks
22

33
This folder contains scripts to convert, write, and stream datasets using FFCV for benchmarking.
44

@@ -68,7 +68,7 @@ python stream_imagenet.py \
6868
--cfg.epochs=2
6969
```
7070

71-
---
71+
______________________________________________________________________
7272

7373
These scripts are easy to use and work with both local and cloud datasets. For more details, see the script docstrings or run with `--help`.
7474

@@ -92,4 +92,5 @@ To extract the real S3 path for a dataset in your teamspace, use:
9292
```sh
9393
python3 -c "from litdata.streaming.resolver import _resolve_dir; path=_resolve_dir('/teamspace/datasets/imagenet-1m-litdata/'); print(path.url)"
9494
```
95+
9596
You can also prepare the datasets yourself using the earlier steps if you prefer.

benchmarks/litdata/README.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -32,6 +32,6 @@ python stream_imagenet.py \
3232

3333
- Use `--use_pil` if you optimized with raw PIL images.
3434

35-
---
35+
______________________________________________________________________
3636

3737
These scripts are easy to use and work with both local and cloud datasets. For more details, see the script docstrings or run with `--help`.

examples/multi_modal/README.md

Lines changed: 21 additions & 7 deletions
Original file line numberDiff line numberDiff line change
@@ -2,20 +2,34 @@
22

33
# Installation Guide:
44

5-
````bash
5+
```bash
66
sudo apt install libpoppler-cpp-dev
77
pip install python-poppler
88
sudo apt install libpoppler-cpp-dev poppler-utils
99
pip install fpdf python-poppler pdf2image names
10+
```
1011

1112
# Attention:
1213

13-
Please note that the provided data and scripts are intended solely for trying out the code and demonstrating the workflow. The synthetic data used in this example is simplified and may not fully represent the complexity and variability of real-world customer emails and documents. For a proper evaluation and effective training of the model, it is crucial to use larger datasets with more diversity and noise. Real-world data often contains various imperfections, such as OCR errors, different document formats, and varied writing styles, which need to be accounted for to develop a robust and reliable model.
14+
Please note that the provided data and scripts are intended solely for trying out the code and demonstrating the workflow.
15+
16+
The synthetic data used in this example is simplified and may not fully represent the complexity and variability of real-world customer emails and documents.
17+
18+
For a proper evaluation and effective training of the model, it is crucial to use larger datasets with more diversity and noise.
19+
20+
Real-world data often contains various imperfections, such as OCR errors, different document formats, and varied writing styles, which need to be accounted for to develop a robust and reliable model.
1421

1522
# Document Classification for Customer Emails
1623

17-
Document classification in an insurance company offers several practical benefits. It helps in efficiently sorting incoming documents, directing them to the appropriate departments promptly. This can reduce processing times and minimize the risk of documents being misrouted. Additionally, by categorizing documents based on their content, it can help ensure that each document type is handled by the appropriate specialists, potentially reducing errors. Moreover, it can aid in better resource allocation, distributing workloads more evenly among departments. Overall, document classification can contribute to improved efficiency and organization within the company.
18-
This project demonstrates a simple example of classifying customer documents into three categories: cancellations, IBAN changes, and damage reports. The classification leverages both computer vision and natural language processing (NLP) techniques. Specifically, it combines the power of a BERT model for text analysis and a ResNet18 model for image recognition. For demonstration purposes, we are only selecting three classes, though there are more classes available in the complete process.
24+
Document classification in an insurance company offers several practical benefits. It helps in efficiently sorting incoming documents, directing them to the appropriate departments promptly.
25+
26+
This can reduce processing times and minimize the risk of documents being misrouted. Additionally, by categorizing documents based on their content, it can help ensure that each document type is handled by the appropriate specialists, potentially reducing errors.
27+
28+
Moreover, it can aid in better resource allocation, distributing workloads more evenly among departments. Overall, document classification can contribute to improved efficiency and organization within the company.
29+
30+
This project demonstrates a simple example of classifying customer documents into three categories: cancellations, IBAN changes, and damage reports. The classification leverages both computer vision and natural language processing (NLP) techniques. Specifically, it combines the power of a BERT model for text analysis and a ResNet18 model for image recognition.
31+
32+
For demonstration purposes, we are only selecting three classes, though there are more classes available in the complete process.
1933

2034
## Table of Contents
2135

@@ -86,15 +100,15 @@ To use this code, follow these steps:
86100

87101
```python
88102
python examples/multi_modal/generate.py
89-
````
103+
```
90104

91-
1. **Prepare Data:**
105+
2. **Prepare Data:**
92106

93107
```python
94108
python examples/multi_modal/convert.py
95109
```
96110

97-
1. **Train Model:**
111+
3. **Train Model:**
98112

99113
```python
100114
python examples/multi_modal/train.py

examples/sine_function_model_prediction/README.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -5,7 +5,7 @@
55

66
- Checkout this example in [Lightning Studio](https://lightning.ai/deependu/studios/sine-function-model-prediction-with-litdata-and-pytorch-lightning)
77

8-
---
8+
______________________________________________________________________
99

1010
## Steps
1111

src/litdata/CHANGELOG.md

Lines changed: 143 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,143 @@
1+
# Changelog
2+
3+
All notable changes to this project will be documented in this file.
4+
5+
The format is based on [Keep a Changelog](http://keepachangelog.com/en/1.0.0/).
6+
7+
---
8+
9+
## [unreleased] - YYYY-MM-DD
10+
11+
### Added
12+
13+
- Introduced `CHANGELOG.md` to track changes across releases ([#733](https://github.com/lightning-ai/litdata/pull/733))
14+
15+
### Changed
16+
17+
### Removed
18+
19+
### Fixed
20+
21+
## [0.2.58] - 2025-10-07
22+
23+
## [0.2.57] - 2025-10-06
24+
25+
## [0.2.56] - 2025-09-23
26+
27+
## [0.2.55] - 2025-09-19
28+
29+
## [0.2.54] - 2025-09-10
30+
31+
## [0.2.53] - 2025-09-09
32+
33+
## [0.2.52] - 2025-08-12
34+
35+
## [0.2.51] - 2025-07-29
36+
37+
## [0.2.50] - 2025-06-27
38+
39+
## [0.2.49] - 2025-06-04
40+
41+
## [0.2.48] - 2025-05-24
42+
43+
## [0.2.47] - 2025-05-13
44+
45+
## [0.2.46] - 2025-05-03
46+
47+
## [0.2.45] - 2025-04-14
48+
49+
## [0.2.44] - 2025-03-26
50+
51+
## [0.2.43] - 2025-03-25
52+
53+
## [0.2.42] (yanked) - 2025-03-11
54+
55+
## [0.2.41] - 2025-03-07
56+
57+
## [0.2.40] - 2025-03-04
58+
59+
## [0.2.39] - 2025-02-14
60+
61+
## [0.2.38] - 2025-02-06
62+
63+
## [0.2.37] - 2025-01-22
64+
65+
## [0.2.36] - 2025-01-14
66+
67+
## [0.2.35] - 2025-01-14
68+
69+
## [0.2.34] - 2024-12-04
70+
71+
## [0.2.33] - 2024-11-29
72+
73+
## [0.2.32] - 2024-11-27
74+
75+
## [0.2.31] - 2024-11-21
76+
77+
## [0.2.30] - 2024-11-05
78+
79+
## [0.2.29] - 2024-09-26
80+
81+
## [0.2.28] - 2024-09-19
82+
83+
## [0.2.27] (yanked) - 2024-09-19
84+
85+
## [0.2.26] - 2024-09-03
86+
87+
## [0.2.25] - 2024-08-28
88+
89+
## [0.2.24] - 2024-08-14
90+
91+
## [0.2.23] - 2024-08-07
92+
93+
## [0.2.22] - 2024-08-05
94+
95+
## [0.2.21] - 2024-08-01
96+
97+
## [0.2.20] - 2024-08-01
98+
99+
## [0.2.19] - 2024-07-30
100+
101+
## [0.2.18] - 2024-07-24
102+
103+
## [0.2.17] - 2024-07-22
104+
105+
## [0.2.16] - 2024-07-11
106+
107+
## [0.2.15] - 2024-07-05
108+
109+
## [0.2.14] - 2024-06-27
110+
111+
## [0.2.13] - 2024-06-27
112+
113+
## [0.2.12] - 2024-06-17
114+
115+
## [0.2.11] - 2024-06-14
116+
117+
## [0.2.10] - 2024-06-13
118+
119+
## [0.2.9] - 2024-06-12
120+
121+
## [0.2.8] - 2024-06-03
122+
123+
## [0.2.7] - 2024-05-24
124+
125+
## [0.2.6] - 2024-05-07
126+
127+
## [0.2.5] - 2024-04-24
128+
129+
## [0.2.4] - 2024-04-24
130+
131+
## [0.2.3] - 2024-04-03
132+
133+
## [0.2.2] - 2024-03-08
134+
135+
## [0.2.1] - 2024-03-05
136+
137+
## [0.2.0] - 2024-02-26
138+
139+
## [0.2.0rc2] (pre-release) - 2024-02-26
140+
141+
## [0.2.0rc1] (pre-release) - 2024-02-24
142+
143+
## [0.2.0rc0] (pre-release) - 2024-02-23

0 commit comments

Comments
 (0)