add CHANGELOG.md to track project updates (Lightning-AI#733)

deependujha · web-flow · commit e69fdd0042de · 2025-10-13T14:13:58.000+01:00
diff --git a/.pre-commit-config.yaml b/.pre-commit-config.yaml
@@ -37,6 +37,10 @@ repos:
       - id: check-case-conflict
       - id: check-added-large-files
         args: ["--maxkb=350", "--enforce-all"]
+        exclude: |
+          (?x)^(
+              src/litdata/CHANGELOG.md
+          )$
       - id: detect-private-key
 
   - repo: https://github.com/codespell-project/codespell
@@ -66,7 +70,11 @@ repos:
           #- mdformat-black
           - mdformat_frontmatter
         args: ["--number"]
-        exclude: README.md
+        exclude: |
+          (?x)^(
+              src/litdata/CHANGELOG.md |
+              README.md
+          )$
 
   - repo: https://github.com/pre-commit/mirrors-prettier
     rev: v3.1.0
diff --git a/benchmarks/README.md b/benchmarks/README.md
@@ -15,10 +15,11 @@ You can compare both approaches for your own datasets and training pipelines.
 ## Why benchmarks?
 
 Benchmarks help you:
+
 - Measure data loading speed and efficiency
 - Compare different formats and pipelines
 - Choose the best setup for your training
 
----
+______________________________________________________________________
 
 For more details, check the README in each subfolder.
diff --git a/benchmarks/ffcv/README.md b/benchmarks/ffcv/README.md
@@ -1,4 +1,4 @@
-# FFCV Benchmarks  
+# FFCV Benchmarks
 
 This folder contains scripts to convert, write, and stream datasets using FFCV for benchmarking.
 
@@ -68,7 +68,7 @@ python stream_imagenet.py \
     --cfg.epochs=2
 ```
 
----
+______________________________________________________________________
 
 These scripts are easy to use and work with both local and cloud datasets. For more details, see the script docstrings or run with `--help`.
 
@@ -92,4 +92,5 @@ To extract the real S3 path for a dataset in your teamspace, use:
 ```sh
 python3 -c "from litdata.streaming.resolver import _resolve_dir; path=_resolve_dir('/teamspace/datasets/imagenet-1m-litdata/'); print(path.url)"
 ```
+
 You can also prepare the datasets yourself using the earlier steps if you prefer.
diff --git a/benchmarks/litdata/README.md b/benchmarks/litdata/README.md
@@ -32,6 +32,6 @@ python stream_imagenet.py \
 
 - Use `--use_pil` if you optimized with raw PIL images.
 
----
+______________________________________________________________________
 
 These scripts are easy to use and work with both local and cloud datasets. For more details, see the script docstrings or run with `--help`.
diff --git a/examples/multi_modal/README.md b/examples/multi_modal/README.md
@@ -2,20 +2,34 @@
 
 # Installation Guide:
 
-````bash
+```bash
 sudo apt install libpoppler-cpp-dev
 pip install python-poppler
 sudo apt install libpoppler-cpp-dev poppler-utils
 pip install fpdf python-poppler pdf2image names
+```
 
 # Attention:
 
-Please note that the provided data and scripts are intended solely for trying out the code and demonstrating the workflow. The synthetic data used in this example is simplified and may not fully represent the complexity and variability of real-world customer emails and documents. For a proper evaluation and effective training of the model, it is crucial to use larger datasets with more diversity and noise. Real-world data often contains various imperfections, such as OCR errors, different document formats, and varied writing styles, which need to be accounted for to develop a robust and reliable model.
+Please note that the provided data and scripts are intended solely for trying out the code and demonstrating the workflow.
+
+The synthetic data used in this example is simplified and may not fully represent the complexity and variability of real-world customer emails and documents.
+
+For a proper evaluation and effective training of the model, it is crucial to use larger datasets with more diversity and noise.
+
+Real-world data often contains various imperfections, such as OCR errors, different document formats, and varied writing styles, which need to be accounted for to develop a robust and reliable model.
 
 # Document Classification for Customer Emails
 
-Document classification in an insurance company offers several practical benefits. It helps in efficiently sorting incoming documents, directing them to the appropriate departments promptly. This can reduce processing times and minimize the risk of documents being misrouted. Additionally, by categorizing documents based on their content, it can help ensure that each document type is handled by the appropriate specialists, potentially reducing errors. Moreover, it can aid in better resource allocation, distributing workloads more evenly among departments. Overall, document classification can contribute to improved efficiency and organization within the company.
-This project demonstrates a simple example of classifying customer documents into three categories: cancellations, IBAN changes, and damage reports. The classification leverages both computer vision and natural language processing (NLP) techniques. Specifically, it combines the power of a BERT model for text analysis and a ResNet18 model for image recognition. For demonstration purposes, we are only selecting three classes, though there are more classes available in the complete process.
+Document classification in an insurance company offers several practical benefits. It helps in efficiently sorting incoming documents, directing them to the appropriate departments promptly.
+
+This can reduce processing times and minimize the risk of documents being misrouted. Additionally, by categorizing documents based on their content, it can help ensure that each document type is handled by the appropriate specialists, potentially reducing errors.
+
+Moreover, it can aid in better resource allocation, distributing workloads more evenly among departments. Overall, document classification can contribute to improved efficiency and organization within the company.
+
+This project demonstrates a simple example of classifying customer documents into three categories: cancellations, IBAN changes, and damage reports. The classification leverages both computer vision and natural language processing (NLP) techniques. Specifically, it combines the power of a BERT model for text analysis and a ResNet18 model for image recognition.
+
+For demonstration purposes, we are only selecting three classes, though there are more classes available in the complete process.
 
 ## Table of Contents
 
@@ -86,15 +100,15 @@ To use this code, follow these steps:
 
    ```python
    python examples/multi_modal/generate.py
-````
+   ```
 
-1. **Prepare Data:**
+2. **Prepare Data:**
 
    ```python
    python examples/multi_modal/convert.py
    ```
 
-1. **Train Model:**
+3. **Train Model:**
 
    ```python
    python examples/multi_modal/train.py
diff --git a/examples/sine_function_model_prediction/README.md b/examples/sine_function_model_prediction/README.md
@@ -5,7 +5,7 @@
 
 - Checkout this example in [Lightning Studio](https://lightning.ai/deependu/studios/sine-function-model-prediction-with-litdata-and-pytorch-lightning)
 
----
+______________________________________________________________________
 
 ## Steps
 
diff --git a/src/litdata/CHANGELOG.md b/src/litdata/CHANGELOG.md
@@ -0,0 +1,143 @@
+# Changelog
+
+All notable changes to this project will be documented in this file.
+
+The format is based on [Keep a Changelog](http://keepachangelog.com/en/1.0.0/).
+
+---
+
+## [unreleased] - YYYY-MM-DD
+
+### Added
+
+- Introduced `CHANGELOG.md` to track changes across releases ([#733](https://github.com/lightning-ai/litdata/pull/733))
+
+### Changed
+
+### Removed
+
+### Fixed
+
+## [0.2.58] - 2025-10-07
+
+## [0.2.57] - 2025-10-06
+
+## [0.2.56] - 2025-09-23
+
+## [0.2.55] - 2025-09-19
+
+## [0.2.54] - 2025-09-10
+
+## [0.2.53] - 2025-09-09
+
+## [0.2.52] - 2025-08-12
+
+## [0.2.51] - 2025-07-29
+
+## [0.2.50] - 2025-06-27
+
+## [0.2.49] - 2025-06-04
+
+## [0.2.48] - 2025-05-24
+
+## [0.2.47] - 2025-05-13
+
+## [0.2.46] - 2025-05-03
+
+## [0.2.45] - 2025-04-14
+
+## [0.2.44] - 2025-03-26
+
+## [0.2.43] - 2025-03-25
+
+## [0.2.42] (yanked) - 2025-03-11
+
+## [0.2.41] - 2025-03-07
+
+## [0.2.40] - 2025-03-04
+
+## [0.2.39] - 2025-02-14
+
+## [0.2.38] - 2025-02-06
+
+## [0.2.37] - 2025-01-22
+
+## [0.2.36] - 2025-01-14
+
+## [0.2.35] - 2025-01-14
+
+## [0.2.34] - 2024-12-04
+
+## [0.2.33] - 2024-11-29
+
+## [0.2.32] - 2024-11-27
+
+## [0.2.31] - 2024-11-21
+
+## [0.2.30] - 2024-11-05
+
+## [0.2.29] - 2024-09-26
+
+## [0.2.28] - 2024-09-19
+
+## [0.2.27] (yanked) - 2024-09-19
+
+## [0.2.26] - 2024-09-03
+
+## [0.2.25] - 2024-08-28
+
+## [0.2.24] - 2024-08-14
+
+## [0.2.23] - 2024-08-07
+
+## [0.2.22] - 2024-08-05
+
+## [0.2.21] - 2024-08-01
+
+## [0.2.20] - 2024-08-01
+
+## [0.2.19] - 2024-07-30
+
+## [0.2.18] - 2024-07-24
+
+## [0.2.17] - 2024-07-22
+
+## [0.2.16] - 2024-07-11
+
+## [0.2.15] - 2024-07-05
+
+## [0.2.14] - 2024-06-27
+
+## [0.2.13] - 2024-06-27
+
+## [0.2.12] - 2024-06-17
+
+## [0.2.11] - 2024-06-14
+
+## [0.2.10] - 2024-06-13
+
+## [0.2.9] - 2024-06-12
+
+## [0.2.8] - 2024-06-03
+
+## [0.2.7] - 2024-05-24
+
+## [0.2.6] - 2024-05-07
+
+## [0.2.5] - 2024-04-24
+
+## [0.2.4] - 2024-04-24
+
+## [0.2.3] - 2024-04-03
+
+## [0.2.2] - 2024-03-08
+
+## [0.2.1] - 2024-03-05
+
+## [0.2.0] - 2024-02-26
+
+## [0.2.0rc2] (pre-release) - 2024-02-26
+
+## [0.2.0rc1] (pre-release) - 2024-02-24
+
+## [0.2.0rc0] (pre-release) - 2024-02-23