Skip to content

[Training] Datasets - update Module#1209

Merged
dsikka merged 9 commits intomainfrom
datasets
Mar 5, 2025
Merged

[Training] Datasets - update Module#1209
dsikka merged 9 commits intomainfrom
datasets

Conversation

@horheynm
Copy link
Copy Markdown

@horheynm horheynm commented Feb 27, 2025

Order of reviews:
#1206
#1207
#1209 <-- Here
#1212
#1214

SUMMARY:

  • Move dataset logic out of transformers module src/llmcompressor/transformers/finetune/data/data_helpers.py, add it to src/llmcompressor/datasets/utils.py

TEST PLAN:
Pass tests

@github-actions
Copy link
Copy Markdown

👋 Hi! Thank you for contributing to llm-compressor. Please add the ready label when the PR is ready for review.

Note: This is required to complete the testing suite, please only add the label once the PR is code complete and local testing has been performed.

@horheynm horheynm changed the title Datasets [Training] Datasets - update Module Feb 28, 2025
Signed-off-by: George Ohashi <george@neuralmagic.com>
@horheynm horheynm removed the ready When a PR is ready for review label Mar 3, 2025
dsikka pushed a commit that referenced this pull request Mar 3, 2025
Order of reviews:
#1206
#1207 <-- Here
#1209 
#1212
#1214 

SUMMARY:
* Decouple arg parser to be used for both oneshot and train

TEST PLAN:
* Pass tests
dsikka added a commit that referenced this pull request Mar 5, 2025
Order of reviews:
#1206  <-- Here
#1207
#1209 
#1212
#1214 

SUMMARY:
Rename data_args to dataset_args

TEST PLAN:
Pass tests
FInd `data_args` using `grep`

---------

Signed-off-by: George Ohashi <george@neuralmagic.com>
Co-authored-by: Dipika Sikka <dipikasikka1@gmail.com>
@horheynm horheynm added the ready When a PR is ready for review label Mar 5, 2025
Copy link
Copy Markdown
Collaborator

@brian-dellabetta brian-dellabetta left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

cool! one nit suggestion

@dsikka dsikka enabled auto-merge (squash) March 5, 2025 17:09
@dsikka dsikka merged commit 8fc6012 into main Mar 5, 2025
16 of 17 checks passed
@dsikka dsikka deleted the datasets branch March 5, 2025 18:36
dsikka pushed a commit that referenced this pull request Mar 6, 2025
…ot (#1212)

Order of reviews:
#1206
#1207
#1209
#1212  <-- Here
#1214

SUMMARY:
* Move the preprocessing and postprocessing logic out of
`src/llmcompressor/transformers/finetune/text_generation.py` and into
`src/llmcompressor/entrypoints/utils.py`

TEST PLAN:
Pass tests
brian-dellabetta pushed a commit that referenced this pull request Mar 10, 2025
…ot (#1212)

Order of reviews:
#1206
#1207
#1209
#1212  <-- Here
#1214

SUMMARY:
* Move the preprocessing and postprocessing logic out of
`src/llmcompressor/transformers/finetune/text_generation.py` and into
`src/llmcompressor/entrypoints/utils.py`

TEST PLAN:
Pass tests

Signed-off-by: Brian Dellabetta <bdellabe@redhat.com>
kylesayrs pushed a commit that referenced this pull request Mar 13, 2025
Order of reviews:
#1206
#1207
#1209
#1212
#1214 <-- Here

SUMMARY:
* Refactor Training pipeline
* Remove initialize, finalize from the session functions
* Add training information on entrypoints/readme.md on the different
types of training that can be carried out on llm-compressor
* Decouple training from text_generation.py::main. The new logic loves
in llmcompressor/entrypoints/train.py that takes the flow of
pre-process, carry out training logic and then post-process
* Delete outdated info on transformers/finetune/readme.md
* Update session_mixin.py to use session().initialize or
session().finalize.
* Deprecate train.py in text_generation.py, raising deprecation message
if used.
* Update tests to use llmcompressor's train, not
llmcompressor.transformers' train

TEST PLAN:
* Pass tests

---------

Co-authored-by: Dipika Sikka <dipikasikka1@gmail.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

ready When a PR is ready for review

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants