[Training] Datasets - update Module by horheynm · Pull Request #1209 · vllm-project/llm-compressor

horheynm · 2025-02-27T20:44:25Z

Order of reviews:
#1206
#1207
#1209 <-- Here
#1212
#1214

SUMMARY:

Move dataset logic out of transformers module src/llmcompressor/transformers/finetune/data/data_helpers.py, add it to src/llmcompressor/datasets/utils.py

TEST PLAN:
Pass tests

github-actions · 2025-02-27T20:44:37Z

👋 Hi! Thank you for contributing to llm-compressor. Please add the ready label when the PR is ready for review.

Note: This is required to complete the testing suite, please only add the label once the PR is code complete and local testing has been performed.

Signed-off-by: George Ohashi <george@neuralmagic.com>

Order of reviews: #1206 #1207 <-- Here #1209 #1212 #1214 SUMMARY: * Decouple arg parser to be used for both oneshot and train TEST PLAN: * Pass tests

Order of reviews: #1206 <-- Here #1207 #1209 #1212 #1214 SUMMARY: Rename data_args to dataset_args TEST PLAN: Pass tests FInd `data_args` using `grep` --------- Signed-off-by: George Ohashi <george@neuralmagic.com> Co-authored-by: Dipika Sikka <dipikasikka1@gmail.com>

brian-dellabetta

cool! one nit suggestion

tests/llmcompressor/transformers/finetune/data/test_dataset_helpers.py

src/llmcompressor/transformers/finetune/runner.py

…ot (#1212) Order of reviews: #1206 #1207 #1209 #1212 <-- Here #1214 SUMMARY: * Move the preprocessing and postprocessing logic out of `src/llmcompressor/transformers/finetune/text_generation.py` and into `src/llmcompressor/entrypoints/utils.py` TEST PLAN: Pass tests

…ot (#1212) Order of reviews: #1206 #1207 #1209 #1212 <-- Here #1214 SUMMARY: * Move the preprocessing and postprocessing logic out of `src/llmcompressor/transformers/finetune/text_generation.py` and into `src/llmcompressor/entrypoints/utils.py` TEST PLAN: Pass tests Signed-off-by: Brian Dellabetta <bdellabe@redhat.com>

Order of reviews: #1206 #1207 #1209 #1212 #1214 <-- Here SUMMARY: * Refactor Training pipeline * Remove initialize, finalize from the session functions * Add training information on entrypoints/readme.md on the different types of training that can be carried out on llm-compressor * Decouple training from text_generation.py::main. The new logic loves in llmcompressor/entrypoints/train.py that takes the flow of pre-process, carry out training logic and then post-process * Delete outdated info on transformers/finetune/readme.md * Update session_mixin.py to use session().initialize or session().finalize. * Deprecate train.py in text_generation.py, raising deprecation message if used. * Update tests to use llmcompressor's train, not llmcompressor.transformers' train TEST PLAN: * Pass tests --------- Co-authored-by: Dipika Sikka <dipikasikka1@gmail.com>

George Ohashi added 2 commits February 27, 2025 15:43

Dataset Module

48d8e97

Dataset Module

45ad106

Merge branch 'main' into datasets

b719d3c

horheynm changed the title ~~Datasets~~ [Training] Datasets - update Module Feb 28, 2025

Merge branch 'main' into datasets

34eba48

horheynm added the ready When a PR is ready for review label Feb 28, 2025

This was referenced Feb 28, 2025

[Train] Training Pipeline #1214

Merged

[Training] Unifying Preprocess + Postprocessing logic for Train/Oneshot #1212

Merged

[Training] Decouple Argument parser #1207

Merged

[Cosmetic] Rename data_args to dataset_args #1206

Merged

pass

72376e7

Signed-off-by: George Ohashi <george@neuralmagic.com>

horheynm removed the ready When a PR is ready for review label Mar 3, 2025

remove accelerator

83482e2

dsikka pushed a commit that referenced this pull request Mar 3, 2025

[Training] Decouple Argument parser (#1207)

7bb517f

Order of reviews: #1206 #1207 <-- Here #1209 #1212 #1214 SUMMARY: * Decouple arg parser to be used for both oneshot and train TEST PLAN: * Pass tests

George added 3 commits March 5, 2025 10:31

Merge branch 'main' into datasets

63dce9b

lint

faeff7c

merge main fix

e5e292b

horheynm added the ready When a PR is ready for review label Mar 5, 2025

brian-dellabetta previously approved these changes Mar 5, 2025

View reviewed changes

tests/llmcompressor/transformers/finetune/data/test_dataset_helpers.py Show resolved Hide resolved

dsikka requested changes Mar 5, 2025

View reviewed changes

tests/llmcompressor/transformers/finetune/data/test_dataset_helpers.py Show resolved Hide resolved

src/llmcompressor/transformers/finetune/runner.py Show resolved Hide resolved

horheynm dismissed brian-dellabetta’s stale review via ab8aacf March 5, 2025 16:52

horheynm force-pushed the datasets branch from ab8aacf to e5e292b Compare March 5, 2025 16:53

brian-dellabetta approved these changes Mar 5, 2025

View reviewed changes

dsikka approved these changes Mar 5, 2025

View reviewed changes

dsikka enabled auto-merge (squash) March 5, 2025 17:09

dsikka merged commit 8fc6012 into main Mar 5, 2025
16 of 17 checks passed

dsikka deleted the datasets branch March 5, 2025 18:36

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Training] Datasets - update Module#1209

[Training] Datasets - update Module#1209
dsikka merged 9 commits intomainfrom
datasets

horheynm commented Feb 27, 2025 •

edited

Loading

Uh oh!

github-actions bot commented Feb 27, 2025

Uh oh!

brian-dellabetta left a comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Conversation

horheynm commented Feb 27, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

github-actions bot commented Feb 27, 2025

Uh oh!

brian-dellabetta left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

horheynm commented Feb 27, 2025 •

edited

Loading