feat: Support gpt-oss class of models with flash attention 3 support #603

dushyantbehl · 2025-09-01T21:47:31Z

Description of the change

Adds code changes required to support the new GPT-OSS models.
Adds a new dockerfile which is based on nvcr for dev mode and supports flash attention 3.
Updates most of the required package versions.
Adds a new quantization class for Mxfp4 which dequantizes the model before training.

Related issue number

How to verify the PR

Was the PR tested

I have added >=1 unit test(s) for every new method I have added.
I have ensured all unit tests pass

github-actions · 2025-09-01T21:47:39Z

Thanks for making a pull request! 😃
One of the maintainers will review and advise on the next steps.

enable cuda 12.8 in dockerfile and add triton kernels in pyproject Signed-off-by: Dushyant Behl <[email protected]>

tuning/config/peft_config.py

Signed-off-by: Dushyant Behl <[email protected]>

ashokponkumar

Nice. Some suggestions.

dev/nvcr.Dockerfile

tuning/config/configs.py

tuning/config/peft_config.py

pyproject.toml

Signed-off-by: Harikrishnan Balagopal <[email protected]> Signed-off-by: Dushyant Behl <[email protected]>

tuning/config/peft_config.py

Signed-off-by: Dushyant Behl <[email protected]>

build/nvcr.Dockerfile

Signed-off-by: Dushyant Behl <[email protected]>

dushyantbehl requested review from aluu317, anhuong, fabianlim and kmehant as code owners September 1, 2025 21:47

github-actions bot added the feat label Sep 1, 2025

Enable tuning of GPT OSS models

889b8f3

enable cuda 12.8 in dockerfile and add triton kernels in pyproject Signed-off-by: Dushyant Behl <[email protected]>

dushyantbehl commented Sep 2, 2025

View reviewed changes

tuning/config/peft_config.py Outdated Show resolved Hide resolved

change dockerfile structure to mimic current cicd

cf10d60

Signed-off-by: Dushyant Behl <[email protected]>

dushyantbehl force-pushed the gpt-oss branch from db31574 to cf10d60 Compare September 2, 2025 04:08

do not change base dockerfile till all testing is done

575c232

Signed-off-by: Dushyant Behl <[email protected]>

dushyantbehl changed the title ~~feat: Support gpt-oss class of models with flash attention support~~ feat: Support gpt-oss class of models with flash attention 3 support Sep 2, 2025

dushyantbehl added 3 commits September 2, 2025 10:08

make dev dockerfile leaner

314ac97

Signed-off-by: Dushyant Behl <[email protected]>

add dev readme

acf3c11

Signed-off-by: Dushyant Behl <[email protected]>

ensure flash attention version is not changed

7f1271a

Signed-off-by: Dushyant Behl <[email protected]>

ashokponkumar reviewed Sep 2, 2025

View reviewed changes

dev/nvcr.Dockerfile Show resolved Hide resolved

tuning/config/configs.py Show resolved Hide resolved

tuning/config/peft_config.py Outdated Show resolved Hide resolved

pyproject.toml Outdated Show resolved Hide resolved

pyproject.toml Outdated Show resolved Hide resolved

Fix linter complaining about quantization config.

42ae4b6

Signed-off-by: Harikrishnan Balagopal <[email protected]> Signed-off-by: Dushyant Behl <[email protected]>

HarikrishnanBalagopal reviewed Sep 2, 2025

View reviewed changes

tuning/config/peft_config.py Show resolved Hide resolved

dushyantbehl added 3 commits September 2, 2025 14:00

condense dockerfile and fix the dependencies

4e786f7

Signed-off-by: Dushyant Behl <[email protected]>

make model loading messages more explicit

4b19f7f

Signed-off-by: Dushyant Behl <[email protected]>

fix format

a92ad68

Signed-off-by: Dushyant Behl <[email protected]>

ashokponkumar previously approved these changes Sep 2, 2025

View reviewed changes

build/nvcr.Dockerfile Outdated Show resolved Hide resolved

build/nvcr.Dockerfile Outdated Show resolved Hide resolved

dushyantbehl dismissed ashokponkumar’s stale review via 8f367e2 September 3, 2025 08:17

change dockerfile to build from source

ab1a21f

Signed-off-by: Dushyant Behl <[email protected]>

dushyantbehl force-pushed the gpt-oss branch from 8f367e2 to ab1a21f Compare September 3, 2025 08:18

enable mamba in nvcr image

130dcc8

Signed-off-by: Dushyant Behl <[email protected]>

ashokponkumar approved these changes Sep 3, 2025

View reviewed changes

ashokponkumar merged commit 7e261d2 into main Sep 3, 2025
12 checks passed

dushyantbehl deleted the gpt-oss branch November 7, 2025 03:22

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat: Support gpt-oss class of models with flash attention 3 support #603

feat: Support gpt-oss class of models with flash attention 3 support #603

Uh oh!

dushyantbehl commented Sep 1, 2025 •

edited

Loading

Uh oh!

github-actions bot commented Sep 1, 2025

Uh oh!

Uh oh!

ashokponkumar left a comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

feat: Support gpt-oss class of models with flash attention 3 support #603

feat: Support gpt-oss class of models with flash attention 3 support #603

Uh oh!

Conversation

dushyantbehl commented Sep 1, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Description of the change

Related issue number

How to verify the PR

Was the PR tested

Uh oh!

github-actions bot commented Sep 1, 2025

Uh oh!

Uh oh!

ashokponkumar left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

dushyantbehl commented Sep 1, 2025 •

edited

Loading