-
Notifications
You must be signed in to change notification settings - Fork 514
[PyTorch][Training][EC2][SageMaker]PyTorch 2.8 Currency Release #5149
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Merged
Merged
Changes from 19 commits
Commits
Show all changes
90 commits
Select commit
Hold shift + click to select a range
d27e523
pt 2.8 training ec2
jinyan-li1 ac530dd
fix typo
jinyan-li1 1793c2a
increase cpu image baseline size, add libcudnn9-dev-cuda-12 dependency
jinyan-li1 f94d72d
add test and update dockerfiles
jinyan-li1 4e7caea
bump fastai version and change CUDA version
jinyan-li1 26fd863
update buildspecs and confest to cu129
jinyan-li1 b5cde70
modify dockerfile gpu and add logging in efa test
jinyan-li1 d2632ed
add back libcudnn9-headers-cuda-12
jinyan-li1 ca1d99b
remove version pins, update cudnn header path check, change efa test …
jinyan-li1 4079253
pin opencv-python version and simplify efa test logging
jinyan-li1 e9da304
comment out some tests and change test_path to only run test_efa
jinyan-li1 bcd8969
fix syntax
jinyan-li1 915eea2
uncomment training log
jinyan-li1 2d98fb1
change log validation
jinyan-li1 454b693
change LD_LIBRARY path and rebuild
jinyan-li1 93b6143
Remove nvjpeg patching script and rebuild with normal test path
jinyan-li1 7d5851d
Merge branch 'master' into pt-2.8-currency
jinyan-li1 c333ecb
change TE version to 2.4
jinyan-li1 4c534ef
update flash attention to 2.8.3 and reorganize dockerfile
jinyan-li1 c3557ca
fix dockerfile
jinyan-li1 5733c55
revert dockerfile organization changes
jinyan-li1 e5ad021
build with PR base dlc
jinyan-li1 8cc73fe
fix FLASH_ATTN_VERSION arg
jinyan-li1 e7eeb33
fix arg
jinyan-li1 a5c1d14
increase image size baselines
jinyan-li1 190005b
rerun test LD_LIBRARY_PATH added back, skip build
jinyan-li1 7c227a1
rebuild image with updated dockerfile
jinyan-li1 0eb5e9d
fix redundancy in dockerfile
jinyan-li1 2a58ed9
move GDRCOPY_VERSION arg to common
jinyan-li1 9ab58d2
update cudnn_version.h location in test path, add telemetry script back
jinyan-li1 f64a296
move oss compliance into a script and modify dockerfile
jinyan-li1 2743af5
move some packages to ec2 and sm stages, remove awscli and boto3 sinc…
jinyan-li1 ba58253
rebuild ec2 image with base image updated
jinyan-li1 94f3e4e
build sm image and run tests
jinyan-li1 fa82f11
update comments and rebuild ec2 image
jinyan-li1 0d999e5
rebuild sm image with updated toml config
jinyan-li1 61d1c0b
build ec2 image with base from PR with libsqlite3-dev added
jinyan-li1 1cdff14
build ec2 image with safety check test and ecr scan allowlist feature…
jinyan-li1 dc5d8b0
Merge branch 'master' into pt-2.8-currency
jinyan-li1 4d0e610
rebuild sm image with PR base image
jinyan-li1 ad2b613
uncomment efa test log, add dgl to dockfiles and rebuild sm image
jinyan-li1 03dc0ab
remove dgl from dockerfiles, skip dgl and smdebug tests
jinyan-li1 0490e9d
Merge branch 'master' into pt-2.8-currency
jinyan-li1 5c3b37e
skip smddp and smppy tests, allowlist protobuf CVE 77740 - blocked by…
jinyan-li1 74be69b
add logging in test_safety to continue when there is an exception
jinyan-li1 fa5a04e
keep debug log, rebuild sm image with released base dlc
jinyan-li1 8fa869d
rebuild ec2 image with released base dlc
jinyan-li1 aa94d04
add CVE 77740 to ignore safety ids for test
jinyan-li1 f0e7d72
rebuild sm image with updated toml config
jinyan-li1 0a1ce87
remove loggings
jinyan-li1 8fce327
fix confest for sm tests
jinyan-li1 dd85d05
Merge branch 'master' into pt-2.8-currency
jinyan-li1 7008e2c
Revert config changes
jinyan-li1 4c76dc8
Merge branch 'master' into pt-2.8-currency
jinyan-li1 d329022
Merge branch 'master' into pt-2.8-currency
jinyan-li1 07f14cf
remove allowlist files and update gpu dockerfile
jinyan-li1 04df973
rebuild ec2 image without allowlist
jinyan-li1 90fb983
rebuild sm image without allowlist
jinyan-li1 5e53acf
install TE from pypi instead of git, rebuild ec2 image
jinyan-li1 e899808
change gdrcopy installation and rebuild ec2 image
jinyan-li1 ede5e9b
change TE version to 2.5 and reogranize cpu dockerfile, rebuild ec2 i…
jinyan-li1 1593492
remove skip_smdebug_v1_test
jinyan-li1 e42556e
reorganize cpu dockerfile
jinyan-li1 f317415
remove unnecessary fixtures and skip tests directly, comment out efa …
jinyan-li1 fb99b12
unpin opencv-python and rebuild ec2 image
jinyan-li1 bccd0ed
Merge branch 'master' into pt-2.8-currency
jinyan-li1 d96f804
pin opencv-python and rebuild ec2 image
jinyan-li1 98fd21a
formatting and rebuild sm image
jinyan-li1 9071542
pin thinc
jinyan-li1 284cc8c
rebuild ec2 image
jinyan-li1 c6c8db2
rebuild sm image
jinyan-li1 23724d7
Revert config changes
jinyan-li1 97a334c
reogranize gpu dockerfile
jinyan-li1 3ee468c
rebuild ec2 image
jinyan-li1 b50bbba
rebuild sm image
jinyan-li1 6804bb8
set MAX_JOBS to 18
jinyan-li1 3bef50f
fix license file access issue and rebuild ec2 image
jinyan-li1 7fed818
rebuild sm image
jinyan-li1 0ae5d9c
Revert config file
jinyan-li1 d391144
Merge branch 'master' into pt-2.8-currency
jinyan-li1 8afc1bd
use flash attn precompiled wheel
jinyan-li1 9640b16
Rebuild ec2 image
jinyan-li1 4827448
rebuild sm image
jinyan-li1 60d1f51
Revert toml changes
jinyan-li1 9a127a9
Merge branch 'master' into pt-2.8-currency
jinyan-li1 721b88e
resolve diff with master
jinyan-li1 258b61e
run ec2 tests again
jinyan-li1 15cd9d0
formatting and revert config changes
jinyan-li1 53e4e55
Merge branch 'master' into pt-2.8-currency
jinyan-li1 f7c88bb
Merge branch 'master' into pt-2.8-currency
jinyan-li1 File filter
Filter by extension
Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
There are no files selected for viewing
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
3 changes: 0 additions & 3 deletions
3
pytorch/training/docker/2.8/py3/Dockerfile.sagemaker.cpu.py_scan_allowlist.json
This file was deleted.
Oops, something went wrong.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
3 changes: 0 additions & 3 deletions
3
pytorch/training/docker/2.8/py3/cu129/Dockerfile.sagemaker.gpu.py_scan_allowlist.json
This file was deleted.
Oops, something went wrong.
Oops, something went wrong.
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Uh oh!
There was an error while loading. Please reload this page.