Skip to content

[WIP] fix transforrmers api change at 5.2.0#1647

Open
UbeCc wants to merge 10 commits intoTHUDM:mainfrom
UbeCc:fix/loss-mask
Open

[WIP] fix transforrmers api change at 5.2.0#1647
UbeCc wants to merge 10 commits intoTHUDM:mainfrom
UbeCc:fix/loss-mask

Conversation

@UbeCc
Copy link
Member

@UbeCc UbeCc commented Feb 28, 2026

No description provided.

@UbeCc UbeCc changed the title fix transforrmers api change at 5.2.0 [WIP] fix transforrmers api change at 5.2.0 Feb 28, 2026
UbeCc added 9 commits March 1, 2026 12:48
- Create tensorboard_dir if it doesn't exist before passing to trace handler
- Create memory_snapshot_dir before profiler tries to save snapshots
- Prevents FileNotFoundError when profiling with missing directories

Fixes issue where OOM observer fails when memory_snapshot_dir doesn't exist.

Made-with: Cursor
…snapshot saving

- Add try-except blocks in OOM observer and stop() method with detailed logging
- Verify that snapshot files are actually created and log file sizes
- Convert Path objects to strings explicitly for better compatibility
- Add stderr tracebacks for better error visibility
- Log snapshot path before attempting to save for debugging
- Use os.makedirs instead of Path.mkdir for more robust directory creation

This helps diagnose issues where snapshots appear to be saved but files are empty.

Made-with: Cursor
… actors

- Convert memory_snapshot_dir and tensorboard_dir to absolute paths using .resolve()
- Prevents issues where Ray actors save files in different working directories
- Add log message showing absolute path being used
- Convert Path objects to strings for memray compatibility

Fixes issue where snapshot files appear to be saved but end up in wrong location
when training is distributed across Ray actors with different working directories.

Made-with: Cursor
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants