-
Notifications
You must be signed in to change notification settings - Fork 248
Update torchtune pin to 0.4.0.dev20241010 #1300
Conversation
🔗 Helpful Links🧪 See artifacts and rendered test results at hud.pytorch.org/pr/pytorch/torchchat/1300
Note: Links to docs will display an error until the docs builds have been completed. ✅ No FailuresAs of commit 6c97bd9 with merge base d1ab6e0 ( This comment was automatically generated by Dr. CI and updates every 15 minutes. |
|
please look at the list after running the install script for et as well and compare to ensure we don't have conflicts |
joecummings
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
tl;dr - not a problem within this diff, but we'll need to look at ET's installation logic. After running through the ET installation, pip list shows the same version for torchtune. However, this reveals a deeper issue - ET is not checking for/using the ROCm (or CUDA) wheel and is force-installing the CPU versions of torch & other libs. If torchtune were included as a dep, then we'd run into the same issue. I don't know if this is an easy fix - ExecuTorch isn't targeted for GPU systems and should install the CPU version of torch. Warrants further discussion, but I think that's outside of the scope of this PR. |
Co-authored-by: vmpuri <[email protected]>
* add pp_dim, distributed, num_gpus, num_nodes as cmd line args * add tp_dim * add elastic_launch * working, can now launch from cli * Remove numpy < 2.0 pin to align with pytorch (#1301) Fix #1296 Align with https://github.com/pytorch/pytorch/blame/main/requirements.txt#L5 * Update torchtune pin to 0.4.0-dev20241010 (#1300) Co-authored-by: vmpuri <[email protected]> * Unbreak gguf util CI job by fixing numpy version (#1307) Setting numpy version to be the range required by gguf: https://github.com/ggerganov/llama.cpp/blob/master/gguf-py/pyproject.toml * Remove apparently-unused import torchvision in model.py (#1305) Co-authored-by: vmpuri <[email protected]> * remove global var for tokenizer type + patch tokenizer to allow list of sequences * make pp tp visible in interface * Add llama 3.1 to dist_run.py * [WIP] Move dist inf into its own generator * Add initial generator interface to dist inference * Added generate method and placeholder scheduler * use prompt parameter for dist generation * Enforce tp>=2 * Build tokenizer from TokenizerArgs * Disable torchchat format + constrain possible models for distributed * disable calling dist_run.py directly for now * Restore original dist_run.py for now * disable _maybe_parallelize_model again * Reenable arg.model_name in dist_run.py * Use singleton logger instead of print in generate * Address PR comments; try/expect in launch_dist_inference; added comments --------- Co-authored-by: lessw2020 <[email protected]> Co-authored-by: Mengwei Liu <[email protected]> Co-authored-by: vmpuri <[email protected]> Co-authored-by: vmpuri <[email protected]> Co-authored-by: Scott Wolchok <[email protected]>
Update torchtune pin to 0.4.0-dev20241010 . This is the newest version visible to the ROCm wheel & fixes the installation script for AMD/ROCm systems (otherwise, it would break since 0.3.0-dev20240928 isn't visible via the ROCm 6.2 wheel)
pip list after running install_requirements.sh successfully.