-
Notifications
You must be signed in to change notification settings - Fork 65
feat: add scanner tracker #422
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
|
Thanks for making a pull request! 😃 |
7991726 to
2218219
Compare
|
@ChanderG FYA |
|
@dushyantbehl @ChanderG PTAL. Do we support pushing pushing scanner data also to aim/wandb etc? or is it not in scope? |
4615177 to
81000bb
Compare
|
Also @ashokponkumar @ChanderG I tested this with our internal test and even though the scanner output file was created, I ran into this error while scanner was in use. I'm not sure how the other way in the prior PR of using a flag and not |
|
@aluu317 That's weird. I think the file is created, but write is failing? I don't think the tracker framework should be causing this. I should have clearly printed out the exception in Scanner, my bad. That said, I am unable to reproduce the error. I tried running the tests and also manually ran the cli command with output json file different places (curr dir, /tmp dir etc) - and it's working in all cases. Tests pass and inspecting the generated json files in other cases shows output in expected formats. |
|
@aluu317 Could you re-try? It seems to work fine for me - last I tried. |
|
@ChanderG When you try, did you try with single GPU and calling sft_trainer |
81000bb to
7930625
Compare
|
@ChanderG Ahh how interesting! Thanks for the fix. I will test with the newer version. But I think this proves that the tracker code for this PR works though, independently of the json issue. Let's wrap up this PR if you're ok with reviewing/merging? It'd be nice to include this in our next fms-hf-tuning release (being worked on this week). |
|
@ChanderG Verfied with 0.1.2 HFResourceScanner, json file is written with content! Thank you |
Signed-off-by: Angel Luu <[email protected]>
Signed-off-by: Angel Luu <[email protected]>
Signed-off-by: Angel Luu <[email protected]>
Signed-off-by: Angel Luu <[email protected]>
Signed-off-by: Angel Luu <[email protected]>
Signed-off-by: Angel Luu <[email protected]>
ad5d0ab to
ee46cd2
Compare
|
@ashokponkumar @kmehant @ChanderG Please review |
|
@aluu317 Should we include hf resource scanner unit tests to run in our CI/CD, currently they are being skipped, WDYT? |
|
@kmehant They are being skipped because we need HFResourceScanner installed to run the tests. It's the same behavior with ML Flow tracker and aim stack tracker unit tests. Did you mean some other tests? |
kmehant
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@aluu317 Do we plan to install HFResourceScanner package and let unit tests run? I know for aim and ml flow need bit of a set up to run unit tests so could be skipped. We can look at this in a separate PR as well. Thanks.
|
@kmehant We could install |
|
While running the unit tests, we could possibly install it so that HFResourceScanner based unit tests would run. We can change tox.ini to accommodate this here - Lines 4 to 9 in 5c03aa8
extras = scanner-dev? More docs if you are interested - https://tox.wiki/en/latest/config.html#python-run
|
Description of the change
Extracted from this suggestion in a prior PR, adding HFResourceScanner TrainerCallback as a tracker.
In order to use this, user would need to install
HFResourceScannerin the environment, and pass in training args to enable:See the test written in
test_launch_script.pyRelated issue number
How to verify the PR
Was the PR tested