-
Notifications
You must be signed in to change notification settings - Fork 955
pytorch/cuda compatibility updates #6
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
replaced adam-atan2 with adam-atan2-pytorch which works on older and newer versions of CUDA / pytorch sapientinc/HRM#45 Replace nn.Buffer with register_buffer sapientinc/HRM#30
|
train/exact_accuracy 0.93514 is accuracy on training data, not on the evaluation (test) data |
|
Aha - makes sense. Do you know if this repo has the script for calculating |
I do not see such script here. You can use evaluate.py from the HRM work https://github.com/sapientinc/HRM. |
copied evaluate.py from HRM and modified to match changes to pretrain.py (eg: adding config.ema logic and CPU_PROCESS_GROUP) example use: ```bash python evaluate.py \ checkpoint=checkpoints/Sudoku-extreme-1k-aug-1000-ACT-torch/pretrain_att_sudoku/step_65100 ```
|
I've also added a modified version of evaluate.py to this branch. 👍 python evaluate.py \
checkpoint=checkpoints/Sudoku-extreme-1k-aug-1000-ACT-torch/pretrain_att_sudoku/step_65100
...
Processing batch 551: all
Completed inference in 16 steps
Running 0 evaluator(s)...
All evaluators completed!
{'all': {'accuracy': 0.91574955, 'exact_accuracy': 0.7751534, 'lm_loss': 0.19095193, 'q_halt_accuracy':
0.9998368, 'q_halt_loss': 0.004543469, 'steps': 16.0}} |
|
not super shocking, but here's a weird viz I made of what logits look like as a trained model solves 5 Sudoku puzzles. correct answer (label) is in green. each puzzle gets 16 iterations. note the first 2 converge right away, the next 2 converge after a few iterations, but the last one never converges and fails to find a solution. |
|
My repo has evaluation scripts and visualizations: https://github.com/olivkoch/nano-trm |

Not necessarily recommending this is merged into the codebase - but thought I would offer some pytorch / CUDA compatibility changes taken from the upstream project in case they are useful for anyone else. In my case it allows me to run this project on an older version of pytorch (2.4.0) / cuda (12.1) [don't ask 😅] - but replacing the adam-atan2 dependency is also reported to be helpful running on newer versions of cuda as well (RTX 5090 / 4090 / 3090 with cuda 12.8).
Anyway, it worked for me - my first run of suduku even had a test accuracy of 93.5% (7% better than the paper?!)
(If this does look useful but messy I can also clean it up a bit if you'd like to merge it in.)
------- 8< -------
replaced adam-atan2 with adam-atan2-pytorch which works on older and newer versions of CUDA / pytorch
sapientinc/HRM#45
Also replaced nn.Buffer with register_buffer
sapientinc/HRM#30