pytorch/cuda compatibility updates #6

dribnet · 2025-10-10T10:33:50Z

Not necessarily recommending this is merged into the codebase - but thought I would offer some pytorch / CUDA compatibility changes taken from the upstream project in case they are useful for anyone else. In my case it allows me to run this project on an older version of pytorch (2.4.0) / cuda (12.1) [don't ask 😅] - but replacing the adam-atan2 dependency is also reported to be helpful running on newer versions of cuda as well (RTX 5090 / 4090 / 3090 with cuda 12.8).

Anyway, it worked for me - my first run of suduku even had a test accuracy of 93.5% (7% better than the paper?!)

wandb: Run summary:                                                                                     
wandb:            num_params 5028866                                                                    
wandb:        train/accuracy 0.98872                                                                    
wandb:           train/count 1                                                                          
wandb:  train/exact_accuracy 0.93514                                                                    
wandb:         train/lm_loss 0.50523                                                                    
wandb:              train/lr 0.0001                                                                     
wandb: train/q_halt_accuracy 0.97297                                                                    
wandb:     train/q_halt_loss 0.02499                                                                    
wandb:           train/steps 3.97297

(If this does look useful but messy I can also clean it up a bit if you'd like to merge it in.)

------- 8< -------

replaced adam-atan2 with adam-atan2-pytorch which works on older and newer versions of CUDA / pytorch

sapientinc/HRM#45

Also replaced nn.Buffer with register_buffer

sapientinc/HRM#30

replaced adam-atan2 with adam-atan2-pytorch which works on older and newer versions of CUDA / pytorch sapientinc/HRM#45 Replace nn.Buffer with register_buffer sapientinc/HRM#30

vasiliyeskin · 2025-10-10T12:18:46Z

train/exact_accuracy 0.93514 is accuracy on training data, not on the evaluation (test) data

dribnet · 2025-10-10T12:52:01Z

Aha - makes sense. Do you know if this repo has the script for calculating eval/exact_accuracy from checkpoint?

vasiliyeskin · 2025-10-10T15:05:38Z

Aha - makes sense. Do you know if this repo has the script for calculating eval/exact_accuracy from checkpoint?

I do not see such script here. You can use evaluate.py from the HRM work https://github.com/sapientinc/HRM.

copied evaluate.py from HRM and modified to match changes to pretrain.py (eg: adding config.ema logic and CPU_PROCESS_GROUP) example use: ```bash python evaluate.py \ checkpoint=checkpoints/Sudoku-extreme-1k-aug-1000-ACT-torch/pretrain_att_sudoku/step_65100 ```

dribnet · 2025-10-10T19:29:51Z

I've also added a modified version of evaluate.py to this branch. 👍

python evaluate.py \
  checkpoint=checkpoints/Sudoku-extreme-1k-aug-1000-ACT-torch/pretrain_att_sudoku/step_65100
...
Processing batch 551: all                                                                               
  Completed inference in 16 steps                                                                 
                                                                               
Running 0 evaluator(s)...                                                                              
All evaluators completed!                                                                     
{'all': {'accuracy': 0.91574955, 'exact_accuracy': 0.7751534, 'lm_loss': 0.19095193, 'q_halt_accuracy':
0.9998368, 'q_halt_loss': 0.004543469, 'steps': 16.0}}

dribnet · 2025-10-11T09:26:23Z

not super shocking, but here's a weird viz I made of what logits look like as a trained model solves 5 Sudoku puzzles.

correct answer (label) is in green. each puzzle gets 16 iterations. note the first 2 converge right away, the next 2 converge after a few iterations, but the last one never converges and fails to find a solution.

olivkoch · 2025-12-15T15:18:35Z

My repo has evaluation scripts and visualizations: https://github.com/olivkoch/nano-trm

pytorch compatibility updates

1a48ae8

replaced adam-atan2 with adam-atan2-pytorch which works on older and newer versions of CUDA / pytorch sapientinc/HRM#45 Replace nn.Buffer with register_buffer sapientinc/HRM#30

added evaluate.py

2c0e476

copied evaluate.py from HRM and modified to match changes to pretrain.py (eg: adding config.ema logic and CPU_PROCESS_GROUP) example use: ```bash python evaluate.py \ checkpoint=checkpoints/Sudoku-extreme-1k-aug-1000-ACT-torch/pretrain_att_sudoku/step_65100 ```

zanngujjar mentioned this pull request Oct 13, 2025

Please make this repo easier to use #9

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

pytorch/cuda compatibility updates #6

pytorch/cuda compatibility updates #6

Uh oh!

dribnet commented Oct 10, 2025 •

edited

Loading

Uh oh!

vasiliyeskin commented Oct 10, 2025

Uh oh!

dribnet commented Oct 10, 2025

Uh oh!

vasiliyeskin commented Oct 10, 2025

Uh oh!

dribnet commented Oct 10, 2025

Uh oh!

dribnet commented Oct 11, 2025

Uh oh!

olivkoch commented Dec 15, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

pytorch/cuda compatibility updates #6

Are you sure you want to change the base?

pytorch/cuda compatibility updates #6

Uh oh!

Conversation

dribnet commented Oct 10, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

vasiliyeskin commented Oct 10, 2025

Uh oh!

dribnet commented Oct 10, 2025

Uh oh!

vasiliyeskin commented Oct 10, 2025

Uh oh!

dribnet commented Oct 10, 2025

Uh oh!

dribnet commented Oct 11, 2025

Uh oh!

olivkoch commented Dec 15, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

dribnet commented Oct 10, 2025 •

edited

Loading