-
Notifications
You must be signed in to change notification settings - Fork 4
Open
Description
Dear authors,
Thank you for releasing fast_l1 code together with datamodels.
While running the linear regression step of datamodels, I faced an issue regarding tensors not being in the same device.
After running
python -m datamodels.regression.compute_datamodels \
-C regression_config.yaml \
--data.data_path "$tmp_dir/reg_data.beton" \
--cfg.out_dir "$tmp_dir/reg_results"
I would face something similar to
File "/path_to_python3.9/site-packages/fast_l1-0.0.1-py3.9.egg/fast_l1/regressor.py", line 221, in train_saga
RuntimeError: indices should be either on cpu or on the same device as the indexed tensor (cpu)
or
File "/path_to_python3.9/site-packages/fast_l1-0.0.1-py3.9.egg/fast_l1/regressor.py", line 341, in train_saga
RuntimeError: indices should be either on cpu or on the same device as the indexed tensor (cpu)
This happened because in lines 221 and 341 of regressor.py some CPU tensors are being indexed/sliced using other tensors that lie on the GPU, in this case, idx and still_opt_outer:
Lines 221 to 222 in ef7d08d
| a_prev[:, :num_keep].copy_(a_table[idx, :num_keep], | |
| non_blocking=True) |
Line 341 in ef7d08d
| inds_to_swap = inds_to_swap[still_opt_outer[inds_to_swap]] |
On the other hand, they are both on the GPU because weight and train_loader in datamodels/datamodels/regression/compute_datamodels.py are on the GPU when train_saga is called:
regressor.train_saga(weight,
bias,
train_loader,
val_loader,
lr=lr,
start_lams=max_lam,
update_bias=(use_bias > 0),
lam_decay=np.exp(np.log(eps)/k),
num_lambdas=k,
early_stop_freq=early_stop_freq,
early_stop_eps=early_stop_eps,
logdir=str(log_path))Metadata
Metadata
Assignees
Labels
No labels