Skip to content

Error running cifar_example.py with InfoBatch #27

@Buenobarbie

Description

@Buenobarbie

I tried running the examples from the README.md. The first example (using the full dataset) worked as expected. However, the second example (using train_info_batch) resulted in the following error:

$ nohup /usr/bin/time -v python examples/cifar_example.py "--model r50 --optimizer lars --max-lr 5.2 --num_epoch 5 --delta 0.875 --ratio 0.5 --use_info_batch" >> log_test.log 2>&1
nohup: ignoring input
==> Building model..
use normal data parallel
Use info batch.
<class 'infobatch.infobatch.IBSampler'>

Epoch: 0, iterations 391
Traceback (most recent call last):
  File "/home/vm03/Desktop/barbara/infobatch/InfoBatch/examples/cifar_example.py", line 269, in <module>
    train_info_batch(epoch) if args.use_info_batch else train_normal(epoch)
    ^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/vm03/Desktop/barbara/infobatch/InfoBatch/examples/cifar_example.py", line 191, in train_info_batch
    lr_scheduler.step()
  File "/home/vm03/anaconda3/envs/cp/lib/python3.11/site-packages/torch/optim/lr_scheduler.py", line 241, in step
    values = self.get_lr()
             ^^^^^^^^^^^^^
  File "/home/vm03/anaconda3/envs/cp/lib/python3.11/site-packages/torch/optim/lr_scheduler.py", line 2153, in get_lr
    raise ValueError(
ValueError: Tried to step 1956 times. The specified number of total steps is 1955

To speed up the process, I initially limited the number of epochs, but I also ran the original example with 200 epochs and encountered the same error — though with a higher number of total steps.

It seems that the error originates from a loop that may be getting stuck inside the train_info_batch function:

for batch_idx, blobs in enumerate(trainloader):
    inputs, targets = blobs
    inputs, targets = inputs.to(device), targets.to(device)

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions