Skip to content

Training #19

@shqmffl486

Description

@shqmffl486

How do I train with sdf_hand_mini and sdf_obj_mini that you uploaded?
I think there is a .npz file that doesn't exist because I put it in mini version.

(alignsdf) MS-7B23:~/mount4t/AlignSDF$ CUDA_VISIBLE_DEVICES=0 bash dist_train.sh 4 6666 -e experiments/obman/30k_1e2d_mlp5.json
do not support renderer in this machine
DeepSdf - INFO - Added key: store_based_barrier_key:1 to store for rank: 0
DeepSdf - INFO - Training in distributed mode, 1 GPU per process. Process 0, total 1.
DeepSdf - INFO - Experiment description:
3D hand reconstruction on the mini obman dataset.
Hand branch: True
Object branch: True
Mano branch: False
Depth branch: False
Classifier Weight: 0
Penetration Loss: False
Penetration Loss Weight: 0
Additional Loss start at epoch: 1201
Contact Loss: False
Contact Loss Weight: 0
Contact Loss Sigma (m): 0.005
Independent Obj Scale: False
Ignore other: False
nb_label_class: 6
Image encoder, the branch has latent size 256
DeepSdf - INFO - Finish constructing the dataset
DeepSdf - INFO - start_epoch:1, current_rank:0
DeepSdf - INFO - epoch:1, current_rank:0
Traceback (most recent call last):
File "train.py", line 715, in
main_function(exp_cfg, args.continue_from, args.local_rank, args.opt_level, args.slurm)
File "train.py", line 465, in main_function
for i, (input_iter, label_iter, meta_iter) in enumerate(sdf_loader):
File "/home/gaeun/anaconda3/envs/alignsdf/lib/python3.8/site-packages/torch/utils/data/dataloader.py", line 517, in next
data = self._next_data()
File "/home/gaeun/anaconda3/envs/alignsdf/lib/python3.8/site-packages/torch/utils/data/dataloader.py", line 1199, in _next_data
return self._process_data(data)
File "/home/gaeun/anaconda3/envs/alignsdf/lib/python3.8/site-packages/torch/utils/data/dataloader.py", line 1225, in _process_data
data.reraise()
File "/home/gaeun/anaconda3/envs/alignsdf/lib/python3.8/site-packages/torch/_utils.py", line 429, in reraise
raise self.exc_type(msg)
FileNotFoundError: Caught FileNotFoundError in DataLoader worker process 0.
Original Traceback (most recent call last):
File "/home/gaeun/anaconda3/envs/alignsdf/lib/python3.8/site-packages/torch/utils/data/_utils/worker.py", line 202, in _worker_loop
data = fetcher.fetch(index)
File "/home/gaeun/anaconda3/envs/alignsdf/lib/python3.8/site-packages/torch/utils/data/_utils/fetch.py", line 44, in fetch
data = [self.dataset[idx] for idx in possibly_batched_index]
File "/home/gaeun/anaconda3/envs/alignsdf/lib/python3.8/site-packages/torch/utils/data/_utils/fetch.py", line 44, in
data = [self.dataset[idx] for idx in possibly_batched_index]
File "/home/gaeun/mount4t/AlignSDF/utils/data.py", line 162, in getitem
hand_samples, hand_labels = unpack_sdf_samples(self.data_source, data_key, num_sample, hand=True, clamp=self.clamp, filter_dist=self.filter_dist)
File "/home/gaeun/mount4t/AlignSDF/utils/sdf_utils.py", line 172, in unpack_sdf_samples
npz = np.load(npz_path)
File "/home/gaeun/anaconda3/envs/alignsdf/lib/python3.8/site-packages/numpy/lib/npyio.py", line 405, in load
fid = stack.enter_context(open(os_fspath(file), "rb"))
FileNotFoundError: [Errno 2] No such file or directory: 'data/obman/train/sdf_hand/00018168.npz'

Killing subprocess 12576
Traceback (most recent call last):
File "/home/gaeun/anaconda3/envs/alignsdf/lib/python3.8/runpy.py", line 194, in _run_module_as_main
return _run_code(code, main_globals, None,
File "/home/gaeun/anaconda3/envs/alignsdf/lib/python3.8/runpy.py", line 87, in _run_code
exec(code, run_globals)
File "/home/gaeun/anaconda3/envs/alignsdf/lib/python3.8/site-packages/torch/distributed/launch.py", line 340, in
main()
File "/home/gaeun/anaconda3/envs/alignsdf/lib/python3.8/site-packages/torch/distributed/launch.py", line 326, in main
sigkill_handler(signal.SIGTERM, None) # not coming back
File "/home/gaeun/anaconda3/envs/alignsdf/lib/python3.8/site-packages/torch/distributed/launch.py", line 301, in sigkill_handler
raise subprocess.CalledProcessError(returncode=last_return_code, cmd=cmd)
subprocess.CalledProcessError: Command '['/home/gaeun/anaconda3/envs/alignsdf/bin/python', '-u', 'train.py', '--local_rank=0', '-e', 'experiments/obman/30k_1e2d_mlp5.json']' returned non-zero exit status 1.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions