-
Notifications
You must be signed in to change notification settings - Fork 10
Description
How do I train with sdf_hand_mini and sdf_obj_mini that you uploaded?
I think there is a .npz file that doesn't exist because I put it in mini version.
(alignsdf) MS-7B23:~/mount4t/AlignSDF$ CUDA_VISIBLE_DEVICES=0 bash dist_train.sh 4 6666 -e experiments/obman/30k_1e2d_mlp5.json
do not support renderer in this machine
DeepSdf - INFO - Added key: store_based_barrier_key:1 to store for rank: 0
DeepSdf - INFO - Training in distributed mode, 1 GPU per process. Process 0, total 1.
DeepSdf - INFO - Experiment description:
3D hand reconstruction on the mini obman dataset.
Hand branch: True
Object branch: True
Mano branch: False
Depth branch: False
Classifier Weight: 0
Penetration Loss: False
Penetration Loss Weight: 0
Additional Loss start at epoch: 1201
Contact Loss: False
Contact Loss Weight: 0
Contact Loss Sigma (m): 0.005
Independent Obj Scale: False
Ignore other: False
nb_label_class: 6
Image encoder, the branch has latent size 256
DeepSdf - INFO - Finish constructing the dataset
DeepSdf - INFO - start_epoch:1, current_rank:0
DeepSdf - INFO - epoch:1, current_rank:0
Traceback (most recent call last):
File "train.py", line 715, in
main_function(exp_cfg, args.continue_from, args.local_rank, args.opt_level, args.slurm)
File "train.py", line 465, in main_function
for i, (input_iter, label_iter, meta_iter) in enumerate(sdf_loader):
File "/home/gaeun/anaconda3/envs/alignsdf/lib/python3.8/site-packages/torch/utils/data/dataloader.py", line 517, in next
data = self._next_data()
File "/home/gaeun/anaconda3/envs/alignsdf/lib/python3.8/site-packages/torch/utils/data/dataloader.py", line 1199, in _next_data
return self._process_data(data)
File "/home/gaeun/anaconda3/envs/alignsdf/lib/python3.8/site-packages/torch/utils/data/dataloader.py", line 1225, in _process_data
data.reraise()
File "/home/gaeun/anaconda3/envs/alignsdf/lib/python3.8/site-packages/torch/_utils.py", line 429, in reraise
raise self.exc_type(msg)
FileNotFoundError: Caught FileNotFoundError in DataLoader worker process 0.
Original Traceback (most recent call last):
File "/home/gaeun/anaconda3/envs/alignsdf/lib/python3.8/site-packages/torch/utils/data/_utils/worker.py", line 202, in _worker_loop
data = fetcher.fetch(index)
File "/home/gaeun/anaconda3/envs/alignsdf/lib/python3.8/site-packages/torch/utils/data/_utils/fetch.py", line 44, in fetch
data = [self.dataset[idx] for idx in possibly_batched_index]
File "/home/gaeun/anaconda3/envs/alignsdf/lib/python3.8/site-packages/torch/utils/data/_utils/fetch.py", line 44, in
data = [self.dataset[idx] for idx in possibly_batched_index]
File "/home/gaeun/mount4t/AlignSDF/utils/data.py", line 162, in getitem
hand_samples, hand_labels = unpack_sdf_samples(self.data_source, data_key, num_sample, hand=True, clamp=self.clamp, filter_dist=self.filter_dist)
File "/home/gaeun/mount4t/AlignSDF/utils/sdf_utils.py", line 172, in unpack_sdf_samples
npz = np.load(npz_path)
File "/home/gaeun/anaconda3/envs/alignsdf/lib/python3.8/site-packages/numpy/lib/npyio.py", line 405, in load
fid = stack.enter_context(open(os_fspath(file), "rb"))
FileNotFoundError: [Errno 2] No such file or directory: 'data/obman/train/sdf_hand/00018168.npz'
Killing subprocess 12576
Traceback (most recent call last):
File "/home/gaeun/anaconda3/envs/alignsdf/lib/python3.8/runpy.py", line 194, in _run_module_as_main
return _run_code(code, main_globals, None,
File "/home/gaeun/anaconda3/envs/alignsdf/lib/python3.8/runpy.py", line 87, in _run_code
exec(code, run_globals)
File "/home/gaeun/anaconda3/envs/alignsdf/lib/python3.8/site-packages/torch/distributed/launch.py", line 340, in
main()
File "/home/gaeun/anaconda3/envs/alignsdf/lib/python3.8/site-packages/torch/distributed/launch.py", line 326, in main
sigkill_handler(signal.SIGTERM, None) # not coming back
File "/home/gaeun/anaconda3/envs/alignsdf/lib/python3.8/site-packages/torch/distributed/launch.py", line 301, in sigkill_handler
raise subprocess.CalledProcessError(returncode=last_return_code, cmd=cmd)
subprocess.CalledProcessError: Command '['/home/gaeun/anaconda3/envs/alignsdf/bin/python', '-u', 'train.py', '--local_rank=0', '-e', 'experiments/obman/30k_1e2d_mlp5.json']' returned non-zero exit status 1.