Training Error

Hi @zhyever ,

I recently tried out patchfusion model using this repo.
Currently I'm trying to run the training script to train a model by myself.

Following the training steps in [https://github.com/zhyever/PatchFusion/blob/main/docs/user_training.md] , I was able to run coarse and fine model training for depth_anything_vitb model.
But facing the below error while running the training  for fusion model.

> 
> [rank0]:     trainer.run()                                                                                                 
> [rank0]:   File "PatchFusion/estimator/trainer/trainer.py", line 32
> 6, in run                                                                                                                  
> [rank0]:     self.train_epoch(epoch_idx)                                                                                   
> [rank0]:   File "PatchFusion/estimator/trainer/trainer.py", line 25
> 0, in train_epoch                                                                                                          
> [rank0]:     self.optimizer_wrapper.update_params(total_loss)                                                              
> [rank0]:   File "lib/python3.8/site-packages/mmengine/optim/optimizer/optimizer_wrapper.py", 
> line 196, in update_params                                                                                                 
> [rank0]:     self.backward(loss)                                                                                           
> [rank0]:   File "lib/python3.8/site-packages/mmengine/optim/optimizer/optimizer_wrapper.py", 
> line 220, in backward                                                                                                      
> [rank0]:     loss.backward(**kwargs)                                                                                       
> [rank0]:   File "lib/python3.8/site-packages/torch/_tensor.py", line 525, in backward        
> [rank0]:     torch.autograd.backward(                                                                                      
> [rank0]:   File "lib/python3.8/site-packages/torch/autograd/__init__.py", line 260, in backwa
> rd                                                                                                                         
> [rank0]:     grad_tensors_ = _make_grads(tensors, grad_tensors_, is_grads_batched=False)                                   
> [rank0]:   File "lib/python3.8/site-packages/torch/autograd/__init__.py", line 133, in _make_
> grads                                                                                                                      
> [rank0]:     raise RuntimeError(                                                                                           
> [rank0]: RuntimeError: grad can be implicitly created only for scalar outputs 


Could you please give some inputs on this?
Is there anything to be modified on the script?


One more question out of this. Do we have any onnx/tensorrt or any other deployment model version for patchfusion?

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Training Error #37

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Training Error #37

Description

Metadata

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Issue actions