Skip to content

Abnormal Training Phenomena and Bad Performance #22

@sunpihai-up

Description

@sunpihai-up

Dear Luigi Piccinelli,
I hope this message finds you well. I wanted to express my sincere appreciation for your exceptional article. Inspired by your work, I attempted to train your project on the KITTI Eigen partitioning dataset.

However, during my training process, I encountered several abnormal phenomena that I would like to bring to your attention:

  1. The loss curve consistently showed a downward trend, but the evaluation indicators' curves reached a stable state very early on.
  2. I observed poor and even abnormal performances across various evaluation indicators.

Here is a screenshot depicting the issue:
image

To accommodate the equipment I am using (a single machine with four RTX 3090s and no SLURM), I modified the distributed training setup from SLURM to standard DDP (DistributedDataParallel).
Additionally, I made some modifications in the dataloader directory to align with the directory structure of my existing KITTI dataset. I believe these changes should not be the cause of the undesirable results, as the code correctly outputs messages such as "Loaded 23158 images. Totally 0 invalid pairs are filtered" and "Loaded 652 images. Totally 45 invalid pairs are filtered."
Furthermore, in order to track the training process using TensorBoard, I incorporated some code in the training section to generate and save log information.
Apart from these adjustments, I have not made any additional modifications to the code. Specifically, the config file remains the same as the one you provided.

I would greatly appreciate your valuable insights and guidance regarding these issues. If there are any specific details or additional information I can provide to assist in troubleshooting, please let me know. Thank you once again for your remarkable contribution to the field.Best regards

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions