!NOTE! The trained models available as weights have been trained on a specific train/test split of the cluster. This means that validation against a different split will result in significantly destorted results because the original training cluster will probably be part of the new validation set. Therefore, see results in the report or retrain models accordingly.
Precise depth estimation is essential for understanding 3D scenes. In this work, we present CapNet, a compositional architecture for monocular depth estimation. Using a pretrained model and combining it with advanced sampling strategies, we are able to increase accuracy and outperform standard scaling methods as well as U-Net architectures. Giving an approach to use when pretrained models with wrong dimensions are available, but simply scaling does not yield good enough accuracy.
Its easy to try CapNet yourself. Follow these steps:
Download the Dataset from Kaggle. Our program expects the following folder structure:
data/
├── test/
│ ├── test_000000_rgb.png
│ ├── test_000001_rgb.png
│ └── ...
├── train/
│ ├── sample_000000_rgb.png
│ ├── sample_000000_depth.npy
│ ├── sample_000001_rgb.png
│ ├── sample_000002_depth.npy
│ └── ...
├── test_list.txt
└── train_list.txt
- Install Python 3.12
- (optional) use venv
- Install dependencies
pip install -r requirements.txt
Since the dataset contains many very similar images, we need to cluster them first to be able to create a clean train/validation split. For this run
python create_cluster.py --data-dir you/path/data/train --output-dir you/path/to/output
Note this code is best run on a GPU. The code checks if a accelerator is available and runs automatically on it.
To train the models use the following command:
python main.py [ARGS]
There are many arguments that can be passed to the code:
data-dir: Must point to the data. In the above example the path must end withdatamodel: Specify which model to use. Available models are:HRNetPixelShuffle,OmniNaiveScaling,ResNetNaiveUpsample,CapNetLite,CapNet,CapNetMaxcluster-file: The path to the cluster file generated in the previous step.
load-model: Path to a pre-trained model to load. The model must match the specified --model architecture.output-dir: Path to the output directory where results and predictions will be saved
Hyperparameters:
seed: Random seed for reproducibility. Used for torch and randombatch-size: Batch size for training and inferencelearning-rate: Learning rate for the optimizerweight-decay: Weight decay for the optimizernum-epochs: Number of epochs for trainingnum-workers: Number of workers for data loadingtraining-size: Fraction of training data to use for training (0.0 to 1.0)