@@ -33,7 +33,8 @@ To encourage the model to develop this cross-view reasoning during training, we
3333space patch masking scheme inspired by the success of masked autoencoders and dropout.
3434We use a training curriculum that starts with a short warmup period where no patches are masked
3535(controlled by ``training.patch_mask.init_epoch `` in the config file), then increase the ratio of
36- masked patches over the course of training (controlled by ``training.patch_mask.init_ratio `` and ``training.patch_mask.final_ratio ``).
36+ masked patches over the course of training
37+ (controlled by ``training.patch_mask.init_ratio `` and ``training.patch_mask.final_ratio ``).
3738This technique creates gradients that flow through the attention mechanism and encourage
3839cross-view information propagation, which in turn develops internal representations that capture
3940statistical relationships between the different views.
@@ -49,8 +50,14 @@ statistical relationships between the different views.
4950
5051 To turn patch masking off, set ``final_ratio: 0.0 ``.
5152
52- 3D augmentations and losses
53- ===========================
53+ 3D augmentations and loss
54+ =========================
55+
56+ .. note ::
57+
58+ As of March 2026, the unsupervised losses introduced in the original Lightning Pose paper have
59+ not yet been implemented for the ``multi-view transformer `` model, including the
60+ ``pca_multiview `` loss.
5461
5562The MVT produces a 2D heatmap for each keypoint in each view.
5663Without explicit geometric constraints, it is possible for these individual 2D predictions to be
@@ -61,14 +68,14 @@ encourage geometric consistency in the outputs
6168formats for camera calibration; note also that bounding box information must be shared if the
6269training images are cropped from larger frames).
6370
64- The 3D losses require geometrically consistent input images, which precludes applying geometric
71+ The 3D loss requires geometrically consistent input images, which precludes applying geometric
6572augmentations like rotation to each view independently.
6673Instead, we triangulate the ground truth labels and augment the 3D poses by translating and scaling in 3D space.
6774The augmented 3D pose is then projected back to individual 2D views.
6875These augmentations do not affect the camera parameters;
6976rather, they are equivalent to keeping the cameras fixed and scaling and translating the subject within the scene.
7077For each view, we then estimate the affine transformation from the original to augmented 2D keypoints,
71- and apply this transformation to the original image
78+ and apply this transformation to the original image.
7279
7380To enable 3D augmentations, add the ``imgaug_3d `` field to the ``training `` section of your configuration
7481file and set it to `true `:
@@ -79,30 +86,27 @@ file and set it to `true`:
7986 imgaug : dlc
8087 imgaug_3d : true
8188
82- Pairwise projection loss
83- ------------------------
84- To compute the 3D pairwise projection loss, we first take the soft argmax of the 2D heatmaps to get predicted coordinates.
85- Then, for each keypoint, and for each pair of views, we triangulate both the ground truth keypoints
86- and the predictions, and compute the mean square error between the two.
87- The 3D loss is weighted by a hyperparameter, which is set in the ``losses `` section of the
88- configuration file:
89+ To compute the 3D reprojection loss, we:
8990
90- .. code-block :: yaml
91+ 1. take the soft argmax of the 2D heatmaps to get predicted coordinates.
92+ 2. for each keypoint, and for each pair of views, we triangulate the predictions into 3D
93+ 3. project the predicted 3D points back into 2D coordinates for each view
94+ 4. turn these reprojected coordinates into heatmaps
95+ 5. computes the mean square error between the reprojected and ground truth heatmaps.
9196
92- losses :
93- supervised_pairwise_projections :
94- log_weight : 0.5
95-
96- Reprojected heatmap loss
97- ------------------------
98- An alternative loss projects the predicted 3D points back into 2D coordinates for each view,
99- turns these reprojected coordinates into heatmaps, and computes the mean square error between the
100- reprojected and ground truth heatmaps.
10197The advantage of this loss is that it is on the same scale as the standard supervised heatmap loss,
10298which may make for easier hyperparameter tuning.
10399
100+ The default ``log_weight `` value of 1.0 should be a reasonable place to start; if the training curve
101+ for this loss is unstable (for example it doesn't decrease, or spikes to a large value during training),
102+ you can _decrease_ the effect of the 3D loss by _increasing_ the log_weight; we recommend a secondary
103+ value of 1.5.
104+
104105.. code-block :: yaml
105106
106107 losses :
107108 supervised_reprojection_heatmap_mse :
108- log_weight : 0.5
109+ log_weight : 1.0
110+
111+ To turn this loss off (but, for example, continue to use 3D augmentations), set
112+ ``log_weight: null `` in the config file.
0 commit comments