Skip to content

Commit aec39d2

Browse files
pose pca loss default bugfix (#381)
1 parent e9b1c92 commit aec39d2

File tree

10 files changed

+267
-45
lines changed

10 files changed

+267
-45
lines changed

.github/workflows/lint.yml

Lines changed: 10 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -22,7 +22,11 @@ jobs:
2222
python-version: '3.10'
2323

2424
- name: Install linters
25-
run: pip install autopep8 flake8
25+
run: pip install autopep8 flake8 isort
26+
27+
- name: Check import sorting with isort
28+
run: isort --check-only --diff lightning_pose tests
29+
# Reads config from [tool.isort] in pyproject.toml
2630

2731
- name: Check formatting with autopep8
2832
run: autopep8 --diff --recursive --exit-code lightning_pose tests
@@ -31,6 +35,10 @@ jobs:
3135
- name: Lint with flake8 (critical errors only)
3236
run: flake8 lightning_pose tests --select=E9,F63,F7,F82
3337
# Reads config from .flake8 file
38+
# F9: syntax errors, can't parse file
39+
# F63: invalid ** use in `assert` or `raise`
40+
# F7: syntax errors in type annotations
41+
# F82: undefined names (var or function that was never imported)
3442

3543
- name: Show fix instructions if formatting needed
3644
if: failure()
@@ -40,6 +48,7 @@ jobs:
4048
echo ""
4149
echo "To fix formatting issues locally, run:"
4250
echo " autopep8 --in-place --recursive lightning_pose tests"
51+
echo " isort lightning_pose tests"
4352
echo ""
4453
echo "To check for flake8 errors locally, run:"
4554
echo " flake8 lightning_pose tests --select=E9,F63,F7,F82"

README.md

Lines changed: 18 additions & 8 deletions
Original file line numberDiff line numberDiff line change
@@ -10,9 +10,9 @@
1010
Lightning Pose is an end-to-end toolkit designed for robust multi-view and single-view animal
1111
pose estimation using advanced transformer architectures. It leverages Multi-View Transformers
1212
and patch-masking training to learn geometric relationships between views,
13-
resulting in strong performance on occlusions [Aharon, Lee et al. 2025](https://arxiv.org/abs/2510.09903).
14-
For single-view datasets it leverages temporal context and learned plausibility constraints for strong performance
15-
in challenging scenarios [Biderman, Whiteway et al. 2024, Nature Methods](https://rdcu.be/dLP3z).
13+
resulting in strong performance on occlusions [Aharon, et al. 2025](https://arxiv.org/abs/2510.09903).
14+
For single-view datasets it leverages temporal context and learned plausibility constraints for
15+
strong performance in challenging scenarios [Biderman, Whiteway et al. 2024, Nature Methods](https://rdcu.be/dLP3z).
1616
It has a rich GUI that supports the end-to-end workflow: labeling, model management, and evaluation.
1717

1818

@@ -64,10 +64,20 @@ a simple and performant post-processor that works with any pose estimation packa
6464
Lightning Pose, DeepLabCut, and SLEAP.
6565

6666
Lightning Pose is primarily maintained by
67-
[Karan Sikka](https://github.com/ksikka) (Columbia University),
68-
[Matt Whiteway](https://themattinthehatt.github.io) (Columbia University),
69-
and
70-
[Dan Biderman](https://dan-biderman.netlify.app) (Stanford University).
67+
[Karan Sikka](https://github.com/ksikka) (Columbia University) and
68+
[Matt Whiteway](https://themattinthehatt.github.io) (Columbia University).
7169

7270
Lightning Pose is under active development and we welcome community contributions.
73-
Whether you want to implement some of your own ideas or help out with our [development roadmap](docs/roadmap.md), please get in touch with us on Discord (see contributing guidelines [here](CONTRIBUTING.md)).
71+
Whether you want to implement some of your own ideas or help out with our [development roadmap](docs/roadmap.md), please get in touch with us on Discord (see contributing guidelines [here](CONTRIBUTING.md)).
72+
73+
## Funding
74+
75+
We are grateful for support from the following:
76+
* Gatsby Charitable Foundation GAT3708
77+
* [NIH R50NS145433](https://reporter.nih.gov/search/Hmj4KMmLv0evcYPlPEDa-Q/project-details/11240675)
78+
* [NIH U19NS123716](https://reporter.nih.gov/search/Hmj4KMmLv0evcYPlPEDa-Q/project-details/11141703)
79+
* [NSF 1707398](https://ui.adsabs.harvard.edu/abs/2017nsf....1707398A/abstract)
80+
* [The NSF AI Institute for Artificial and Natural Intelligence](https://ui.adsabs.harvard.edu/abs/2023nsf....2229929Z/abstract)
81+
* Simons Foundation
82+
* Wellcome Trust 216324
83+
* Zuckerman Institute (Columbia University) Team Science

docs/source/directory_structure_reference/model_config_file.rst

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -217,7 +217,7 @@ The following parameters relate to model architecture and unsupervised losses.
217217
* vits_dinov3: Vision Transformer (Small) pretrained on ImageNet with DINOv3
218218
* vitb_dino: Vision Transformer (Base) pretrained on ImageNet with DINO
219219
* vitb_dinov2: Vision Transformer (Base) pretrained on ImageNet with DINOv2
220-
* vitb_dinov3: Vision Transformer (Base) pretrained on ImageNet with DINOv3
220+
* vitb_dinov3: Vision Transformer (Base) pretrained on ImageNet with DINOv3; note this is a gated repo and you will need a Hugging Face account
221221
* vitb_imagenet: Vision Transformer (Base) pretrained on ImageNet with MAE loss
222222
* vitb_sam: Segment Anything Model (Vision Transformer Base)
223223

docs/source/user_guide_multiview/patch_masking_3d_loss.rst

Lines changed: 27 additions & 23 deletions
Original file line numberDiff line numberDiff line change
@@ -33,7 +33,8 @@ To encourage the model to develop this cross-view reasoning during training, we
3333
space patch masking scheme inspired by the success of masked autoencoders and dropout.
3434
We use a training curriculum that starts with a short warmup period where no patches are masked
3535
(controlled by ``training.patch_mask.init_epoch`` in the config file), then increase the ratio of
36-
masked patches over the course of training (controlled by ``training.patch_mask.init_ratio`` and ``training.patch_mask.final_ratio``).
36+
masked patches over the course of training
37+
(controlled by ``training.patch_mask.init_ratio`` and ``training.patch_mask.final_ratio``).
3738
This technique creates gradients that flow through the attention mechanism and encourage
3839
cross-view information propagation, which in turn develops internal representations that capture
3940
statistical relationships between the different views.
@@ -49,8 +50,14 @@ statistical relationships between the different views.
4950
5051
To turn patch masking off, set ``final_ratio: 0.0``.
5152

52-
3D augmentations and losses
53-
===========================
53+
3D augmentations and loss
54+
=========================
55+
56+
.. note::
57+
58+
As of March 2026, the unsupervised losses introduced in the original Lightning Pose paper have
59+
not yet been implemented for the ``multi-view transformer`` model, including the
60+
``pca_multiview`` loss.
5461

5562
The MVT produces a 2D heatmap for each keypoint in each view.
5663
Without explicit geometric constraints, it is possible for these individual 2D predictions to be
@@ -61,14 +68,14 @@ encourage geometric consistency in the outputs
6168
formats for camera calibration; note also that bounding box information must be shared if the
6269
training images are cropped from larger frames).
6370

64-
The 3D losses require geometrically consistent input images, which precludes applying geometric
71+
The 3D loss requires geometrically consistent input images, which precludes applying geometric
6572
augmentations like rotation to each view independently.
6673
Instead, we triangulate the ground truth labels and augment the 3D poses by translating and scaling in 3D space.
6774
The augmented 3D pose is then projected back to individual 2D views.
6875
These augmentations do not affect the camera parameters;
6976
rather, they are equivalent to keeping the cameras fixed and scaling and translating the subject within the scene.
7077
For each view, we then estimate the affine transformation from the original to augmented 2D keypoints,
71-
and apply this transformation to the original image
78+
and apply this transformation to the original image.
7279

7380
To enable 3D augmentations, add the ``imgaug_3d`` field to the ``training`` section of your configuration
7481
file and set it to `true`:
@@ -79,30 +86,27 @@ file and set it to `true`:
7986
imgaug: dlc
8087
imgaug_3d: true
8188
82-
Pairwise projection loss
83-
------------------------
84-
To compute the 3D pairwise projection loss, we first take the soft argmax of the 2D heatmaps to get predicted coordinates.
85-
Then, for each keypoint, and for each pair of views, we triangulate both the ground truth keypoints
86-
and the predictions, and compute the mean square error between the two.
87-
The 3D loss is weighted by a hyperparameter, which is set in the ``losses`` section of the
88-
configuration file:
89+
To compute the 3D reprojection loss, we:
8990

90-
.. code-block:: yaml
91+
1. take the soft argmax of the 2D heatmaps to get predicted coordinates.
92+
2. for each keypoint, and for each pair of views, we triangulate the predictions into 3D
93+
3. project the predicted 3D points back into 2D coordinates for each view
94+
4. turn these reprojected coordinates into heatmaps
95+
5. computes the mean square error between the reprojected and ground truth heatmaps.
9196

92-
losses:
93-
supervised_pairwise_projections:
94-
log_weight: 0.5
95-
96-
Reprojected heatmap loss
97-
------------------------
98-
An alternative loss projects the predicted 3D points back into 2D coordinates for each view,
99-
turns these reprojected coordinates into heatmaps, and computes the mean square error between the
100-
reprojected and ground truth heatmaps.
10197
The advantage of this loss is that it is on the same scale as the standard supervised heatmap loss,
10298
which may make for easier hyperparameter tuning.
10399

100+
The default ``log_weight`` value of 1.0 should be a reasonable place to start; if the training curve
101+
for this loss is unstable (for example it doesn't decrease, or spikes to a large value during training),
102+
you can _decrease_ the effect of the 3D loss by _increasing_ the log_weight; we recommend a secondary
103+
value of 1.5.
104+
104105
.. code-block:: yaml
105106
106107
losses:
107108
supervised_reprojection_heatmap_mse:
108-
log_weight: 0.5
109+
log_weight: 1.0
110+
111+
To turn this loss off (but, for example, continue to use 3D augmentations), set
112+
``log_weight: null`` in the config file.

0 commit comments

Comments
 (0)