Skip to content

Commit 3995c59

Browse files
Add iBOT to Docs with Example (#1855)
* complete lightning examples * finish pytorch example * finish ibot docs
1 parent 5d63696 commit 3995c59

File tree

13 files changed

+1917
-61
lines changed

13 files changed

+1917
-61
lines changed

benchmarks/imagenet/vitb16/ibot.py

Lines changed: 3 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -258,9 +258,9 @@ def configure_gradient_clipping(
258258
def on_before_optimizer_step(self, optimizer: AdamW, *args) -> None:
259259
# Optionally zero out the learning rate of the last layer.
260260
if self.current_epoch < 1:
261-
for param_group in optimizer.param_groups:
262-
if "last_layer" in param_group["name"]:
263-
param_group["lr"] = 0.0
261+
for name, param in self.student_head.named_parameters():
262+
if "last_layer" in name:
263+
param.grad = None
264264

265265
# Apply weight decay schedule
266266
weight_decay = cosine_schedule(

docs/source/examples/ibot.rst

Lines changed: 74 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,74 @@
1+
.. _ibot:
2+
3+
iBOT
4+
======
5+
6+
iBOT (image BERT pre-training with Online Tokenizer) [0]_ is a self-supervised learning framework for visual representation learning. It is based on masked image modeling (MIM) which draws inspiration from pretext task of masked language modeling (MLM) from NLP. It also incorporates ideas from DINO [1]_ for self-distillation and cross-view learning. More specifically, it trains a student network to reconstruct masked image patches by predicting the outputs from an online visual tokenizer (teacher). Then, iBOT jointly optimizes the tokenizer and the representation learner through a combination of masked patch prediction and cross-view self-distillation on the class token. iBOT improves the results in many downstream tasks compared to the previous methods, including image classification, object detection, instance segmentation, and semantic segmentation by showing emerging local semantic patterns.
7+
8+
Key Components
9+
--------------
10+
11+
- **Masked Image Modeling (MIM)**:In iBOT, the student predicts feature representations of masked patches to match the teacher’s outputs, using a self-distillation objective rather than reconstructing pixel values.
12+
- **Online Tokenizer**: iBOT introduces an online tokenizer implemented as a momentum-updated teacher network, eliminating the need for a separate offline tokenizer.
13+
- **Cross-View Self-Distillation**: Like DINO, iBOT uses self-distillation on the [CLS] tokens from two augmented views to encourage semantic consistency.
14+
15+
Good to Know
16+
------------
17+
18+
- **Relation to other SSL methods**: iBOT can be seen as DINO (which only uses image-level objectives) plus patch-level objectives. Further, DINOv2 [2]_ can be seen as a combination of iBOT and Koleo loss with the centering of SwAV [3]_.
19+
20+
Reference:
21+
22+
.. [0] `iBOT: Image BERT Pre-Training with Online Tokenizer, 2021 <https://arxiv.org/abs/2111.07832>`_
23+
.. [1] `Emerging Properties in Self-Supervised Vision Transformers, 2021 <https://arxiv.org/abs/2104.14294>`_
24+
.. [2] `DINOv2: Learning Robust Visual Features without Supervision, 2023 <https://arxiv.org/abs/2304.07193>`_
25+
.. [3] `Unsupervised Learning of Visual Features by Contrasting Cluster Assignments, 2020 <https://arxiv.org/abs/2006.09882>`_
26+
27+
.. tabs::
28+
.. tab:: PyTorch
29+
30+
.. image:: https://img.shields.io/badge/Open%20in%20Colab-blue?logo=googlecolab&label=%20&labelColor=5c5c5c
31+
:target: https://colab.research.google.com/github/lightly-ai/lightly/blob/master/examples/notebooks/pytorch/ibot.ipynb
32+
33+
This example can be run from the command line with::
34+
35+
python lightly/examples/pytorch/ibot.py
36+
37+
.. literalinclude:: ../../../examples/pytorch/ibot.py
38+
39+
.. tab:: Lightning
40+
41+
.. image:: https://img.shields.io/badge/Open%20in%20Colab-blue?logo=googlecolab&label=%20&labelColor=5c5c5c
42+
:target: https://colab.research.google.com/github/lightly-ai/lightly/blob/master/examples/notebooks/pytorch_lightning/ibot.ipynb
43+
44+
This example can be run from the command line with::
45+
46+
python lightly/examples/pytorch_lightning/ibot.py
47+
48+
.. literalinclude:: ../../../examples/pytorch_lightning/ibot.py
49+
50+
.. tab:: Lightning Distributed
51+
52+
.. image:: https://img.shields.io/badge/Open%20in%20Colab-blue?logo=googlecolab&label=%20&labelColor=5c5c5c
53+
:target: https://colab.research.google.com/github/lightly-ai/lightly/blob/master/examples/notebooks/pytorch_lightning_distributed/ibot.ipynb
54+
55+
This example runs on multiple gpus using Distributed Data Parallel (DDP)
56+
training with Pytorch Lightning. At least one GPU must be available on
57+
the system. The example can be run from the command line with::
58+
59+
python lightly/examples/pytorch_lightning_distributed/ibot.py
60+
61+
The model differs in the following ways from the non-distributed
62+
implementation:
63+
64+
- Distributed Data Parallel is enabled
65+
- Synchronized Batch Norm is used in place of standard Batch Norm
66+
- Distributed Sampling is used in the dataloader
67+
68+
Note that Synchronized Batch Norm is optional and the model can also be
69+
trained without it. Without Synchronized Batch Norm the batch norm for
70+
each GPU is only calculated based on the features on that specific GPU.
71+
Distributed Sampling makes sure that each distributed process sees only
72+
a subset of the data.
73+
74+
.. literalinclude:: ../../../examples/pytorch_lightning_distributed/ibot.py

docs/source/examples/models.rst

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -18,6 +18,7 @@ for PyTorch and PyTorch Lightning to give you a headstart when implementing your
1818
dino.rst
1919
dinov2.rst
2020
fastsiam.rst
21+
ibot.rst
2122
mae.rst
2223
mmcr.rst
2324
msn.rst

0 commit comments

Comments
 (0)