|
| 1 | +Canonical Rank Approximation (CaRA): An Efficient Fine-Tuning Strategy for Vision Transformers |
| 2 | +********************************************************************************************** |
| 3 | + |
| 4 | +`Lokesh Veeramacheneni <https://lokiv.dev>`__\ :sup:`1`, `Moritz |
| 5 | +Wolter <https://www.wolter.tech/>`__\ :sup:`1`, `Hilde |
| 6 | +Kuehne <https://hildekuehne.github.io/>`__\ :sup:`2`, and `Juergen |
| 7 | +Gall <https://pages.iai.uni-bonn.de/gall_juergen/>`__\ :sup:`1,3` |
| 8 | + |
| 9 | +| 1. *University of Bonn* |
| 10 | +| 2. *University of Tübingen, MIT-IBM Watson AI Lab* |
| 11 | +| 3. *Lamarr Institute for Machine Learning and Artificial Intelligence* |
| 12 | +| |
| 13 | +
|
| 14 | + |
| 15 | +|License| |Arxiv| |Project| |
| 16 | + |
| 17 | +**Keywords:** CaRA, Canonical Polyadic Decomposition, CPD, Tensor methods, ViT, LoRA |
| 18 | + |
| 19 | +**Abstract:** Modern methods for fine-tuning a Vision Transformer (ViT) like Low-Rank Adaptation (LoRA) and its variants demonstrate impressive performance. However, these methods ignore the high-dimensional nature of Multi-Head Attention (MHA) weight tensors. To address this limitation, we propose Canonical Rank Adaptation (CaRA). CaRA leverages tensor mathematics, first by tensorising the transformer into two different tensors; one for projection layers in MHA and the other for feed-forward layers. Second, the tensorised formulation is fine-tuned using the low-rank adaptation in Canonical-Polyadic Decomposition (CPD) form. Employing CaRA efficiently minimizes the number of trainable parameters. Experimentally, CaRA outperforms existing Parameter-Efficient Fine-Tuning (PEFT) methods in visual classification benchmarks such as Visual Task Adaptation Benchmark (VTAB)-1k and Fine-Grained Visual Categorization (FGVC). |
| 20 | + |
| 21 | + |
| 22 | +Note |
| 23 | +**** |
| 24 | +We are commited to providing thoroughly tested and well-packaged code. |
| 25 | +The code will be soon released once the process is completed. |
| 26 | + |
| 27 | + |
| 28 | +Acknowledgments |
| 29 | +=============== |
| 30 | +The code is built on the implementation of `FacT <https://github.com/JieShibo/PETL-ViT/tree/main/FacT>`__. Thanks to `Zahra Ganji <https://github.com/ZahraGanji>`__ for reimplementing VeRA baseline. |
| 31 | + |
| 32 | + |
| 33 | + |
| 34 | +.. |License| image:: https://img.shields.io/badge/License-Apache_2.0-blue.svg |
| 35 | + :target: https://opensource.org/licenses/Apache-2.0 |
| 36 | +.. |Project| image:: https://img.shields.io/badge/Project-Website-blue |
| 37 | + :target: https://lokiv.dev/cara/ |
| 38 | + :alt: Project Page |
| 39 | +.. |Arxiv| image:: https://img.shields.io/badge/OpenReview-Paper-blue |
| 40 | + :target: https://openreview.net/pdf?id=vexHifrbJg |
| 41 | + :alt: Paper |
0 commit comments