Skip to content
This repository was archived by the owner on Nov 13, 2025. It is now read-only.

Commit f0d645b

Browse files
committed
Update with rst file
1 parent 9a01bc7 commit f0d645b

File tree

2 files changed

+41
-13
lines changed

2 files changed

+41
-13
lines changed

README.md

Lines changed: 0 additions & 13 deletions
This file was deleted.

README.rst

Lines changed: 41 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,41 @@
1+
Canonical Rank Approximation (CaRA): An Efficient Fine-Tuning Strategy for Vision Transformers
2+
**********************************************************************************************
3+
4+
`Lokesh Veeramacheneni <https://lokiv.dev>`__\ :sup:`1`, `Moritz
5+
Wolter <https://www.wolter.tech/>`__\ :sup:`1`, `Hilde
6+
Kuehne <https://hildekuehne.github.io/>`__\ :sup:`2`, and `Juergen
7+
Gall <https://pages.iai.uni-bonn.de/gall_juergen/>`__\ :sup:`1,3`
8+
9+
| 1. *University of Bonn*
10+
| 2. *University of Tübingen, MIT-IBM Watson AI Lab*
11+
| 3. *Lamarr Institute for Machine Learning and Artificial Intelligence*
12+
|
13+
14+
15+
|License| |Arxiv| |Project|
16+
17+
**Keywords:** CaRA, Canonical Polyadic Decomposition, CPD, Tensor methods, ViT, LoRA
18+
19+
**Abstract:** Modern methods for fine-tuning a Vision Transformer (ViT) like Low-Rank Adaptation (LoRA) and its variants demonstrate impressive performance. However, these methods ignore the high-dimensional nature of Multi-Head Attention (MHA) weight tensors. To address this limitation, we propose Canonical Rank Adaptation (CaRA). CaRA leverages tensor mathematics, first by tensorising the transformer into two different tensors; one for projection layers in MHA and the other for feed-forward layers. Second, the tensorised formulation is fine-tuned using the low-rank adaptation in Canonical-Polyadic Decomposition (CPD) form. Employing CaRA efficiently minimizes the number of trainable parameters. Experimentally, CaRA outperforms existing Parameter-Efficient Fine-Tuning (PEFT) methods in visual classification benchmarks such as Visual Task Adaptation Benchmark (VTAB)-1k and Fine-Grained Visual Categorization (FGVC).
20+
21+
22+
Note
23+
****
24+
We are commited to providing thoroughly tested and well-packaged code.
25+
The code will be soon released once the process is completed.
26+
27+
28+
Acknowledgments
29+
===============
30+
The code is built on the implementation of `FacT <https://github.com/JieShibo/PETL-ViT/tree/main/FacT>`__. Thanks to `Zahra Ganji <https://github.com/ZahraGanji>`__ for reimplementing VeRA baseline.
31+
32+
33+
34+
.. |License| image:: https://img.shields.io/badge/License-Apache_2.0-blue.svg
35+
:target: https://opensource.org/licenses/Apache-2.0
36+
.. |Project| image:: https://img.shields.io/badge/Project-Website-blue
37+
:target: https://lokiv.dev/cara/
38+
:alt: Project Page
39+
.. |Arxiv| image:: https://img.shields.io/badge/OpenReview-Paper-blue
40+
:target: https://openreview.net/pdf?id=vexHifrbJg
41+
:alt: Paper

0 commit comments

Comments
 (0)