Skip to content

Commit 4ad6eff

Browse files
authored
Merge pull request #401 from mchanchee/patch-2
Fix ViT image
2 parents 1b42906 + 1d0e4b9 commit 4ad6eff

File tree

1 file changed

+1
-1
lines changed

1 file changed

+1
-1
lines changed

chapters/en/unit3/vision-transformers/vision-transformers-for-image-classification.mdx

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -9,7 +9,7 @@ As the Transformers architecture scaled well in Natural Language Processing, the
99
To summarize, in Vision transformer, images are reorganized as 2D grids of patches. The models are trained on those patches.
1010

1111
The main idea can be found at the picture below:
12-
![Vision Transformer](https://huggingface.co/datasets/hf-vision/course-assets/blob/main/Screenshot%20from%202024-12-27%2014-25-49.png)
12+
![Vision Transformer](https://huggingface.co/datasets/hf-vision/course-assets/resolve/main/Screenshot%20from%202024-12-27%2014-25-49.png)
1313

1414
But there is a catch! The Convolutional Neural Networks (CNN) are designed with an assumption missing in the VT. This assumption is based on how we perceive the objects in the images as humans. It is described in the following section.
1515

0 commit comments

Comments
 (0)