|
62 | 62 | "\n", |
63 | 63 | "First, let's install the neccessary packages:\n", |
64 | 64 | "\n", |
65 | | - "- fastdup - To analyze issues in the dataset.\n", |
| 65 | + "- [fastdup](https://github.com/visual-layer/fastdup) - To analyze issues in the dataset.\n", |
66 | 66 | "- [TIMM (PyTorch Image Models)](https://github.com/huggingface/pytorch-image-models) - To acquire pre-trained models." |
67 | 67 | ] |
68 | 68 | }, |
|
130 | 130 | "metadata": {}, |
131 | 131 | "source": [ |
132 | 132 | "## List TIMM Models\n", |
133 | | - "There are over a thousand models on TIMM. Let's list down models that match the keyword `dino`." |
| 133 | + "There are currently 1212 computer vision models on TIMM. Pick a model of your choice to compute the embedding with.\n", |
| 134 | + "\n", |
| 135 | + "Now, pick a model of your choice. For demonstration, we will go with a relatively new model `vit_small_patch14_dinov2.lvd142m` from MetaAI. \n", |
| 136 | + "\n", |
| 137 | + "Let's list down models that match the keyword `dino`." |
134 | 138 | ] |
135 | 139 | }, |
136 | 140 | { |
|
171 | 175 | "id": "633dce0c-47eb-4039-8cd4-a36874c49b8a", |
172 | 176 | "metadata": {}, |
173 | 177 | "source": [ |
174 | | - "Now, pick a model of your choice. For demonstration, we will go with a relatively new model `vit_small_patch14_dinov2.lvd142m` from MetaAI. \n", |
175 | | - "\n", |
176 | 178 | "DINOv2 models produce high-performance visual features that can be directly employed with classifiers as simple as linear layers on a variety of computer vision tasks; these visual features are robust and perform well across domains without any requirement for fine-tuning. Read more about DINOv2 [here](https://github.com/facebookresearch/dinov2).\n", |
177 | 179 | "\n", |
178 | 180 | "It makes sense for us to use DINOv2 as a model to create an embedding of the dataset." |
|
288 | 290 | "source": [ |
289 | 291 | "## Run fastdup\n", |
290 | 292 | "\n", |
291 | | - "Now what's left is to load the embeddings into fastdup and run an analysis to surface dataset issues." |
| 293 | + "Now let's load the embeddings into fastdup and run an analysis to surface dataset issues." |
292 | 294 | ] |
293 | 295 | }, |
294 | 296 | { |
|
2467 | 2469 | "metadata": {}, |
2468 | 2470 | "source": [ |
2469 | 2471 | "## Wrap Up\n", |
2470 | | - "In this tutorial, we showed how you can run fastdup using pre-computed feature vectors. Running over pre-computed feature vectors significantly reduces run time compared to running over raw image files.\n", |
| 2472 | + "In this tutorial, we showed how you can compute embeddings on your dataset using TIMM and run fastdup on top of it to surface dataset issues.\n", |
| 2473 | + "\n", |
| 2474 | + "Questions about this tutorial? Reach out to us on our [Slack channel](https://visuallayer.slack.com/)!\n", |
| 2475 | + "\n", |
| 2476 | + "\n", |
2471 | 2477 | "\n", |
2472 | 2478 | "Next, feel free to check out other tutorials -\n", |
2473 | 2479 | "\n", |
|
0 commit comments