You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: examples/analyzing-kaggle-datasets.ipynb
+17-7Lines changed: 17 additions & 7 deletions
Original file line number
Diff line number
Diff line change
@@ -18,13 +18,7 @@
18
18
"[](https://colab.research.google.com/github/visual-layer/fastdup/blob/main/examples/analyzing-kaggle-datasets.ipynb)\n",
19
19
"[](https://kaggle.com/kernels/welcome?src=https://github.com/visual-layer/fastdup/blob/main/examples/analyzing-kaggle-datasets.ipynb)\n",
20
20
"\n",
21
-
"This notebook shows how you can use fastdup to analyze any datasets from [Kaggle](https://kaggle.com).\n",
22
-
"\n",
23
-
"We will analyze an image classification dataset for:\n",
24
-
"\n",
25
-
"+ Duplicates / near-duplicates.\n",
26
-
"+ Outliers.\n",
27
-
"+ Wrong labels."
21
+
"This notebook shows how you can use [fastdup](https://github.com/visual-layer/fastdup) to analyze any computer vision datasets from [Kaggle](https://kaggle.com)."
28
22
]
29
23
},
30
24
{
@@ -181,6 +175,14 @@
181
175
"!unzip -q the-rvlcdip-dataset-test.zip"
182
176
]
183
177
},
178
+
{
179
+
"cell_type": "markdown",
180
+
"id": "1f8d6b66-3f53-4afb-b040-c5d91a628608",
181
+
"metadata": {},
182
+
"source": [
183
+
"Once completed, we should have a folder with the name `test/` which contains all the images from the dataset."
184
+
]
185
+
},
184
186
{
185
187
"cell_type": "markdown",
186
188
"id": "41f2abee-1251-4500-8ebf-90c593b6157a",
@@ -246,6 +248,14 @@
246
248
"## Run fastdup"
247
249
]
248
250
},
251
+
{
252
+
"cell_type": "markdown",
253
+
"id": "a10910f4-b772-400b-96b6-f44b62b97fe0",
254
+
"metadata": {},
255
+
"source": [
256
+
"To run fastdup, we only need to point `input_dir` to the folder containing images from the dataset."
0 commit comments