|
19 | 19 | "\n", |
20 | 20 | "In this notebook we load satellite data from Mafat Competition https://mafatchallenge.mod.gov.il/, which consists of 16 bit grayscale images with rotated bounding boxes.\n", |
21 | 21 | "\n", |
| 22 | + "The dataset is also available on Kaggle [here](https://www.kaggle.com/datasets/dragonzhang/mafat-train-dataset).\n", |
| 23 | + "\n", |
22 | 24 | "We show how to work with this dataset using fastdup. It takes 140 seconds to process 18,000 bounding boxes and find all similarities.\n", |
23 | 25 | "\n", |
24 | 26 | "We use components gallery to highly suspected wrong bounding boxes as well as correct bounding boxes.\n" |
|
165 | 167 | } |
166 | 168 | ], |
167 | 169 | "source": [ |
168 | | - "# install latst fastdup (required 0.904 or up)\n", |
169 | | - "%pip install fastdup -U --force-reinstall" |
| 170 | + "!pip install fastdup -Uq" |
170 | 171 | ] |
171 | 172 | }, |
172 | 173 | { |
173 | | - "cell_type": "code", |
174 | | - "execution_count": 1, |
175 | | - "id": "62c0ac2e-cd8d-428e-b5ff-1b75c917f9e3", |
| 174 | + "cell_type": "markdown", |
| 175 | + "id": "547f2a35", |
176 | 176 | "metadata": {}, |
177 | | - "outputs": [], |
178 | 177 | "source": [ |
179 | | - "#download mafat traing data, extract the zip file and put the notebook one level below images/ folder" |
| 178 | + "Download mafat traing data, extract the zip file and put the notebook one level below images/ folder" |
180 | 179 | ] |
181 | 180 | }, |
182 | 181 | { |
183 | 182 | "cell_type": "markdown", |
184 | 183 | "id": "538d2699-4678-4f0b-a570-412d4a97c7ae", |
185 | 184 | "metadata": {}, |
186 | 185 | "source": [ |
187 | | - "# Prepare annotation for fastdup format" |
188 | | - ] |
189 | | - }, |
190 | | - { |
191 | | - "cell_type": "code", |
192 | | - "execution_count": 18, |
193 | | - "id": "f2fa9853-0765-4d0a-a474-1eb703ea0a66", |
194 | | - "metadata": {}, |
195 | | - "outputs": [], |
196 | | - "source": [ |
197 | | - "# Here we read the data as given in the competition, one annotation file per each image. We combine all files into a single flat table" |
| 186 | + "## Prepare annotation for fastdup format\n", |
| 187 | + "\n", |
| 188 | + "\n", |
| 189 | + "Here we read the data as given in the competition, one annotation file per each image. We combine all files into a single flat table" |
198 | 190 | ] |
199 | 191 | }, |
200 | 192 | { |
|
448 | 440 | "id": "620799ea-3318-4a74-8dd0-d74ec3f42849", |
449 | 441 | "metadata": {}, |
450 | 442 | "source": [ |
451 | | - "# Run fastdup to crop and build a model for the crops" |
| 443 | + "## Run fastdup to crop and build a model for the crops" |
452 | 444 | ] |
453 | 445 | }, |
454 | 446 | { |
|
531 | 523 | "id": "a834aaaa-a76c-49bc-b293-c3c3e114d7aa", |
532 | 524 | "metadata": {}, |
533 | 525 | "source": [ |
534 | | - "# Find suspected wrong bounding boxes\n", |
| 526 | + "## Find suspected wrong bounding boxes\n", |
535 | 527 | "\n", |
536 | 528 | "From - crop image name\n", |
537 | 529 | "To - similar images\n", |
|
1980 | 1972 | ] |
1981 | 1973 | }, |
1982 | 1974 | { |
1983 | | - "cell_type": "code", |
1984 | | - "execution_count": 9, |
1985 | | - "id": "44174ffd-72f0-4a63-8849-6989bf982fa2", |
| 1975 | + "cell_type": "markdown", |
| 1976 | + "id": "ffa5de31", |
1986 | 1977 | "metadata": {}, |
1987 | | - "outputs": [], |
1988 | 1978 | "source": [ |
1989 | | - "# Looking at the raw cluster to link back cluster name to to file" |
| 1979 | + "Looking at the raw cluster to link back cluster name to to file" |
1990 | 1980 | ] |
1991 | 1981 | }, |
1992 | 1982 | { |
|
2124 | 2114 | ] |
2125 | 2115 | }, |
2126 | 2116 | { |
2127 | | - "cell_type": "code", |
2128 | | - "execution_count": 15, |
2129 | | - "id": "bcb6a063-698c-480b-88e4-8ec3c9bfdb27", |
| 2117 | + "cell_type": "markdown", |
| 2118 | + "id": "bcc93d2e", |
2130 | 2119 | "metadata": {}, |
2131 | | - "outputs": [], |
2132 | 2120 | "source": [ |
2133 | | - "# Looking at good labels" |
| 2121 | + "Looking at good labels" |
2134 | 2122 | ] |
2135 | 2123 | }, |
2136 | 2124 | { |
|
3491 | 3479 | ] |
3492 | 3480 | }, |
3493 | 3481 | { |
3494 | | - "cell_type": "code", |
3495 | | - "execution_count": null, |
3496 | | - "id": "5b86b38f-2f3e-4ab5-911b-f43079f82e93", |
| 3482 | + "cell_type": "markdown", |
| 3483 | + "id": "b4d06ad8", |
3497 | 3484 | "metadata": {}, |
3498 | | - "outputs": [], |
3499 | 3485 | "source": [ |
3500 | | - "# Let's look on outliers on the satellite image level" |
| 3486 | + "## Outliers\n", |
| 3487 | + "\n", |
| 3488 | + "Let's look on outliers on the satellite image level" |
3501 | 3489 | ] |
3502 | 3490 | }, |
3503 | 3491 | { |
|
4542 | 4530 | ] |
4543 | 4531 | }, |
4544 | 4532 | { |
4545 | | - "cell_type": "code", |
4546 | | - "execution_count": 22, |
4547 | | - "id": "f7998fe4-db21-4c06-aca6-3287119b74d2", |
| 4533 | + "cell_type": "markdown", |
| 4534 | + "id": "60ee12c8", |
4548 | 4535 | "metadata": {}, |
4549 | | - "outputs": [], |
4550 | 4536 | "source": [ |
4551 | | - "# Now we look at outliers at the crop level" |
| 4537 | + "Now we look at outliers at the crop level" |
4552 | 4538 | ] |
4553 | 4539 | }, |
4554 | 4540 | { |
|
5600 | 5586 | ] |
5601 | 5587 | }, |
5602 | 5588 | { |
5603 | | - "cell_type": "code", |
5604 | | - "execution_count": null, |
5605 | | - "id": "47cfd1cc-7db6-4256-9550-62ab7fe3e81e", |
| 5589 | + "cell_type": "markdown", |
| 5590 | + "id": "aa35647a", |
5606 | 5591 | "metadata": {}, |
5607 | | - "outputs": [], |
5608 | 5592 | "source": [ |
5609 | | - "# We look for the brightest satellite images" |
| 5593 | + "## Brightest Image\n", |
| 5594 | + "\n", |
| 5595 | + "We look for the brightest satellite images" |
5610 | 5596 | ] |
5611 | 5597 | }, |
5612 | 5598 | { |
|
6652 | 6638 | ] |
6653 | 6639 | }, |
6654 | 6640 | { |
6655 | | - "cell_type": "code", |
6656 | | - "execution_count": null, |
6657 | | - "id": "9711f363-9d0f-4d42-b4cd-66f5f9ab1b00", |
| 6641 | + "cell_type": "markdown", |
| 6642 | + "id": "73a82a89", |
6658 | 6643 | "metadata": {}, |
6659 | | - "outputs": [], |
6660 | 6644 | "source": [ |
6661 | | - "# Now we look for the most blurry images" |
| 6645 | + "## Blurry Images \n", |
| 6646 | + "Now we look for the most blurry images" |
6662 | 6647 | ] |
6663 | 6648 | }, |
6664 | 6649 | { |
|
0 commit comments