Skip to content

Commit ac634cb

Browse files
committed
updates
1 parent 76e8220 commit ac634cb

File tree

3 files changed

+18
-12
lines changed

3 files changed

+18
-12
lines changed

_posts/2025-12-06-data-strategy-post.md

Lines changed: 18 additions & 12 deletions
Original file line numberDiff line numberDiff line change
@@ -58,24 +58,29 @@ Kinda... but there's a couple of questions that need answers:
5858
- Stated differently: "If mAP50 is the test, what score must the model achieve to prove its ability?"
5959
- I'm trying to prove traction towards a solution, not achieve perfection on the first try. I think a `mAP50=50%` sufficiently demonstrates utility while leaving lots of room for improvement.
6060

61-
The metric (mAP50) and threshold (50%) I chose both depend on validating performance on images with real potholes already identified (test data). Running these metrics against the irrelevant test data would yield useless results.
61+
The metric (mAP50) and threshold (50%) I chose both depend on validating performance on images with real potholes already identified (ground truth). Running these metrics against the irrelevant test data would yield useless results.
6262

6363
I'll need to manually tag some images with potholes which begs the question:
6464

6565
**What is a pothole?**
6666
- Definition: _a depression or hollow in a road surface caused by wear or sinking_.
6767
- For the sake of this experiment I made some convenient assumptions:
68-
- All _road_ defects are potholes. _Ground_ defects off road **are not** potholes.
68+
- All _road_ defects are potholes. _Ground_ defects (off road) **are not** potholes.
6969
- All puddles are potholes (but not the inverse).
70-
- Not all potholes are circular. They can be uniquely shaped.
70+
- Potholes can be uniquely shaped (not just circular).
7171

72-
I used [roboflow](https://roboflow.com/) to pull frames from the video that caused my original model to fail and manually annotated ~1500 potholes to become my ground truth dataset. I'll benchmark all the models I train against this test set to draw a clear conclusion on how well we're addressing the problem.
72+
I used [roboflow](https://roboflow.com/) to pull frames from the video that caused my original model to fail and manually annotated ~1500 potholes to become my ground truth dataset. I'll benchmark all the models I train against this ground truth set to draw a clear conclusion on how well we're addressing the problem.
7373

7474
Now that I've got a KPI to target we can talk data that'll get us there.
7575

7676
## Data strategy
7777
With a clear problem statement the data required is obvious: I need overhead images of dirt roads with potholes.
7878

79+
| This | Not This |
80+
| -- | -- |
81+
| ![aerial-potholes.jpg](/assets/2025-12-06-data-strategy-post/aerial-potholes.jpg) | ![street-level-pothole.jpeg](/assets/2025-12-06-data-strategy-post/street-level-pothole.jpeg) |
82+
83+
7984
Again, this seems simple but there are nuances to manage:
8085
- Dirt roads look differently at different times of day.
8186
- The landscape around the roads can be dramatically different.
@@ -228,15 +233,16 @@ I wanted to put it to the test and see how it compared.
228233
| meta/sam3 | 50% | 558 | 38.6%
229234
| meta/sam3 | 25% | 1142 | 79.1%
230235

236+
- **Conclusion:** This was a blowout in terms of identifying the objects I was interested in. SAM 3 was much slower, but that was expected. It got even better with confidence tuning.
237+
231238
| 50% Conf | 25% Conf |
232239
| -- | -- |
233240
| ![sam3-50pct.gif](/assets/2025-12-06-data-strategy-post/SAM3-50pct-Conf.gif) | ![sam3-25pct.gif](/assets/2025-12-06-data-strategy-post/SAM3-25pct-Conf.gif) |
234241

235-
- **Conclusion:** This was a blowout in terms of identifying the objects I was interested in. SAM 3 was much slower, but that was expected. It got even better with confidence tuning.
236242

237-
From a model to model perspective its kind of apples:oranges comparison. SAM 3 was much slower than the YOLO models, but that was expected.
243+
From a model to model perspective its kind of apples:oranges comparison. SAM 3 was much slower than the YOLO models, but that was expected. But in terms of identifying the objects I wanted, SAM 3 was a big winner.
238244

239-
But overall, SAM 3 got insanely close to the ground truth objects that were _manually_ labeled. Color me impressed.
245+
Overall, SAM 3 got insanely close to the ground truth objects that were _manually_ labeled. Color me impressed.
240246

241247
| Ground Truth | SAM 3 @ 25% Conf |
242248
| -- | -- |
@@ -260,12 +266,12 @@ But overall, SAM 3 got insanely close to the ground truth objects that were _man
260266
So now what? I started with a yolo model that didn't work, and I was able to design and implement a data strategy to make it work.
261267

262268
That's a fun experiment, but what conclusions do we draw:
263-
1. Data makes the difference. This is obvious in hindsight, but worth repeating and extends beyond cvis. No training tweaks were going to fix the original control models.
264-
2. The law of diminishing returns. Performance gains were huge early on but plateaued quickly. When that happens "more training" stops being a viable strategy.
265-
3. Controlled experiments. By changing few variables at a time the cause/effect relationship between the changes and the performance were obvious.
266-
4. SAM 3 is a game changer for computer vision.
269+
1. **Data makes the difference.** This is obvious in hindsight, but worth repeating and extends beyond cvis. No training tweaks were going to fix the original control models.
270+
2. **The law of diminishing returns.** Performance gains were huge early on but plateaued quickly. When that happens "more training" stops being a viable strategy.
271+
3. **Controlled experiments.** By changing few variables at a time the cause/effect relationship between the changes and the performance were obvious.
272+
4. **SAM 3 is a game changer.** There's some big-time tradeoffs to using SAM3 in terms of speed, so it might not be ideal for real-time. But nearly eclipsing my modles right off-the-shelf _with zero examples_ is next level for cvis.
267273

268-
I was able to take <5 min of drone footage and turn it into a working POC in just a few hours. This was an all out success that I wasn't expecting.
274+
I took <5 min of drone footage and turned it into a working POC in just a few hours. This was an all out success that I wasn't expecting.
269275
- Earnestly thought through the type of images and conditions that would make this model successful for the use-case (and published a dataset).
270276
- Surpassed my target metric (50%) after only 3 model iterations, landing at `mAP50=57%` (I literally jumped with excitement!).
271277
- Experimented with Meta's new SAM 3 and built out a validation harness to map its inference output to YOLO labels ([opensourced here](https://github.com/nmata010/aerial-pothole-detection))
316 KB
Loading
14.1 KB
Loading

0 commit comments

Comments
 (0)