Commit 40a804d
Add IoU-based accuracy checking for inductor tests segmentation models (#171927)
Summary:
# Add IoU-based accuracy checking for segmentation models
### Summary
Introduces IoU (Intersection over Union) metric for boolean mask accuracy checking in inductor benchmarks. This provides a more appropriate accuracy comparison for segmentation models like SAM that output boolean masks.
Those tests are viable/strict blocking, so there is an interest on maintaining its quality.
### Problem
The `sam` model was failing accuracy checks intermittently in CI (`inductor-test / test (inductor_torchbench, *, *, linux.g5.4xlarge.nvidia.gpu)`):
```
sam FAIL: accuracy=fail_accuracy, expected=pass
```
The error logs showed:
```
Accuracy failed: uint8 tensor did not match
Accuracy failed for key name masks
```
**Root cause:** Segmentation models output boolean masks that are derived by thresholding floating-point values. Small numerical differences (e.g., 0.4999 vs 0.5001) can cause pixels to flip between `True` and `False`. The existing accuracy check requires exact boolean matching, which is too strict for this use case.
### Solution
Instead of suppressing the failures (via `flaky_models` or `non_deterministic`), this PR implements a semantically appropriate comparison method:
- **IoU (Intersection over Union)** - A standard metric for comparing segmentation masks
- Models can be configured to use IoU ≥ 0.99 threshold for boolean tensor comparison
- This catches real accuracy problems while allowing minor pixel-level variations
### Changes
1. **`benchmarks/dynamo/torchbench.yaml`**
- Added `tolerance.use_iou_for_bool_masks` config list for models that should use IoU
2. **`benchmarks/dynamo/torchbench.py`**
- Added `use_iou_for_bool_accuracy()` method to `TorchBenchmarkRunner`
3. **`benchmarks/dynamo/common.py`**
- Added base `use_iou_for_bool_accuracy()` method to `BenchmarkRunner`
- Pass new flag to `same()` function
4. **`torch/_dynamo/utils.py`**
- Added `use_iou_for_bool` parameter to `same()` function
- Implemented IoU comparison logic for boolean tensors:
intersection = (ref & res).sum().float()
union = (ref | res).sum().float()
iou = intersection / union # Pass if IoU >= 0.99 (99% pixel agreement)
### Models enabled for IoU comparison
- `sam` - Segment Anything Model
- `sam_fast` - Fast variant of SAM
- `vision_maskrcnn` - Mask R-CNN (also outputs segmentation masks)
### Why IoU over alternatives?
| Approach | Pros | Cons |
|----------|------|------|
| `flaky_models` | Visible failures, doesn't block CI | Doesn't fix the underlying issue |
| `non_deterministic` | Simple config | Silently passes all failures, hides real problems |
| **IoU (this PR)** | Semantically correct metric, catches real bugs | Slightly more code |
### Test Plan
- Models in `use_iou_for_bool_masks` will use IoU ≥ 0.99 for boolean tensor comparison
- Real accuracy problems (IoU < 0.99) will still fail
- CI should no longer flake on `sam` model accuracy checks
```python
intersection = (ref & res).sum().float()
union = (ref | res).sum().float()
iou = intersection / union
# Pass if IoU >= 0.99 (99% pixel agreement)
```
### Models enabled for IoU comparison
- `sam` - Segment Anything Model
- `sam_fast` - Fast variant of SAM
- `vision_maskrcnn` - Mask R-CNN (also outputs segmentation masks)
### Why IoU over alternatives?
| Approach | Pros | Cons |
|----------|------|------|
| `flaky_models` | Visible failures, doesn't block CI | Doesn't fix the underlying issue |
| `non_deterministic` | Simple config | Silently passes all failures, hides real problems |
| **IoU (this PR)** | Semantically correct metric, catches real bugs | Slightly more code |
### Test Plan
- Models in `use_iou_for_bool_masks` will use IoU ≥ 0.99 for boolean tensor comparison
- Real accuracy problems (IoU < 0.99) will still fail
- CI should no longer flake on `sam` model accuracy checks
- `sam_fast` can now be verified for accuracy and we can detect regressions
X-link: pytorch/pytorch#171927
Approved by: https://github.com/malfet, https://github.com/yangw-dev
Reviewed By: yangw-dev
Differential Revision: D90691456
fbshipit-source-id: 8e8e4f799a666e2d65123ea4e82c3a101c8eeb301 parent af44597 commit 40a804d
File tree
4 files changed
+213
-101
lines changed- userbenchmark/dynamo/dynamobench
- _dynamo
4 files changed
+213
-101
lines changed| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
3053 | 3053 | | |
3054 | 3054 | | |
3055 | 3055 | | |
| 3056 | + | |
| 3057 | + | |
3056 | 3058 | | |
3057 | 3059 | | |
3058 | 3060 | | |
| |||
3080 | 3082 | | |
3081 | 3083 | | |
3082 | 3084 | | |
| 3085 | + | |
| 3086 | + | |
3083 | 3087 | | |
3084 | 3088 | | |
3085 | 3089 | | |
| |||
3100 | 3104 | | |
3101 | 3105 | | |
3102 | 3106 | | |
| 3107 | + | |
| 3108 | + | |
3103 | 3109 | | |
3104 | 3110 | | |
3105 | 3111 | | |
| |||
3121 | 3127 | | |
3122 | 3128 | | |
3123 | 3129 | | |
| 3130 | + | |
| 3131 | + | |
3124 | 3132 | | |
3125 | 3133 | | |
3126 | 3134 | | |
| |||
3151 | 3159 | | |
3152 | 3160 | | |
3153 | 3161 | | |
| 3162 | + | |
| 3163 | + | |
| 3164 | + | |
| 3165 | + | |
| 3166 | + | |
| 3167 | + | |
| 3168 | + | |
| 3169 | + | |
| 3170 | + | |
| 3171 | + | |
| 3172 | + | |
| 3173 | + | |
| 3174 | + | |
| 3175 | + | |
| 3176 | + | |
| 3177 | + | |
| 3178 | + | |
| 3179 | + | |
| 3180 | + | |
| 3181 | + | |
| 3182 | + | |
| 3183 | + | |
| 3184 | + | |
3154 | 3185 | | |
3155 | 3186 | | |
3156 | 3187 | | |
| |||
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
1945 | 1945 | | |
1946 | 1946 | | |
1947 | 1947 | | |
| 1948 | + | |
| 1949 | + | |
| 1950 | + | |
| 1951 | + | |
| 1952 | + | |
| 1953 | + | |
| 1954 | + | |
| 1955 | + | |
| 1956 | + | |
1948 | 1957 | | |
1949 | 1958 | | |
1950 | 1959 | | |
| |||
2306 | 2315 | | |
2307 | 2316 | | |
2308 | 2317 | | |
2309 | | - | |
2310 | | - | |
2311 | | - | |
2312 | | - | |
2313 | | - | |
2314 | | - | |
2315 | | - | |
2316 | | - | |
2317 | | - | |
2318 | | - | |
2319 | | - | |
2320 | | - | |
2321 | | - | |
2322 | | - | |
2323 | | - | |
2324 | | - | |
2325 | | - | |
2326 | | - | |
2327 | | - | |
2328 | | - | |
2329 | | - | |
2330 | | - | |
2331 | | - | |
2332 | | - | |
2333 | | - | |
2334 | | - | |
2335 | | - | |
2336 | | - | |
2337 | | - | |
2338 | | - | |
2339 | | - | |
2340 | | - | |
2341 | | - | |
2342 | | - | |
2343 | | - | |
2344 | | - | |
2345 | | - | |
2346 | | - | |
2347 | | - | |
2348 | | - | |
2349 | | - | |
2350 | | - | |
2351 | | - | |
2352 | | - | |
| 2318 | + | |
| 2319 | + | |
| 2320 | + | |
2353 | 2321 | | |
2354 | | - | |
2355 | | - | |
2356 | | - | |
2357 | | - | |
2358 | | - | |
2359 | | - | |
2360 | | - | |
| 2322 | + | |
| 2323 | + | |
| 2324 | + | |
| 2325 | + | |
| 2326 | + | |
| 2327 | + | |
| 2328 | + | |
2361 | 2329 | | |
2362 | | - | |
2363 | | - | |
2364 | | - | |
2365 | | - | |
| 2330 | + | |
| 2331 | + | |
| 2332 | + | |
| 2333 | + | |
| 2334 | + | |
| 2335 | + | |
| 2336 | + | |
| 2337 | + | |
| 2338 | + | |
2366 | 2339 | | |
2367 | | - | |
2368 | | - | |
2369 | | - | |
| 2340 | + | |
| 2341 | + | |
| 2342 | + | |
| 2343 | + | |
| 2344 | + | |
| 2345 | + | |
| 2346 | + | |
| 2347 | + | |
| 2348 | + | |
| 2349 | + | |
| 2350 | + | |
| 2351 | + | |
| 2352 | + | |
| 2353 | + | |
| 2354 | + | |
| 2355 | + | |
| 2356 | + | |
| 2357 | + | |
| 2358 | + | |
| 2359 | + | |
| 2360 | + | |
| 2361 | + | |
| 2362 | + | |
| 2363 | + | |
| 2364 | + | |
| 2365 | + | |
| 2366 | + | |
| 2367 | + | |
| 2368 | + | |
| 2369 | + | |
| 2370 | + | |
| 2371 | + | |
| 2372 | + | |
| 2373 | + | |
| 2374 | + | |
2370 | 2375 | | |
| 2376 | + | |
2371 | 2377 | | |
2372 | | - | |
2373 | | - | |
2374 | | - | |
2375 | | - | |
| 2378 | + | |
| 2379 | + | |
| 2380 | + | |
| 2381 | + | |
2376 | 2382 | | |
2377 | | - | |
2378 | | - | |
2379 | | - | |
2380 | | - | |
| 2383 | + | |
2381 | 2384 | | |
2382 | | - | |
2383 | | - | |
2384 | | - | |
| 2385 | + | |
| 2386 | + | |
| 2387 | + | |
| 2388 | + | |
| 2389 | + | |
| 2390 | + | |
| 2391 | + | |
| 2392 | + | |
| 2393 | + | |
| 2394 | + | |
| 2395 | + | |
| 2396 | + | |
| 2397 | + | |
| 2398 | + | |
| 2399 | + | |
| 2400 | + | |
| 2401 | + | |
| 2402 | + | |
| 2403 | + | |
2385 | 2404 | | |
2386 | | - | |
2387 | | - | |
2388 | | - | |
2389 | | - | |
2390 | | - | |
2391 | | - | |
2392 | | - | |
2393 | | - | |
2394 | | - | |
| 2405 | + | |
2395 | 2406 | | |
2396 | | - | |
| 2407 | + | |
2397 | 2408 | | |
2398 | | - | |
2399 | | - | |
| 2409 | + | |
| 2410 | + | |
| 2411 | + | |
| 2412 | + | |
| 2413 | + | |
2400 | 2414 | | |
| 2415 | + | |
| 2416 | + | |
| 2417 | + | |
| 2418 | + | |
| 2419 | + | |
| 2420 | + | |
| 2421 | + | |
| 2422 | + | |
| 2423 | + | |
| 2424 | + | |
| 2425 | + | |
| 2426 | + | |
2401 | 2427 | | |
2402 | | - | |
2403 | | - | |
| 2428 | + | |
| 2429 | + | |
| 2430 | + | |
| 2431 | + | |
| 2432 | + | |
| 2433 | + | |
| 2434 | + | |
| 2435 | + | |
| 2436 | + | |
| 2437 | + | |
| 2438 | + | |
| 2439 | + | |
| 2440 | + | |
| 2441 | + | |
| 2442 | + | |
| 2443 | + | |
| 2444 | + | |
| 2445 | + | |
| 2446 | + | |
| 2447 | + | |
| 2448 | + | |
| 2449 | + | |
| 2450 | + | |
| 2451 | + | |
| 2452 | + | |
| 2453 | + | |
| 2454 | + | |
| 2455 | + | |
| 2456 | + | |
| 2457 | + | |
| 2458 | + | |
| 2459 | + | |
| 2460 | + | |
2404 | 2461 | | |
2405 | | - | |
2406 | 2462 | | |
2407 | | - | |
2408 | | - | |
2409 | | - | |
2410 | | - | |
2411 | | - | |
2412 | | - | |
2413 | | - | |
2414 | | - | |
2415 | | - | |
2416 | | - | |
2417 | | - | |
2418 | | - | |
2419 | | - | |
2420 | | - | |
2421 | | - | |
2422 | | - | |
| 2463 | + | |
| 2464 | + | |
| 2465 | + | |
| 2466 | + | |
| 2467 | + | |
| 2468 | + | |
| 2469 | + | |
| 2470 | + | |
| 2471 | + | |
| 2472 | + | |
2423 | 2473 | | |
2424 | 2474 | | |
2425 | 2475 | | |
| |||
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
420 | 420 | | |
421 | 421 | | |
422 | 422 | | |
| 423 | + | |
| 424 | + | |
| 425 | + | |
| 426 | + | |
| 427 | + | |
| 428 | + | |
| 429 | + | |
| 430 | + | |
| 431 | + | |
| 432 | + | |
| 433 | + | |
| 434 | + | |
423 | 435 | | |
424 | 436 | | |
425 | 437 | | |
| |||
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
64 | 64 | | |
65 | 65 | | |
66 | 66 | | |
| 67 | + | |
| 68 | + | |
| 69 | + | |
| 70 | + | |
| 71 | + | |
| 72 | + | |
| 73 | + | |
| 74 | + | |
| 75 | + | |
| 76 | + | |
| 77 | + | |
| 78 | + | |
| 79 | + | |
| 80 | + | |
| 81 | + | |
| 82 | + | |
| 83 | + | |
| 84 | + | |
| 85 | + | |
| 86 | + | |
67 | 87 | | |
68 | 88 | | |
69 | 89 | | |
| |||
89 | 109 | | |
90 | 110 | | |
91 | 111 | | |
92 | | - | |
93 | 112 | | |
94 | 113 | | |
95 | 114 | | |
| |||
0 commit comments