Commit c51e30b
committed
Fix GPU deadlock caused by m_next_global bottleneck in adaptive cache bypass
Root cause:
- Original implementation checked prediction_table at response stage
- When multiple requests with high miss-rate returned, only one could enter
m_next_global (single slot), others got stuck in response_fifo
- This caused response_fifo to fill up, blocking interconnect and causing deadlock
Solution:
- Move prediction-based bypass decision from response stage to request stage
- Check prediction_table in memory_cycle() before sending request to L1D
- Requests with prediction_table[pc] >= 8 now bypass L1D entirely
- Response handling uses normal global memory path with adequate buffering
Benefits:
- Eliminates m_next_global bottleneck
- Avoids unnecessary L1D accesses for predicted-miss requests
- Aligns with adaptive bypass paper's design intent
- Maintains prediction_table update logic1 parent fdf6a54 commit c51e30b
1 file changed
+18
-7
lines changed| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
2277 | 2277 | | |
2278 | 2278 | | |
2279 | 2279 | | |
| 2280 | + | |
| 2281 | + | |
| 2282 | + | |
| 2283 | + | |
| 2284 | + | |
| 2285 | + | |
| 2286 | + | |
| 2287 | + | |
| 2288 | + | |
| 2289 | + | |
| 2290 | + | |
| 2291 | + | |
2280 | 2292 | | |
2281 | 2293 | | |
2282 | 2294 | | |
| |||
2869 | 2881 | | |
2870 | 2882 | | |
2871 | 2883 | | |
2872 | | - | |
2873 | | - | |
2874 | | - | |
| 2884 | + | |
| 2885 | + | |
| 2886 | + | |
2875 | 2887 | | |
2876 | 2888 | | |
2877 | 2889 | | |
2878 | 2890 | | |
2879 | 2891 | | |
2880 | 2892 | | |
2881 | 2893 | | |
2882 | | - | |
2883 | | - | |
2884 | | - | |
2885 | | - | |
| 2894 | + | |
| 2895 | + | |
| 2896 | + | |
2886 | 2897 | | |
2887 | 2898 | | |
2888 | 2899 | | |
| |||
0 commit comments