Skip to content

Commit d9fc6da

Browse files
committed
Pipeline the FP divider and sqrt paths in fp_div_shim
Convert fp_divider.sv and fp_sqrt.sv from sequential FSMs to fully pipelined datapaths using generate blocks to unroll the iterative division/sqrt loops. Pipeline depths: SP=36 stages, DP=65 stages. Rewrite fp_div_shim to directly instantiate 4 sub-units (div_s, div_d, sqrt_s, sqrt_d) with per-sub-unit tag queue shift registers, holding registers, a fixed-priority arbiter, and a depth-4 result FIFO. Credit-based back-pressure prevents FIFO overflow. Add i_div_accepted port and wire fp_div_result_accepted in tomasulo_wrapper. Retain o_stall port (tied low) on both pipelined units for backward compatibility with the in-order fpu_div_sqrt_unit, which connects but does not read it. Update cocotb tests with FIFO-based output protocol (drive i_div_accepted to pop results) and add pipeline-specific tests: back-to-back issue, interleaved div/sqrt, multi-op flush, and FIFO backpressure.
1 parent e9c0381 commit d9fc6da

File tree

8 files changed

+1990
-1175
lines changed

8 files changed

+1990
-1175
lines changed

README.md

Lines changed: 16 additions & 16 deletions
Original file line numberDiff line numberDiff line change
@@ -303,16 +303,16 @@ Running `pytest tests/` exercises:
303303

304304
| Resource | Used | Available | Util% |
305305
|----------|-----:|----------:|------:|
306-
| CLB LUTs | 30,686 | 1,029,600 | 3.0% |
307-
| LUT as Logic | 29,425 | 1,029,600 | 2.9% |
306+
| CLB LUTs | 48,040 | 1,029,600 | 4.7% |
307+
| LUT as Logic | 46,481 | 1,029,600 | 4.5% |
308308
| LUT as Distributed RAM | 1,168 |||
309-
| LUT as Shift Register | 93 |||
310-
| CLB Registers | 19,936 | 2,059,200 | 1.0% |
309+
| LUT as Shift Register | 391 |||
310+
| CLB Registers | 37,747 | 2,059,200 | 1.8% |
311311
| Block RAM Tile | 68.5 | 2,112 | 3.2% |
312312
| URAM | 0 | 352 | 0.0% |
313313
| DSPs | 30 | 1,320 | 2.3% |
314-
| CARRY8 | 765 | 128,700 | 0.6% |
315-
| F7 Muxes | 384 | 514,800 | 0.1% |
314+
| CARRY8 | 2,790 | 128,700 | 2.2% |
315+
| F7 Muxes | 385 | 514,800 | 0.1% |
316316
| F8 Muxes | 2 | 257,400 | 0.0% |
317317
| Bonded IOB | 4 | 364 | 1.1% |
318318
| MMCM | 1 | 11 | 9.1% |
@@ -322,14 +322,14 @@ Running `pytest tests/` exercises:
322322

323323
| Resource | Used | Available | Util% |
324324
|----------|-----:|----------:|------:|
325-
| Slice LUTs | 29,843 | 203,800 | 14.6% |
326-
| LUT as Logic | 28,446 | 203,800 | 14.0% |
325+
| Slice LUTs | 46,683 | 203,800 | 22.9% |
326+
| LUT as Logic | 44,988 | 203,800 | 22.1% |
327327
| LUT as Distributed RAM | 1,308 |||
328-
| LUT as Shift Register | 89 |||
329-
| Slice Registers | 19,431 | 407,600 | 4.8% |
328+
| LUT as Shift Register | 387 |||
329+
| Slice Registers | 37,151 | 407,600 | 9.1% |
330330
| Block RAM Tile | 68.5 | 445 | 15.4% |
331331
| DSPs | 34 | 840 | 4.0% |
332-
| F7 Muxes | 316 | 101,900 | 0.3% |
332+
| F7 Muxes | 386 | 101,900 | 0.4% |
333333
| F8 Muxes | 2 | 50,950 | 0.0% |
334334
| Bonded IOB | 6 | 500 | 1.2% |
335335
| MMCM | 1 | 10 | 10.0% |
@@ -339,14 +339,14 @@ Running `pytest tests/` exercises:
339339

340340
| Resource | Used | Available | Util% |
341341
|----------|-----:|----------:|------:|
342-
| Slice LUTs | 29,630 | 63,400 | 46.7% |
343-
| LUT as Logic | 28,233 | 63,400 | 44.5% |
342+
| Slice LUTs | 46,683 | 63,400 | 73.6% |
343+
| LUT as Logic | 44,988 | 63,400 | 71.0% |
344344
| LUT as Distributed RAM | 1,308 |||
345-
| LUT as Shift Register | 89 |||
346-
| Slice Registers | 19,463 | 126,800 | 15.3% |
345+
| LUT as Shift Register | 387 |||
346+
| Slice Registers | 37,151 | 126,800 | 29.3% |
347347
| Block RAM Tile | 68.5 | 135 | 50.7% |
348348
| DSPs | 34 | 240 | 14.2% |
349-
| F7 Muxes | 374 | 31,700 | 1.2% |
349+
| F7 Muxes | 384 | 31,700 | 1.2% |
350350
| F8 Muxes | 2 | 15,850 | 0.0% |
351351
| Bonded IOB | 4 | 210 | 1.9% |
352352
| MMCM | 1 | 6 | 16.7% |

0 commit comments

Comments
 (0)