Dual-ported, buffered interface to HyperBus Memory Controller #439

alees24 · 2025-07-25T14:56:04Z

Migrate TL-UL logic into a submodule to support multiple ports.
Separated I and D ports for increased performance.
Implemented read buffering on each port.
Write coalescing on D port, to form burst writes.
Routing of write notifications from D port to I port.
Updating of read buffers in response to write notifications.
Introduce a single-entry, zero-latency FIFO on each TL-UL connection to avoid a combinatorial loop that would otherwise exist between the LSU and Instruction fetch ports of the Ibex when the HyperRAM is presented to both ports.
Use a single SRAM model of the HyperRAM for simulation purposes (when requested) and for the Sonata XL synthesis target.
SRAM model is dual-ported on the TL-UL bus, increasing performance on the Sonata XL target too.
Support simulation of Sonata XL (with TARGET_XL_BOARD defined).

Background:

The HyperRAM interface was previously a FPGA-only design, making it difficult to develop a more sophisticated implementation that could offer higher performance. The first implementation was simple and reliable, but suffered comparatively poor performance on account of (i) issuing each individual TL-UL read/write transaction in isolation to the HBMC/HyperRAM device. (HyperRAM is intended to offer high throughput using burst transfers, but at the penalty of comparatively high latency), and (ii) the HyperRAM was presented as a single slave port to both the I and D ports.

This body of work on improving the performance of the HyperRAM interface thus started by aiming to bring up the HBMC in Verilator simulation, making it possible to develop/debug a more sophisticated interface with buffering and write coalescing logic.

Performance on FPGA

These figures are from executing the program sw/cheri/checks/hyperram_test which has been extended with some additional directed tests exercising the modified logic/functionality:

Previous implementation (the numbers are cycle counts; lower numbers represent faster execution):

Running RND cap test...PASS!
Running RND data test...PASS!
Running RND data & address test...PASS!
Running 0101 stripe test... (7340148 cycles)...PASS!
Running 1001 stripe test... (7340122 cycles)...PASS!
Running 0000_1111 stripe test... (7340132 cycles)...PASS!
Running Execution test... (15666097 cycles)...PASS!
Running performance test with icache enabled...
    copy:   17054 - cmp:  24631 - total:  41685...    copy:   18539 - cmp:  24634 - total:  43173...PASS!
Running performance test with icache disabled...
    copy:   17069 - cmp:  24887 - total:  41956...    copy:   42925 - cmp:  24883 - total:  67808...PASS!
Running alignment tests with cleaning...
without cleaning...
PASS!
Running buffering test...PASS!
Running write tests...
  Test type 0: 1024 iteration(s)
  Test type 1: 1024 iteration(s)
  Test type 2: 1024 iteration(s)
  Test type 3: 1024 iteration(s)
  Test type 4: 1024 iteration(s)
  Test type 5: 1024 iteration(s)
  Test type 6: 1024 iteration(s)
  Test type 7: 1024 iteration(s)
  result...PASS!
Single iteration took 128378015 cycles

Modified implementation, using the RTL from this PR:

Running RND cap test...PASS!
Running RND data test...PASS!
Running RND data & address test...PASS!
Running 0101 stripe test... (4947985 cycles)...PASS!
Running 1001 stripe test... (4947982 cycles)...PASS!
Running 0000_1111 stripe test... (4947986 cycles)...PASS!
Running Execution test... (9792540 cycles)...PASS!
Running performance test with icache enabled...
    copy:    3754 - cmp:   6327 - total:  10081...    copy:    3825 - cmp:   6325 - total:  10150...PASS!
Running performance test with icache disabled...
    copy:    3754 - cmp:   6194 - total:   9948...    copy:    3819 - cmp:   6194 - total:  10013...PASS!
Running alignment tests with cleaning...
without cleaning...
PASS!
Running buffering test...PASS!
Running write tests...
  Test type 0: 1024 iteration(s)
  Test type 1: 1024 iteration(s)
  Test type 2: 1024 iteration(s)
  Test type 3: 1024 iteration(s)
  Test type 4: 1024 iteration(s)
  Test type 5: 1024 iteration(s)
  Test type 6: 1024 iteration(s)
  Test type 7: 1024 iteration(s)
  result...PASS!
Single iteration took 87542355 cycles

Linear code execution from HyperRAM

Code (to be raised separately) is just an 8KiB sequence of cincoffset ca0, ca0, 0x1 instructions, used as a single-cycle 'no-op' but with the ability to check that all instructions have been executed as intended.

Previous implementation:

Running linear execution test with icache enabled...
100 iteration(s) took 1367932 cycles
PASS!
Running linear execution test with icache disabled...
100 iteration(s) took 1809637 cycles
PASS!

Modified implementation, using the RTL from this PR:

Running linear execution test with icache enabled...
100 iteration(s) took 440148 cycles
PASS!
Running linear execution test with icache disabled...
100 iteration(s) took 439208 cycles
PASS!

alees24 · 2025-07-30T06:05:51Z

Updated with lint fixes, comment correction and connection of an omitted parameter; no functional change.

marnovandermaas

Thanks so much for putting this together. I've done a code review, and think this is working as expected. I've also checked CI and the HyperRAM tests are passing. Feel free to merge this once you're happy with it as you are in a better position to judge this than I am.

rtl/ip/hyperram/rtl/hyperram_rdbuf.sv

marnovandermaas · 2025-07-30T16:16:42Z

rtl/ip/hyperram/rtl/hyperram_rdbuf.sv

+  end
+
+  // Updating of validity bits.
+  always_ff @(posedge clk_i) begin


I assume this is fine not to have a reset because configured will be forced to zero in the always_ff above?

Yes, I've generally tried to follow the existing design principle of not introducing reset logic where it is not logically required.

marnovandermaas · 2025-07-30T16:19:45Z

rtl/ip/hyperram/rtl/hyperram_wrbuf.sv

+  if (!stalled) begin
+    if (wr_start | wr_storing) wr_timer <= {TimerW{1'b1}};
+    else if (wr_stored) wr_timer <= wr_timer - 'b1;
+  end


Nice, good to have this timer.

- Migrate TL-UL logic into a submodule to support multiple ports. - Separated I and D ports for increased performance. - Implemented read buffering on each port. - Write coalescing on D port, to form burst writes. - Routing of write notifications from D port to I port. - Updating of read buffers in response to write notifications. - Introduce a single-entry, zero-latency FIFO on each TL-UL connection to avoid a combinatorial loop that would otherwise exist between the LSU and Instruction fetch ports of the Ibex when the HyperRAM is presented to both ports. - Use a single SRAM model of the HyperRAM for simulation purposes (when requested) and for the Sonata XL synthesis target. - SRAM model is dual-ported on the TL-UL bus, increasing performance on the Sonata XL target too. - Support simulation of Sonata XL (with TARGET_XL_BOARD defined).

alees24 requested a review from marnovandermaas July 25, 2025 14:56

alees24 force-pushed the hyperram-rtl branch from fdb6752 to 4252be1 Compare July 25, 2025 14:59

alees24 mentioned this pull request Jul 25, 2025

Increase HyperRAM clock frequency #440

Merged

alees24 force-pushed the hyperram-rtl branch 3 times, most recently from 17e585f to 8933b68 Compare July 30, 2025 06:03

alees24 mentioned this pull request Jul 30, 2025

HyperRAM burst support #393

Open

marnovandermaas approved these changes Jul 30, 2025

View reviewed changes

alees24 force-pushed the hyperram-rtl branch from 8933b68 to e606895 Compare July 30, 2025 18:27

alees24 merged commit 21ad093 into lowRISC:main Jul 30, 2025
3 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Dual-ported, buffered interface to HyperBus Memory Controller #439

Dual-ported, buffered interface to HyperBus Memory Controller #439

Uh oh!

alees24 commented Jul 25, 2025 •

edited

Loading

Uh oh!

alees24 commented Jul 30, 2025

Uh oh!

marnovandermaas left a comment

Uh oh!

Uh oh!

Uh oh!

marnovandermaas Jul 30, 2025

Uh oh!

alees24 Jul 30, 2025

Uh oh!

marnovandermaas Jul 30, 2025

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Dual-ported, buffered interface to HyperBus Memory Controller #439

Dual-ported, buffered interface to HyperBus Memory Controller #439

Uh oh!

Conversation

alees24 commented Jul 25, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

alees24 commented Jul 30, 2025

Uh oh!

marnovandermaas left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

marnovandermaas Jul 30, 2025

Choose a reason for hiding this comment

Uh oh!

alees24 Jul 30, 2025

Choose a reason for hiding this comment

Uh oh!

marnovandermaas Jul 30, 2025

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

alees24 commented Jul 25, 2025 •

edited

Loading