-
Notifications
You must be signed in to change notification settings - Fork 29
Dual-ported, buffered interface to HyperBus Memory Controller #439
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
17e585f to
8933b68
Compare
|
Updated with lint fixes, comment correction and connection of an omitted parameter; no functional change. |
marnovandermaas
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks so much for putting this together. I've done a code review, and think this is working as expected. I've also checked CI and the HyperRAM tests are passing. Feel free to merge this once you're happy with it as you are in a better position to judge this than I am.
| end | ||
|
|
||
| // Updating of validity bits. | ||
| always_ff @(posedge clk_i) begin |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I assume this is fine not to have a reset because configured will be forced to zero in the always_ff above?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, I've generally tried to follow the existing design principle of not introducing reset logic where it is not logically required.
| if (!stalled) begin | ||
| if (wr_start | wr_storing) wr_timer <= {TimerW{1'b1}}; | ||
| else if (wr_stored) wr_timer <= wr_timer - 'b1; | ||
| end |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nice, good to have this timer.
- Migrate TL-UL logic into a submodule to support multiple ports. - Separated I and D ports for increased performance. - Implemented read buffering on each port. - Write coalescing on D port, to form burst writes. - Routing of write notifications from D port to I port. - Updating of read buffers in response to write notifications. - Introduce a single-entry, zero-latency FIFO on each TL-UL connection to avoid a combinatorial loop that would otherwise exist between the LSU and Instruction fetch ports of the Ibex when the HyperRAM is presented to both ports. - Use a single SRAM model of the HyperRAM for simulation purposes (when requested) and for the Sonata XL synthesis target. - SRAM model is dual-ported on the TL-UL bus, increasing performance on the Sonata XL target too. - Support simulation of Sonata XL (with TARGET_XL_BOARD defined).
Background:
The HyperRAM interface was previously a FPGA-only design, making it difficult to develop a more sophisticated implementation that could offer higher performance. The first implementation was simple and reliable, but suffered comparatively poor performance on account of (i) issuing each individual TL-UL read/write transaction in isolation to the HBMC/HyperRAM device. (HyperRAM is intended to offer high throughput using burst transfers, but at the penalty of comparatively high latency), and (ii) the HyperRAM was presented as a single slave port to both the I and D ports.
This body of work on improving the performance of the HyperRAM interface thus started by aiming to bring up the HBMC in Verilator simulation, making it possible to develop/debug a more sophisticated interface with buffering and write coalescing logic.
Performance on FPGA
These figures are from executing the program
sw/cheri/checks/hyperram_testwhich has been extended with some additional directed tests exercising the modified logic/functionality:Previous implementation (the numbers are cycle counts; lower numbers represent faster execution):
Modified implementation, using the RTL from this PR:
Linear code execution from HyperRAM
Code (to be raised separately) is just an 8KiB sequence of
cincoffset ca0, ca0, 0x1instructions, used as a single-cycle 'no-op' but with the ability to check that all instructions have been executed as intended.Previous implementation:
Modified implementation, using the RTL from this PR: