Skip to content

Conversation

@alees24
Copy link
Contributor

@alees24 alees24 commented Jul 23, 2025

This PR extends the existing HyperRAM test software and adds new tests specific to the recent HyperRAM development work.

Since the HyperRAM interface is now considerably more involved, retaining data in read buffers (both Instruction and Data ports) as well as a write burst that is under construction, additional tests are introduced to catch edge cases and to test for any coherency issues, both between write and read traffic,as well as between the Instruction and Data ports.

To this end a couple of the tests make extensive use of handwritten assembler to achieve the bus traffic and performance. Compiled code was used initially but this results in well-separated individual read or write transactions and thus does not stress the logic.

A note on coding style

Those tests that remain in C++ have also been carefully expressed in places to ensure that the traffic occurs in a short time interval, because otherwise - for example - the hardware will have flushed write traffic out to memory and the test would not be exercising read-write address collisions as intended. In particular note that the pseudo-random number generation is a slow process at the level of CPU cycles/bus transactions, and must thus be avoided between bus transactions that must occur close together.

A brief overview of the modified hardware:

  • Data port has a number of 32-byte read buffers, plus one under-construction write burst. Contiguous, ascending write accesses can be coalesced into a burst write of up to 32 bytes. Contiguous descending writes can be coalesced up to a total of 8 bytes (2 words).
  • Instruction port has a number of 32-byte read buffers.
  • Writes to addresses that lie within the buffered read data will cause the data to be updated (both Data and Instruction ports exhibit this coherency).
  • After writing a block of code into the memory, a non-contiguous write is required to ensure that the write data is flushed out, if the code is to be executed immediately and the code is short. (Only 32 bytes may be retained in the construction of a burst write, and write data is flushed out after tens of clock cycles, rather than being retained indefinitely.)

The tests added are as follows:

  • performance test; memory-to-memory copy using bursts.
  • write tests; specialised implementations of memset-like memory initialisation code to exercise the write coalescing logic of the HyperRAM controller interface.
  • alignment tests; exercise all possible alignments of source and destination addresses, to check the behaviour of wrapping read bursts and coalesced linear write bursts.
  • buffering test, issuing pseudo-random write traffic into a destination buffer with known content.

Sample output from the current HyperRAM implementation (FPGA):

Get hyped for hyperram!
Running RND cap test...PASS!
Running RND data test...PASS!
Running RND data & address test...PASS!
Running 0101 stripe test... (7340114 cycles)...PASS!
Running 1001 stripe test... (7340123 cycles)...PASS!
Running 0000_1111 stripe test... (7340119 cycles)...PASS!
Running Execution test... (15666057 cycles)...PASS!
Running performance test with icache enabled...
    copy:   17054 - cmp:  24632 - total:  41686...    copy:   18538 - cmp:  24633 - total:  43171...PASS!
Running performance test with icache disabled...
    copy:   17071 - cmp:  24883 - total:  41954...    copy:   42925 - cmp:  24887 - total:  67812...PASS!
Running alignment tests with cleaning...
without cleaning...
PASS!

Sample output from the modified HyperRAM implementation (FPGA) - to be raised in another PR.

Get hyped for hyperram!
Running RND cap test...PASS!
Running RND data test...PASS!
Running RND data & address test...PASS!
Running 0101 stripe test... (4947986 cycles)...PASS!
Running 1001 stripe test... (4947987 cycles)...PASS!
Running 0000_1111 stripe test... (4947985 cycles)...PASS!
Running Execution test... (9792557 cycles)...PASS!
Running performance test with icache enabled...
    copy:    3754 - cmp:   6327 - total:  10081...    copy:    3825 - cmp:   6325 - total:  10150...PASS!
Running performance test with icache disabled...
    copy:    3754 - cmp:   6194 - total:   9948...    copy:    3819 - cmp:   6194 - total:  10013...PASS!
Running alignment tests with cleaning...
without cleaning...
PASS!

The test checks/hyperram_test has been modified to run indefinitely until one or more failures is observed within an iteration, at which point it will halt. It may thus be used to soak test an FPGA build overnight and has run throughout a couple of nights without failure.

Comment on lines +182 to +194
hyperram_memset_wd:
addi a2, a2, -8
bltz a2, memset_wd_8fix
memset_wd_8:
csw a1, -4(ca0)
csw a1, -8(ca0)
cincoffset ca0, ca0, -8
addi a2, a2, -8
bgez a2, memset_wd_8
memset_wd_8fix:
addi a2, a2, 8
bgtz a2, memset_b_desc_tail
cret
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is there a reason why this has an inner loop of two stores rather than the eight of the ascending version?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The code was just written to perform back-to-back writes until the maximum burst length is achieved; that's just two words for a descending burst, but 8 for an ascending burst. It makes little difference in practice because the write coalescing logic will not time out and flush the burst write to HyperRAM for dozens of cycles and the Ibex will complete the loop overhead in far fewer cycles than that.

}
}

// ----- Avoid the use of randomisation before this point -----
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is there any use of randomisation after this point?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not presently, at least not on that loop iteration. The point was to demarcate - above and below - the region where it must be avoided.

Copy link
Contributor

@elliotb-lowrisc elliotb-lowrisc left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good to me. I appreciated the chance to read some assembly functions.

Comment on lines 71 to 74
copy4:
lw a5, (ca1)
cincoffset ca1, ca1, 4
sw a5, (ca0)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should these lw and sw instructions be written clw and csw for consistency with copy32 above?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Manual editing after noting I had been using a mix of clw and lw etc. throughout. Tried to tidy for consistency and just missed those, thanks.

clw a4, (ca1)
clw a5, 4(ca1)
clw t0, 8(ca1)
clw t1, 12(ca1) // End of 16 bytes from first buffer.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this comment correct? It looks like it just read 16 bytes from the "second" buffer (as defined above)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fair enough, I'll change that. It just meant the first buffer to be read, rather than how they were named in the function API.

Comment on lines 209 to 210
csrw 0x7c0, a0
cret
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Accidental indentation?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Heh; what's happened there is that I've copied some code from another .S file that uses tabs instead of spaces, and the tab characters are not visible in the editor I used.

@lowRISC lowRISC deleted a comment from elliotb-lowrisc Jul 25, 2025
@lowRISC lowRISC deleted a comment from elliotb-lowrisc Jul 25, 2025
- performance test; memory-to-memory copy using bursts.
- write tests; specialised implementations of `memset`-like
  memory initialisation code to exercise the write coalescing
  logic of the HyperRAM controller interface.
- alignment tests; exercise all possible alignments of source and
  destination addresses, to check the behaviour of wrapping read
  bursts and coalsced linear write bursts.
- buffering test, issuing pseudo-random write traffic into
  a destination buffer with known content.
@alees24 alees24 merged commit ed86a2e into lowRISC:main Jul 25, 2025
3 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants