Skip to content

Implementing RoCEv2 passive receiver#198

Draft
Skinbow wants to merge 9 commits intoenjoy-digital:masterfrom
Skinbow:rdma_pr
Draft

Implementing RoCEv2 passive receiver#198
Skinbow wants to merge 9 commits intoenjoy-digital:masterfrom
Skinbow:rdma_pr

Conversation

@Skinbow
Copy link
Copy Markdown

@Skinbow Skinbow commented Mar 31, 2026

This PR adds RoCEv2 (RDMA over Converged Ethernet v2 with RDMA being Remote Direct Memory Access) functionality to LiteEth.

In particular, this pull request is built to be compliant with the InfiniBand Architecture Specification Volume 1 (Release 1.2.1).
It implements the Transport Layer (Chapter 9) of the InfiniBand protocol as well as the CM (Communication Management) protocol (Chapter 12) for standard connection establishment.
This is done on top of the UDP/IP (similarly to the LiteEthUDPIPCore) to comply with the Annex 17 - RoCEv2 of the InfiniBand specification.

The changes are gathered in the LiteEthRoCEv2Core which includes the RDMA on top of the IP/UDP stack (which remains usable for normal UDP connections through ports other than 4791, which is the IANA assigned port for the RoCEv2 protocol).

Only the receiver side of the protocol is implemented, meaning that the server is a passive TCA (Target Channel Adapter) which can have one QP (Queue Pair) in RC (Reliable Connection) mode, as well as a Special QP (QP1) used for CM in UD (Unreliable Datagram) mode.

Thus, an HCA (Host Channel Adapter) can establish a connection and send RDMA Read and Write operations, as well as RDMA Send operations to this module, and receive acknowledgements as specified in the protocol. This allows existing hardware such as Mellanox ConnectX cards to communicate with the FPGA through the use of Infiniband's Verbs API as well as other calls to the rdma-core library. Regular networking cards can also communicate through RoCEv2 with Software ROCE, also implemented in the rdma-core library.

Tests

This module has been tested using a 1000BASE-T and 1000BASE-LX connection to a Mellanox ConnectX-4 Lx, as well as a regular networking card using Software ROCE.

It can be tested using the test.c script.

Here are some tests results that I obtained with the Mellanox card:

Test of a RDMA_WRITE followed by an RDMA READ (twice):

$ ./test -C 2 -c -a 192.170.1.50 -S 32 -V
Sent:
\xd3\x6d\x51\x9b\xb5\x59\x76\x7f\xa7\x4f\x79\x91\x4d\x1f\xff\xcf\xb3\xd4\x6f\xa4\x46\xb5\xf7\xa2\xe9\x98\x4c\xc6\x90\x5a\x06\xc6
Received:
\xd3\x6d\x51\x9b\xb5\x59\x76\x7f\xa7\x4f\x79\x91\x4d\x1f\xff\xcf\xb3\xd4\x6f\xa4\x46\xb5\xf7\xa2\xe9\x98\x4c\xc6\x90\x5a\x06\xc6

Sent:
\x49\x9b\x2e\x3d\xe2\x69\x6e\xa4\x65\xa6\x2a\xd8\xd5\xa3\xb5\x46\xc2\x7d\x3e\x5b\x36\xe6\x79\x17\x7f\xec\x05\x7a\x43\x7a\xfb\x5a
Received:
\x49\x9b\x2e\x3d\xe2\x69\x6e\xa4\x65\xa6\x2a\xd8\xd5\xa3\xb5\x46\xc2\x7d\x3e\x5b\x36\xe6\x79\x17\x7f\xec\x05\x7a\x43\x7a\xfb\x5a

with the transmission showing in Wireshark:
Screenshot from 2026-04-02 15-03-45

Benchmarks with RDMA_WRITEs and RDMA_READs:

$ ./test -b -w -C 1000 -c -a 192.170.1.50 -S 1024 -V
Sent 16 1024-byte RDMA_WRITEs 1000 times in 0.164135s
This amounts to a transfer rate of MB/s: 99.820227
$ ./test -b -w -C 1000 -c -a 192.170.1.50 -S 2048 -V
Sent 16 2048-byte RDMA_WRITEs 1000 times in 0.314078s
This amounts to a transfer rate of MB/s: 104.330698
$ ./test -b -w -C 1000 -c -a 192.170.1.50 -S 32768 -V
Sent 16 32768-byte RDMA_WRITEs 1000 times in 4.564372s
This amounts to a transfer rate of MB/s: 114.865314
$ 
$ ./test -b -C 1000 -c -a 192.170.1.50 -S 1024 -V
Sent 16 1024-byte RDMA_READs 1000 times in 0.395153s
This amounts to a transfer rate of MB/s: 41.462421
$ ./test -b -C 1000 -c -a 192.170.1.50 -S 2048 -V
Sent 16 2048-byte RDMA_READs 1000 times in 0.537711s
This amounts to a transfer rate of MB/s: 60.939812
$ ./test -b -C 1000 -c -a 192.170.1.50 -S 32768 -V
Sent 16 32768-byte RDMA_READs 1000 times in 4.784942s
This amounts to a transfer rate of MB/s: 109.570390

Related changes

This pull request relies on a couple of changes throughout the LiteX, such as an additional last signal in the LiteDRAMDMAWriter of LiteDRAM's frontend/dma.py (see "Added last signal to LiteDRAMWriter#377").
It also relies on a small change to the Header class in LiteX's soc/interconnect/packet.py (see "Allow for changing starting position of header#2441").

Mikhail Iakimenko added 4 commits April 1, 2026 11:08
Introduces various headers from the RoCEv2 protocol for use in the IBT as well as the MAD layer for CM
Adds IBT and MAD-specific sink/source layout descriptions.
Adds a modified Header class ModHeader that allows encoding and decoding with an offset.
	This is necessary as RoCEv2 packets have variable opcode-dependent headers
Adds enums representing IBT and MAD opcodes for better readability.
Sets RoCEv2-related constants.
Adds several helper functions.
RoCEv2 requires the don't fragment flag to be set to 1 in the IP header
Introduces numerous submodules useful for the IBT and the MAD CM
Adds icrc.py:
	Treats the Invariable Cyclic Redundancy Check required by the
	IBT. The submodule will need to be placed between UDP and IBT
	output and will listen to IP's output to calculate the CRC.
Adds pad.py:
	The IBT requires all packets to be aligned to a size of 4 bytes,
	and since all IBT headers have a size that is a multiple of 4,
	the alignment needs to be insured using the payload's size.
Adds qp.py:
	RDMA uses QPs to establish communication. The current
	implementation will only allow one generic (RC) and one special
 	(QP1 UD) QPs to exist.
Adds mr.py:
	Memory regions to be handled by QPs.
	!! To function properly, LiteEthDMAWriter must be patched and
	!! have a last signal added to it
These submodules are essential for the RDMA implementation as they add
specialized fifos (WaitPipe) and a modified packetizer and depacketizer.

The packetizer and depacketizer needed changing to accommodate for
variable opcode-dependent headers in the IBT and MAD layers.

The need for a specialized fifo comes from the need to be able to pipe
entire packets and instantly dump them in case of an error.
Mikhail Iakimenko added 5 commits April 1, 2026 15:00
Adds the MAD CM layer logic for communication establishment and QP
configuration.
This allows to simplify the client-side API calls through the usage of
the libmacm library (a standard in rdma_core).
Adds the IBT layer of RoCEv2 and connects it to MAD CM.
Adds the entire stack connected together, similar to the UDPIPCore, but
with the IBT and MAD CM layers on top.
For ICRC to function properly on send, the IPTX cannot be buffered
Adds tests for various submodules, as well as the entire RoCEv2 core.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant