Skip to content

Interconnecting with the UBShmTransport Based on the LD/ST Shared Memory Semantics.#3290

Open
zchuango wants to merge 25 commits into
apache:masterfrom
zchuango:ubshm_transport_dev
Open

Interconnecting with the UBShmTransport Based on the LD/ST Shared Memory Semantics.#3290
zchuango wants to merge 25 commits into
apache:masterfrom
zchuango:ubshm_transport_dev

Conversation

@zchuango
Copy link
Copy Markdown
Contributor

@zchuango zchuango commented May 9, 2026

What problem does this PR solve?

Issue Number: #3226 #3167 #3217

Problem Summary:
After recent efforts, the UB-Ring framework has been successfully integrated with the BRPC transport framework. Currently, high-performance and low-latency communication based on the load/store (LD/ST) semantics is supported. I feel happy be able to contribute this to the community and look forward to receiving feedback and reviews. @wwbmmm @chenBright

What is changed and the side effects?

Changed:

  1. The ubring framework is added. This framework implements low-latency data communication based on the shared memory LD/ST semantics.
  2. Currently, the ubring framework supports two modes: POSIX IPC shared memory and ubs-mem remote shared memory.
  3. The ub_shm_type parameter is used to control whether to use the IPC or ubs-mem capability. Currently, ubs-mem can run on the Kunpeng 950 supernode that supports the ub protocol.
    Side effects:
  • Performance effects: NAN

  • Breaking backward compatibility:


Check List:

Comment thread src/brpc/ubshm_transport.h Outdated
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Adds a new UBRing-based shared-memory transport mode to brpc (IPC + optional ubs-mem backend) and wires it into the Socket/Transport framework, along with docs and a performance example.

Changes:

  • Introduce UBRing transport (SOCKET_MODE_UBRING) with endpoint handshake, polling, and ring manager infrastructure.
  • Add shared-memory backend abstraction (POSIX IPC + ubs-mem via dlopen’d SDK stubs/headers) plus timer utilities.
  • Update build/docs/examples to expose the feature and provide a basic performance harness.

Reviewed changes

Copilot reviewed 43 out of 43 changed files in this pull request and generated 15 comments.

Show a summary per file
File Description
src/brpc/ubshm/ubs_mem/ubshmem_stub.cpp Adds stub implementations of ubs-mem APIs for non-ubs environments/UT.
src/brpc/ubshm/ubs_mem/ubs_mem.h Introduces ubs-mem C API header used by the UBS backend integration.
src/brpc/ubshm/ubs_mem/ubs_mem_def.h Defines ubs-mem types/constants used by the UBS backend integration.
src/brpc/ubshm/ubs_mem/declare_shm_ubs.h Declares the dynamically loaded ubs-mem function pointer table.
src/brpc/ubshm/ubr_trx.h Defines core UBR transaction structures and states.
src/brpc/ubshm/ubr_msg.h Defines UBR message chunk format used by the ring transport.
src/brpc/ubshm/ub_ring.h Declares UBRing read/write and lifecycle APIs used by the endpoint.
src/brpc/ubshm/ub_ring_manager.h Declares global manager for UBR transactions and link bookkeeping.
src/brpc/ubshm/ub_ring_manager.cpp Implements UBR transaction manager and UB event callback plumbing.
src/brpc/ubshm/ub_helper.h Declares UBRing global init/availability helpers.
src/brpc/ubshm/ub_helper.cpp Implements global init/fini, availability flags, and polling init.
src/brpc/ubshm/ub_endpoint.h Declares UB shared-memory endpoint and polling infrastructure.
src/brpc/ubshm/ub_endpoint.cpp Implements handshake, polling loop, and I/O integration with Socket/InputMessenger.
src/brpc/ubshm/timer/timer_mgr.h Declares timer module used by UBS cleanup/recovery flows.
src/brpc/ubshm/timer/timer_mgr.cpp Implements epoll/kqueue-based timer dispatch for UBRing subsystems.
src/brpc/ubshm/shm/shm_ubs.h Declares UBS backend shared-memory operations.
src/brpc/ubshm/shm/shm_ubs.cpp Implements UBS backend via dynamically loaded ubs-mem SDK.
src/brpc/ubshm/shm/shm_mgr.h Declares backend-agnostic SHM manager interface.
src/brpc/ubshm/shm/shm_mgr.cpp Implements SHM manager selecting IPC vs UBS backend via flag.
src/brpc/ubshm/shm/shm_ipc.h Declares POSIX IPC SHM backend operations.
src/brpc/ubshm/shm/shm_ipc.cpp Implements POSIX IPC SHM backend operations.
src/brpc/ubshm/shm/shm_def.h Adds SHM structs/constants used across SHM backends and UBRing.
src/brpc/ubshm/common/thread_lock.h Adds RAII-style mutex/spin/rwlock/semaphore guard macros.
src/brpc/ubshm/common/common.h Adds common macros/types/constants used throughout UBRing code.
src/brpc/ubshm_transport.h Declares UBShmTransport implementing the Transport interface.
src/brpc/ubshm_transport.cpp Implements transport selection between UBRing and TCP fallback paths.
src/brpc/transport_factory.cpp Wires SOCKET_MODE_UBRING into transport creation/context init.
src/brpc/socket.h Adds UB endpoint/connect friend declarations for Socket integration.
src/brpc/socket_mode.h Adds SOCKET_MODE_UBRING enum value.
src/brpc/rdma_transport.cpp Adjusts RDMA transport’s TCP fallback member initialization (currently broken).
src/brpc/input_messenger.h Adds UB endpoint friend declaration to support message processing hooks.
src/brpc/input_messenger.cpp Extends RDMA-special message queuing behavior to UBRing sockets.
src/brpc/controller.h Guards latency_us() against unset begin time.
README.md Adds docs link for UBRing.
README_cn.md Adds docs link for UBRing (CN).
example/ubring_performance/test.proto Adds proto for UBRing performance test example.
example/ubring_performance/server.cpp Adds UBRing-capable perf test server example.
example/ubring_performance/client.cpp Adds UBRing-capable perf test client example.
example/ubring_performance/CMakeLists.txt Adds standalone CMake build for the performance example.
docs/en/ubring.md Documents build/run/configuration and backend selection for UBRing.
docs/cn/ubring.md Chinese documentation for UBRing build/run/configuration.
CMakeLists.txt Adds WITH_UBRING option and compile definition wiring.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment thread src/brpc/rdma_transport.cpp Outdated
Comment thread src/brpc/ubshm_transport.cpp
Comment thread src/brpc/ubshm/ub_endpoint.cpp
Comment thread src/brpc/ubshm/ub_endpoint.cpp Outdated
Comment thread src/brpc/ubshm/ub_endpoint.cpp Outdated
Comment thread src/brpc/ubshm/ub_ring_manager.cpp
Comment thread src/brpc/ubshm/shm/shm_ubs.cpp Outdated
Comment thread src/brpc/ubshm/shm/shm_mgr.cpp
Comment thread CMakeLists.txt Outdated
Comment thread src/brpc/ubshm_transport.cpp
Comment thread docs/cn/ubring.md
g_last_time.store(0, butil::memory_order_relaxed);

brpc::ServerOptions options;
options.socket_mode = FLAGS_use_ubring? brpc::SOCKET_MODE_UBRING : brpc::SOCKET_MODE_TCP;
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

brpc::ServerOptions socket_mode default use tcp mode is better。

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

it reference example/rdma_performance code style,switching to the default TCP mode also works fine.

return -1;
}
ubring::GlobalUBInitializeOrDie();
if (!ubring::InitPollingModeWithTag(bthread_self_tag())) {
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does ubring only support polling mode?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes. The LD/ST shared memory has this limitation. Currently, only the polling mode is supported. The time waiting mode requires the support of the OS kernel or hardware.

Comment thread docs/cn/ubring.md

### 2. UBS-Mem 远端共享内存 (ub\_shm\_type = 2)

此模式使用 ubs-mem(Unified Block Storage Memory),这是来自 openEuler 的开源远端共享内存框架。它支持机架内节点之间的共享内存通信,类似于 RDMA 但部署要求更简单。
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you list the libraries that need to be used?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Okay, I'll list the depends libraries later

@zchuango
Copy link
Copy Markdown
Contributor Author

The related comments code has been modified. Please check and review it again. @wwbmmm @chenBright @yanglimingcn

while (curIov < iovcnt && pktRemainN > 0) {
iovRemain = (iov[curIov].iov_len - curIovPos);
fulled = iovRemain > pktRemainN ? pktRemainN : iovRemain;
memcpy((msg->payload.inner + (curPktLen - (uint8_t)pktRemainN)),
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does ub simply copy the memory from iobuf to complete the transfer?

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So, is it correct to understand that there is a memory copy involved?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does ub simply copy the memory from iobuf to complete the transfer?

This memcpy fills the UbrMsgFormat to build a 64-byte remote packet (4B header + 60B body). Then Copy64Byte(...) flushes it to the remote node dataMsg buffer (refer this line: Copy64Byte((int8_t *)&dataMsg[_trx->ubrTx.writePos], (int8_t *)msg);). The 64B limit comes from UB transport's atomic semantics—remote writes must be indivisible.

Does ub simply copy the memory from iobuf to complete the transfer?

Yes, that's correct—memory copies are involved at two distinct stages:

  • Local assembly: The memcpy composes the payload into the 64-byte UbrMsgFormat message (header + body) in local memory.
  • Remote transfer: Copy64Byte then writes that assembled 64-byte block to the remote buffer.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can I then assume that ub is not a zero-copy operation? Compared to RdmaEndpoint?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not quite. Both are DMA implementations at the hardware level; the difference is in access semantics and usage model.

  • UB mem uses memory-semantic DMA: remote memory is directly addressable via standard load/store instructions, just like accessing local RAM. No kernel involvement, no explicit verbs—data movement is implicit and transparent.

  • RdmaEndpoint uses message-semantic DMA: while the NIC's DMA engine also bypasses the CPU, the application must explicitly trigger transfers via verbs (send/recv/read/write) with memory registration and QP management.

Core idea: Both achieve zero-copy through DMA, but UB mem is implicit memory access (ld/st), RDMA is explicit message passing (verbs).

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@zchuango Is msg->payload.inner the host memory? It seems that memcpy here requires cpu participation. What is the bandwidth of memcpy?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@zchuango Is msg->payload.inner the host memory? It seems that memcpy here requires cpu participation. What is the bandwidth of memcpy?

Yes, msg->payload.inner resides in host memory (local address space). The memcpy here is used to assemble a 64-byte message packet on the local side.

The actual remote write happens in the next line: Copy64Byte((int8_t *)&dataMsg[_trx->ubrTx.writePos], (int8_t *)msg);, where dataMsg is the virtual address imported from the remote node.

  • The memcpy to msg->payload.inner is bound by the host memory bandwidth.
  • The Copy64Byte to the remote node is bound by the UB hardware bandwidth..

Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copilot encountered an error and was unable to review this pull request. You can try again by re-requesting a review.

return _tcp_transport->CutFromIOBufList(buf, ndata);
}

int UBShmTransport::WaitEpollOut(butil::atomic<int> *_epollout_butex,
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can polling mode avoid frequent event registration? Registering one event per link seems reasonable. But registering UBShmTransport::WaitEpollOut would be too frequent, wouldn't it?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, this is borrowed from the TCP transport's WaitEpollOut pattern, using _io_event for notification. The fundamental issue is that UB shared memory lacks hardware-generated events—unlike RDMA, where the NIC deposits completions into a CQ for async notification, UB SHM has no hardware signaling path. Instead, we have to poll the URing buffer state (via isWritable and similar checks) to detect message readiness, which makes the current approach functional but admittedly not the most elegant.
I would like to brainstorm a cleaner design with you on this—feel free to ping me on WeChat if you're open to a deeper dive. @yanglimingcn

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

my WeChat id "yanglm_28"

@wwbmmm
Copy link
Copy Markdown
Contributor

wwbmmm commented May 18, 2026

LGTM

@yanglimingcn
Copy link
Copy Markdown
Contributor

The issue has been communicated, and subsequent PR efforts will proceed in stages.
LGTM

@chenBright
Copy link
Copy Markdown
Contributor

There is a compilation error on macOS:

In file included from brpc/src/brpc/ubshm/ub_endpoint.cpp:31:
brpc/src/brpc/ubshm/ub_endpoint.h:87:27: error: use of undeclared identifier 'EPOLLOUT'
   87 |         uint32_t events = EPOLLOUT | EPOLLET;
|                           ^
brpc/src/brpc/ubshm/ub_endpoint.h:89:56: error: use of undeclared identifier 'EPOLLIN'
89 |             PollerRegisterEvent(CqSidOp::MOD, events | EPOLLIN);
|                                                        ^
brpc/src/brpc/ubshm/ub_endpoint.h:96:27: error: use of undeclared identifier 'EPOLLIN'
96 |         uint32_t events = EPOLLIN | EPOLLET;

@chenBright
Copy link
Copy Markdown
Contributor

Please update cmake ci to compile UBShmTransport:

- name: clang with all options
run: |
export CC=clang && export CXX=clang++
mkdir clang_build_all && cd clang_build_all
cmake -DWITH_MESALINK=OFF -DWITH_GLOG=ON -DWITH_THRIFT=ON -DWITH_RDMA=ON -DWITH_DEBUG_BTHREAD_SCHE_SAFETY=ON -DWITH_DEBUG_LOCK=ON -DWITH_BTHREAD_TRACER=ON -DWITH_ASAN=ON -DCMAKE_POLICY_VERSION_MINIMUM=3.5 ..
make -j ${{env.proc_num}} && make clean

- name: compile with cmake
run: |
echo "CMAKE_PREFIX_PATH=$(brew --prefix protobuf@21)"
mkdir build && cd build && cmake -DCMAKE_POLICY_VERSION_MINIMUM=3.5 -DCMAKE_PREFIX_PATH=$(brew --prefix protobuf@21) ..
make -j ${{env.proc_num}} && make clean

- name: compile with cmake
run: |
mkdir build && cd build && cmake -DCMAKE_POLICY_VERSION_MINIMUM=3.5 -DCMAKE_PREFIX_PATH=$(brew --prefix protobuf@29) ..
make -j ${{env.proc_num}} && make clean

@zchuango
Copy link
Copy Markdown
Contributor Author

zchuango commented May 21, 2026

Please update cmake ci to compile UBShmTransport:

okay, good suggestion ! I will check CI/Testing pipeline later.

@zchuango
Copy link
Copy Markdown
Contributor Author

@chenBright I have resolve the macOS compilation error and add updated cmake ci to compile UBShmTransport, recheck it please,The current CI testing error seems to be intermittent; I can pass CI tests in my own repository.

@chenBright
Copy link
Copy Markdown
Contributor

chenBright commented May 26, 2026

@zchuango When I run the ubring demo on Ubuntu, the server crashes.

./ubring_performance_client -use_ubring=true -echo_attachment=true -attachment_size=1048576

./ubring_performance_server -use_ubring=true
Using host libthread_db library "/usr/lib/x86_64-linux-gnu/libthread_db.so.1".
Core was generated by `./ubring_performance_server -use_ubring=true'.
--Type <RET> for more, q to quit, c to continue without paging--
Program terminated with signal SIGBUS, Bus error.
#0  __memset_avx512_unaligned_erms () at ../sysdeps/x86_64/multiarch/memset-vec-unaligned-erms.S:228

warning: 228	../sysdeps/x86_64/multiarch/memset-vec-unaligned-erms.S: No such file or directory
[Current thread is 1 (Thread 0x7f69dcff96c0 (LWP 19284))]
(gdb) bt
#0  __memset_avx512_unaligned_erms () at ../sysdeps/x86_64/multiarch/memset-vec-unaligned-erms.S:228
#1  0x0000558a87ac463b in memset (__len=<optimized out>, __ch=0, __dest=<optimized out>)
at /usr/include/x86_64-linux-gnu/bits/string_fortified.h:59
#2  brpc::ubring::ShmLocalCalloc (shm=shm@entry=0x7f68f06f7e90) at /brpc/src/brpc/ubshm/shm/shm_mgr.cpp:117
#3  0x0000558a879e925c in brpc::ubring::UBRing::UbrAllocateServerShm (this=0x7f69ac058900, remote_trx_shm=remote_trx_shm@entry=0x7f68f06f7e40,
local_trx_shm=local_trx_shm@entry=0x7f68f06f7e90) at /brpc/src/brpc/ubshm/ub_ring.cpp:796
#4  0x0000558a879e32e5 in brpc::ubring::UBShmEndpoint::AllocateServerResources (this=this@entry=0x7f66c4023a40,
remote_trx_shm=remote_trx_shm@entry=0x7f68f06f7e40, local_trx_shm=local_trx_shm@entry=0x7f68f06f7e90)
at /brpc/src/brpc/ubshm/ub_endpoint.cpp:712
#5  0x0000558a879e3e6b in brpc::ubring::UBShmEndpoint::ProcessHandshakeAtServer (arg=0x7f66c4023a40)
at /brpc/src/brpc/ubshm/ub_endpoint.cpp:548
#6  0x0000558a877d11b7 in bthread::TaskGroup::task_runner (skip_remained=<optimized out>) at /brpc/src/bthread/task_group.cpp:388
#7  0x0000558a8786d6c1 in bthread_make_fcontext ()
#8  0x0000000000000000 in ?? ()

@zchuango
Copy link
Copy Markdown
Contributor Author

zchuango commented May 26, 2026

@zchuango When I run the ubring demo on Ubuntu, the server crashes.

./ubring_performance_client -use_ubring=true -echo_attachment=true -attachment_size=6291456

./ubring_performance_server -use_ubring=true
Using host libthread_db library "/usr/lib/x86_64-linux-gnu/libthread_db.so.1".
Core was generated by `./ubring_performance_server -use_ubring=true'.
--Type <RET> for more, q to quit, c to continue without paging--
Program terminated with signal SIGBUS, Bus error.
#0  __memset_avx512_unaligned_erms () at ../sysdeps/x86_64/multiarch/memset-vec-unaligned-erms.S:228

warning: 228	../sysdeps/x86_64/multiarch/memset-vec-unaligned-erms.S: No such file or directory
[Current thread is 1 (Thread 0x7f69dcff96c0 (LWP 19284))]
(gdb) bt
#0  __memset_avx512_unaligned_erms () at ../sysdeps/x86_64/multiarch/memset-vec-unaligned-erms.S:228
#1  0x0000558a87ac463b in memset (__len=<optimized out>, __ch=0, __dest=<optimized out>)
at /usr/include/x86_64-linux-gnu/bits/string_fortified.h:59
#2  brpc::ubring::ShmLocalCalloc (shm=shm@entry=0x7f68f06f7e90) at /brpc/src/brpc/ubshm/shm/shm_mgr.cpp:117
#3  0x0000558a879e925c in brpc::ubring::UBRing::UbrAllocateServerShm (this=0x7f69ac058900, remote_trx_shm=remote_trx_shm@entry=0x7f68f06f7e40,
local_trx_shm=local_trx_shm@entry=0x7f68f06f7e90) at /brpc/src/brpc/ubshm/ub_ring.cpp:796
#4  0x0000558a879e32e5 in brpc::ubring::UBShmEndpoint::AllocateServerResources (this=this@entry=0x7f66c4023a40,
remote_trx_shm=remote_trx_shm@entry=0x7f68f06f7e40, local_trx_shm=local_trx_shm@entry=0x7f68f06f7e90)
at /brpc/src/brpc/ubshm/ub_endpoint.cpp:712
#5  0x0000558a879e3e6b in brpc::ubring::UBShmEndpoint::ProcessHandshakeAtServer (arg=0x7f66c4023a40)
at /brpc/src/brpc/ubshm/ub_endpoint.cpp:548
#6  0x0000558a877d11b7 in bthread::TaskGroup::task_runner (skip_remained=<optimized out>) at /brpc/src/bthread/task_group.cpp:388
#7  0x0000558a8786d6c1 in bthread_make_fcontext ()
#8  0x0000000000000000 in ?? ()

@zchuango zchuango closed this May 26, 2026
@zchuango zchuango reopened this May 26, 2026
@zchuango
Copy link
Copy Markdown
Contributor Author

zchuango commented May 26, 2026

@zchuango When I run the ubring demo on Ubuntu, the server crashes.

./ubring_performance_client -use_ubring=true -echo_attachment=true -attachment_size=1048576

./ubring_performance_server -use_ubring=true

@chenBright Is the error occurring during startup or a runtime error? Could you please provide relevant environment information, including OS and CPU details, so I can try to reproduce the problem?

@chenBright
Copy link
Copy Markdown
Contributor

chenBright commented May 26, 2026

Is the error occurring during startup or a runtime error?

The error occurred at runtime.

Could you please provide relevant environment information, including OS and CPU details

Some environment information:

uname -r

5.10.134-16.3.al8.x86_64

lsb_release -a

No LSB modules are available.
Distributor ID:	Ubuntu
Description:	Ubuntu 24.04.1 LTS
Release:	24.04
Codename:	noble
lscpu

Architecture:            x86_64
  CPU op-mode(s):        32-bit, 64-bit
  Address sizes:         52 bits physical, 57 bits virtual
  Byte Order:            Little Endian
CPU(s):                  192
  On-line CPU(s) list:   0-191
Vendor ID:               GenuineIntel
  BIOS Vendor ID:        Intel(R) Corporation
  Model name:            Intel(R) Xeon(R) Platinum 8469C
    BIOS Model name:     Intel(R) Xeon(R) Platinum 8469C  CPU @ 2.6GHz
    BIOS CPU family:     179
    CPU family:          6
    Model:               143
    Thread(s) per core:  2
    Core(s) per socket:  48
    Socket(s):           2
    Stepping:            8
    CPU(s) scaling MHz:  82%
    CPU max MHz:         3800.0000
    CPU min MHz:         800.0000
    BogoMIPS:            5200.00
    Flags:               fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush dts acpi mmx fxsr sse sse2 ss ht tm pbe syscall nx pdpe1gb r
                         dtscp lm constant_tsc art arch_perfmon pebs bts rep_good nopl xtopology nonstop_tsc cpuid aperfmperf tsc_known_freq pni pclmulqdq dtes64 mon
                         itor ds_cpl vmx smx est tm2 ssse3 sdbg fma cx16 xtpr pdcm pcid dca sse4_1 sse4_2 x2apic movbe popcnt tsc_deadline_timer aes xsave avx f16c r
                         drand lahf_lm abm 3dnowprefetch cpuid_fault epb cat_l3 cat_l2 cdp_l3 invpcid_single intel_ppin cdp_l2 ssbd mba ibrs ibpb stibp ibrs_enhanced
                          tpr_shadow vnmi flexpriority ept vpid ept_ad fsgsbase tsc_adjust bmi1 hle avx2 smep bmi2 erms invpcid rtm cqm rdt_a avx512f avx512dq rdseed
                          adx smap avx512ifma clflushopt clwb intel_pt avx512cd sha_ni avx512bw avx512vl xsaveopt xsavec xgetbv1 xsaves cqm_llc cqm_occup_llc cqm_mbm
                         _total cqm_mbm_local split_lock_detect avx_vnni avx512_bf16 wbnoinvd dtherm ida arat pln pts hwp hwp_notify hwp_act_window hwp_epp hwp_pkg_r
                         eq hfi avx512vbmi umip pku ospke waitpkg avx512_vbmi2 gfni vaes vpclmulqdq avx512_vnni avx512_bitalg tme avx512_vpopcntdq rdpid bus_lock_det
                         ect cldemote movdiri movdir64b enqcmd fsrm uintr md_clear serialize tsxldtrk pconfig arch_lbr amx_bf16 avx512_fp16 amx_tile amx_int8 flush_l
                         1d arch_capabilities
Virtualization features:
  Virtualization:        VT-x
Caches (sum of all):
  L1d:                   4.5 MiB (96 instances)
  L1i:                   3 MiB (96 instances)
  L2:                    192 MiB (96 instances)
  L3:                    195 MiB (2 instances)
NUMA:
  NUMA node(s):          2
  NUMA node0 CPU(s):     0-47,96-143
  NUMA node1 CPU(s):     48-95,144-191
Vulnerabilities:
  Itlb multihit:         Not affected
  L1tf:                  Not affected
  Mds:                   Not affected
  Meltdown:              Not affected
  Mmio stale data:       Not affected
  Retbleed:              Not affected
  Spec store bypass:     Mitigation; Speculative Store Bypass disabled via prctl and seccomp
  Spectre v1:            Mitigation; usercopy/swapgs barriers and __user pointer sanitization
  Spectre v2:            Mitigation; Enhanced IBRS, IBPB conditional, RSB filling, PBRSB-eIBRS SW sequence
  Srbds:                 Not affected
  Tsx async abort:       Not affected

Complete runtime log:

./ubring_performance_client -use_ubring=true -echo_attachment=true -attachment_size=6291456
I0526 23:04:09.249178 98087     0 /workspace/cgm/brpc/src/brpc/server.cpp:1232 StartInternal] Server[DummyServerOf(./ubring_performance_client)] is serving on port=8001.
I0526 23:04:09.249319 98087     0 /workspace/cgm/brpc/src/brpc/server.cpp:1235 StartInternal] Check out http://k8s-al-sh-gpu-rdma-h20-0032:8001 in web browser.
[Threads: 1, Depth: 1, Attachment: 6291456B, UBRING: yes, Echo: yes]
I0526 23:04:09.257395 98087     0 /workspace/cgm/brpc/src/brpc/ubshm/shm/shm_mgr.cpp:72 ShmMgrInit] shm mgr init success, shm type=1
I0526 23:04:09.267279 98099     0 /workspace/cgm/brpc/src/brpc/ubshm/ub_ring.cpp:269 UbrTrxHBCallback] Heartbeat cannot be started, wait connected state.
Avg-Latency: 0, 90th-Latency: 0, 99th-Latency: 0, 99.9th-Latency: 0, Throughput: 29.9741MB/s, QPS: 0k, Server CPU-utilization: 0%, Client CPU-utilization: 101%
[Threads: 2, Depth: 1, Attachment: 6291456B, UBRING: yes, Echo: yes]
I0526 23:04:30.303327 98099     0 /workspace/cgm/brpc/src/brpc/ubshm/ub_ring.cpp:269 UbrTrxHBCallback] Heartbeat cannot be started, wait connected state.
Avg-Latency: 0, 90th-Latency: 0, 99th-Latency: 0, 99.9th-Latency: 0, Throughput: 0.299211MB/s, QPS: 0k, Server CPU-utilization: 0%, Client CPU-utilization: 102%
[Threads: 4, Depth: 1, Attachment: 6291456B, UBRING: yes, Echo: yes]
W0526 23:04:50.475469 98092 4294969093 /workspace/cgm/brpc/src/brpc/ubshm/ub_endpoint.cpp:385 ProcessHandshakeAtClient] Fail to get hello message from server:brpc::Socket{id=5 fd=14 addr=0.0.0.0:8002:57824} (0x564645f47910): Got EOF
W0526 23:04:50.475563 98087     0 /workspace/cgm/brpc/example/ubring_performance/client.cpp:131 Init] RPC call failed, retrying... (3 left): [E1014]Fail to complete ubring handshake from brpc::Socket{id=5 fd=14 addr=0.0.0.0:8002:57824} (0x564645f47910): Got EOF
W0526 23:04:51.475721 98087     0 /workspace/cgm/brpc/example/ubring_performance/client.cpp:131 Init] RPC call failed, retrying... (2 left): [E112]Not connected to 0.0.0.0:8002 yet, server_id=5
W0526 23:04:52.475883 98087     0 /workspace/cgm/brpc/example/ubring_performance/client.cpp:131 Init] RPC call failed, retrying... (1 left): [E112]Not connected to 0.0.0.0:8002 yet, server_id=5
E0526 23:04:53.476011 98087     0 /workspace/cgm/brpc/example/ubring_performance/client.cpp:135 Init] RPC call failed after multiple retries
./ubring_performance_server -use_ubring=true
I0526 23:00:15.982886 97452     0 /brpc/src/brpc/ubshm/shm/shm_mgr.cpp:72 ShmMgrInit] shm mgr init success, shm type=1
I0526 23:00:15.997779 97452     0 /brpc/src/brpc/server.cpp:1232 StartInternal] Server[test::PerfTestServiceImpl] is serving on port=8002.
I0526 23:00:15.998154 97452     0 /brpc/src/brpc/server.cpp:1235 StartInternal] Check out http://k8s-al-sh-gpu-rdma-h20-0032:8002 in web browser.
I0526 23:00:46.670268 97457 4294969601 /brpc/src/brpc/ubshm/ub_ring.cpp:1021 UbrTrxCloseCheck] Trx close skipped, already closing, trx local name=UBRING_127.0.0.1:35304_S
I0526 23:00:46.670297 97457 4294969601 /brpc/src/brpc/ubshm/ub_ring.cpp:62 UbrTrxClose] Trx close skipped, already closing, local name=UBRING_127.0.0.1:35304_S
I0526 23:00:56.666588 97464     0 /brpc/src/brpc/ubshm/shm/shm_ipc.cpp:185 IpcShmRemoteFree] IPC free remote shm=UBRING_127.0.0.1:35304_C success.
I0526 23:00:56.666952 97464     0 /brpc/src/brpc/ubshm/shm/shm_ipc.cpp:78 IpcShmMunmap] IPC unmap shm=UBRING_127.0.0.1:35304_S length=4194304 success.
I0526 23:00:56.667327 97464     0 /brpc/src/brpc/ubshm/shm/shm_ipc.cpp:185 IpcShmRemoteFree] IPC free remote shm=UBRING_127.0.0.1:35304_C success.
E0526 23:02:17.708842 97484 4294969601 /brpc/src/brpc/ubshm/common/common.h:173 HasTimedOut] task time out 5 seconds.
W0526 23:02:17.708876 97484 4294969601 /brpc/src/brpc/ubshm/ub_ring.cpp:85 UbrTrxClose] Local shm UBRING_127.0.0.1:41854_S wait for the peer to close timed out, force cleanup.
I0526 23:02:17.709291 97484 4294969601 /brpc/src/brpc/ubshm/shm/shm_ipc.cpp:185 IpcShmRemoteFree] IPC free remote shm=UBRING_127.0.0.1:41854_C success.
I0526 23:02:17.709631 97484 4294969601 /brpc/src/brpc/ubshm/shm/shm_ipc.cpp:78 IpcShmMunmap] IPC unmap shm=UBRING_127.0.0.1:41854_S length=4194304 success.
I0526 23:02:17.709974 97484 4294969601 /brpc/src/brpc/ubshm/shm/shm_ipc.cpp:185 IpcShmRemoteFree] IPC free remote shm=UBRING_127.0.0.1:41854_C success.
[1]    97452 bus error (core dumped)  ./ubring_performance_server -use_ubring=true

coredump:

Using host libthread_db library "/usr/lib/x86_64-linux-gnu/libthread_db.so.1".
Core was generated by `./ubring_performance_server -use_ubring=true'.
--Type <RET> for more, q to quit, c to continue without paging--
Program terminated with signal SIGBUS, Bus error.
#0  __memset_avx512_unaligned_erms () at ../sysdeps/x86_64/multiarch/memset-vec-unaligned-erms.S:228

warning: 228	../sysdeps/x86_64/multiarch/memset-vec-unaligned-erms.S: No such file or directory
[Current thread is 1 (Thread 0x7f69dcff96c0 (LWP 19284))]
(gdb) bt
#0  __memset_avx512_unaligned_erms () at ../sysdeps/x86_64/multiarch/memset-vec-unaligned-erms.S:228
#1  0x0000558a87ac463b in memset (__len=<optimized out>, __ch=0, __dest=<optimized out>)
at /usr/include/x86_64-linux-gnu/bits/string_fortified.h:59
#2  brpc::ubring::ShmLocalCalloc (shm=shm@entry=0x7f68f06f7e90) at /brpc/src/brpc/ubshm/shm/shm_mgr.cpp:117
#3  0x0000558a879e925c in brpc::ubring::UBRing::UbrAllocateServerShm (this=0x7f69ac058900, remote_trx_shm=remote_trx_shm@entry=0x7f68f06f7e40,
local_trx_shm=local_trx_shm@entry=0x7f68f06f7e90) at /brpc/src/brpc/ubshm/ub_ring.cpp:796
#4  0x0000558a879e32e5 in brpc::ubring::UBShmEndpoint::AllocateServerResources (this=this@entry=0x7f66c4023a40,
remote_trx_shm=remote_trx_shm@entry=0x7f68f06f7e40, local_trx_shm=local_trx_shm@entry=0x7f68f06f7e90)
at /brpc/src/brpc/ubshm/ub_endpoint.cpp:712
#5  0x0000558a879e3e6b in brpc::ubring::UBShmEndpoint::ProcessHandshakeAtServer (arg=0x7f66c4023a40)
at /brpc/src/brpc/ubshm/ub_endpoint.cpp:548
#6  0x0000558a877d11b7 in bthread::TaskGroup::task_runner (skip_remained=<optimized out>) at /brpc/src/bthread/task_group.cpp:388
#7  0x0000558a8786d6c1 in bthread_make_fcontext ()
#8  0x0000000000000000 in ?? ()

@chenBright
Copy link
Copy Markdown
Contributor

Another crash:

./ubring_performance_client -use_ubring=true -echo_attachment=true -attachment_size=6291456
I0526 23:17:51.313918 98707     0 /brpc/src/brpc/server.cpp:1232 StartInternal] Server[DummyServerOf(./ubring_performance_client)] is serving on port=8001.
I0526 23:17:51.314074 98707     0 /brpc/src/brpc/server.cpp:1235 StartInternal] Check out http://k8s-al-sh-gpu-rdma-h20-0032:8001 in web browser.
[Threads: 1, Depth: 1, Attachment: 6291456B, UBRING: yes, Echo: yes]
I0526 23:17:51.321939 98707     0 /brpc/src/brpc/ubshm/shm/shm_mgr.cpp:72 ShmMgrInit] shm mgr init success, shm type=1
I0526 23:17:51.332043 98719     0 /brpc/src/brpc/ubshm/ub_ring.cpp:269 UbrTrxHBCallback] Heartbeat cannot be started, wait connected state.
Avg-Latency: 0, 90th-Latency: 0, 99th-Latency: 0, 99.9th-Latency: 0, Throughput: 64.0254MB/s, QPS: 0k, Server CPU-utilization: 0%, Client CPU-utilization: 102%
[Threads: 2, Depth: 1, Attachment: 6291456B, UBRING: yes, Echo: yes]
[1]    98707 bus error (core dumped)  ./ubring_performance_client -use_ubring=true -echo_attachment=true
./ubring_performance_server -use_ubring=true
I0526 23:17:49.302722 98508     0 /brpc/src/brpc/ubshm/shm/shm_mgr.cpp:72 ShmMgrInit] shm mgr init success, shm type=1
I0526 23:17:49.318155 98508     0 /brpc/src/brpc/server.cpp:1232 StartInternal] Server[test::PerfTestServiceImpl] is serving on port=8002.
I0526 23:17:49.318282 98508     0 /brpc/src/brpc/server.cpp:1235 StartInternal] Check out http://k8s-al-sh-gpu-rdma-h20-0032:8002 in web browser.
W0526 23:18:15.647572 98668 8589934810 /brpc/src/brpc/ubshm/ub_endpoint.cpp:480 ProcessHandshakeAtServer] Fail to read Hello Message from client:brpc::Socket{id=234 fd=11 addr=127.0.0.1:51360:8002} (0x7f2c24025030) 127.0.0.1:51360: Got EOF
I0526 23:18:16.331656 98520     0 /brpc/src/brpc/ubshm/ub_ring.cpp:269 UbrTrxHBCallback] Heartbeat cannot be started, wait connected state.
E0526 23:18:20.648439 98660 8589934772 /brpc/src/brpc/ubshm/common/common.h:173 HasTimedOut] task time out 5 seconds.
W0526 23:18:20.648472 98660 8589934772 /brpc/src/brpc/ubshm/ub_ring.cpp:85 UbrTrxClose] Local shm UBRING_127.0.0.1:36514_S wait for the peer to close timed out, force cleanup.
I0526 23:18:21.332054 98660 8589934772 /brpc/src/brpc/ubshm/shm/shm_ipc.cpp:185 IpcShmRemoteFree] IPC free remote shm=UBRING_127.0.0.1:36514_C success.
I0526 23:18:21.332468 98660 8589934772 /brpc/src/brpc/ubshm/shm/shm_ipc.cpp:78 IpcShmMunmap] IPC unmap shm=UBRING_127.0.0.1:36514_S length=4194304 success.
I0526 23:18:21.332848 98660 8589934772 /brpc/src/brpc/ubshm/shm/shm_ipc.cpp:185 IpcShmRemoteFree] IPC free remote shm=UBRING_127.0.0.1:36514_C success.
warning: 228	../sysdeps/x86_64/multiarch/memset-vec-unaligned-erms.S: No such file or directory
[Current thread is 1 (Thread 0x7f7db0ff96c0 (LWP 98717))]
(gdb)
(gdb) bt
#0  __memset_avx512_unaligned_erms () at ../sysdeps/x86_64/multiarch/memset-vec-unaligned-erms.S:228
#1  0x000055df0fec824b in memset (__len=<optimized out>, __ch=0, __dest=<optimized out>)
    at /usr/include/x86_64-linux-gnu/bits/string_fortified.h:59
#2  brpc::ubring::ShmLocalCalloc (shm=shm@entry=0x7f7d91af4e00) at /brpc/src/brpc/ubshm/shm/shm_mgr.cpp:117
#3  0x000055df0fea20de in brpc::ubring::UBRing::ApplyAndMapLocalShm (this=this@entry=0x7f7d8c031a00,
    localTrxShm=localTrxShm@entry=0x7f7d91af4e00, localName=localName@entry=0x7f7d91af4e50 "127.0.0.1:51360")
    at /brpc/src/brpc/ubshm/ub_ring.cpp:911
#4  0x000055df0fea25a2 in brpc::ubring::UBRing::UbrAllocateLocalShm (this=0x7f7d8c031a00,
    local_trx_shm=local_trx_shm@entry=0x7f7d91af4e00, shm_name=shm_name@entry=0x7f7d91af4e50 "127.0.0.1:51360")
    at /brpc/src/brpc/ubshm/ub_ring.cpp:827
#5  0x000055df0fe9aa35 in brpc::ubring::UBShmEndpoint::AllocateClientResources (this=this@entry=0x55df11984ee0,
    local_trx_shm=local_trx_shm@entry=0x7f7d91af4e00, shm_name=shm_name@entry=0x7f7d91af4e50 "127.0.0.1:51360")
    at /brpc/src/brpc/ubshm/ub_endpoint.cpp:687
#6  0x000055df0fe9ae6a in brpc::ubring::UBShmEndpoint::ProcessHandshakeAtClient (arg=0x55df11984ee0)
    at /brpc/src/brpc/ubshm/ub_endpoint.cpp:356
#7  0x000055df0fc09c97 in bthread::TaskGroup::task_runner (skip_remained=<optimized out>)
    at /brpc/src/bthread/task_group.cpp:388
#8  0x000055df0fde1571 in bthread_make_fcontext ()
#9  0x0000000000000000 in ?? ()

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

6 participants