Skip to content

Conversation

lla-dane
Copy link
Contributor

Description

This is a work-progress pull request for establishing interoperability between py-libp2p and rust-libp2p.

Changes

The directory structure is similar to ./interop-tests in rust-libp2p.

  • interop/exec/native_ping.py
  • interop/exec/config: env variables for ping test
  • interop/arch.py: RedisClient and SwarmBuilder utilities
  • interop/lib.py: handle_ping, send_ping and run_test(SwarmRunner) utilities

Current Status

Ping test is successful for 2 py-libp2p nodes with insecure, secio and mplex.

Logs

  • Listener

❯ transport=tcp ip=0.0.0.0 is_dialer=false port=8000 redis_addr=6379 test_timeout_seconds=180 security=insecure muxer=mplex python3 native_ping.py

2025-05-18 22:29:13.035 | INFO     | interop.lib:run_test:69 - Starting run_test
2025-05-18 22:29:13.161 | INFO     | interop.lib:run_test:75 - Running ping test local_peer=Qmd5TtD5cr5sWfJsgexK6ESZr2FhvuzgLwPMVFyikKpWCB
2025-05-18 22:29:13.164 | INFO     | interop.lib:run_test:83 - Test instance, listening: /ip4/0.0.0.0/tcp/8000/p2p/Qmd5TtD5cr5sWfJsgexK6ESZr2FhvuzgLwPMVFyikKpWCB
received ping from QmcwusXZydtSbD9evrcHyaXEwMN8GjnTXHBsT67iYcm9yN
responded with pong to QmcwusXZydtSbD9evrcHyaXEwMN8GjnTXHBsT67iYcm9yN
  • Dialer

❯ transport=tcp ip=0.0.0.0 is_dialer=true port=8001 redis_addr=6379 test_timeout_seconds=180 security=insecure muxer=mplex python3 native_ping.py

2025-05-18 22:29:16.488 | INFO     | interop.lib:run_test:69 - Starting run_test
2025-05-18 22:29:16.572 | INFO     | interop.lib:run_test:75 - Running ping test local_peer=QmcwusXZydtSbD9evrcHyaXEwMN8GjnTXHBsT67iYcm9yN
2025-05-18 22:29:16.585 | INFO     | interop.lib:run_test:95 - Remote conection established
2025-05-18 22:29:16.585 | INFO     | interop.lib:run_test:100 - handshake time: 9.33ms
sending ping to Qmd5TtD5cr5sWfJsgexK6ESZr2FhvuzgLwPMVFyikKpWCB
received pong from Qmd5TtD5cr5sWfJsgexK6ESZr2FhvuzgLwPMVFyikKpWCB

Next Steps

Extend the support for Noise and Yamux, to test interoperability with rust-libp2p.

@seetadev

@seetadev
Copy link
Contributor

@lla-dane : Hi Abhinav. Great progress.

Wish to share that yamux has been merged today. Looking forward to an early resolution to CI/CD fixes.

@seetadev
Copy link
Contributor

@lla-dane : Great progress. Please use yamux for interop tests now.

This would resolve most of the CI/CD issues. Rust-libp2p has deprecated mplex and are using yamux.

Yamux PR got merged yesterday by @pacrob. Please keep @acul71 and me in the loop as you share code commits using yamux as the multiplexer.

@lla-dane
Copy link
Contributor Author

Yes, sure. Stuck with a few errors right now, but will post an update soon!!

@lla-dane
Copy link
Contributor Author

lla-dane commented May 24, 2025

What is happening currently is, after the security and muxer negotiation happens, the protocol negotiation fails. I will explain the case for py-dialer and rust-listener.

Starting up py-dialer and rust-listener with the following commands:

# ./interop/exec: py-dialer
transport=tcp ip=127.0.0.1 is_dialer=true redis_addr=6379 port=8001 test_timeout_seconds=180 security=noise  muxer=yamux  python3 native_ping.py 

# ./rust-libp2p: rust-listener
RUST_LOG=debug redis_addr=localhost:6379 ip="0.0.0.0" transport=tcp security=noise muxer=yamux is_dialer="false" cargo run --bin native_ping

py-dialer logs

image

Logs
2025-05-24 22:59:05.534 | INFO     | interop.lib:run_test:69 - Starting run_test
2025-05-24 22:59:05.736 | INFO     | interop.lib:run_test:75 - Running ping test local_peer=QmYG1F2byD3M4eRs2E46k32NseJffaQxj85GAnFkHRAvrf
2025-05-24 22:59:05.743 | INFO     | interop.lib:run_test:126 - GETTING READY FOR CONNECTION
2025-05-24 22:59:05.743 | DEBUG    | libp2p.network.swarm:dial_peer:139 - attempting to dial peer %s
2025-05-24 22:59:05.743 | INFO     | libp2p.network.swarm:dial_peer:155 - HANDSHAKE GOING TO HAPPEN
2025-05-24 22:59:05.744 | DEBUG    | libp2p.network.swarm:dial_addr:191 - dialed peer %s over base transport
2025-05-24 22:59:05.744 | INFO     | libp2p.protocol_muxer.multiselect_client:select_one_of:70 - TRYING TO GET THE HANDSHAKE HAPPENED
2025-05-24 22:59:05.744 | INFO     | libp2p.protocol_muxer.multiselect_client:handshake:46 - WROTE SUC, /multistream/1.0.0
2025-05-24 22:59:05.747 | INFO     | libp2p.protocol_muxer.multiselect_client:handshake:49 - READ SUC, /multistream/1.0.0
2025-05-24 22:59:05.748 | INFO     | libp2p.protocol_muxer.multiselect_client:select_one_of:72 - HANDSHAKE HAPPENED
2025-05-24 22:59:05.748 | INFO     | libp2p.protocol_muxer.multiselect_client:select_one_of:75 - /noise
2025-05-24 22:59:05.748 | INFO     | libp2p.protocol_muxer.multiselect_client:try_select:101 - /noise
2025-05-24 22:59:05.748 | INFO     | libp2p.protocol_muxer.multiselect_client:try_select:107 - Response: 
2025-05-24 22:59:05.757 | DEBUG    | libp2p.network.swarm:dial_addr:204 - upgraded security for peer %s
2025-05-24 22:59:05.757 | INFO     | libp2p.protocol_muxer.multiselect_client:select_one_of:70 - TRYING TO GET THE HANDSHAKE HAPPENED
2025-05-24 22:59:05.757 | INFO     | libp2p.protocol_muxer.multiselect_client:handshake:46 - WROTE SUC, /multistream/1.0.0
2025-05-24 22:59:05.759 | INFO     | libp2p.protocol_muxer.multiselect_client:handshake:49 - READ SUC, /multistream/1.0.0
2025-05-24 22:59:05.759 | INFO     | libp2p.protocol_muxer.multiselect_client:select_one_of:72 - HANDSHAKE HAPPENED
2025-05-24 22:59:05.759 | INFO     | libp2p.protocol_muxer.multiselect_client:select_one_of:75 - /yamux/1.0.0
2025-05-24 22:59:05.759 | INFO     | libp2p.protocol_muxer.multiselect_client:try_select:101 - /yamux/1.0.0
2025-05-24 22:59:05.759 | INFO     | libp2p.protocol_muxer.multiselect_client:try_select:107 - Response: 
2025-05-24 22:59:05.759 | DEBUG    | libp2p.network.swarm:dial_addr:213 - upgraded mux for peer %s
2025-05-24 22:59:05.760 | DEBUG    | libp2p.network.swarm:dial_addr:217 - successfully dialed peer %s
2025-05-24 22:59:05.760 | INFO     | interop.lib:run_test:128 - HOST CONNECTED
2025-05-24 22:59:05.760 | DEBUG    | libp2p.network.swarm:new_stream:227 - attempting to open a stream to peer %s
2025-05-24 22:59:05.760 | INFO     | libp2p.network.swarm:dial_peer:134 - WE ARE RETURNING, PEER ALREADAY EXISTS
2025-05-24 22:59:05.760 | INFO     | libp2p.network.swarm:new_stream:230 - INETCONN CREATED
2025-05-24 22:59:05.760 | INFO     | libp2p.network.swarm:new_stream:233 - INETSTREAM CREATED
2025-05-24 22:59:05.760 | DEBUG    | libp2p.network.swarm:new_stream:235 - successfully opened a stream to peer %s
2025-05-24 22:59:05.760 | INFO     | libp2p.host.basic_host:new_stream:186 - INETSTREAM CHECKING IN
2025-05-24 22:59:05.760 | INFO     | libp2p.host.basic_host:new_stream:187 - ['/ipfs/ping/1.0.0']
2025-05-24 22:59:05.760 | DEBUG    | libp2p.host.basic_host:new_stream:190 - PROTOCOLS TRYING TO GET SENT
2025-05-24 22:59:05.760 | INFO     | libp2p.protocol_muxer.multiselect_client:select_one_of:70 - TRYING TO GET THE HANDSHAKE HAPPENED
2025-05-24 22:59:05.760 | INFO     | libp2p.protocol_muxer.multiselect_client:handshake:46 - WROTE SUC, /multistream/1.0.0
2025-05-24 22:59:05.762 | DEBUG    | libp2p.host.basic_host:_swarm_stream_handler:237 - failed to accept a stream from peer %s, error=%s
2025-05-24 22:59:05.762 | ERROR    | libp2p.protocol_muxer.multiselect_client:handshake:51 - READ FAIL, fail to read from multiselect communicator
2025-05-24 22:59:05.762 | DEBUG    | libp2p.host.basic_host:new_stream:196 - fail to open a stream to peer %s, error=%s

rust-listener logs

image

Logs
2025-05-24T17:29:03.063081Z  INFO libp2p_swarm: local_peer_id=12D3KooWBjkD7G113HB1RgaU36kRYWsfqQKXKuKT23cBas1W3DYZ
2025-05-24T17:29:03.063232Z  INFO interop_tests: Running ping test local_peer=12D3KooWBjkD7G113HB1RgaU36kRYWsfqQKXKuKT23cBas1W3DYZ
2025-05-24T17:29:03.063387Z DEBUG libp2p_tcp: listening on 0.0.0.0:0
2025-05-24T17:29:03.065175Z  INFO interop_tests: Test instance, listening for incoming connections on address address=/ip4/0.0.0.0/tcp/0
2025-05-24T17:29:03.076315Z DEBUG Swarm::poll: libp2p_tcp: New listen address address=/ip4/127.0.0.1/tcp/40449
2025-05-24T17:29:03.076491Z DEBUG Swarm::poll: libp2p_swarm: New listener address listener=ListenerId(1) address=/ip4/127.0.0.1/tcp/40449
2025-05-24T17:29:03.076796Z DEBUG Swarm::poll: libp2p_tcp: New listen address address=/ip4/192.168.31.130/tcp/40449
2025-05-24T17:29:03.076880Z DEBUG Swarm::poll: libp2p_swarm: New listener address listener=ListenerId(1) address=/ip4/192.168.31.130/tcp/40449
2025-05-24T17:29:03.111888Z DEBUG Swarm::poll: libp2p_tcp: New listen address address=/ip4/172.19.0.1/tcp/40449
2025-05-24T17:29:03.111976Z DEBUG Swarm::poll: libp2p_swarm: New listener address listener=ListenerId(1) address=/ip4/172.19.0.1/tcp/40449
2025-05-24T17:29:03.112045Z DEBUG interop_tests: NewListenAddr { listener_id: ListenerId(1), address: /ip4/172.19.0.1/tcp/40449 }
2025-05-24T17:29:03.112124Z DEBUG Swarm::poll: libp2p_tcp: New listen address address=/ip4/172.17.0.1/tcp/40449
2025-05-24T17:29:03.112171Z DEBUG Swarm::poll: libp2p_swarm: New listener address listener=ListenerId(1) address=/ip4/172.17.0.1/tcp/40449
2025-05-24T17:29:03.112218Z DEBUG interop_tests: NewListenAddr { listener_id: ListenerId(1), address: /ip4/172.17.0.1/tcp/40449 }
2025-05-24T17:29:03.112303Z DEBUG Swarm::poll: libp2p_tcp: New listen address address=/ip4/172.18.0.1/tcp/40449
2025-05-24T17:29:03.112351Z DEBUG Swarm::poll: libp2p_swarm: New listener address listener=ListenerId(1) address=/ip4/172.18.0.1/tcp/40449
2025-05-24T17:29:03.112401Z DEBUG interop_tests: NewListenAddr { listener_id: ListenerId(1), address: /ip4/172.18.0.1/tcp/40449 }
2025-05-24T17:29:05.744517Z DEBUG Swarm::poll: libp2p_tcp: Incoming connection from remote at local remote_address=/ip4/192.168.31.130/tcp/33746 local_address=/ip4/192.168.31.130/tcp/40449
2025-05-24T17:29:05.745833Z DEBUG interop_tests: IncomingConnection { connection_id: ConnectionId(1), local_addr: /ip4/192.168.31.130/tcp/40449, send_back_addr: /ip4/192.168.31.130/tcp/33746 }
2025-05-24T17:29:05.748791Z DEBUG new_incoming_connection{remote_addr=/ip4/192.168.31.130/tcp/33746 id=1}: multistream_select::listener_select: Listener: confirming protocol protocol=/noise
2025-05-24T17:29:05.748877Z DEBUG new_incoming_connection{remote_addr=/ip4/192.168.31.130/tcp/33746 id=1}: multistream_select::listener_select: Listener: sent confirmed protocol protocol=/noise
2025-05-24T17:29:05.759486Z DEBUG new_incoming_connection{remote_addr=/ip4/192.168.31.130/tcp/33746 id=1}: multistream_select::listener_select: Listener: confirming protocol protocol=/yamux/1.0.0
2025-05-24T17:29:05.759564Z DEBUG new_incoming_connection{remote_addr=/ip4/192.168.31.130/tcp/33746 id=1}: multistream_select::listener_select: Listener: sent confirmed protocol protocol=/yamux/1.0.0
2025-05-24T17:29:05.759922Z DEBUG new_incoming_connection{remote_addr=/ip4/192.168.31.130/tcp/33746 id=1}: yamux::connection: new connection: bdb5a808 (Server)    
2025-05-24T17:29:05.760482Z DEBUG Swarm::poll: libp2p_swarm: Connection established peer=QmYG1F2byD3M4eRs2E46k32NseJffaQxj85GAnFkHRAvrf endpoint=Listener { local_addr: /ip4/192.168.31.130/tcp/40449, send_back_addr: /ip4/192.168.31.130/tcp/33746 } total_peers=1
2025-05-24T17:29:05.760548Z DEBUG interop_tests: ConnectionEstablished { peer_id: PeerId("QmYG1F2byD3M4eRs2E46k32NseJffaQxj85GAnFkHRAvrf"), connection_id: ConnectionId(1), endpoint: Listener { local_addr: /ip4/192.168.31.130/tcp/40449, send_back_addr: /ip4/192.168.31.130/tcp/33746 }, num_established: 1, concurrent_dial_errors: None, established_in: 14.499266ms }
2025-05-24T17:29:05.760591Z DEBUG new_established_connection{remote_addr=/ip4/192.168.31.130/tcp/33746 id=1 peer=QmYG1F2byD3M4eRs2E46k32NseJffaQxj85GAnFkHRAvrf}:Connection::poll: yamux::connection::rtt: sending ping 3065193300    
2025-05-24T17:29:05.761599Z DEBUG new_established_connection{remote_addr=/ip4/192.168.31.130/tcp/33746 id=1 peer=QmYG1F2byD3M4eRs2E46k32NseJffaQxj85GAnFkHRAvrf}:Connection::poll: yamux::connection: bdb5a808: new outbound (Stream bdb5a808/2) of (Connection bdb5a808 Server (streams 1))    
2025-05-24T17:29:05.761880Z DEBUG new_established_connection{remote_addr=/ip4/192.168.31.130/tcp/33746 id=1 peer=QmYG1F2byD3M4eRs2E46k32NseJffaQxj85GAnFkHRAvrf}:Connection::poll: multistream_select::dialer_select: Dialer: Proposed protocol protocol=/ipfs/ping/1.0.0
2025-05-24T17:29:05.762018Z DEBUG new_established_connection{remote_addr=/ip4/192.168.31.130/tcp/33746 id=1 peer=QmYG1F2byD3M4eRs2E46k32NseJffaQxj85GAnFkHRAvrf}:Connection::poll: yamux::connection::rtt: received pong 3065193300, estimated round-trip-time 1.428048ms    
2025-05-24T17:29:05.762630Z DEBUG Swarm::poll: libp2p_swarm: Connection closed with error IO(Custom { kind: Other, error: Error(Right(Decode(Io(Os { code: 104, kind: ConnectionReset, message: "Connection reset by peer" })))) }): Connected { endpoint: Listener { local_addr: /ip4/192.168.31.130/tcp/40449, send_back_addr: /ip4/192.168.31.130/tcp/33746 }, peer_id: PeerId("QmYG1F2byD3M4eRs2E46k32NseJffaQxj85GAnFkHRAvrf") } total_peers=0
2025-05-24T17:29:05.762710Z DEBUG interop_tests: ConnectionClosed { peer_id: PeerId("QmYG1F2byD3M4eRs2E46k32NseJffaQxj85GAnFkHRAvrf"), connection_id: ConnectionId(1), endpoint: Listener { local_addr: /ip4/192.168.31.130/tcp/40449, send_back_addr: /ip4/192.168.31.130/tcp/33746 }, num_established: 0, cause: Some(IO(Custom { kind: Other, error: Error(Right(Decode(Io(Os { code: 104, kind: ConnectionReset, message: "Connection reset by peer" })))) })) }

So what's happening is:

  1. The dialer successfully negotiates noise and muxer for stream, at ConnectionEstablished in the rust logs.
  2. But after that when the /ipfs/ping/1.0.0 protocol negotiation starts to happen, the dialer write /multistream/1.0.0 in stream for handshake, but cannot read from the stream, visible in the py-logs in this line:
2025-05-24 22:59:05.760 | INFO     | libp2p.protocol_muxer.multiselect_client:handshake:46 - WROTE SUC, /multistream/1.0.0
  1. I think for some reason during protocol negotiation the stream get reset or something.
  2. But this issue does not happens when the same implementations are used.

So I can't figure out why this is happening, what I am missing. @acul71 @paschal533 .

@paschal533
Copy link
Contributor

Hey @lla-dane, thanks for the detailed breakdown and logs. It looks like the Noise and Yamux negotiations are working fine, but the protocol negotiation for /ipfs/ping/1.0.0 is failing because the Python dialer can’t read the response after writing /multistream/1.0.0. I suspect this is either a multistream-select mismatch or a ping protocol incompatibility between py-libp2p and rust-libp2p. Here’s what I think we should try next:

Debug Multistream-Select:
Let’s add more logging to the Rust listener to see how it handles the /multistream/1.0.0 handshake. Run the listener with:

RUST_LOG=debug,libp2p::multistream_select=debug redis_addr=localhost:6379 ip="127.0.0.1" transport=tcp security=noise muxer=yamux is_dialer="false" cargo run --bin native_ping

This should show if the Rust side is responding correctly or resetting the stream. Also, can you check the libp2p.protocol_muxer.multiselect_client.handshake function in Python to see if it’s reading the response correctly? Maybe add some debug prints like:

print(f"DEBUG: Writing /multistream/1.0.0 to stream")
await stream.write(b"/multistream/1.0.0\n")
print(f"DEBUG: Reading response from stream")
response = await stream.read(1024)
print(f"DEBUG: Received response: {response}")

Could you share the updated Rust and Python logs after trying these? Also, let’s keep @acul71 and @seetadev in the loop for any new commits. If you’re still stuck, I can help test this locally with the same setup. Thanks for driving this forward

@paschal533
Copy link
Contributor

Hi @lla-dane , @acul71, @seetadev ,

Just a quick update on the ping example in py-libp2p/examples/ping/ping.py. I’ve confirmed that the current version uses Plaintext as the security protocol since it calls new_host() without a custom sec_opt configuration. This means it’s not yet aligned with rust-libp2p’s default Noise protocol for our interop tests.

I’m actively working on updating ping.py to include Noise support by adding NoiseTransport and X25519 keypair configurations, similar to what we discussed earlier. However, I’m still hitting the ModuleNotFoundError: No module named 'libp2p.crypto.x25519' despite x25519.py being in libp2p/crypto/ and updating libp2p/crypto/__init__.py with from .x25519 import create_new_key_pair as create_new_x25519_key_pair. The tests in tests/crypto/test_x25519.py pass, so the module is functional, but ping.py can’t resolve it.

Next Steps:

  • I’m debugging the x25519 module resolution issue. @lla-dane, could you share the exact py-libp2p commit/branch you’re using or confirm how you’re running native_ping.py to avoid this error?
  • Once the module issue is resolved, I’ll test the updated ping.py with Noise against the Rust listener using the custom ping behavior (32-byte 0x01 payload) to fix the protocol negotiation issue as mentioned.
  • I’ll also update native_ping.py to include Noise for consistency with our interop tests.

I’ll post another update with logs once I get Noise working. Please let me know if you have any tips on the x25519 import

@lla-dane
Copy link
Contributor Author

I’m debugging the x25519 module resolution issue. @lla-dane, could you share the exact py-libp2p commit/branch you’re using or confirm how you’re running native_ping.py to avoid this error?

@paschal533: This error is coming on this branch: https://github.com/lla-dane/py-libp2p/tree/interop/py-rust. And I am running the py-dialer using the below command in ./interop/exec directory:

transport=tcp ip=127.0.0.1 is_dialer=true redis_addr=6379 port=8001 test_timeout_seconds=180 security=noise  muxer=yamux  python3 native_ping.py 

interop/arch.py Outdated
key_pair, noise_privkey=noise_key_pair.private_key
)
},
)
Copy link
Contributor Author

@lla-dane lla-dane May 25, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I’ve confirmed that the current version uses Plaintext as the security protocol since it calls new_host() without a custom sec_opt configuration. This means it’s not yet aligned with rust-libp2p’s default Noise protocol for our interop tests.

@paschal533: As you can see here, in the ./interop/arch.py the new_host function builds the hosts as per the specs specified in the initiation command, and its not running on plaintext as sec-protocol as default.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nope, I wasn't refering to this PR. I was refering to py-libp2p/examples/ping/ping.py. py-libp2p/examples/ping/ping.py is running on plaintext as sec-protocol.

@lla-dane
Copy link
Contributor Author

image

These are the logs of a successful handshake between two py-nodes.

Logs
(venv) shelby@soiarch ~/Desktop/libp2p/py-libp2p/interop/exec ❯ transport=tcp ip=127.0.0.1 is_dialer=true redis_addr=6379 port=8001 test_timeout_seconds=180 security=noise  muxer=yamux  python3 native_ping.py 

2025-05-25 17:42:49.665 | INFO     | interop.lib:run_test:69 - Starting run_test
2025-05-25 17:42:50.712 | INFO     | interop.lib:run_test:75 - Running ping test local_peer=QmbzWn9Xp4Xzwa4kpQWghc1SMJfUoYRfc7dB2v79anX8Ta
2025-05-25 17:42:50.725 | INFO     | interop.lib:run_test:126 - GETTING READY FOR CONNECTION
2025-05-25 17:42:50.725 | DEBUG    | libp2p.network.swarm:dial_peer:139 - attempting to dial peer %s
2025-05-25 17:42:50.725 | INFO     | libp2p.network.swarm:dial_peer:155 - HANDSHAKE GOING TO HAPPEN
2025-05-25 17:42:50.726 | DEBUG    | libp2p.network.swarm:dial_addr:191 - dialed peer %s over base transport
2025-05-25 17:42:50.726 | INFO     | libp2p.protocol_muxer.multiselect_client:select_one_of:70 - TRYING TO GET THE HANDSHAKE HAPPENED
2025-05-25 17:42:50.726 | INFO     | libp2p.protocol_muxer.multiselect_client:handshake:46 - WROTE SUC, /multistream/1.0.0
2025-05-25 17:42:50.726 | INFO     | libp2p.protocol_muxer.multiselect_client:handshake:49 - READ SUC, /multistream/1.0.0
2025-05-25 17:42:50.726 | INFO     | libp2p.protocol_muxer.multiselect_client:select_one_of:72 - HANDSHAKE HAPPENED
2025-05-25 17:42:50.726 | INFO     | libp2p.protocol_muxer.multiselect_client:select_one_of:75 - /noise
2025-05-25 17:42:50.726 | INFO     | libp2p.protocol_muxer.multiselect_client:try_select:101 - /noise
2025-05-25 17:42:50.727 | INFO     | libp2p.protocol_muxer.multiselect_client:try_select:107 - Response: 
2025-05-25 17:42:50.771 | DEBUG    | libp2p.network.swarm:dial_addr:204 - upgraded security for peer %s
2025-05-25 17:42:50.771 | INFO     | libp2p.protocol_muxer.multiselect_client:select_one_of:70 - TRYING TO GET THE HANDSHAKE HAPPENED
2025-05-25 17:42:50.771 | INFO     | libp2p.protocol_muxer.multiselect_client:handshake:46 - WROTE SUC, /multistream/1.0.0
2025-05-25 17:42:50.797 | INFO     | libp2p.protocol_muxer.multiselect_client:handshake:49 - READ SUC, /multistream/1.0.0
2025-05-25 17:42:50.797 | INFO     | libp2p.protocol_muxer.multiselect_client:select_one_of:72 - HANDSHAKE HAPPENED
2025-05-25 17:42:50.797 | INFO     | libp2p.protocol_muxer.multiselect_client:select_one_of:75 - /yamux/1.0.0
2025-05-25 17:42:50.798 | INFO     | libp2p.protocol_muxer.multiselect_client:try_select:101 - /yamux/1.0.0
2025-05-25 17:42:50.798 | INFO     | libp2p.protocol_muxer.multiselect_client:try_select:107 - Response: 
2025-05-25 17:42:50.799 | DEBUG    | libp2p.network.swarm:dial_addr:213 - upgraded mux for peer %s
2025-05-25 17:42:50.799 | DEBUG    | libp2p.network.swarm:dial_addr:217 - successfully dialed peer %s
2025-05-25 17:42:50.800 | INFO     | interop.lib:run_test:128 - HOST CONNECTED
2025-05-25 17:42:50.800 | DEBUG    | libp2p.network.swarm:new_stream:227 - attempting to open a stream to peer %s
2025-05-25 17:42:50.800 | INFO     | libp2p.network.swarm:dial_peer:134 - WE ARE RETURNING, PEER ALREADAY EXISTS
2025-05-25 17:42:50.800 | INFO     | libp2p.network.swarm:new_stream:230 - INETCONN CREATED
2025-05-25 17:42:50.800 | INFO     | libp2p.network.swarm:new_stream:233 - INETSTREAM CREATED
2025-05-25 17:42:50.800 | DEBUG    | libp2p.network.swarm:new_stream:235 - successfully opened a stream to peer %s
2025-05-25 17:42:50.800 | INFO     | libp2p.host.basic_host:new_stream:186 - INETSTREAM CHECKING IN
2025-05-25 17:42:50.800 | INFO     | libp2p.host.basic_host:new_stream:187 - ['/ipfs/ping/1.0.0']
2025-05-25 17:42:50.801 | DEBUG    | libp2p.host.basic_host:new_stream:190 - PROTOCOLS TRYING TO GET SENT
2025-05-25 17:42:50.801 | INFO     | libp2p.protocol_muxer.multiselect_client:select_one_of:70 - TRYING TO GET THE HANDSHAKE HAPPENED
2025-05-25 17:42:50.801 | INFO     | libp2p.protocol_muxer.multiselect_client:handshake:46 - WROTE SUC, /multistream/1.0.0
2025-05-25 17:42:50.802 | INFO     | libp2p.protocol_muxer.multiselect_client:handshake:49 - READ SUC, /multistream/1.0.0
2025-05-25 17:42:50.802 | INFO     | libp2p.protocol_muxer.multiselect_client:select_one_of:72 - HANDSHAKE HAPPENED
2025-05-25 17:42:50.802 | INFO     | libp2p.protocol_muxer.multiselect_client:select_one_of:75 - /ipfs/ping/1.0.0
2025-05-25 17:42:50.802 | INFO     | libp2p.protocol_muxer.multiselect_client:try_select:101 - /ipfs/ping/1.0.0
2025-05-25 17:42:50.804 | INFO     | libp2p.protocol_muxer.multiselect_client:try_select:107 - Response: 
2025-05-25 17:42:50.804 | INFO     | libp2p.host.basic_host:new_stream:194 - PROTOCOLS GOT SENT
2025-05-25 17:42:50.804 | INFO     | interop.lib:run_test:133 - CREATED NEW STREAM
2025-05-25 17:42:50.804 | INFO     | interop.lib:run_test:136 - Remote conection established
2025-05-25 17:42:50.804 | INFO     | interop.lib:run_test:142 - handshake time: 79.69ms
sending ping to QmUVzokJA5siraZSfFD8YXHUaR8GjzSitGcT1eP3snkshb
received pong from QmUVzokJA5siraZSfFD8YXHUaR8GjzSitGcT1eP3snkshb

Between rust and py, its failing around this part:

image

@seetadev
Copy link
Contributor

@lla-dane and @paschal533 : Appreciate your contribution. Do we need help from rust-maintainers here (Elena and Joao). We have a community call tomorrow and can discuss on this specific pull request.

Also, touch base with @acul71 on this effort. He is about to start the js-libp2p inteop effort in the coming days.

@paschal533
Copy link
Contributor

image

These are the logs of a successful handshake between two py-nodes.

Logs
Between rust and py, its failing around this part:

image

Noted. Thank you.
I'm currently working on this. I will give my update by evening.

@paschal533
Copy link
Contributor

paschal533 commented May 29, 2025

Hi @lla-dane

I've been digging into the py-libp2p and rust-libp2p interoperability issues, and I think I've found the root cause. The problem seems to be with py-libp2p's incomplete Noise implementation. It's preventing successful connections between the two implementations. I discovered that py-libp2p's Noise is currently using X25519 keys, while rust-libp2p expects Ed25519 keys for the identity verification part of the handshake. When I tried to modify py-libp2p to use Ed25519 keys, it gets further in the process but then fails because of the incomplete implementation.

What's happening:

  • py-libp2p generates the Ed25519 key pair fine: public=95bbc18d2e79774320ae4f996dc3fd1ee9f1be498f6979f6b17b3fc607b8950c
  • But rust-libp2p can't parse it: InvalidKey(DecodingError { msg: "failed to parse Ed25519 public key" })

This suggests there's a mismatch in how the keys are being encoded/serialized during the Noise handshake.

I want to discuss this with @seetadev and @acul71... thinking of taking on the task of completing the Noise implementation in py-libp2p.

@lla-dane
Copy link
Contributor Author

@paschal533 : Great, this is great. Thanks for finding this out.

@paschal533
Copy link
Contributor

Hi @acul71, After further investigation, I discovered that both implementations should be using X25519 for the Noise Diffie-Hellman key exchange. That part is actually correct. The issue appears to be with key encoding/serialization compatibility during the handshake process.

py-libp2p generates keys that should be compatible, but there's an encoding mismatch
rust-libp2p fails to parse what should be a valid key: InvalidKey(DecodingError { msg: "failed to parse Ed25519 public key" })
The error mentions Ed25519, but this likely refers to libp2p identity keys rather than the Noise DH keys themselves.

The real issue is a mismatch in how the keys (possibly identity keys used alongside the Noise handshake) are being encoded/serialized between the two implementations. The py-libp2p Noise implementation may be incomplete in how it handles key formatting or the integration between Noise protocol keys and libp2p identity verification.
This suggests the fix should focus on key serialization compatibility rather than changing the fundamental cryptographic primitives being used. I will keep you posted with my updates

@paschal533
Copy link
Contributor

Hi @lla-dane, @seetadev, @acul71 So I made some modifications to py-libp2p/examples/ping/ping.py and found out what's actually happening. Good news is we're now closer than we thought

The real issue isn't about X25519 vs Ed25519 keys - both implementations are actually using X25519 correctly for the Noise protocol. The problem is much more specific, py-libp2p's Noise code is trying to call .get_public_key() on the noise private key object, but our current implementation doesn't have that method. It only has .public_key(). So it crashes with: 'NoisePrivateKey' object has no attribute 'get_public_key'

What this means:

  • The connection setup now works fine (rust-libp2p even confirms the /noise protocol)
  • Protocol negotiation is successful
  • It only fails when trying to build the handshake payload because of this missing method

On the rust side:

  • It just sees the connection die unexpectedly ("eof" error)
  • Which makes sense since our Python side crashes before completing the handshake

This is actually encouraging because it means py-libp2p's Noise implementation is mostly there. It just needs the proper API methods implemented. It's more of a "finish the implementation" problem rather than a "rewrite everything" problem.

I think this should be pretty straightforward to fix. currently working on fixing it.

@acul71
Copy link
Contributor

acul71 commented May 31, 2025

This is actually encouraging because it means py-libp2p's Noise implementation is mostly there. It just needs the proper API methods implemented. It's more of a "finish the implementation" problem rather than a "rewrite everything" problem.

I think this should be pretty straightforward to fix. currently working on fixing it.

@paschal533
Great news, You are a good "detective" Paschal. :-)
I hope that when this rust-libp2p and py-libp2p issue will be solved, also py-libp2p and js-libp2p ping will work.
Share your last finding here and ping me if you need something.
Ciao Luca

@varun-r-mallya
Copy link
Contributor

image
I get an error like this on a fresh install of this branch. I'll try get this fixed.

setup.py Outdated
@@ -37,10 +37,14 @@
"pytest-trio>=0.5.2",
"factory-boy>=2.12.0,<3.0.0",
],
"interop": ["redis==6.1.0", "logging==0.4.9.6" "loguru==0.7.3"],
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Missed comma here is causing it to fail building

@varun-r-mallya
Copy link
Contributor

varun-r-mallya#2
As you can see, the failing tests are due to very simple issues. I'll try getting these fixed

@varun-r-mallya
Copy link
Contributor

image

These are the logs of a successful handshake between two py-nodes.

Logs
Between rust and py, its failing around this part:

image

image
why does it fail here when I run the same code ??

@lla-dane
Copy link
Contributor Author

lla-dane commented Jun 6, 2025

why does it fail here when I run the same code ??

Hey @varun-r-mallya , thanks for taking an interest in this PR. Actually there was a noise encryption incompatibility with the rust-libp2p ping interop example. As you can see from the above conversations, there was a stream rejection issue withthe rust node, so @paschal533 worked on this and created a test plan in #532 for ping example with appropriate adoption of noise and yamux upgrades. He has been working on creating an appropriate test plan for the ping-interop, for new contributors to work on.

@varun-r-mallya
Copy link
Contributor

varun-r-mallya commented Jun 6, 2025

why does it fail here when I run the same code ??

Hey @varun-r-mallya , thanks for taking an interest in this PR. Actually there was a noise encryption incompatibility with the rust-libp2p ping interop example. As you can see from the above conversations, there was a stream rejection issue withthe rust node, so @paschal533 worked on this and created a test plan in #532 for ping example with appropriate adoption of noise and yamux upgrades. He has been working on creating an appropriate test plan for the ping-interop, for new contributors to work on.

Can I please have the test plan and know exactly what to continue work on so I can contribute to this effort ?

@lla-dane
Copy link
Contributor Author

lla-dane commented Jun 6, 2025

Can I please have the test plan and know exactly what to continue work on so I can contribute to this effort ?

So firstly, we have to test out the ping example in #532 appropriately with rust node locally. It was throwing an error at my end, so @paschal533 said he would be documenting on how he ran the py and rust nodes appropriately, as we have to do some tweaks on the rust side also. So after that's done locally, then we just have to integrate the noise and yamux updates in this PR's arch.py file and run the ping-interop successfully with the rust node.

@seetadev
Copy link
Contributor

seetadev commented Jun 6, 2025

why does it fail here when I run the same code ??

Hey @varun-r-mallya , thanks for taking an interest in this PR. Actually there was a noise encryption incompatibility with the rust-libp2p ping interop example. As you can see from the above conversations, there was a stream rejection issue withthe rust node, so @paschal533 worked on this and created a test plan in #532 for ping example with appropriate adoption of noise and yamux upgrades. He has been working on creating an appropriate test plan for the ping-interop, for new contributors to work on.

Can I please have the test plan and know exactly what to continue work on so I can contribute to this effort ?

@varun-r-mallya and @lla-dane : Great progress. Please study in details: https://github.com/libp2p/test-plans

Ping is a part of the interop suite. We will follow it up wth interop focused on hole punching, transport interop (webrtc, quic, websockets) followed by gossipsub interop :) @mystical-prog and @paschal533 will help you in this specific project related to interop with rust-libp2p with focus on ping protocol.

@seetadev
Copy link
Contributor

seetadev commented Jun 9, 2025

@paschal533 : Wish to share that we discussed internally with the rust-libp2p team.

Kindly don't close interop pull requests before the interop scripts land up in the test plans repository.

@seetadev
Copy link
Contributor

seetadev commented Jun 9, 2025

@lla-dane and @varun-r-mallya : CI/CD tests are failing. Kindly resolve the issues. Please keep in touch with Elena from rust-libp2p team for helping this land up in the test plans repository.

@paschal533 : Please add your notes/feedback here in this pull request. This has been shared with the rust-libp2p team.

@lla-dane
Copy link
Contributor Author

lla-dane commented Jun 9, 2025

@paschal533 : Ran the ping examples from #664. The py-listener logs look like this:

image

Is this the farthest that is possible at present ?

@lla-dane lla-dane force-pushed the interop/py-rust branch 2 times, most recently from 3455690 to b5b1375 Compare July 26, 2025 07:52
@lla-dane
Copy link
Contributor Author

@seetadev: py<->rust ping interop is successfull in this PR after reflecting rahul's changes in yamux module.

image

@lla-dane
Copy link
Contributor Author

lla-dane commented Jul 26, 2025

This is just a last issue of an indefinite hanging in tests that you can see in the CI/CD checks. These are happening after the yamux changes.
Everytime the tests are random which are hanging. Also when we run the hanging tests separately in the local, they are running correctly everytime.
And ideas what might be causing this. @seetadev

@seetadev
Copy link
Contributor

@lla-dane : Thanks so much for digging into this and flagging the indefinite hanging issue in the tests.

You're right — it's especially tricky when the failures are nondeterministic and pass consistently in local runs but hang intermittently in CI. That kind of behavior often points to subtle race conditions or resource cleanup issues, especially around connection handling or stream lifecycle.

Wish to also recommend you to study the changes introduced in the Yamux layer. There have been some important updates there, particularly from @acul71 and @sukhman-sukh, which aim to improve stream multiplexing behavior and should help reduce edge cases like these hangs. CCing them here so we can get their thoughts as well.

It might also be helpful to:

Add timeouts or logging around stream open/close events to isolate which step is stalling.

Temporarily enable verbose Yamux logs in CI to catch any dropped streams or incomplete teardown.

Run tests with --forked or similar isolation to check for resource leaks.

Appreciate your persistence on this — let’s collaborate to get it nailed down. Once we stabilize this, it will unlock a lot of downstream improvements in py-libp2p reliability.

@acul71
Copy link
Contributor

acul71 commented Aug 6, 2025

@lla-dane @seetadev

Yamux Interleaving Test Fixes

Problem Summary

The Yamux interleaving tests were hanging indefinitely due to two critical issues:

  1. Timeout mechanism not working in TrioStreamAdapter.read()
  2. Connection lifecycle management issues in the yamux_pair fixture

Files Fixed

  • tests/core/stream_muxer/test_yamux_interleaving.py
  • tests/core/stream_muxer/test_yamux_interleaving_EOF.py

Issue 1: Broken Timeout Mechanism

Problem

The TrioStreamAdapter.read() method had a critical bug where the timeout wasn't working properly:

# ❌ BROKEN VERSION
async def read(self, n: int | None = None) -> bytes:
    with trio.move_on_after(2):
        data = await self.receive_stream.receive_some(n)
        return data  # No fallback after timeout!

Fix

Added proper timeout handling with fallback behavior:

# ✅ FIXED VERSION
async def read(self, n: int | None = None) -> bytes:
    with trio.move_on_after(2):
        data = await self.receive_stream.receive_some(n)
        return data
    # ✅ Fallback code after timeout!
    logger.debug("Read timed out after 2 seconds, raising IncompleteReadError")
    from libp2p.io.exceptions import IncompleteReadError
    raise IncompleteReadError({"requested_count": n, "received_count": 0})

Why This Fixed It

  • Before: When receive_some(n) hung, the timeout would trigger but there was no fallback code
  • After: When receive_some(n) times out, it properly raises an IncompleteReadError which the calling code can handle

Issue 2: Connection Lifecycle Management

Problem

The yamux_pair fixture was using an incorrect pattern that caused connections to close prematurely:

# ❌ BROKEN VERSION
async def yamux_pair(secure_conn_pair, peer_id):
    async with trio.open_nursery() as nursery:
        nursery.start_soon(client_yamux.start)
        nursery.start_soon(server_yamux.start)
        # ... complex handshake logic
    yield client_yamux, server_yamux  # Connections already closed!

Fix

Changed to match the working pattern from test_yamux.py:

# ✅ FIXED VERSION
async def yamux_pair(secure_conn_pair, peer_id):
    async with trio.open_nursery() as nursery:
        with trio.move_on_after(5):
            nursery.start_soon(client_yamux.start)
            nursery.start_soon(server_yamux.start)
            await trio.sleep(0.1)
        yield client_yamux, server_yamux  # Connections stay alive

Why This Fixed It

  • Before: The nursery would exit immediately, closing the connections
  • After: The nursery stays alive with a timeout, keeping connections open during the test

Additional Improvements

Proper Logging Setup

Added proper logger configuration to match other libp2p modules:

# Configure logger for this test module
logger = logging.getLogger(__name__)

Test Results

After the fixes:

  • test_yamux_race_condition_without_locks in test_yamux_interleaving.py - PASSING
  • test_yamux_race_condition_without_locks in test_yamux_interleaving_EOF.py - PASSING

Root Cause Analysis

The original hanging was caused by:

  1. Test infrastructure issues (timeout mechanism, connection lifecycle)
  2. Not actual race conditions in the Yamux implementation
  3. Timing issues in the secure connection handshake

The tests are designed to stress test concurrent read/write operations and expose race conditions, but the hanging was due to infrastructure problems rather than actual race conditions in the Yamux implementation.

@acul71
Copy link
Contributor

acul71 commented Aug 17, 2025

@lla-dane @seetadev
Is this ready to be merged ?

@seetadev
Copy link
Contributor

@acul71 : We do have merge conflicts coming up.

@lla-dane : Wish if you could resolve the issues and also share updated changes. Will re-run CI/CD pipeline.

@lla-dane
Copy link
Contributor Author

lla-dane commented Aug 30, 2025

Fixed the merge conflicts, and rebased the branch. The indefinite hanging in the tests has also been fixed my @acul71, and the ping-interop is also working with rust-libp2p ping example and interop directory.
The commands on how to check the interop is added in the interop/README.md file, it can also be directly checked via the ping example with rust-libp2p . I will write a proper detailed readme for the interop dir shortly. Till the time @pacrob @seetadev please suggest any code improvements to be done!

@lla-dane lla-dane closed this Sep 1, 2025
@lla-dane lla-dane deleted the interop/py-rust branch September 1, 2025 11:41
@lla-dane lla-dane restored the interop/py-rust branch September 1, 2025 11:45
@lla-dane
Copy link
Contributor Author

lla-dane commented Sep 1, 2025

This PR is continued at #888

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants