Skip to content

Bug/694 circuit relay stream hangs indefinite #767

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Draft
wants to merge 18 commits into
base: main
Choose a base branch
from

Conversation

Winter-Soren
Copy link
Contributor

What was wrong?

The execution of the circuit-relay example hangs indefinitely when attempting to create a new stream on a successfully established relay connection. The code connects to the relay node and establishes a connection to the destination through the relay, but then freezes when trying to open a stream on that connection without any error messages or exceptions being thrown.

Issue #694

How was it fixed?

Loggers were added to aid debugging. The indefinite hanging issue has been resolved, but now the execution halts when the CONNECT message is sent.

Summary of approach.

To-Do

  • Clean up commit history
  • Add or update documentation related to these changes
  • Add entry to the release notes

Cute Animal Picture

put a cute animal picture link inside the parentheses

@seetadev
Copy link
Contributor

@Winter-Soren : Hi Soham. Thank you for submitting the PR. Appreciate it.

Wish if you could fix the CI/CD issues. Circuit relay is indeed urgently needed for universal connectivity dapp for py-libp2p and also for the workshop branch.

@seetadev
Copy link
Contributor

seetadev commented Jul 19, 2025

@guha-rahul , @sukhman-sukh, @lla-dane : Wish if you could test circuit relay on your devices in parallel.

@seetadev
Copy link
Contributor

@Winter-Soren : Great, thanks Soham. Please add test cases covering specific NAT traversal scenarios and add a newsfragment. Will do a final review + merge after we do a collective testing. Appreciate your efforts.

@Winter-Soren
Copy link
Contributor Author

Hi @lla-dane and @guha-rahul,

I’d love to get your review on the example.
Regarding the bug, I’ve removed the indefinite hanging in the dial_peer_info method. When the source peer sends the CONNECT message, the relay node still fails to accept it. I tried adding additional protocol handlers, but the issue persists.

Would appreciate it if you could take a fresh look and suggest any improvements.

@sukhman-sukh
Copy link
Contributor

Hey @Winter-Soren, luca made some more fixes on yamux. Can you rebase it and test again. Maybe that could fir the issue

@lla-dane
Copy link
Contributor

lla-dane commented Jul 22, 2025

@guha-rahul , @sukhman-sukh, @lla-dane : Wish if you could test circuit relay on your devices in parallel.

At my end, it is failing when the source peer is attempting to dial destination:
Source peer logs:

2025-07-22 11:37:27,592 | circuit-relay-example | INFO | Attempting to dial destination 16Uiu2HAkvHVvWuyrMgYYq6uDirGSrCvM1HfhmrD1ajjZZ8fkArGh through relay 16Uiu2HAmRJmmU71BgGafq2Ee9ZH1y4v6zH8Es5VAmvfWNBmcXWDV
Error making reservation: 
Failed to make reservation with relay 16Uiu2HAmRJmmU71BgGafq2Ee9ZH1y4v6zH8Es5VAmvfWNBmcXWDV
2025-07-22 11:37:57,627 | circuit-relay-example | ERROR | Failed to dial through relay: Failed to establish relay connection: 
2025-07-22 11:37:57,627 | circuit-relay-example | ERROR | Exception type: ConnectionError
2025-07-22 11:37:57,627 | circuit-relay-example | ERROR | Error: Failed to establish relay connection: 

Source operation completed
image

@guha-rahul
Copy link
Contributor

@guha-rahul , @sukhman-sukh, @lla-dane : Wish if you could test circuit relay on your devices in parallel.

I am getting the same error as @lla-dane , but a quick question, should we be sending out raw protobuf messages since in other places I saw we are using variants.

@seetadev
Copy link
Contributor

seetadev commented Jul 22, 2025

I am getting the same error as @lla-dane , but a quick question, should we be sending out raw protobuf messages since in other places I saw we are using variants.

@guha-rahul : Thanks for raising this—and great question!

You're right to observe the difference here. In some parts of the codebase, especially when interfacing with protocols like Identify or PeerRecord, we use protobuf variants or length-prefixed wrappers to ensure compatibility with libp2p's framing expectations. However, in other cases—especially when we're doing low-level testing or working within a tightly scoped context—we send raw protobuf messages directly over the stream.

In the specific scenario you’re encountering (same as @lla-dane’s), it's expected to send raw protobuf messages unless the protocol explicitly requires a variant or length-prefix for parsing. If you're getting an error, it might be related to how the message is being framed or interpreted on the receiving end. Feel free to share a snippet or the stack trace—happy to ask @Winter-Soren, @sukhman-sukh and @lla-dane help debug further.

We’re working towards clearer abstractions for these different framing cases to avoid this confusion going forward. Appreciate your attention to detail.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants