Skip to content

Fix connection reset#134

Open
yinheli wants to merge 1 commit intodevelopfrom
bugfix/fix-connection-reset
Open

Fix connection reset#134
yinheli wants to merge 1 commit intodevelopfrom
bugfix/fix-connection-reset

Conversation

@yinheli
Copy link
Owner

@yinheli yinheli commented Mar 11, 2026

Refactored idle timeout mechanism in TcpRelay to avoid prematurely closing Keep-Alive connections (e.g., in Chrome). Swapped out the custom active polling with tokio::select! to correctly close idle sockets without triggering TCP RST packets.

Copy link

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Refactors the TCP relay idle-timeout handling to avoid closing Keep-Alive connections in a way that can trigger TCP RSTs, by switching from per-I/O timeout checks to a tokio::select! race between copying and an idle monitor.

Changes:

  • Replaced IdleTimeoutStream with ActiveStream that tracks activity and byte counts.
  • Added an async idle monitor and used tokio::select! to stop relaying when the connection is idle.
  • Moved metrics reporting to run after the relay finishes, using the tracked byte counters.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Copy link

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 1 out of 1 changed files in this pull request and generated 1 comment.


💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Copy link

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 1 out of 1 changed files in this pull request and generated 2 comments.


💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Copy link

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 2 out of 2 changed files in this pull request and generated 1 comment.


💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Copy link

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 3 out of 3 changed files in this pull request and generated 4 comments.


💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Copy link

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 3 out of 3 changed files in this pull request and generated 4 comments.

Comments suppressed due to low confidence (1)

src/gateway/nat.rs:120

  • Nat::create selects nat_port under a read-lock and only inserts into mapping after an .await on cache.insert. This creates a race where concurrent create calls can observe the same mapping state and end up assigning the same nat_port (and the OS may also re-issue recently freed ephemeral ports). Consider making port allocation + uniqueness check + reservation/insertion into mapping atomic (e.g., loop with a write-lock and re-check before inserting), or otherwise reserving the chosen port before the async boundary.
        let nat_port = {
            let mapping = self.mapping.read().await;

            if let Some(&nat_port) = mapping.get_by_left(&addr_key) {
                return Ok(Session {
                    src_addr,
                    dst_addr,
                    src_port,
                    dst_port,
                    nat_port,
                });
            }

            let mut assigned_port = 0;
            for _ in 0..10 {
                let port = self.get_available_port()?;
                if !mapping.contains_right(&port) {
                    assigned_port = port;
                    break;
                }
            }

            if assigned_port == 0 {
                return Err(Error::new(
                    ErrorKind::AddrInUse,
                    "No available NAT port via OS allocation",
                ));
            }

            assigned_port
        };

        let session = Arc::new(Session {
            src_addr,
            dst_addr,
            src_port,
            dst_port,
            nat_port,
        });

        self.cache.insert(addr_key, session.clone()).await;

        {
            let mut mapping = self.mapping.write().await;
            mapping.insert(addr_key, nat_port);
        }

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment on lines +64 to +69
let addr_key = SessionKey {
src_addr,
dst_addr,
src_port,
dst_port,
};
Copy link

Copilot AI Mar 11, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

create() now keys sessions by the full 4-tuple (SessionKey), but the unit test only exercises a single tuple and doesn’t assert that different destinations (or ports) produce distinct sessions/ports. Consider extending the tests to create at least two different SessionKey values and assert they don’t alias (and that find() resolves each nat_port correctly).

Copilot uses AI. Check for mistakes.
Comment on lines +182 to +183
self.last_activity_micros
.fetch_max(elapsed, Ordering::Relaxed);
Copy link

Copilot AI Mar 11, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

SharedIdleTracker::update_activity stores elapsed time in microseconds and monitor_task treats last_activity_instant() not advancing as idle. If activity occurs within the same microsecond tick (e.g., very soon after tracker creation), fetch_max(elapsed) may not advance the value, and the connection can be incorrectly timed out after IDLE_TIMEOUT. Consider storing higher-resolution time (e.g., nanoseconds) and/or ensuring each activity update monotonically increments the stored value even when the timestamp quantizes to the same unit.

Suggested change
self.last_activity_micros
.fetch_max(elapsed, Ordering::Relaxed);
let mut current = self.last_activity_micros.load(Ordering::Relaxed);
loop {
let new_value = if elapsed > current {
elapsed
} else {
current.wrapping_add(1)
};
match self.last_activity_micros.compare_exchange(
current,
new_value,
Ordering::Relaxed,
Ordering::Relaxed,
) {
Ok(_) => break,
Err(actual) => {
current = actual;
}
}
}

Copilot uses AI. Check for mistakes.
@yinheli yinheli force-pushed the bugfix/fix-connection-reset branch from e5bc870 to a2f9f44 Compare March 11, 2026 18:37
Copy link

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 3 out of 3 changed files in this pull request and generated 2 comments.


💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment on lines +88 to +95
let mut assigned_port = 0;
for _ in 0..10 {
let port = self.get_available_port()?;
if !mapping.contains_right(&port) {
assigned_port = port;
break;
}
}
Copy link

Copilot AI Mar 11, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

create() chooses a NAT port while holding only a read-lock on mapping, then later inserts under a separate write-lock. Two concurrent create() calls can still pick the same port (OS can reuse ephemeral ports quickly), and BiMap::insert() will typically evict/replace an existing entry when the right-side value already exists, potentially breaking an existing session mapping. Consider selecting+reserving the port under a write-lock (or re-checking and retrying under the write-lock right before insert) so port assignment is race-free.

Copilot uses AI. Check for mistakes.
Comment on lines 125 to +137
pub async fn find(&self, nat_port: u16) -> Option<Session> {
if let Some(addr_key) = self.get_addr_key_by_port_fast(&nat_port).await
&& let Some(session) = self.cache.get(&addr_key).await
{
return Some(*session);
if let Some(addr_key) = self.get_addr_key_by_port_fast(&nat_port).await {
if let Some(session) = self.cache.get(&addr_key).await {
return Some(*session);
}

return Some(Session {
src_addr: addr_key.src_addr,
dst_addr: addr_key.dst_addr,
src_port: addr_key.src_port,
dst_port: addr_key.dst_port,
nat_port,
});
Copy link

Copilot AI Mar 11, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

find() now reconstructs a Session from SessionKey when the cache misses but the mapping still contains the NAT port. This new behavior isn’t exercised by the current tests (they only hit the cache-present path). Consider adding a unit test that forces a cache miss (e.g., invalidate the entry) while keeping the mapping present, and asserts find() still returns the expected session.

Copilot uses AI. Check for mistakes.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants