Fix connection reset by yinheli · Pull Request #134 · yinheli/kungfu

yinheli · 2026-03-11T15:32:53Z

Refactored idle timeout mechanism in TcpRelay to avoid prematurely closing Keep-Alive connections (e.g., in Chrome). Swapped out the custom active polling with tokio::select! to correctly close idle sockets without triggering TCP RST packets.

Copilot

Pull request overview

Refactors the TCP relay idle-timeout handling to avoid closing Keep-Alive connections in a way that can trigger TCP RSTs, by switching from per-I/O timeout checks to a tokio::select! race between copying and an idle monitor.

Changes:

Replaced IdleTimeoutStream with ActiveStream that tracks activity and byte counts.
Added an async idle monitor and used tokio::select! to stop relaying when the connection is idle.
Moved metrics reporting to run after the relay finishes, using the tracked byte counters.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

src/gateway/relay_tcp.rs

Copilot

Pull request overview

Copilot reviewed 1 out of 1 changed files in this pull request and generated 1 comment.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

src/gateway/relay_tcp.rs

Copilot

Pull request overview

Copilot reviewed 1 out of 1 changed files in this pull request and generated 2 comments.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

src/gateway/relay_tcp.rs

Copilot

Pull request overview

Copilot reviewed 2 out of 2 changed files in this pull request and generated 1 comment.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

src/gateway/relay_tcp.rs

Copilot

Pull request overview

Copilot reviewed 3 out of 3 changed files in this pull request and generated 4 comments.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

src/gateway/relay_tcp.rs

src/gateway/nat.rs

Copilot

Pull request overview

Copilot reviewed 3 out of 3 changed files in this pull request and generated 4 comments.

Comments suppressed due to low confidence (1)

src/gateway/nat.rs:120

Nat::create selects nat_port under a read-lock and only inserts into mapping after an .await on cache.insert. This creates a race where concurrent create calls can observe the same mapping state and end up assigning the same nat_port (and the OS may also re-issue recently freed ephemeral ports). Consider making port allocation + uniqueness check + reservation/insertion into mapping atomic (e.g., loop with a write-lock and re-check before inserting), or otherwise reserving the chosen port before the async boundary.

        let nat_port = {
            let mapping = self.mapping.read().await;

            if let Some(&nat_port) = mapping.get_by_left(&addr_key) {
                return Ok(Session {
                    src_addr,
                    dst_addr,
                    src_port,
                    dst_port,
                    nat_port,
                });
            }

            let mut assigned_port = 0;
            for _ in 0..10 {
                let port = self.get_available_port()?;
                if !mapping.contains_right(&port) {
                    assigned_port = port;
                    break;
                }
            }

            if assigned_port == 0 {
                return Err(Error::new(
                    ErrorKind::AddrInUse,
                    "No available NAT port via OS allocation",
                ));
            }

            assigned_port
        };

        let session = Arc::new(Session {
            src_addr,
            dst_addr,
            src_port,
            dst_port,
            nat_port,
        });

        self.cache.insert(addr_key, session.clone()).await;

        {
            let mut mapping = self.mapping.write().await;
            mapping.insert(addr_key, nat_port);
        }

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

src/gateway/nat.rs

Copilot · 2026-03-11T18:24:59Z

src/gateway/nat.rs

+        let addr_key = SessionKey {
+            src_addr,
+            dst_addr,
+            src_port,
+            dst_port,
+        };


create() now keys sessions by the full 4-tuple (SessionKey), but the unit test only exercises a single tuple and doesn’t assert that different destinations (or ports) produce distinct sessions/ports. Consider extending the tests to create at least two different SessionKey values and assert they don’t alias (and that find() resolves each nat_port correctly).

Copilot · 2026-03-11T18:25:00Z

src/gateway/relay_tcp.rs

+        self.last_activity_micros
+            .fetch_max(elapsed, Ordering::Relaxed);


SharedIdleTracker::update_activity stores elapsed time in microseconds and monitor_task treats last_activity_instant() not advancing as idle. If activity occurs within the same microsecond tick (e.g., very soon after tracker creation), fetch_max(elapsed) may not advance the value, and the connection can be incorrectly timed out after IDLE_TIMEOUT. Consider storing higher-resolution time (e.g., nanoseconds) and/or ensuring each activity update monotonically increments the stored value even when the timestamp quantizes to the same unit.

Suggested change

self.last_activity_micros

.fetch_max(elapsed, Ordering::Relaxed);

let mut current = self.last_activity_micros.load(Ordering::Relaxed);

loop {

let new_value = if elapsed > current {

elapsed

} else {

current.wrapping_add(1)

};

match self.last_activity_micros.compare_exchange(

current,

new_value,

Ordering::Relaxed,

Ordering::Relaxed,

) {

Ok(_) => break,

Err(actual) => {

current = actual;

}

}

}

src/gateway/nat.rs

Copilot

Pull request overview

Copilot reviewed 3 out of 3 changed files in this pull request and generated 2 comments.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Copilot · 2026-03-11T18:38:51Z

src/gateway/nat.rs

+            let mut assigned_port = 0;
+            for _ in 0..10 {
+                let port = self.get_available_port()?;
+                if !mapping.contains_right(&port) {
+                    assigned_port = port;
+                    break;
+                }
+            }


create() chooses a NAT port while holding only a read-lock on mapping, then later inserts under a separate write-lock. Two concurrent create() calls can still pick the same port (OS can reuse ephemeral ports quickly), and BiMap::insert() will typically evict/replace an existing entry when the right-side value already exists, potentially breaking an existing session mapping. Consider selecting+reserving the port under a write-lock (or re-checking and retrying under the write-lock right before insert) so port assignment is race-free.

Copilot · 2026-03-11T18:38:52Z

src/gateway/nat.rs

    pub async fn find(&self, nat_port: u16) -> Option<Session> {
-        if let Some(addr_key) = self.get_addr_key_by_port_fast(&nat_port).await
-            && let Some(session) = self.cache.get(&addr_key).await
-        {
-            return Some(*session);
+        if let Some(addr_key) = self.get_addr_key_by_port_fast(&nat_port).await {
+            if let Some(session) = self.cache.get(&addr_key).await {
+                return Some(*session);
+            }
+
+            return Some(Session {
+                src_addr: addr_key.src_addr,
+                dst_addr: addr_key.dst_addr,
+                src_port: addr_key.src_port,
+                dst_port: addr_key.dst_port,
+                nat_port,
+            });


find() now reconstructs a Session from SessionKey when the cache misses but the mapping still contains the NAT port. This new behavior isn’t exercised by the current tests (they only hit the cache-present path). Consider adding a unit test that forces a cache miss (e.g., invalidate the entry) while keeping the mapping present, and asserts find() still returns the expected session.

yinheli requested a review from Copilot March 11, 2026 15:33

Copilot started reviewing on behalf of yinheli March 11, 2026 15:36 View session