Conversation
There was a problem hiding this comment.
Pull request overview
Refactors the TCP relay idle-timeout handling to avoid closing Keep-Alive connections in a way that can trigger TCP RSTs, by switching from per-I/O timeout checks to a tokio::select! race between copying and an idle monitor.
Changes:
- Replaced
IdleTimeoutStreamwithActiveStreamthat tracks activity and byte counts. - Added an async idle monitor and used
tokio::select!to stop relaying when the connection is idle. - Moved metrics reporting to run after the relay finishes, using the tracked byte counters.
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
db97c79 to
52b9cdd
Compare
There was a problem hiding this comment.
Pull request overview
Copilot reviewed 1 out of 1 changed files in this pull request and generated 1 comment.
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
52b9cdd to
76014b6
Compare
There was a problem hiding this comment.
Pull request overview
Copilot reviewed 1 out of 1 changed files in this pull request and generated 2 comments.
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
76014b6 to
c1e95b3
Compare
There was a problem hiding this comment.
Pull request overview
Copilot reviewed 2 out of 2 changed files in this pull request and generated 1 comment.
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
63ec79d to
6120a1b
Compare
There was a problem hiding this comment.
Pull request overview
Copilot reviewed 3 out of 3 changed files in this pull request and generated 4 comments.
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
4c2950c to
769c73b
Compare
There was a problem hiding this comment.
Pull request overview
Copilot reviewed 3 out of 3 changed files in this pull request and generated 4 comments.
Comments suppressed due to low confidence (1)
src/gateway/nat.rs:120
Nat::createselectsnat_portunder a read-lock and only inserts intomappingafter an.awaitoncache.insert. This creates a race where concurrentcreatecalls can observe the samemappingstate and end up assigning the samenat_port(and the OS may also re-issue recently freed ephemeral ports). Consider making port allocation + uniqueness check + reservation/insertion intomappingatomic (e.g., loop with a write-lock and re-check before inserting), or otherwise reserving the chosen port before the async boundary.
let nat_port = {
let mapping = self.mapping.read().await;
if let Some(&nat_port) = mapping.get_by_left(&addr_key) {
return Ok(Session {
src_addr,
dst_addr,
src_port,
dst_port,
nat_port,
});
}
let mut assigned_port = 0;
for _ in 0..10 {
let port = self.get_available_port()?;
if !mapping.contains_right(&port) {
assigned_port = port;
break;
}
}
if assigned_port == 0 {
return Err(Error::new(
ErrorKind::AddrInUse,
"No available NAT port via OS allocation",
));
}
assigned_port
};
let session = Arc::new(Session {
src_addr,
dst_addr,
src_port,
dst_port,
nat_port,
});
self.cache.insert(addr_key, session.clone()).await;
{
let mut mapping = self.mapping.write().await;
mapping.insert(addr_key, nat_port);
}
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| let addr_key = SessionKey { | ||
| src_addr, | ||
| dst_addr, | ||
| src_port, | ||
| dst_port, | ||
| }; |
There was a problem hiding this comment.
create() now keys sessions by the full 4-tuple (SessionKey), but the unit test only exercises a single tuple and doesn’t assert that different destinations (or ports) produce distinct sessions/ports. Consider extending the tests to create at least two different SessionKey values and assert they don’t alias (and that find() resolves each nat_port correctly).
| self.last_activity_micros | ||
| .fetch_max(elapsed, Ordering::Relaxed); |
There was a problem hiding this comment.
SharedIdleTracker::update_activity stores elapsed time in microseconds and monitor_task treats last_activity_instant() not advancing as idle. If activity occurs within the same microsecond tick (e.g., very soon after tracker creation), fetch_max(elapsed) may not advance the value, and the connection can be incorrectly timed out after IDLE_TIMEOUT. Consider storing higher-resolution time (e.g., nanoseconds) and/or ensuring each activity update monotonically increments the stored value even when the timestamp quantizes to the same unit.
| self.last_activity_micros | |
| .fetch_max(elapsed, Ordering::Relaxed); | |
| let mut current = self.last_activity_micros.load(Ordering::Relaxed); | |
| loop { | |
| let new_value = if elapsed > current { | |
| elapsed | |
| } else { | |
| current.wrapping_add(1) | |
| }; | |
| match self.last_activity_micros.compare_exchange( | |
| current, | |
| new_value, | |
| Ordering::Relaxed, | |
| Ordering::Relaxed, | |
| ) { | |
| Ok(_) => break, | |
| Err(actual) => { | |
| current = actual; | |
| } | |
| } | |
| } |
61b14c5 to
e5bc870
Compare
e5bc870 to
a2f9f44
Compare
There was a problem hiding this comment.
Pull request overview
Copilot reviewed 3 out of 3 changed files in this pull request and generated 2 comments.
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| let mut assigned_port = 0; | ||
| for _ in 0..10 { | ||
| let port = self.get_available_port()?; | ||
| if !mapping.contains_right(&port) { | ||
| assigned_port = port; | ||
| break; | ||
| } | ||
| } |
There was a problem hiding this comment.
create() chooses a NAT port while holding only a read-lock on mapping, then later inserts under a separate write-lock. Two concurrent create() calls can still pick the same port (OS can reuse ephemeral ports quickly), and BiMap::insert() will typically evict/replace an existing entry when the right-side value already exists, potentially breaking an existing session mapping. Consider selecting+reserving the port under a write-lock (or re-checking and retrying under the write-lock right before insert) so port assignment is race-free.
| pub async fn find(&self, nat_port: u16) -> Option<Session> { | ||
| if let Some(addr_key) = self.get_addr_key_by_port_fast(&nat_port).await | ||
| && let Some(session) = self.cache.get(&addr_key).await | ||
| { | ||
| return Some(*session); | ||
| if let Some(addr_key) = self.get_addr_key_by_port_fast(&nat_port).await { | ||
| if let Some(session) = self.cache.get(&addr_key).await { | ||
| return Some(*session); | ||
| } | ||
|
|
||
| return Some(Session { | ||
| src_addr: addr_key.src_addr, | ||
| dst_addr: addr_key.dst_addr, | ||
| src_port: addr_key.src_port, | ||
| dst_port: addr_key.dst_port, | ||
| nat_port, | ||
| }); |
There was a problem hiding this comment.
find() now reconstructs a Session from SessionKey when the cache misses but the mapping still contains the NAT port. This new behavior isn’t exercised by the current tests (they only hit the cache-present path). Consider adding a unit test that forces a cache miss (e.g., invalidate the entry) while keeping the mapping present, and asserts find() still returns the expected session.
Refactored idle timeout mechanism in TcpRelay to avoid prematurely closing Keep-Alive connections (e.g., in Chrome). Swapped out the custom active polling with tokio::select! to correctly close idle sockets without triggering TCP RST packets.