Skip to content

Commit 17ff22b

Browse files
committed
OF-3176: Fix stanza loss during outgoing S2S session teardown
When an outgoing server-to-server (S2S) connection closes, the connection is marked closed immediately, but route removal from the routing table happens asynchronously during teardown. This creates a short race window where routing can still send stanzas to a closing session, which then sees a closed connection and drops the stanza. This change fixes that behavior by waiting for teardown to fully complete before re-routing a stanza that reached a closed outgoing session. Waiting for close completion ensures stale route cleanup has finished, after which normal routing can use an existing replacement session or create a new one. A new system property, xmpp.server.session.redelivery-timeout (default: 30 seconds), controls how long Openfire waits for teardown completion before returning an error to the sender.
1 parent 35901a4 commit 17ff22b

File tree

3 files changed

+50
-3
lines changed

3 files changed

+50
-3
lines changed

i18n/src/main/resources/openfire_i18n.properties

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -1325,6 +1325,7 @@ system_property.xmpp.server.outgoing.threads-timeout=Amount of time after which
13251325
system_property.xmpp.server.outgoing.queue=Maximum amount of stanzas that are queued while waiting for an outbound server-to-server connection to be established.
13261326
system_property.xmpp.server.outgoing.close-wait-timeout=The maximum amount of time to wait for a previous outgoing server-to-server session to fully complete its teardown before establishing a new one. If this timeout expires, the new session will be established regardless. Increasing this value reduces the risk of the new session colliding with stale state left by the previous one, at the cost of delaying recovery from dropped connections.
13271327
system_property.xmpp.server.session.initialise-timeout=Maximum amount of time in seconds for an outbound S2S session to be initialised
1328+
system_property.xmpp.server.session.redelivery-timeout=Maximum time to wait for an outgoing server session's teardown to complete before abandoning a stanza re-delivery attempt.
13281329
system_property.xmpp.server.session.idle=How long, in milliseconds, before idle inbound server sessions are dropped. Set to -1 to never drop idle sessions.
13291330
system_property.xmpp.server.limits.advertisement.disabled=Disables the advertisement of server limits in the XMPP stream features (XEP-0478) on server connections.
13301331
system_property.xmpp.server.permission-apply-recursive=Defines if the permission setting for server-to-server connection applies recursively for subdomains.

i18n/src/main/resources/openfire_i18n_nl.properties

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -1233,6 +1233,7 @@ system_property.xmpp.server.outgoing.threads-timeout=Tijd waarna inactieve en ov
12331233
system_property.xmpp.server.outgoing.queue=Maximaal aantal stanzas dat in een wachtrij wordt gezet terwijl een nieuwe server-naar-server connectie wordt opgetuigd.
12341234
system_property.xmpp.server.outgoing.close-wait-timeout=De maximale tijd die gewacht wordt totdat een vorige uitgaande server-naar-server verbinding volledig is afgebroken, voordat een nieuwe verbinding wordt opgezet. Als deze tijd verstrijkt, wordt de nieuwe verbinding alsnog opgezet. Een hogere waarde verkleint het risico dat de nieuwe verbinding in conflict komt met verouderde toestand van de vorige verbinding, maar vertraagt wel het herstel na weggevallen verbindingen.
12351235
system_property.xmpp.server.session.initialise-timeout=Maximumtijd in seconden om een uitgaande S2S-sessie te initialiseren
1236+
system_property.xmpp.server.session.redelivery-timeout=De maximale tijd die gewacht wordt totdat het afbreken van een uitgaande serversessie volledig is afgerond, voordat een poging tot herlevering van een stanza wordt opgegeven.
12361237
system_property.xmpp.server.session.idle=Hoe lang, in milliseconden, voordat inactieve server sessies worden afgebroken. Stel in op -1 om inactieve sessies nooit te laten vallen.
12371238
system_property.xmpp.server.limits.advertisement.disabled=Schakelt het adverteren van server-limieten in XMPP stream features (XEP-0478) uit voor server-connecties.
12381239
system_property.xmpp.server.permission-apply-recursive=Bepaalt of de permissie-configuratie voor server-naar-server verbindingen recursief wordt toegepast voor subdomeinen.

xmppserver/src/main/java/org/jivesoftware/openfire/session/LocalOutgoingServerSession.java

Lines changed: 48 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -85,6 +85,16 @@ public class LocalOutgoingServerSession extends LocalServerSession implements Ou
8585
.setDynamic(true)
8686
.build();
8787

88+
/**
89+
* Maximum time to wait for an outgoing server session's teardown to complete before abandoning a stanza re-delivery attempt.
90+
*/
91+
public static final SystemProperty<Duration> REDELIVERY_TIMEOUT_SECONDS = SystemProperty.Builder.ofType(Duration.class)
92+
.setKey("xmpp.server.session.redelivery-timeout")
93+
.setDefaultValue(Duration.ofSeconds(30))
94+
.setChronoUnit(ChronoUnit.SECONDS)
95+
.setDynamic(true)
96+
.build();
97+
8898
private OutgoingServerSocketReader socketReader;
8999
private final Collection<DomainPair> outgoingDomainPairs = new HashSet<>();
90100

@@ -326,10 +336,45 @@ boolean canDeliver(@Nonnull final Packet stanza) {
326336
void deliver(Packet packet) throws UnauthorizedException {
327337
if (!conn.isClosed()) {
328338
conn.deliver(packet);
329-
} else {
330-
Log.warn("Dropping stanza to {} because the connection is closed. Returning error to sender.", packet.getTo());
331-
returnErrorToSenderAsync(packet);
339+
return;
340+
}
341+
342+
// Guard against a double bounce: if re-routing this error stanza fails (i.e. the remote server is still
343+
// unreachable after teardown), OutgoingSessionPromise would attempt to return an error to the original sender,
344+
// producing an error in response to an error.
345+
if (packet.getError() != null) {
346+
Log.debug("Dropping error stanza to {} on closed connection to avoid a double-bounce: {}", packet.getTo(), packet);
347+
return;
332348
}
349+
350+
// The connection is closed but teardown may still be in progress - the routing table entry for this session
351+
// might not yet be removed. Waiting for closeFuture guarantees that all ConnectionCloseListeners (including
352+
// route removal) have finished before we re-route, so the packet router will not hand the stanza straight back
353+
// to this dying session. OF-3176
354+
Log.info("Connection to {} is closed; will attempt re-delivery once teardown completes.", packet.getTo());
355+
356+
// Capture the configured timeout now: the property is dynamic and could change before the callback fires.
357+
final long timeoutSeconds = REDELIVERY_TIMEOUT_SECONDS.getValue().toSeconds();
358+
conn.getCloseFuture()
359+
.toCompletableFuture()
360+
.orTimeout(timeoutSeconds, TimeUnit.SECONDS)
361+
.whenCompleteAsync((v, ex) -> {
362+
if (ex instanceof TimeoutException) {
363+
// The teardown did not complete within the configured window. We cannot safely re-route.
364+
Log.warn("Timed out after {}s waiting for connection teardown to complete for {}; dropping stanza and returning error to sender. Packet: {}", timeoutSeconds, packet.getTo(), packet);
365+
returnErrorToSenderAsync(packet);
366+
} else if (ex != null) {
367+
// Unexpected exception (e.g. CancellationException). Log with stack trace and return an error.
368+
Log.warn("Unexpected exception while waiting for connection teardown to complete for {}; dropping stanza and returning error to sender. Packet: {}", packet.getTo(), packet, ex);
369+
returnErrorToSenderAsync(packet);
370+
} else {
371+
// Teardown is complete. The route for this session has been removed. Delegate back to the packet
372+
// router, which will either find a concurrently-established session or trigger authenticateDomain()
373+
// to create a new one.
374+
Log.debug("Connection teardown complete; re-routing stanza to {}.", packet.getTo());
375+
XMPPServer.getInstance().getPacketRouter().route(packet);
376+
}
377+
});
333378
}
334379

335380
@Override

0 commit comments

Comments
 (0)