Skip to content

Server Netty. Set proper call.coroutineContext to fix HttpRequestLifecycle#5421

Open
zibet27 wants to merge 6 commits intorelease/3.xfrom
zibet27/http-request-lifecycle-breaks-call-logging
Open

Server Netty. Set proper call.coroutineContext to fix HttpRequestLifecycle#5421
zibet27 wants to merge 6 commits intorelease/3.xfrom
zibet27/http-request-lifecycle-breaks-call-logging

Conversation

@zibet27
Copy link
Collaborator

@zibet27 zibet27 commented Mar 4, 2026

Subsystem
Netty Server

Motivation
KTOR-9343 HttpRequestLifecycle plugin with cancelCallOnClose on, cancels subsequent requests when CallLogging plugin with callIdMdc is installed
The problem is that cancellation of call.coroutineContext was cancelling userAppContext (application.coroutineContext + userContext). They were identical.

Solution

  • Refactored Netty HTTP/1 and HTTP/2 handlers to inline call execution instead of using a separate NettyApplicationCallHandler channel handler
  • Use a separate job inside of call.coroutineContext

@zibet27 zibet27 requested review from bjhham, e5l and osipxd March 4, 2026 10:18
@zibet27 zibet27 self-assigned this Mar 4, 2026
@coderabbitai
Copy link
Contributor

coderabbitai bot commented Mar 4, 2026

Note

Reviews paused

It looks like this branch is under active development. To avoid overwhelming you with review comments due to an influx of new commits, CodeRabbit has automatically paused this review. You can configure this behavior by changing the reviews.auto_review.auto_pause_after_reviewed_commits setting.

Use the following commands to manage reviews:

  • @coderabbitai resume to resume automatic reviews.
  • @coderabbitai review to trigger a single review.

Use the checkboxes below for quick actions:

  • ▶️ Resume reviews
  • 🔍 Trigger review
📝 Walkthrough

Walkthrough

Replaces the monolithic NettyApplicationCallHandler with an internal object and HTTP/1 helpers; adds per-call coroutineContext/job wiring for HTTP/1 and HTTP/2 handlers, moves request validation and 400/408 response logic to utilities, removes Netty dispatcher, and adds/updates tests for call-logging and disconnection.

Changes

Cohort / File(s) Summary
Netty application call handler & HTTP/1 helpers
ktor-server/ktor-server-netty/jvm/src/io/ktor/server/netty/NettyApplicationCallHandler.kt
Removed previous class-based ChannelInbound handler; added internal NettyApplicationCallHandler object with CallHandlerCoroutineName, HTTP/1 request validation, 408/400 response helpers, transfer-encoding validation, and logging utilities.
HTTP/1.1 call & handler
ktor-server/ktor-server-netty/jvm/src/io/ktor/server/netty/http1/NettyHttp1ApplicationCall.kt, ktor-server/ktor-server-netty/jvm/src/io/ktor/server/netty/http1/NettyHttp1Handler.kt
Promoted httpRequest to a public property; changed per-call coroutineContext/job wiring and launched per-call coroutines; removed CoroutineScope from handler; added per-call cancellation, timeout (408) handling, and connection-close logic.
HTTP/2 call & handler
ktor-server/ktor-server-netty/jvm/src/io/ktor/server/netty/http2/NettyHttp2ApplicationCall.kt, ktor-server/ktor-server-netty/jvm/src/io/ktor/server/netty/http2/NettyHttp2Handler.kt
Replaced userContext with constructor coroutineContext; removed handler CoroutineScope; added per-call job/context orchestration, on-stream-close handling, async engine pipeline execution, and a cancel() method.
Response dispatch / dispatcher removal
ktor-server/ktor-server-netty/jvm/src/io/ktor/server/netty/CIO.kt, ktor-server/ktor-server-netty/jvm/src/io/ktor/server/netty/http1/NettyHttp1ApplicationResponse.kt
Removed Netty-specific NettyDispatcher and CurrentContext machinery; simplified response upgrade to pass original user context directly.
Tests: lifecycle & disconnection
ktor-server/ktor-server-test-suites/jvm/src/io/ktor/server/testing/suites/HttpRequestLifecycleTest.kt, ktor-server/ktor-server-cio/jvm/test/io/ktor/tests/server/cio/CIOEngineTestJvm.kt
Added testDisconnection helper, new tests for CallLogging and disconnection (including pipelined requests cancellation), and an @Ignore override for the CIO pipelined test.

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~50 minutes

Possibly related PRs

Suggested labels

👍 ship!

Suggested reviewers

  • bjhham
  • osipxd
🚥 Pre-merge checks | ✅ 2 | ❌ 1

❌ Failed checks (1 warning)

Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 3.13% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
✅ Passed checks (2 passed)
Check name Status Explanation
Title check ✅ Passed The title clearly summarizes the main change: setting proper call.coroutineContext in Netty Server to fix HttpRequestLifecycle plugin issues.
Description check ✅ Passed The pull request description provides all required sections with clear information about the subsystem, motivation (referencing KTOR-9343), and solution with technical details about the refactoring approach.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Commit unit tests in branch zibet27/http-request-lifecycle-breaks-call-logging
📝 Coding Plan
  • Generate coding plan for human review comments

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

Tip

You can customize the tone of the review comments and chat replies.

Configure the tone_instructions setting to customize the tone of the review comments and chat replies. For example, you can set the tone to Act like a strict teacher, Act like a pirate and more.

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 3

🧹 Nitpick comments (2)
ktor-server/ktor-server-netty/jvm/src/io/ktor/server/netty/NettyApplicationCallHandler.kt (1)

45-60: Consider limiting error detail exposure to clients.

The causeMessage from decoderResult().cause() is sent directly in the response body. While useful for debugging, this could expose internal parsing details (e.g., buffer states, parsing internals) to clients.

Consider truncating or sanitizing the message, or only including it when a debug flag is enabled:

🛡️ Suggested approach
 internal suspend fun NettyHttp1ApplicationCall.respondError400BadRequest() {
     logCause()

-    val causeMessage = failureCause?.message?.toByteArray(charset = Charsets.UTF_8)
+    val causeMessage = if (application.developmentMode) {
+        failureCause?.message?.toByteArray(charset = Charsets.UTF_8)
+    } else {
+        null
+    }
     val content = if (causeMessage != null) ByteReadChannel(causeMessage) else ByteReadChannel.Empty
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In
`@ktor-server/ktor-server-netty/jvm/src/io/ktor/server/netty/NettyApplicationCallHandler.kt`
around lines 45 - 60, The respondError400BadRequest handler currently sends
failureCause message directly to clients; change it to avoid leaking internals
by sanitizing/truncating or gating the detail behind a debug flag: in
NettyHttp1ApplicationCall.respondError400BadRequest use failureCause only when a
configurable debug/verbose flag (e.g., applicationConfig.isDebug or
call.environment.config) is true, otherwise replace with a generic message like
"Bad request" and/or trim the cause text to a short safe length (e.g., first N
characters) and strip sensitive tokens before creating the ByteReadChannel; keep
the logCause() call for server logs but ensure response.headers and
response.sendResponse still use the sanitized/truncated content and
contentLength.
ktor-server/ktor-server-netty/jvm/src/io/ktor/server/netty/http1/NettyHttp1Handler.kt (1)

114-119: Minor: Consider handling ReadTimeoutException when no active request.

If currentJob is null when ReadTimeoutException occurs (edge case between requests), no response is sent and the channel isn't closed. This could leave the connection in an undefined state.

💡 Optional improvement
             is ReadTimeoutException -> {
-                currentJob?.let {
+                val job = currentJob
+                if (job != null) {
                     context.respond408RequestTimeoutHttp1()
-                    it.cancel(CancellationException(cause))
+                    job.cancel(CancellationException(cause))
+                } else {
+                    // No active request - just close the idle connection
+                    context.close()
                 }
             }
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In
`@ktor-server/ktor-server-netty/jvm/src/io/ktor/server/netty/http1/NettyHttp1Handler.kt`
around lines 114 - 119, ReadTimeoutException handling currently only acts when
currentJob is non-null, leaving the connection open if the timeout occurs
between requests; modify the ReadTimeoutException branch (in
NettyHttp1Handler.kt where ReadTimeoutException is matched and currentJob is
referenced) to also handle the null-case by sending a 408 or closing the
channel: if currentJob is null, call context.respond408RequestTimeoutHttp1() or
ensure the Netty channel is closed (e.g., channel.close()) so the connection is
not left in an undefined state while preserving the existing behavior of
canceling the job when present.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In
`@ktor-server/ktor-server-netty/jvm/src/io/ktor/server/netty/http1/NettyHttp1Handler.kt`:
- Around line 217-219: NettyHttp1Handler.cancel() is never registered for plain
HTTP/1.1 connections, causing handlerJob/coroutine resources to leak; in
NettyChannelInitializer (HTTP/1.1 branch where the NettyHttp1Handler instance is
created), add a closeFuture listener on pipeline.channel() that calls
handler.cancel() (same pattern used for the HTTP/2 and H2C upgrade branches
using pipeline.channel().closeFuture().addListener { handler.cancel() }) so the
handler's job is cancelled when the channel closes.
- Around line 42-43: In NettyHttp1Handler mark the shared mutable fields
currentJob and currentCall with the `@Volatile` annotation to ensure visibility
across threads; locate the declarations "private var currentJob: Job? = null"
and "private var currentCall: NettyHttp1ApplicationCall? = null" and add
`@Volatile` above each so writes inside callEventGroup.execute() are visible to
reads on the channel I/O thread (channelInactive/onConnectionClose), mirroring
the pattern used in NettyApplicationResponse.

In
`@ktor-server/ktor-server-netty/jvm/src/io/ktor/server/netty/http2/NettyHttp2Handler.kt`:
- Around line 121-124: The launched call coroutines (created via userScope =
CoroutineScope(userCoroutineContext) and launched in the callEventGroup.execute
block) are not children of the handlerJob SupervisorJob, so cancelling
handlerJob on connection close doesn't cancel in-flight
enginePipeline.execute(call) jobs; fix by making the launched call jobs descend
from handlerJob—e.g., create the scope or launch with a context that includes
handlerJob (combine handlerJob with userCoroutineContext or pass handlerJob into
the CoroutineScope/launch) so that the call coroutine (callJob) is cancelled
when handlerJob is cancelled; adjust the code around callEventGroup.execute,
userScope, and the launch invocation to include handlerJob in the coroutine
context.

---

Nitpick comments:
In
`@ktor-server/ktor-server-netty/jvm/src/io/ktor/server/netty/http1/NettyHttp1Handler.kt`:
- Around line 114-119: ReadTimeoutException handling currently only acts when
currentJob is non-null, leaving the connection open if the timeout occurs
between requests; modify the ReadTimeoutException branch (in
NettyHttp1Handler.kt where ReadTimeoutException is matched and currentJob is
referenced) to also handle the null-case by sending a 408 or closing the
channel: if currentJob is null, call context.respond408RequestTimeoutHttp1() or
ensure the Netty channel is closed (e.g., channel.close()) so the connection is
not left in an undefined state while preserving the existing behavior of
canceling the job when present.

In
`@ktor-server/ktor-server-netty/jvm/src/io/ktor/server/netty/NettyApplicationCallHandler.kt`:
- Around line 45-60: The respondError400BadRequest handler currently sends
failureCause message directly to clients; change it to avoid leaking internals
by sanitizing/truncating or gating the detail behind a debug flag: in
NettyHttp1ApplicationCall.respondError400BadRequest use failureCause only when a
configurable debug/verbose flag (e.g., applicationConfig.isDebug or
call.environment.config) is true, otherwise replace with a generic message like
"Bad request" and/or trim the cause text to a short safe length (e.g., first N
characters) and strip sensitive tokens before creating the ByteReadChannel; keep
the logCause() call for server logs but ensure response.headers and
response.sendResponse still use the sanitized/truncated content and
contentLength.

ℹ️ Review info
⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

Run ID: daba7079-02e1-4f07-bae4-cc6a5a4e94c3

📥 Commits

Reviewing files that changed from the base of the PR and between 975ea8d and 32fb307.

📒 Files selected for processing (6)
  • ktor-server/ktor-server-netty/jvm/src/io/ktor/server/netty/NettyApplicationCallHandler.kt
  • ktor-server/ktor-server-netty/jvm/src/io/ktor/server/netty/http1/NettyHttp1ApplicationCall.kt
  • ktor-server/ktor-server-netty/jvm/src/io/ktor/server/netty/http1/NettyHttp1Handler.kt
  • ktor-server/ktor-server-netty/jvm/src/io/ktor/server/netty/http2/NettyHttp2ApplicationCall.kt
  • ktor-server/ktor-server-netty/jvm/src/io/ktor/server/netty/http2/NettyHttp2Handler.kt
  • ktor-server/ktor-server-test-suites/jvm/src/io/ktor/server/testing/suites/HttpRequestLifecycleTest.kt

@zibet27 zibet27 force-pushed the zibet27/http-request-lifecycle-breaks-call-logging branch from 32fb307 to a2d6271 Compare March 4, 2026 10:49
Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🧹 Nitpick comments (3)
ktor-server/ktor-server-netty/jvm/src/io/ktor/server/netty/http1/NettyHttp1Handler.kt (1)

117-121: Verify cancellation effectiveness during undispatched coroutine execution.

With CoroutineStart.UNDISPATCHED, the coroutine executes synchronously until its first suspension point before launch returns and currentJob is assigned. If a ReadTimeoutException occurs on the I/O thread during this undispatched phase, currentJob would still be null and cancellation would silently fail.

This is a narrow race window that likely won't occur in practice, but consider assigning the job reference earlier or using CoroutineStart.DEFAULT if this edge case matters.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In
`@ktor-server/ktor-server-netty/jvm/src/io/ktor/server/netty/http1/NettyHttp1Handler.kt`
around lines 117 - 121, The ReadTimeoutException handler may miss cancelling the
coroutine when launch is started with CoroutineStart.UNDISPATCHED because
currentJob can be null during undispatched synchronous execution; to fix, ensure
the job reference is captured before starting work (e.g., assign currentJob =
launch(...) immediately or obtain the Job from the coroutine builder before
allowing undispatched execution) or switch the launch to CoroutineStart.DEFAULT
so the coroutine is dispatched and currentJob is set prior to running; update
the code paths around currentJob, the coroutine launch call, and the
ReadTimeoutException branch that calls context.respond408RequestTimeoutHttp1()
and cancels with CancellationException(cause) to use the earlier-captured Job
reference.
ktor-server/ktor-server-netty/jvm/src/io/ktor/server/netty/NettyApplicationCallHandler.kt (1)

45-70: Consider sanitizing or limiting the error message length sent to clients.

The failureCause?.message is sent directly in the 400 response body. While decoder error messages are typically controlled, this could potentially expose internal details if a custom decoder produces verbose messages. Consider either:

  1. Using a generic message like "Bad Request"
  2. Truncating the message to a reasonable length

That said, this follows existing Ktor error response patterns, so this is a minor consideration.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In
`@ktor-server/ktor-server-netty/jvm/src/io/ktor/server/netty/NettyApplicationCallHandler.kt`
around lines 45 - 70, The 400 handler
(NettyHttp1ApplicationCall.respondError400BadRequest) writes
failureCause?.message directly to the response body; change it to avoid leaking
long/internal messages by either replacing with a fixed generic string (e.g.,
"Bad Request") or truncating the message to a safe max length before converting
to bytes, then use that sanitized text for ByteReadChannel and Content-Length;
keep existing header and connection handling and still call logCause() so full
details remain logged server-side.
ktor-server/ktor-server-netty/jvm/src/io/ktor/server/netty/http2/NettyHttp2Handler.kt (1)

243-245: Mark cancel() as internal.

The method is only called from within the same module (NettyChannelInitializer). Per coding guidelines, prefer internal visibility by default.

♻️ Suggested fix
-    fun cancel() {
+    internal fun cancel() {
         handlerJob.cancel()
     }

As per coding guidelines: "Prefer internal visibility by default; keep public surface intentional."

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In
`@ktor-server/ktor-server-netty/jvm/src/io/ktor/server/netty/http2/NettyHttp2Handler.kt`
around lines 243 - 245, The public cancel() function in NettyHttp2Handler should
be made internal since it's only used inside the module; update the visibility
of fun cancel() to internal in the NettyHttp2Handler class so callers like
NettyChannelInitializer still work while following the "prefer internal by
default" guideline.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Nitpick comments:
In
`@ktor-server/ktor-server-netty/jvm/src/io/ktor/server/netty/http1/NettyHttp1Handler.kt`:
- Around line 117-121: The ReadTimeoutException handler may miss cancelling the
coroutine when launch is started with CoroutineStart.UNDISPATCHED because
currentJob can be null during undispatched synchronous execution; to fix, ensure
the job reference is captured before starting work (e.g., assign currentJob =
launch(...) immediately or obtain the Job from the coroutine builder before
allowing undispatched execution) or switch the launch to CoroutineStart.DEFAULT
so the coroutine is dispatched and currentJob is set prior to running; update
the code paths around currentJob, the coroutine launch call, and the
ReadTimeoutException branch that calls context.respond408RequestTimeoutHttp1()
and cancels with CancellationException(cause) to use the earlier-captured Job
reference.

In
`@ktor-server/ktor-server-netty/jvm/src/io/ktor/server/netty/http2/NettyHttp2Handler.kt`:
- Around line 243-245: The public cancel() function in NettyHttp2Handler should
be made internal since it's only used inside the module; update the visibility
of fun cancel() to internal in the NettyHttp2Handler class so callers like
NettyChannelInitializer still work while following the "prefer internal by
default" guideline.

In
`@ktor-server/ktor-server-netty/jvm/src/io/ktor/server/netty/NettyApplicationCallHandler.kt`:
- Around line 45-70: The 400 handler
(NettyHttp1ApplicationCall.respondError400BadRequest) writes
failureCause?.message directly to the response body; change it to avoid leaking
long/internal messages by either replacing with a fixed generic string (e.g.,
"Bad Request") or truncating the message to a safe max length before converting
to bytes, then use that sanitized text for ByteReadChannel and Content-Length;
keep existing header and connection handling and still call logCause() so full
details remain logged server-side.

ℹ️ Review info
⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

Run ID: 8f597f9c-da19-43ee-a29e-aa0e7d6694ce

📥 Commits

Reviewing files that changed from the base of the PR and between 32fb307 and a2d6271.

📒 Files selected for processing (6)
  • ktor-server/ktor-server-netty/jvm/src/io/ktor/server/netty/NettyApplicationCallHandler.kt
  • ktor-server/ktor-server-netty/jvm/src/io/ktor/server/netty/http1/NettyHttp1ApplicationCall.kt
  • ktor-server/ktor-server-netty/jvm/src/io/ktor/server/netty/http1/NettyHttp1Handler.kt
  • ktor-server/ktor-server-netty/jvm/src/io/ktor/server/netty/http2/NettyHttp2ApplicationCall.kt
  • ktor-server/ktor-server-netty/jvm/src/io/ktor/server/netty/http2/NettyHttp2Handler.kt
  • ktor-server/ktor-server-test-suites/jvm/src/io/ktor/server/testing/suites/HttpRequestLifecycleTest.kt
🚧 Files skipped from review as they are similar to previous changes (2)
  • ktor-server/ktor-server-netty/jvm/src/io/ktor/server/netty/http2/NettyHttp2ApplicationCall.kt
  • ktor-server/ktor-server-netty/jvm/src/io/ktor/server/netty/http1/NettyHttp1ApplicationCall.kt

Copy link
Member

@e5l e5l left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review — Request Changes

1. Race condition: call.coroutineContext set after responseWriter.processResponse(call) (Critical)

In NettyHttp1Handler.handleRequest(), the sequence is:

  1. responseWriter.processResponse(call) — called on the I/O thread
  2. callEventGroup.execute { ... } — scheduled, runs later
  3. Inside the coroutine: call.coroutineContext = this@callJob.coroutineContext

There is a window between step 1 and step 3 where call.coroutineContext is still the initial userContext. If onConnectionClose fires in this window (channel goes inactive before the coroutine starts — possible under load), it calls HttpRequestCloseHandlerKey, which invokes call.coroutineContext.cancel(...). At that point coroutineContext is the parent userContext — cancelling it would cancel all in-flight calls sharing that context, not just this one.

The old code didn't have this problem because NettyApplicationCallHandler set the context before the call was visible. The PR narrows the window but doesn't close it.

Suggestion: Create the call's Job before constructing the call object, pass it into the constructor, and set it into the context at construction time rather than deferring it to the coroutine body.

2. currentJob may be null when ReadTimeoutException fires (Critical)

In exceptionCaught:

is ReadTimeoutException -> {
    currentJob?.let {
        context.respond408RequestTimeoutHttp1()
        it.cancel(CancellationException(cause))
    }
}

currentJob is set inside callEventGroup.execute { ... }, which is queued after handleRequest returns. ReadTimeoutException fires on the I/O thread from Netty's ReadTimeoutHandler. If the timeout fires before callEventGroup.execute has run (plausible under load or very short timeouts), currentJob is null, the let block is skipped — no 408 response, no cancellation, silent drop.

The old code also fell back to ctx.fireExceptionCaught(cause) when no job was active. That fallback is now absent.

3. handlerJob never cancelled on normal channelInactive

In exceptionCaught, IOException cancels handlerJob:

is IOException -> {
    handlerJob.cancel()
}

But in channelInactive, only onConnectionClose is called — handlerJob is not cancelled. Since NettyHttpResponsePipeline uses handlerJob as its coroutine context parent and launches body-writing coroutines from it, these coroutines have no cancellation signal when the channel closes normally (without an IOException).

4. HTTP/2: onStreamClose called on both Http2ResetFrame and channelInactive

The pre-PR code only handled Http2ResetFrame by closing contentActor and never called a close handler. The new code adds onStreamClose to Http2ResetFrame handling. A reset frame does not necessarily mean the channel is inactive. If the channel later goes inactive naturally (e.g., graceful shutdown), channelInactive fires onStreamClose again. The null-check on applicationCall prevents double-invocation, but adding onStreamClose to reset frame handling could prematurely cancel a call that has already sent response headers but not the body.

5. Minor: cancel() on NettyHttp2Handler missing internal modifier

fun cancel() {
    handlerJob.cancel()
}

Per project guidelines ("Prefer internal by default"), this should be internal fun cancel(). The class itself is internal so this isn't a public API leak, but it should be explicit.


The core idea of fixing call.coroutineContext is correct and the CallLogging integration tests are valuable. However, the race condition in issue #1 is the same class of bug this PR aims to fix — the context is wrong for a window of time — and the timeout handling gap in issue #2 is a regression from the old behavior. These should be addressed before merging.

@osipxd osipxd removed their request for review March 12, 2026 14:48
Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In
`@ktor-server/ktor-server-netty/jvm/src/io/ktor/server/netty/http2/NettyHttp2Handler.kt`:
- Around line 85-90: onStreamClose currently clears context.applicationCall and
invokes the HttpRequestCloseHandlerKey only if already registered, which lets an
early disconnect bypass HttpRequestLifecycle cancellation; modify the flow so
that onStreamClose (and the async handoff in CallSetup) atomically record a
"stream-closed" state on the Channel/ChannelHandlerContext (e.g., a
Channel/Context attribute) before clearing applicationCall, and when CallSetup
registers HttpRequestCloseHandlerKey it must check that attribute and
immediately invoke the handler (or trigger cancellation) if the channel was
closed earlier; alternatively ensure the channel-close is observable to the
async dispatch by deferring clearing context.applicationCall until after the
close handler is invoked/replayed. Ensure changes reference onStreamClose,
HttpRequestCloseHandlerKey and the CallSetup/async handoff path so the
replay/notification always occurs even if the child channel went inactive before
handler registration.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

Run ID: 68184899-1675-4ada-937d-3ab8b46f19c0

📥 Commits

Reviewing files that changed from the base of the PR and between a2d6271 and fef7757.

📒 Files selected for processing (6)
  • ktor-server/ktor-server-netty/jvm/src/io/ktor/server/netty/CIO.kt
  • ktor-server/ktor-server-netty/jvm/src/io/ktor/server/netty/http1/NettyHttp1ApplicationCall.kt
  • ktor-server/ktor-server-netty/jvm/src/io/ktor/server/netty/http1/NettyHttp1ApplicationResponse.kt
  • ktor-server/ktor-server-netty/jvm/src/io/ktor/server/netty/http1/NettyHttp1Handler.kt
  • ktor-server/ktor-server-netty/jvm/src/io/ktor/server/netty/http2/NettyHttp2ApplicationCall.kt
  • ktor-server/ktor-server-netty/jvm/src/io/ktor/server/netty/http2/NettyHttp2Handler.kt
💤 Files with no reviewable changes (1)
  • ktor-server/ktor-server-netty/jvm/src/io/ktor/server/netty/CIO.kt

@zibet27 zibet27 force-pushed the zibet27/http-request-lifecycle-breaks-call-logging branch from 599433e to b0052bb Compare March 13, 2026 10:19
Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🧹 Nitpick comments (3)
ktor-server/ktor-server-test-suites/jvm/src/io/ktor/server/testing/suites/HttpRequestLifecycleTest.kt (1)

167-211: Consider extracting shared CallLogging server setup.

Lines 169-181 and Lines 198-208 repeat nearly identical server/plugin wiring; extracting a small helper would keep these tests easier to evolve.

♻️ Suggested refactor
+    private suspend fun startLifecycleServerWithCallLogging(
+        route: suspend RoutingContext.() -> Unit
+    ) {
+        val server = createServer {
+            install(HttpRequestLifecycle) {
+                cancelCallOnClose = true
+            }
+            install(CallLogging) {
+                mdc("something") { "something else" }
+            }
+            routing { get("/hello") { route() } }
+        }
+        startServer(server)
+    }
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In
`@ktor-server/ktor-server-test-suites/jvm/src/io/ktor/server/testing/suites/HttpRequestLifecycleTest.kt`
around lines 167 - 211, The tests testHttpRequestLifecycleWithCallLogging and
testHttpRequestLifecycleCancelWithCallLogging duplicate server setup (installing
HttpRequestLifecycle with cancelCallOnClose = true and CallLogging with
mdc("something") { "something else" }) — extract that repeated wiring into a
small helper factory (e.g., createCallLoggingServer or
configureCallLoggingServer) and use it from both tests; move the common
install(...) and routing { ... } pieces into that helper and have the tests call
createServer or configureRoute wrapper so they only supply the differing bits
(like the specific routing body or get handler), keeping references to
createServer, startServer, HttpRequestLifecycle, and CallLogging consistent.
ktor-server/ktor-server-cio/jvm/test/io/ktor/tests/server/cio/CIOEngineTestJvm.kt (1)

99-100: Add a reason to @Ignore for traceability.

Please include a short reason (or issue id) so it’s clear why this override is skipped and when it can be re-enabled.

📝 Suggested tweak
-    `@Ignore`
+    `@Ignore`("Temporarily disabled for CIO: pipelined disconnect cancellation behavior differs from suite expectation")
     override fun testPipelinedRequestsCancelledOnDisconnect() {}
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In
`@ktor-server/ktor-server-cio/jvm/test/io/ktor/tests/server/cio/CIOEngineTestJvm.kt`
around lines 99 - 100, The `@Ignore` on the
testPipelinedRequestsCancelledOnDisconnect override lacks a reason; update the
annotation on the testPipelinedRequestsCancelledOnDisconnect method to include a
concise reason or issue id (e.g. "flaky on CI - issue `#1234`" or "blocked by bug
XYZ") so future readers can trace why it’s skipped and when to re-enable it.
ktor-server/ktor-server-netty/jvm/src/io/ktor/server/netty/http1/NettyHttp1Handler.kt (1)

89-103: Consider cancelling handlerJob when channel becomes inactive.

When the channel closes, handlerJob (which is the coroutineContext for responseWriter) is not cancelled. This could leave the response pipeline in an inconsistent state. Other exception paths (IOException at line 110, generic errors at line 123) cancel or complete handlerJob, but channelInactive does not.

🔧 Suggested fix
 override fun channelInactive(context: ChannelHandlerContext) {
     onConnectionClose(context)
+    handlerJob.cancel()
     context.fireChannelInactive()
 }
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In
`@ktor-server/ktor-server-netty/jvm/src/io/ktor/server/netty/http1/NettyHttp1Handler.kt`
around lines 89 - 103, When channelInactive is invoked, cancel the response
writer's coroutine job so the response pipeline doesn't hang: in
NettyHttp1Handler.channelInactive (before or after calling onConnectionClose),
check the handlerJob (the coroutineContext job used by responseWriter) and call
handlerJob.cancel(CancellationException("channel inactive")) if it's active (or
handlerJob.cancel() if you prefer). This ensures any pending coroutines tied to
responseWriter are cancelled; keep the existing context.fireChannelInactive()
call.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In
`@ktor-server/ktor-server-netty/jvm/src/io/ktor/server/netty/http1/NettyHttp1Handler.kt`:
- Around line 158-159: The try/catch in NettyHttp1Handler calls a non-existent
function handleFailure(call, error); implement a proper handler or replace the
call: either add a function handleFailure(call: NettyHttp1ApplicationCall,
error: Throwable) that performs consistent error handling (logging and sending a
500 response) or directly invoke the existing error response path on the call
(e.g., call.respond with an InternalServerError or call.respondError500()) and
ensure errors are logged; update NettyHttp1Handler to reference the implemented
function or the correct response method so compilation succeeds.

---

Nitpick comments:
In
`@ktor-server/ktor-server-cio/jvm/test/io/ktor/tests/server/cio/CIOEngineTestJvm.kt`:
- Around line 99-100: The `@Ignore` on the
testPipelinedRequestsCancelledOnDisconnect override lacks a reason; update the
annotation on the testPipelinedRequestsCancelledOnDisconnect method to include a
concise reason or issue id (e.g. "flaky on CI - issue `#1234`" or "blocked by bug
XYZ") so future readers can trace why it’s skipped and when to re-enable it.

In
`@ktor-server/ktor-server-netty/jvm/src/io/ktor/server/netty/http1/NettyHttp1Handler.kt`:
- Around line 89-103: When channelInactive is invoked, cancel the response
writer's coroutine job so the response pipeline doesn't hang: in
NettyHttp1Handler.channelInactive (before or after calling onConnectionClose),
check the handlerJob (the coroutineContext job used by responseWriter) and call
handlerJob.cancel(CancellationException("channel inactive")) if it's active (or
handlerJob.cancel() if you prefer). This ensures any pending coroutines tied to
responseWriter are cancelled; keep the existing context.fireChannelInactive()
call.

In
`@ktor-server/ktor-server-test-suites/jvm/src/io/ktor/server/testing/suites/HttpRequestLifecycleTest.kt`:
- Around line 167-211: The tests testHttpRequestLifecycleWithCallLogging and
testHttpRequestLifecycleCancelWithCallLogging duplicate server setup (installing
HttpRequestLifecycle with cancelCallOnClose = true and CallLogging with
mdc("something") { "something else" }) — extract that repeated wiring into a
small helper factory (e.g., createCallLoggingServer or
configureCallLoggingServer) and use it from both tests; move the common
install(...) and routing { ... } pieces into that helper and have the tests call
createServer or configureRoute wrapper so they only supply the differing bits
(like the specific routing body or get handler), keeping references to
createServer, startServer, HttpRequestLifecycle, and CallLogging consistent.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

Run ID: f60fc3f2-d2f9-421e-8eb7-d6ac8c438eec

📥 Commits

Reviewing files that changed from the base of the PR and between fef7757 and 599433e.

📒 Files selected for processing (3)
  • ktor-server/ktor-server-cio/jvm/test/io/ktor/tests/server/cio/CIOEngineTestJvm.kt
  • ktor-server/ktor-server-netty/jvm/src/io/ktor/server/netty/http1/NettyHttp1Handler.kt
  • ktor-server/ktor-server-test-suites/jvm/src/io/ktor/server/testing/suites/HttpRequestLifecycleTest.kt

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

🧹 Nitpick comments (1)
ktor-server/ktor-server-netty/jvm/src/io/ktor/server/netty/http1/NettyHttp1Handler.kt (1)

11-12: Align io.ktor.* import style with repository policy.

The explicit import of io.ktor.server.netty.NettyApplicationCallHandler.CallHandlerCoroutineName appears to conflict with the configured io.ktor.* star-import policy.

As per coding guidelines "Use star imports for io.ktor.* packages as configured in .editorconfig".

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In
`@ktor-server/ktor-server-netty/jvm/src/io/ktor/server/netty/http1/NettyHttp1Handler.kt`
around lines 11 - 12, Replace the explicit import of CallHandlerCoroutineName
with a star import for the io.ktor.server.netty package so it follows the repo
policy; specifically, remove the line importing
io.ktor.server.netty.NettyApplicationCallHandler.CallHandlerCoroutineName and
instead use io.ktor.server.netty.* (keeping the existing
io.ktor.server.netty.cio.* import) so the CallHandlerCoroutineName symbol is
resolved via the star import.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In
`@ktor-server/ktor-server-cio/jvm/test/io/ktor/tests/server/cio/CIOEngineTestJvm.kt`:
- Around line 99-100: The empty ignored override
testPipelinedRequestsCancelledOnDisconnect currently triggers
detekt.empty-blocks and lacks context; change the override to use an expression
body (override fun testPipelinedRequestsCancelledOnDisconnect() = Unit) and add
an `@Ignore` with a short reason string (e.g., `@Ignore`("Not applicable on CIO
engine") ) so the skip is documented and lint passes; update the annotation and
the function signature in CIOEngineTestJvm.kt accordingly.

In
`@ktor-server/ktor-server-netty/jvm/src/io/ktor/server/netty/http1/NettyHttp1Handler.kt`:
- Around line 152-168: callEventGroup.execute submission can throw (executor
shutdown) which would leave the call tracked but never cleaned up; wrap the
callEventGroup.execute(...) invocation in a try/catch for
RejectedExecutionException (and Throwable as fallback) and in the catch ensure
you run the same cleanup actions you rely on in the launched coroutine: remove
the call from activeCalls (activeCalls.remove(call)), complete the callJob
(callJob.complete()), and also send an appropriate error response or fail fast
(e.g. respond 503) when you cannot schedule enginePipeline.execute(call) or
responseWriter.processResponse(call), so no call remains stranded if execute
rejects.

---

Nitpick comments:
In
`@ktor-server/ktor-server-netty/jvm/src/io/ktor/server/netty/http1/NettyHttp1Handler.kt`:
- Around line 11-12: Replace the explicit import of CallHandlerCoroutineName
with a star import for the io.ktor.server.netty package so it follows the repo
policy; specifically, remove the line importing
io.ktor.server.netty.NettyApplicationCallHandler.CallHandlerCoroutineName and
instead use io.ktor.server.netty.* (keeping the existing
io.ktor.server.netty.cio.* import) so the CallHandlerCoroutineName symbol is
resolved via the star import.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

Run ID: 3b9c82c8-13b9-4ca1-9d96-77cd2b4dd617

📥 Commits

Reviewing files that changed from the base of the PR and between 599433e and b0052bb.

📒 Files selected for processing (3)
  • ktor-server/ktor-server-cio/jvm/test/io/ktor/tests/server/cio/CIOEngineTestJvm.kt
  • ktor-server/ktor-server-netty/jvm/src/io/ktor/server/netty/http1/NettyHttp1Handler.kt
  • ktor-server/ktor-server-test-suites/jvm/src/io/ktor/server/testing/suites/HttpRequestLifecycleTest.kt

@zibet27 zibet27 force-pushed the zibet27/http-request-lifecycle-breaks-call-logging branch from b0052bb to baa3ba9 Compare March 16, 2026 14:04
@zibet27 zibet27 changed the base branch from main to release/3.x March 16, 2026 14:06
@zibet27 zibet27 requested a review from e5l March 16, 2026 14:12
@zibet27
Copy link
Collaborator Author

zibet27 commented Mar 16, 2026

PS: call.coroutineContext has netty dispatcher which is single threaded, that's why we don't use callEventGroup not to change threads inside of call. https://github.com/ktorio/ktor/pull/5421/changes#diff-2fe7824f7116f099d8d4afcd45c523ae25c386423f14ff231c775a3775e5d4ebR150

}

@Ignore
override fun testPipelinedRequestsCancelledOnDisconnect() {}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe you could log a new issue to handle this behaviour in CIO if it falls outside the scope here.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Afaik, CIO doesn't support pipelining, does it?

val userAppContext = applicationProvider().coroutineContext + userContext
val callJob = Job(parent = userAppContext[Job])

val callContext = userAppContext + NettyDispatcher.CurrentContext(context) + callJob + CallHandlerCoroutineName
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The netty dispatcher ought to be present in userContext here already so I think NettyDispatcher.CurrentContext(context) can be removed.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

NettyDispatcher.CurrentContext is created only in handlers
Or my IDE can't find all usages (which happens)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh yeah, look at that... huh. Ok 👍

Comment on lines +150 to +166
context.executor().execute {
val callScope = CoroutineScope(context = callContext)
callScope.launch(start = CoroutineStart.UNDISPATCHED) {
try {
if (!call.request.isValid()) {
call.respondError400BadRequest()
return@launch
}
enginePipeline.execute(call)
} catch (error: Throwable) {
handleFailure(call, error)
} finally {
activeCalls.remove(call)
callJob.complete()
}
}
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since the user context is already on the netty dispatcher, couldn't we just have:

CoroutineScope(callContext).launch {
    try {
        if (!call.request.isValid()) {
            call.respondError400BadRequest()
            return@launch
        }
        enginePipeline.execute(call)
    } catch (error: Throwable) {
        handleFailure(call, error)
    } finally {
        activeCalls.remove(call)
        callJob.complete()
    }
}

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, we could do that
The idea was that context.executor().execute {} makes the call more granular;
For instance, some other event could be dispatched.
Not sure if it makes much sense here

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm I don't think you'll get other events for this request at this point, and using the executor with a new runnable is going to introduce a little overhead. Maybe you could play around with it to investigate if it provides material benefits. My main concern is introducing unneeded complexity and adding more deep references to the deeper Netty API.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm, I've tried to remove it, but some tests were failing
For example, early flushing of headers was broken in HTTP1 and got some EOF exceptions in HTTP2

So, I've added comments on why extra dispatching is needed

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok, good to know, thanks for checking 👍

Copy link
Contributor

@bjhham bjhham left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'll leave the context executor call up to you, though I'd advise adding a comment to explain what's going on there. Other than that, looks like a nice fix 👍

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants