Skip to content

Conversation

arjan-bal
Copy link
Contributor

@arjan-bal arjan-bal commented Aug 18, 2025

Fixes: #8514

The mutex that guards the trailers should be held while copying the trailers. We do lock the mutex in the regular gRPC server transport, but have missed it in the std lib http/2 transport. The only place where a write happens is writeStatus() is when the status contains a proto.

if p := st.Proto(); p != nil && len(p.Details) > 0 {
delete(s.trailer, grpcStatusDetailsBinHeader)

RELEASE NOTES:

  • transport: Fix a data race while copying headers for stats handlers in the std lib http2 server transport.

@arjan-bal arjan-bal added this to the 1.75 Release milestone Aug 18, 2025
Copy link

codecov bot commented Aug 18, 2025

Codecov Report

✅ All modified and coverable lines are covered by tests.
✅ Project coverage is 82.00%. Comparing base (0ebea3e) to head (31a6f26).
⚠️ Report is 3 commits behind head on master.

Additional details and impacted files
@@            Coverage Diff             @@
##           master    #8519      +/-   ##
==========================================
+ Coverage   81.87%   82.00%   +0.12%     
==========================================
  Files         413      413              
  Lines       40518    40520       +2     
==========================================
+ Hits        33176    33229      +53     
+ Misses       5967     5908      -59     
- Partials     1375     1383       +8     
Files with missing lines Coverage Δ
internal/transport/handler_server.go 90.81% <100.00%> (+4.35%) ⬆️

... and 22 files with indirect coverage changes

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

@@ -277,11 +277,13 @@ func (ht *serverHandlerTransport) writeStatus(s *ServerStream, st *status.Status
if err == nil { // transport has not been closed
// Note: The trailer fields are compressed with hpack after this call returns.
// No WireLength field is set here.
s.hdrMu.Lock()
Copy link
Contributor Author

@arjan-bal arjan-bal Aug 18, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We're holding two locks here, this and ht.writeStatusMu (acquired at line 229). ht.writeStatusMu is only referenced in this method, so there shouldn't be a chance of deadlocks.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Note that hdrMu is also already taken on 249, although, I'm not sure if that closure is run in the current goroutine or another one.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The callback is executed in an event loop in a separate goroutine:

func (ht *serverHandlerTransport) runStream() {
for {
select {
case fn := <-ht.writes:
fn()
case <-ht.closedCh:
return
}
}
}

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I thought about calling the stats handler in the callback above, but I noticed that the http2_server transport also schedules the network write in the background and calls the stats handlers.

t.finishStream(s, rst, http2.ErrCodeNo, trailingHeader, true)
for _, sh := range t.stats {
// Note: The trailer fields are compressed with hpack after this call returns.
// No WireLength field is set here.
sh.HandleRPC(s.Context(), &stats.OutTrailer{
Trailer: s.trailer.Copy(),
})
}

@arjan-bal arjan-bal changed the title transport: acquire header mutex while copying trailers in handler_server transport: ensure header mutex is held while copying trailers in handler_server Aug 18, 2025
@@ -277,11 +277,13 @@ func (ht *serverHandlerTransport) writeStatus(s *ServerStream, st *status.Status
if err == nil { // transport has not been closed
// Note: The trailer fields are compressed with hpack after this call returns.
// No WireLength field is set here.
s.hdrMu.Lock()
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Note that hdrMu is also already taken on 249, although, I'm not sure if that closure is run in the current goroutine or another one.

Comment on lines +513 to +515
if err := s.SetTrailer(metadata.Pairs("custom-trailer", "Custom trailer value")); err != nil {
t.Error(err)
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How hard would it be to test this in a new test instead of in an existing test that's intended for testing error details?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Changed to use a new test. My thought process was that modifying existing tests would give us better coverage for interactions b/w different features.

Comment on lines 280 to 285
// Add mock stats handlers to exercise the stats handler code path.
statsHandlers := make([]stats.Handler, 0, 5)
for range 5 {
statsHandlers = append(statsHandlers, &mockStatsHandler{})
}
ht, err := NewServerHandlerTransport(rw, req, statsHandlers, mem.DefaultBufferPool())
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we parameterize this, or find some other way to cause this configuration, instead of doing it for all existing tests?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

@dfawley dfawley assigned arjan-bal and unassigned dfawley Aug 18, 2025
@arjan-bal arjan-bal assigned dfawley and unassigned arjan-bal Aug 19, 2025
@dfawley dfawley assigned arjan-bal and unassigned dfawley Aug 20, 2025
@arjan-bal arjan-bal merged commit 5ed7cf6 into grpc:master Aug 21, 2025
29 checks passed
@arjan-bal arjan-bal deleted the fix-trailer-race branch August 21, 2025 06:50
arjan-bal added a commit to arjan-bal/grpc-go that referenced this pull request Aug 21, 2025
…ler_server (grpc#8519)

Fixes: grpc#8514


The mutex that guards the trailers should be held while copying the
trailers. We do lock the mutex in [the regular gRPC server
transport](https://github.com/grpc/grpc-go/blob/9ac0ec87ca2ecc66b3c0c084708aef768637aef6/internal/transport/http2_server.go#L1140-L1142),
but have missed it in the std lib http/2 transport. The only place where
a write happens is `writeStatus()` is when the status contains a proto.


https://github.com/grpc/grpc-go/blob/4375c784450aa7e43ff15b8b2879c896d0917130/internal/transport/handler_server.go#L251-L252

RELEASE NOTES:
* transport: Fix a data race while copying headers for stats handlers in
the std lib http2 server transport.
arjan-bal added a commit that referenced this pull request Aug 21, 2025
Original PR: #8519
Related issue: #8514

RELEASE NOTES:
* transport: Fix a data race while copying headers for stats handlers in
the std lib http2 server transport.
dimpavloff pushed a commit to dimpavloff/grpc-go that referenced this pull request Aug 22, 2025
…ler_server (grpc#8519)

Fixes: grpc#8514


The mutex that guards the trailers should be held while copying the
trailers. We do lock the mutex in [the regular gRPC server
transport](https://github.com/grpc/grpc-go/blob/9ac0ec87ca2ecc66b3c0c084708aef768637aef6/internal/transport/http2_server.go#L1140-L1142),
but have missed it in the std lib http/2 transport. The only place where
a write happens is `writeStatus()` is when the status contains a proto.


https://github.com/grpc/grpc-go/blob/4375c784450aa7e43ff15b8b2879c896d0917130/internal/transport/handler_server.go#L251-L252

RELEASE NOTES:
* transport: Fix a data race while copying headers for stats handlers in
the std lib http2 server transport.
eshitachandwani pushed a commit to eshitachandwani/grpc-go that referenced this pull request Aug 29, 2025
…ler_server (grpc#8519)

Fixes: grpc#8514


The mutex that guards the trailers should be held while copying the
trailers. We do lock the mutex in [the regular gRPC server
transport](https://github.com/grpc/grpc-go/blob/9ac0ec87ca2ecc66b3c0c084708aef768637aef6/internal/transport/http2_server.go#L1140-L1142),
but have missed it in the std lib http/2 transport. The only place where
a write happens is `writeStatus()` is when the status contains a proto.


https://github.com/grpc/grpc-go/blob/4375c784450aa7e43ff15b8b2879c896d0917130/internal/transport/handler_server.go#L251-L252

RELEASE NOTES:
* transport: Fix a data race while copying headers for stats handlers in
the std lib http2 server transport.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Race condition involving trailer metadata
2 participants