Skip to content

Race condition involving trailer metadata #8514

@akehlenbeck

Description

@akehlenbeck

What version of gRPC are you using?

v1.72.0

What version of Go are you using (go version)?

go version go1.24.2 linux/amd64

What operating system (Linux, Windows, …) and version?

$ uname -a
Linux 6.8.0-1030-gcp #32~22.04.1-Ubuntu SMP Tue Apr 29 23:17:09 UTC 2025 x86_64 x86_64 x86_64 GNU/Linux

What did you do?

We observed the following panic in grpc code.

fatal error: concurrent map iteration and map write

goroutine 13860950 [running]:
internal/runtime/maps.fatal({0x28de412?, 0x45df57?})
        /usr/local/go/src/runtime/panic.go:1058 +0x18
internal/runtime/maps.(*Iter).Next(0x224e100?)
        /usr/local/go/src/internal/runtime/maps/table.go:683 +0x86
google.golang.org/grpc/metadata.MD.Copy(...)
        /go/src/github.com/lightstep/vendor/google.golang.org/grpc/metadata/metadata.go:101
google.golang.org/grpc/internal/transport.(*serverHandlerTransport).writeStatus(0xc00a544ee0, 0xc0274c3320, 0xc016ef6388)
        /go/src/github.com/lightstep/vendor/google.golang.org/grpc/internal/transport/handler_server.go:282 +0x3c5
google.golang.org/grpc/internal/transport.(*ServerStream).WriteStatus(...)
        /go/src/github.com/lightstep/vendor/google.golang.org/grpc/internal/transport/server_stream.go:76
google.golang.org/grpc.(*Server).processUnaryRPC(0xc000a63200, {0x2c782f0, 0xc02152a6f0}, 0xc0274c3320, 0xc001ac2cc0, 0x43e0550, 0x0)
        /go/src/github.com/lightstep/vendor/google.golang.org/grpc/server.go:1418 +0x1562
google.golang.org/grpc.(*Server).handleStream(0xc000a63200, {0x2c794a8, 0xc00a544ee0}, 0xc0274c3320)
        /go/src/github.com/lightstep/vendor/google.golang.org/grpc/server.go:1815 +0xb88
google.golang.org/grpc.(*Server).serveStreams.func2.1()
        /go/src/github.com/lightstep/vendor/google.golang.org/grpc/server.go:1035 +0x7f
created by google.golang.org/grpc.(*Server).serveStreams.func2 in goroutine 13860947
        /go/src/github.com/lightstep/vendor/google.golang.org/grpc/server.go:1046 +0x11d

I suspect the problem may be a race internal to grpc code, because this mutex [1] does not appear to be held at the callsite [2] where the panic occurred, despite a comment suggesting that it should be ("hdrMu protects outgoing header and trailer metadata").

(It's also possible that our application code is touching the trailer metadata in some way that races with the call stack above but I've been unable to find such a race, and I stopped looking hard after finding this potential issue with the unheld mutex.)

[1]

[2]

Trailer: s.trailer.Copy(),

Metadata

Metadata

Assignees

Labels

Area: TransportIncludes HTTP/2 client/server and HTTP server handler transports and advanced transport features.Type: Bug

Type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions