-
Notifications
You must be signed in to change notification settings - Fork 18.3k
Description
Go version
go version go1.24.6 linux/amd64 (go1.24 as contained in golang:1.24 docker image)
Output of go env
in your module/workspace:
AR='ar'
CC='gcc'
CGO_CFLAGS='-O2 -g'
CGO_CPPFLAGS=''
CGO_CXXFLAGS='-O2 -g'
CGO_ENABLED='1'
CGO_FFLAGS='-O2 -g'
CGO_LDFLAGS='-O2 -g'
CXX='g++'
GCCGO='gccgo'
GO111MODULE=''
GOAMD64='v1'
GOARCH='amd64'
GOAUTH='netrc'
GOBIN=''
GOCACHE='/root/.cache/go-build'
GOCACHEPROG=''
GODEBUG=''
GOENV='/root/.config/go/env'
GOEXE=''
GOEXPERIMENT=''
GOFIPS140='off'
GOFLAGS=''
GOGCCFLAGS='-fPIC -m64 -pthread -Wl,--no-gc-sections -fmessage-length=0 -ffile-prefix-map=/tmp/go-build1479637807=/tmp/go-build -gno-record-gcc-switches'
GOHOSTARCH='amd64'
GOHOSTOS='linux'
GOINSECURE=''
GOMOD='/dev/null'
GOMODCACHE='/go/pkg/mod'
GONOPROXY=''
GONOSUMDB=''
GOOS='linux'
GOPATH='/go'
GOPRIVATE=''
GOPROXY='https://proxy.golang.org,direct'
GOROOT='/usr/local/go'
GOSUMDB='sum.golang.org'
GOTELEMETRY='local'
GOTELEMETRYDIR='/root/.config/go/telemetry'
GOTMPDIR=''
GOTOOLCHAIN='local'
GOTOOLDIR='/usr/local/go/pkg/tool/linux_amd64'
GOVCS=''
GOVERSION='go1.24.6'
GOWORK=''
PKG_CONFIG='pkg-config'
What did you do?
Was using a sync.Map
to store *time.Location
values (caching time.LoadLocation
outputs). This was being called at the start of each HTTP request in a gin HTTP server, where the server was getting 10-50 requests concurrently.
Here is the function:
var locationCache sync.Map
// Loads information about the given location, using a thread-safe
// cache for speed and to reduce memory pressure.
func LoadTimeLocationCached(name string) (*time.Location, error) {
cachedValue, ok := locationCache.Load(name)
if ok {
switch v := cachedValue.(type) {
case *time.Location:
return v, nil
}
}
loc, err := time.LoadLocation(name)
if err != nil {
return loc, err
}
locationCache.Store(name, loc)
return loc, nil
}
The code was running in a long-running container in Google Cloud Run (request concurrency for the cloud run service was active with a high limit, so the container was getting concurrent requests). There is a fair amount of memory pressure in this container (GOGC=2000 and GOMEMLIMIT=2672MiB) -- the long running HTTP requests do a lot of allocation of temporary data as part of parsing/manipulating JSON.
What did you see happen?
One day in production (without any deployment or change in code), about 1 in every 5000 requests started to panic on the line of code locationCache.Store(name, loc)
-- restarting the container didn't help. The panic message was internal/sync.HashTrieMap: ran out of hash bits while inserting
. The input to LoadTimeLocationCached
came from an HTTP request header, so a new combination of (valid) time location names is likely what started to trigger the issue. Some of these location names can be pretty similar so maybe that is a hint.
The LoadTimeLocationCached
was being called with a likely low cardinality of inputs (only valid time.LoadLocation names would ever get to the locationCache.Store line).
Note that this is the entirety of the code that manipulated locationCache
-- nothing ever deleted values from it, but there shouldn't have been more than a few dozen values in it (or a few hundred at most).
What did you expect to see?
The program should run without panicing. Changing to a sync.Mutex
with a regular map[string]*time.Location
fixed the issue.
Unfortuneately, I wasn't able to reproduce the issue locally, even when trying in a variety of ways.
Thanks to @mknyszek for encouraging me to submit an issue and helpfully pointing out the related issue #73427.