Skip to content

lock allocstate before adding task event#31

Merged
tgross merged 1 commit intomainfrom
b-panic-in-runner
Dec 2, 2025
Merged

lock allocstate before adding task event#31
tgross merged 1 commit intomainfrom
b-panic-in-runner

Conversation

@tgross
Copy link
Collaborator

@tgross tgross commented Dec 1, 2025

We don't lock the alloc state before adding a task event. This can result in a panic when we emit allocation metrics if it happens concurrently, because the length of the slice isn't atomically updated with the contents so we end up indexing an event that's been added before the slice is valid.


Example panic:

panic: runtime error: index out of range [5] with length 5

goroutine 1530 [running]:
github.com/hashicorp/nomad/nomad/structs.(*TaskState).Copy(...)
        /home/ubuntu/go/pkg/mod/github.com/hashicorp/nomad@v1.11.0/nomad/structs/structs.go:9116
github.com/hashicorp/nomad/client/allocrunner/state.(*State).Copy(0xc06f6ebf40)
        /home/ubuntu/go/pkg/mod/github.com/hashicorp/nomad@v1.11.0/client/allocrunner/state/state.go:60 +0x42d
github.com/hashicorp-forge/nomad-nodesim/allocrunnersim.(*simulatedAllocRunner).AllocState(0xc000f30008?)
        /home/ubuntu/src/nomad-nodesim/allocrunnersim/allocrunnersim.go:418 +0x7c
github.com/hashicorp/nomad/client.(*Client).emitClientMetrics(0xc000f30008)
        /home/ubuntu/go/pkg/mod/github.com/hashicorp/nomad@v1.11.0/client/client.go:3396 +0x18c
github.com/hashicorp/nomad/client.(*Client).emitStats(0xc000f30008)
        /home/ubuntu/go/pkg/mod/github.com/hashicorp/nomad@v1.11.0/client/client.go:3259 +0x78
github.com/hashicorp/nomad/helper/group.(*Group).Go.func1()
        /home/ubuntu/go/pkg/mod/github.com/hashicorp/nomad@v1.11.0/helper/group/group.go:22 +0x4c
created by github.com/hashicorp/nomad/helper/group.(*Group).Go in goroutine 1
        /home/ubuntu/go/pkg/mod/github.com/hashicorp/nomad@v1.11.0/helper/group/group.go:20 +0x73

We don't lock the alloc state before adding a task event. This can result in a
panic when we emit allocation metrics if it happens concurrently, because the
length of the slice isn't atomically updated with the contents so we end up
indexing an event that's been added before the slice is valid.
Copy link
Collaborator

@jrasell jrasell left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@tgross tgross merged commit 867e796 into main Dec 2, 2025
2 checks passed
@tgross tgross deleted the b-panic-in-runner branch December 2, 2025 13:53
tgross added a commit that referenced this pull request Dec 3, 2025
In #31 we fixed a panic when adding a task event, but the fix was incomplete and
just panicked again later under heavy concurrent use. This PR fixes the
follow-up panic but also fixes a data race that popped up when testing the fix
with a build that used `-race`.
tgross added a commit that referenced this pull request Dec 3, 2025
In #31 we fixed a panic when adding a task event, but the fix was incomplete and
just panicked again later under heavy concurrent use. This PR fixes the
follow-up panic but also fixes a data race that popped up when testing the fix
with a build that used `-race`.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants