Skip to content

Commit 1556419

Browse files
authored
Add a random UUID to worker identities, to prevent collisions (#1135)
The fallback worker identity of {pid}@{host}@{tasklist} is insufficient in practice. E.g. within Uber, our docker setup can somewhat regularly lead to multiple instances of a worker ending up on the same physical host (which is not changed by docker) and with the same PID (as all the docker containers start up the same way -> they all get the same PIDs). Other causes are possible too, e.g. if a worker crashes and is restarted it could share the same host+pid, even though its caches (and anything else we actually care about for the identity) are lost. At the least-problematic-but-confusing level, this can lead to a misleadingly-short list of workers on tasklists in the web UI, as duplicates are collapsed. At the most-problematic level, this can lead to uneven load and sub-par request routing, as only the first/last/??? request to poll the server will receive data intended for a specific worker. Ideally that wouldn't be an issue, e.g. the server could keep better track of polls and route more precisely, but 1) it's difficult to reliably identify polls since the server only has the poller-provided data (which mis-identifies itself), and 2) even if it can be fixed on the server, improving identity uniqueness helps both fix this now and reduces or eliminates the impact of any future identity confusions. --- Alternatively, we could add a process-global random UUID that auto-inits, which would give us IDs that can be correlated across tasklists. While I think this is acceptable as well, I'm loathe to add another global variable, and the benefit of cross-task identity correlating is pretty minor or nonexistent. If someone actually cares about that, they can pass an explicit ID that they generate however they like.
1 parent 9fdc4bc commit 1556419

File tree

1 file changed

+6
-1
lines changed

1 file changed

+6
-1
lines changed

internal/internal_utils.go

Lines changed: 6 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -32,6 +32,7 @@ import (
3232
"syscall"
3333
"time"
3434

35+
"github.com/pborman/uuid"
3536
"github.com/uber-go/tally"
3637
s "go.uber.org/cadence/.gen/go/shared"
3738
"go.uber.org/cadence/internal/common"
@@ -196,8 +197,12 @@ func newChannelContextHelper(
196197
}
197198

198199
// GetWorkerIdentity gets a default identity for the worker.
200+
//
201+
// This contains a random UUID, generated each time it is called, to prevent identity collisions when workers share
202+
// other host/pid/etc information. These alone are not guaranteed to be unique, especially when Docker is involved.
203+
// Take care to retrieve this only once per worker.
199204
func getWorkerIdentity(tasklistName string) string {
200-
return fmt.Sprintf("%d@%s@%s", os.Getpid(), getHostName(), tasklistName)
205+
return fmt.Sprintf("%d@%s@%s@%s", os.Getpid(), getHostName(), tasklistName, uuid.New())
201206
}
202207

203208
func getHostName() string {

0 commit comments

Comments
 (0)