Consistent Indices for Addressable Shards of Multi-Host Arrays? #19319

andyehrenberg · 2024-01-11T19:03:17Z

andyehrenberg
Jan 11, 2024

Suppose I have three arrays a, b and c, where b is an input to some compiled function f, and c = f(b). They all have the same leading dimension (consistent batch size).

f is a jitted function with an output sharding NamedSharding(mesh, PartitionSpec("data")) and a and b are also multi-host arrays sharded along "data".

For a given host, do the addressable_shards of a, b and c correspond to the same indices? Like if a is a batch of target sequences, and c is predictions, are we guaranteed that computing a word error rate between the detokenization of a and c's addressable shards will actually compare the correct targets and predictions? Or are we only guaranteed to have the correct order after all-gathering?

Code like https://github.com/apple/axlearn/blob/2c7e9f8fcd6fa2ee4e9596d4d18f870e4b653289/axlearn/common/utils.py#L585 seems to imply that the indices addressable by a host remain consistent, but that their order may still change.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Consistent Indices for Addressable Shards of Multi-Host Arrays? #19319

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{editor}}'s edit

{{editor}}'s edit

Uh oh!

Replies: 0 comments

Select a reply

Uh oh!

Consistent Indices for Addressable Shards of Multi-Host Arrays? #19319

Uh oh!

Uh oh!

andyehrenberg Jan 11, 2024

Replies: 0 comments

andyehrenberg
Jan 11, 2024