The implications of `check_vma` in the `shard_map` API #31449

AakashKumarNain · 2025-08-31T13:03:42Z

AakashKumarNain
Aug 31, 2025

I think the documentation of shard_map API in general is good enough, but I think more examples on the implication of check_vma can be useful. That aside, let me present a simple example, where it is not clear to me whether it is the right thing to disable check_vma without having any implications.

Say I have an MLP with two linear layers. Here is the config:

# num_devices = 8
devices = np.array(jax.devices())
mesh_axis_names = ("feats",)
mesh = Mesh(devices, axis_names=mesh_axis_names)

# input shape: [batch_size, features], labels shape: [batch_size,]
# params = MLP(
#     fc1=Linear(
#         in_features=784,
#         out_features=128,
#         weight=float32[784,128],
#         bias=float32[128],
#         use_bias=True,
#     ),
#     fc2=Linear(
#         in_features=128,
#         out_features=10,
#         weight=float32[128,10],
#         bias=None,
#         use_bias=False,
#     )
# )

# forward function
@partial(jax.shard_map, mesh=mesh,
         in_specs=(P(None, 'feats'), param_specs),
         out_specs=P(),
         check_vma=False
        )
def forward_tp(inputs, params):
    # inputs shape: (256, 98)
    out = jnp.dot(inputs, params.fc1.weight)

    # our first layer is sharded for TP
    out = jax.lax.psum_scatter(out, 'feats', scatter_dimension=1, tiled=True)
    if params.fc1.bias is not None:
        out = out + params.fc1.bias
        
    # second layer is replicated, so we need to gather all features    
    out = jax.lax.all_gather(out, 'feats', axis=1, tiled=True)
    out = jnp.dot(out, params.fc2.weight)
    return out

Without check_vma=False, it will throw an error. I am not sure if turning it off is advisable or if there is any other workaround.

Answered by yashk2810

Aug 31, 2025

Use jax.lax.all_gather_invariant if you want out_specs to be P()

If you want to use all_gather, then your out_specs should be P(None, feats)

View full answer

yashk2810 · 2025-08-31T14:42:09Z

yashk2810
Aug 31, 2025
Collaborator

What is the error? If I guess what it is, you will need to use jax.lax.all_gather_invariant for out_specs to be P().

But usually no, you shouldn't need to disable check_vma.

4 replies

AakashKumarNain Aug 31, 2025
Author

The error is this:

Check if these output values are meant to be replicated over those mesh axes. If not, consider revising the corresponding out_specs entries. If so, consider disabling the check by passing the check_vma=False argument to `jax.shard_map`.

yashk2810 Aug 31, 2025
Collaborator

Use jax.lax.all_gather_invariant if you want out_specs to be P()

If you want to use all_gather, then your out_specs should be P(None, feats)

Answer selected by AakashKumarNain

AakashKumarNain Aug 31, 2025
Author

Got it. Thanks. BTW any doc to read about the varying->varying and varying ->invarying in detail?

yashk2810 Aug 31, 2025
Collaborator

https://docs.jax.dev/en/latest/notebooks/shard_map.html#tracking-how-values-vary-over-manual-mesh-axes-and-check-vma-true

https://docs.jax.dev/en/latest/jep/17111-shmap-transpose.html

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

The implications of `check_vma` in the `shard_map` API #31449

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{editor}}'s edit

{{editor}}'s edit

Uh oh!

Replies: 1 comment 4 replies

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{editor}}'s edit

{{editor}}'s edit

Uh oh!

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{title}}

Uh oh!

Select a reply

Uh oh!

The implications of check_vma in the shard_map API #31449

Uh oh!

Uh oh!

AakashKumarNain Aug 31, 2025

Replies: 1 comment · 4 replies

Uh oh!

Uh oh!

yashk2810 Aug 31, 2025 Collaborator

Uh oh!

AakashKumarNain Aug 31, 2025 Author

Uh oh!

yashk2810 Aug 31, 2025 Collaborator

Uh oh!

AakashKumarNain Aug 31, 2025 Author

Uh oh!

yashk2810 Aug 31, 2025 Collaborator

The implications of `check_vma` in the `shard_map` API #31449

AakashKumarNain
Aug 31, 2025

Replies: 1 comment 4 replies

yashk2810
Aug 31, 2025
Collaborator

AakashKumarNain Aug 31, 2025
Author

yashk2810 Aug 31, 2025
Collaborator

AakashKumarNain Aug 31, 2025
Author

yashk2810 Aug 31, 2025
Collaborator