support non-sharded llm inference (#1007)

JKSenthil · facebook-github-bot · commit f7cf5b7afec8 · 2025-06-03T18:21:03.000-07:00
Summary: Pull Request resolved: #1007 # Context Small llms (1b) don't need to be sharded for inference. Currently we assume sharded is needed in our recipes # This Diff 1) enables materialization of params at TNT level when moving model from meta device to cuda 2) makes sharding / global mesh coordinator optional 3) Adds llama3pt2_1b_inference.yaml which loads full model per rank Reviewed By: rshakoor Differential Revision: D75823535 fbshipit-source-id: 9d8bd09ca50d032d9144eb2c8ea2f486dfc50dfe
diff --git a/torchtnt/utils/prepare_module.py b/torchtnt/utils/prepare_module.py
@@ -728,6 +728,9 @@ def _prepare_module_1d(
         elif isinstance(strategy, FSDP2Strategy):
             module = prepare_fsdp2(module, device, strategy, global_mesh=global_mesh)
     else:
+        # materialize any meta device params
+        materialize_meta_params(module=module, device=device)
+        # then move entire module to device
         module = module.to(device)
 
     if activation_checkpoint_params: