`device_map` in `load_model_dict_into_meta` #10851

hlky · 2025-02-21T06:14:36Z

What does this PR do?

Who can review?

Anyone in the community is free to review the PR once the tests have passed. Feel free to tag
members/contributors who may be interested in your PR.

HuggingFaceDocBuilderDev · 2025-02-21T06:20:54Z

The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. The docs are available until 30 days after the last update.

sayakpaul

How can we test this better?

hlky · 2025-02-21T06:34:18Z

@sayakpaul It was picked up in nightly tests, not sure why it wasn't picked up in the model refactor PR's fast tests, there are fast tests for IPAdapter afaik.

sayakpaul · 2025-02-21T06:42:09Z

not sure why it wasn't picked up in the model refactor PR's fast tests, there are fast tests for IPAdapter afaik.

Because these are not fast tests:

diffusers/tests/pipelines/ip_adapters/test_ip_adapter_stable_diffusion.py

Line 393 in 6cef7d2

@slow

They won't be triggered even after #10310 as they are all fast tests run on GPU.

Additionally, in the IP Adapter fast tests, do we have the codepath that touches the line under consideration i.e., load_model_dict_into_meta()? Would be nice to investigate that.

hlky · 2025-02-21T06:55:59Z

There are also mixins for the fast tests

diffusers/tests/pipelines/test_pipelines_common.py

Lines 245 to 491 in 6cef7d2

    
           class IPAdapterTesterMixin: 
        
               """ 
        
               This mixin is designed to be used with PipelineTesterMixin and unittest.TestCase classes. 
        
               It provides a set of common tests for pipelines that support IP Adapters. 
        
               """ 
        
               def test_pipeline_signature(self): 
        
                   parameters = inspect.signature(self.pipeline_class.__call__).parameters 
        
                   assert issubclass(self.pipeline_class, IPAdapterMixin) 
        
                   self.assertIn( 
        
                       "ip_adapter_image", 
        
                       parameters, 
        
                       "`ip_adapter_image` argument must be supported by the `__call__` method", 
        
                   ) 
        
                   self.assertIn( 
        
                       "ip_adapter_image_embeds", 
        
                       parameters, 
        
                       "`ip_adapter_image_embeds` argument must be supported by the `__call__` method", 
        
                   ) 
        
               def _get_dummy_image_embeds(self, cross_attention_dim: int = 32): 
        
                   return torch.randn((2, 1, cross_attention_dim), device=torch_device) 
        
               def _get_dummy_faceid_image_embeds(self, cross_attention_dim: int = 32): 
        
                   return torch.randn((2, 1, 1, cross_attention_dim), device=torch_device) 
        
               def _get_dummy_masks(self, input_size: int = 64): 
        
                   _masks = torch.zeros((1, 1, input_size, input_size), device=torch_device) 
        
                   _masks[0, :, :, : int(input_size / 2)] = 1 
        
                   return _masks 
        
               def _modify_inputs_for_ip_adapter_test(self, inputs: Dict[str, Any]): 
        
                   parameters = inspect.signature(self.pipeline_class.__call__).parameters 
        
                   if "image" in parameters.keys() and "strength" in parameters.keys(): 
        
                       inputs["num_inference_steps"] = 4 
        
                   inputs["output_type"] = "np" 
        
                   inputs["return_dict"] = False 
        
                   return inputs 
        
               def test_ip_adapter(self, expected_max_diff: float = 1e-4, expected_pipe_slice=None): 
        
                   r"""Tests for IP-Adapter. 
        
                   The following scenarios are tested: 
        
                     - Single IP-Adapter with scale=0 should produce same output as no IP-Adapter. 
        
                     - Multi IP-Adapter with scale=0 should produce same output as no IP-Adapter. 
        
                     - Single IP-Adapter with scale!=0 should produce different output compared to no IP-Adapter. 
        
                     - Multi IP-Adapter with scale!=0 should produce different output compared to no IP-Adapter. 
        
                   """ 
        
                   # Raising the tolerance for this test when it's run on a CPU because we 
        
                   # compare against static slices and that can be shaky (with a VVVV low probability). 
        
                   expected_max_diff = 9e-4 if torch_device == "cpu" else expected_max_diff 
        
                   components = self.get_dummy_components() 
        
                   pipe = self.pipeline_class(**components).to(torch_device) 
        
                   pipe.set_progress_bar_config(disable=None) 
        
                   cross_attention_dim = pipe.unet.config.get("cross_attention_dim", 32) 
        
                   # forward pass without ip adapter 
        
                   inputs = self._modify_inputs_for_ip_adapter_test(self.get_dummy_inputs(torch_device)) 
        
                   if expected_pipe_slice is None: 
        
                       output_without_adapter = pipe(**inputs)[0] 
        
                   else: 
        
                       output_without_adapter = expected_pipe_slice 
        
                   # 1. Single IP-Adapter test cases 
        
                   adapter_state_dict = create_ip_adapter_state_dict(pipe.unet) 
        
                   pipe.unet._load_ip_adapter_weights(adapter_state_dict) 
        
                   # forward pass with single ip adapter, but scale=0 which should have no effect 
        
                   inputs = self._modify_inputs_for_ip_adapter_test(self.get_dummy_inputs(torch_device)) 
        
                   inputs["ip_adapter_image_embeds"] = [self._get_dummy_image_embeds(cross_attention_dim)] 
        
                   pipe.set_ip_adapter_scale(0.0) 
        
                   output_without_adapter_scale = pipe(**inputs)[0] 
        
                   if expected_pipe_slice is not None: 
        
                       output_without_adapter_scale = output_without_adapter_scale[0, -3:, -3:, -1].flatten() 
        
                   # forward pass with single ip adapter, but with scale of adapter weights 
        
                   inputs = self._modify_inputs_for_ip_adapter_test(self.get_dummy_inputs(torch_device)) 
        
                   inputs["ip_adapter_image_embeds"] = [self._get_dummy_image_embeds(cross_attention_dim)] 
        
                   pipe.set_ip_adapter_scale(42.0) 
        
                   output_with_adapter_scale = pipe(**inputs)[0] 
        
                   if expected_pipe_slice is not None: 
        
                       output_with_adapter_scale = output_with_adapter_scale[0, -3:, -3:, -1].flatten() 
        
                   max_diff_without_adapter_scale = np.abs(output_without_adapter_scale - output_without_adapter).max() 
        
                   max_diff_with_adapter_scale = np.abs(output_with_adapter_scale - output_without_adapter).max() 
        
                   self.assertLess( 
        
                       max_diff_without_adapter_scale, 
        
                       expected_max_diff, 
        
                       "Output without ip-adapter must be same as normal inference", 
        
                   ) 
        
                   self.assertGreater( 
        
                       max_diff_with_adapter_scale, 1e-2, "Output with ip-adapter must be different from normal inference" 
        
                   ) 
        
                   # 2. Multi IP-Adapter test cases 
        
                   adapter_state_dict_1 = create_ip_adapter_state_dict(pipe.unet) 
        
                   adapter_state_dict_2 = create_ip_adapter_state_dict(pipe.unet) 
        
                   pipe.unet._load_ip_adapter_weights([adapter_state_dict_1, adapter_state_dict_2]) 
        
                   # forward pass with multi ip adapter, but scale=0 which should have no effect 
        
                   inputs = self._modify_inputs_for_ip_adapter_test(self.get_dummy_inputs(torch_device)) 
        
                   inputs["ip_adapter_image_embeds"] = [self._get_dummy_image_embeds(cross_attention_dim)] * 2 
        
                   pipe.set_ip_adapter_scale([0.0, 0.0]) 
        
                   output_without_multi_adapter_scale = pipe(**inputs)[0] 
        
                   if expected_pipe_slice is not None: 
        
                       output_without_multi_adapter_scale = output_without_multi_adapter_scale[0, -3:, -3:, -1].flatten() 
        
                   # forward pass with multi ip adapter, but with scale of adapter weights 
        
                   inputs = self._modify_inputs_for_ip_adapter_test(self.get_dummy_inputs(torch_device)) 
        
                   inputs["ip_adapter_image_embeds"] = [self._get_dummy_image_embeds(cross_attention_dim)] * 2 
        
                   pipe.set_ip_adapter_scale([42.0, 42.0]) 
        
                   output_with_multi_adapter_scale = pipe(**inputs)[0] 
        
                   if expected_pipe_slice is not None: 
        
                       output_with_multi_adapter_scale = output_with_multi_adapter_scale[0, -3:, -3:, -1].flatten() 
        
                   max_diff_without_multi_adapter_scale = np.abs( 
        
                       output_without_multi_adapter_scale - output_without_adapter 
        
                   ).max() 
        
                   max_diff_with_multi_adapter_scale = np.abs(output_with_multi_adapter_scale - output_without_adapter).max() 
        
                   self.assertLess( 
        
                       max_diff_without_multi_adapter_scale, 
        
                       expected_max_diff, 
        
                       "Output without multi-ip-adapter must be same as normal inference", 
        
                   ) 
        
                   self.assertGreater( 
        
                       max_diff_with_multi_adapter_scale, 
        
                       1e-2, 
        
                       "Output with multi-ip-adapter scale must be different from normal inference", 
        
                   ) 
        
               def test_ip_adapter_cfg(self, expected_max_diff: float = 1e-4): 
        
                   parameters = inspect.signature(self.pipeline_class.__call__).parameters 
        
                   if "guidance_scale" not in parameters: 
        
                       return 
        
                   components = self.get_dummy_components() 
        
                   pipe = self.pipeline_class(**components).to(torch_device) 
        
                   pipe.set_progress_bar_config(disable=None) 
        
                   cross_attention_dim = pipe.unet.config.get("cross_attention_dim", 32) 
        
                   adapter_state_dict = create_ip_adapter_state_dict(pipe.unet) 
        
                   pipe.unet._load_ip_adapter_weights(adapter_state_dict) 
        
                   pipe.set_ip_adapter_scale(1.0) 
        
                   # forward pass with CFG not applied 
        
                   inputs = self._modify_inputs_for_ip_adapter_test(self.get_dummy_inputs(torch_device)) 
        
                   inputs["ip_adapter_image_embeds"] = [self._get_dummy_image_embeds(cross_attention_dim)[0].unsqueeze(0)] 
        
                   inputs["guidance_scale"] = 1.0 
        
                   out_no_cfg = pipe(**inputs)[0] 
        
                   # forward pass with CFG applied 
        
                   inputs = self._modify_inputs_for_ip_adapter_test(self.get_dummy_inputs(torch_device)) 
        
                   inputs["ip_adapter_image_embeds"] = [self._get_dummy_image_embeds(cross_attention_dim)] 
        
                   inputs["guidance_scale"] = 7.5 
        
                   out_cfg = pipe(**inputs)[0] 
        
                   assert out_cfg.shape == out_no_cfg.shape 
        
               def test_ip_adapter_masks(self, expected_max_diff: float = 1e-4): 
        
                   components = self.get_dummy_components() 
        
                   pipe = self.pipeline_class(**components).to(torch_device) 
        
                   pipe.set_progress_bar_config(disable=None) 
        
                   cross_attention_dim = pipe.unet.config.get("cross_attention_dim", 32) 
        
                   sample_size = pipe.unet.config.get("sample_size", 32) 
        
                   block_out_channels = pipe.vae.config.get("block_out_channels", [128, 256, 512, 512]) 
        
                   input_size = sample_size * (2 ** (len(block_out_channels) - 1)) 
        
                   # forward pass without ip adapter 
        
                   inputs = self._modify_inputs_for_ip_adapter_test(self.get_dummy_inputs(torch_device)) 
        
                   output_without_adapter = pipe(**inputs)[0] 
        
                   output_without_adapter = output_without_adapter[0, -3:, -3:, -1].flatten() 
        
                   adapter_state_dict = create_ip_adapter_state_dict(pipe.unet) 
        
                   pipe.unet._load_ip_adapter_weights(adapter_state_dict) 
        
                   # forward pass with single ip adapter and masks, but scale=0 which should have no effect 
        
                   inputs = self._modify_inputs_for_ip_adapter_test(self.get_dummy_inputs(torch_device)) 
        
                   inputs["ip_adapter_image_embeds"] = [self._get_dummy_image_embeds(cross_attention_dim)] 
        
                   inputs["cross_attention_kwargs"] = {"ip_adapter_masks": [self._get_dummy_masks(input_size)]} 
        
                   pipe.set_ip_adapter_scale(0.0) 
        
                   output_without_adapter_scale = pipe(**inputs)[0] 
        
                   output_without_adapter_scale = output_without_adapter_scale[0, -3:, -3:, -1].flatten() 
        
                   # forward pass with single ip adapter and masks, but with scale of adapter weights 
        
                   inputs = self._modify_inputs_for_ip_adapter_test(self.get_dummy_inputs(torch_device)) 
        
                   inputs["ip_adapter_image_embeds"] = [self._get_dummy_image_embeds(cross_attention_dim)] 
        
                   inputs["cross_attention_kwargs"] = {"ip_adapter_masks": [self._get_dummy_masks(input_size)]} 
        
                   pipe.set_ip_adapter_scale(42.0) 
        
                   output_with_adapter_scale = pipe(**inputs)[0] 
        
                   output_with_adapter_scale = output_with_adapter_scale[0, -3:, -3:, -1].flatten() 
        
                   max_diff_without_adapter_scale = np.abs(output_without_adapter_scale - output_without_adapter).max() 
        
                   max_diff_with_adapter_scale = np.abs(output_with_adapter_scale - output_without_adapter).max() 
        
                   self.assertLess( 
        
                       max_diff_without_adapter_scale, 
        
                       expected_max_diff, 
        
                       "Output without ip-adapter must be same as normal inference", 
        
                   ) 
        
                   self.assertGreater( 
        
                       max_diff_with_adapter_scale, 1e-3, "Output with ip-adapter must be different from normal inference" 
        
                   ) 
        
               def test_ip_adapter_faceid(self, expected_max_diff: float = 1e-4): 
        
                   components = self.get_dummy_components() 
        
                   pipe = self.pipeline_class(**components).to(torch_device) 
        
                   pipe.set_progress_bar_config(disable=None) 
        
                   cross_attention_dim = pipe.unet.config.get("cross_attention_dim", 32) 
        
                   # forward pass without ip adapter 
        
                   inputs = self._modify_inputs_for_ip_adapter_test(self.get_dummy_inputs(torch_device)) 
        
                   output_without_adapter = pipe(**inputs)[0] 
        
                   output_without_adapter = output_without_adapter[0, -3:, -3:, -1].flatten() 
        
                   adapter_state_dict = create_ip_adapter_faceid_state_dict(pipe.unet) 
        
                   pipe.unet._load_ip_adapter_weights(adapter_state_dict) 
        
                   # forward pass with single ip adapter, but scale=0 which should have no effect 
        
                   inputs = self._modify_inputs_for_ip_adapter_test(self.get_dummy_inputs(torch_device)) 
        
                   inputs["ip_adapter_image_embeds"] = [self._get_dummy_faceid_image_embeds(cross_attention_dim)] 
        
                   pipe.set_ip_adapter_scale(0.0) 
        
                   output_without_adapter_scale = pipe(**inputs)[0] 
        
                   output_without_adapter_scale = output_without_adapter_scale[0, -3:, -3:, -1].flatten() 
        
                   # forward pass with single ip adapter, but with scale of adapter weights 
        
                   inputs = self._modify_inputs_for_ip_adapter_test(self.get_dummy_inputs(torch_device)) 
        
                   inputs["ip_adapter_image_embeds"] = [self._get_dummy_faceid_image_embeds(cross_attention_dim)] 
        
                   pipe.set_ip_adapter_scale(42.0) 
        
                   output_with_adapter_scale = pipe(**inputs)[0] 
        
                   output_with_adapter_scale = output_with_adapter_scale[0, -3:, -3:, -1].flatten() 
        
                   max_diff_without_adapter_scale = np.abs(output_without_adapter_scale - output_without_adapter).max() 
        
                   max_diff_with_adapter_scale = np.abs(output_with_adapter_scale - output_without_adapter).max() 
        
                   self.assertLess( 
        
                       max_diff_without_adapter_scale, 
        
                       expected_max_diff, 
        
                       "Output without ip-adapter must be same as normal inference", 
        
                   ) 
        
                   self.assertGreater( 
        
                       max_diff_with_adapter_scale, 1e-3, "Output with ip-adapter must be different from normal inference" 
        
                   )

https://github.com/search?q=repo%3Ahuggingface%2Fdiffusers%20IPAdapterTesterMixin&type=code

diffusers/tests/pipelines/test_pipelines_common.py

Lines 494 to 579 in 6cef7d2

    
           class FluxIPAdapterTesterMixin: 
        
               """ 
        
               This mixin is designed to be used with PipelineTesterMixin and unittest.TestCase classes. 
        
               It provides a set of common tests for pipelines that support IP Adapters. 
        
               """ 
        
               def test_pipeline_signature(self): 
        
                   parameters = inspect.signature(self.pipeline_class.__call__).parameters 
        
                   assert issubclass(self.pipeline_class, FluxIPAdapterMixin) 
        
                   self.assertIn( 
        
                       "ip_adapter_image", 
        
                       parameters, 
        
                       "`ip_adapter_image` argument must be supported by the `__call__` method", 
        
                   ) 
        
                   self.assertIn( 
        
                       "ip_adapter_image_embeds", 
        
                       parameters, 
        
                       "`ip_adapter_image_embeds` argument must be supported by the `__call__` method", 
        
                   ) 
        
               def _get_dummy_image_embeds(self, image_embed_dim: int = 768): 
        
                   return torch.randn((1, 1, image_embed_dim), device=torch_device) 
        
               def _modify_inputs_for_ip_adapter_test(self, inputs: Dict[str, Any]): 
        
                   inputs["negative_prompt"] = "" 
        
                   inputs["true_cfg_scale"] = 4.0 
        
                   inputs["output_type"] = "np" 
        
                   inputs["return_dict"] = False 
        
                   return inputs 
        
               def test_ip_adapter(self, expected_max_diff: float = 1e-4, expected_pipe_slice=None): 
        
                   r"""Tests for IP-Adapter. 
        
                   The following scenarios are tested: 
        
                     - Single IP-Adapter with scale=0 should produce same output as no IP-Adapter. 
        
                     - Single IP-Adapter with scale!=0 should produce different output compared to no IP-Adapter. 
        
                   """ 
        
                   # Raising the tolerance for this test when it's run on a CPU because we 
        
                   # compare against static slices and that can be shaky (with a VVVV low probability). 
        
                   expected_max_diff = 9e-4 if torch_device == "cpu" else expected_max_diff 
        
                   components = self.get_dummy_components() 
        
                   pipe = self.pipeline_class(**components).to(torch_device) 
        
                   pipe.set_progress_bar_config(disable=None) 
        
                   image_embed_dim = pipe.transformer.config.pooled_projection_dim 
        
                   # forward pass without ip adapter 
        
                   inputs = self._modify_inputs_for_ip_adapter_test(self.get_dummy_inputs(torch_device)) 
        
                   if expected_pipe_slice is None: 
        
                       output_without_adapter = pipe(**inputs)[0] 
        
                   else: 
        
                       output_without_adapter = expected_pipe_slice 
        
                   adapter_state_dict = create_flux_ip_adapter_state_dict(pipe.transformer) 
        
                   pipe.transformer._load_ip_adapter_weights(adapter_state_dict) 
        
                   # forward pass with single ip adapter, but scale=0 which should have no effect 
        
                   inputs = self._modify_inputs_for_ip_adapter_test(self.get_dummy_inputs(torch_device)) 
        
                   inputs["ip_adapter_image_embeds"] = [self._get_dummy_image_embeds(image_embed_dim)] 
        
                   inputs["negative_ip_adapter_image_embeds"] = [self._get_dummy_image_embeds(image_embed_dim)] 
        
                   pipe.set_ip_adapter_scale(0.0) 
        
                   output_without_adapter_scale = pipe(**inputs)[0] 
        
                   if expected_pipe_slice is not None: 
        
                       output_without_adapter_scale = output_without_adapter_scale[0, -3:, -3:, -1].flatten() 
        
                   # forward pass with single ip adapter, but with scale of adapter weights 
        
                   inputs = self._modify_inputs_for_ip_adapter_test(self.get_dummy_inputs(torch_device)) 
        
                   inputs["ip_adapter_image_embeds"] = [self._get_dummy_image_embeds(image_embed_dim)] 
        
                   inputs["negative_ip_adapter_image_embeds"] = [self._get_dummy_image_embeds(image_embed_dim)] 
        
                   pipe.set_ip_adapter_scale(42.0) 
        
                   output_with_adapter_scale = pipe(**inputs)[0] 
        
                   if expected_pipe_slice is not None: 
        
                       output_with_adapter_scale = output_with_adapter_scale[0, -3:, -3:, -1].flatten() 
        
                   max_diff_without_adapter_scale = np.abs(output_without_adapter_scale - output_without_adapter).max() 
        
                   max_diff_with_adapter_scale = np.abs(output_with_adapter_scale - output_without_adapter).max() 
        
                   self.assertLess( 
        
                       max_diff_without_adapter_scale, 
        
                       expected_max_diff, 
        
                       "Output without ip-adapter must be same as normal inference", 
        
                   ) 
        
                   self.assertGreater( 
        
                       max_diff_with_adapter_scale, 1e-2, "Output with ip-adapter must be different from normal inference" 
        
                   )

diffusers/tests/pipelines/flux/test_pipeline_flux.py

Lines 28 to 30 in 6cef7d2

    
           class FluxPipelineFastTests( 
        
               unittest.TestCase, PipelineTesterMixin, FluxIPAdapterTesterMixin, PyramidAttentionBroadcastTesterMixin 
        
           ):

sayakpaul · 2025-02-21T07:20:10Z

Maybe they are not invoking load_model_dict_into_meta() in the first place? It's only supposed be invoked when we call from_pretrained(), no?

hlky · 2025-02-21T07:41:00Z

It was not hitting. Tests are using _load_ip_adapter_weights which had default low_cpu_mem_usage=False, normally the correct (depending on torch version) low_cpu_mem_usage is passed by load_ip_adapter. 040d657 sets the default for private methods.

hlky · 2025-02-21T07:49:32Z

issue in 7d96d88 was previously uncaught as and is_peft_version("<=", "0.13.0") did not evaluate due to low_cpu_mem_usage default False.

diffusers/src/diffusers/loaders/unet.py

Line 149 in 6cef7d2

if low_cpu_mem_usage and is_peft_version("<=", "0.13.0"):

Similar issue in is_bitsandbytes_version (_bitsandbytes_version is unbound unless _bitsandbytes_available is True).

sayakpaul · 2025-02-21T07:56:20Z

Thanks for fixing those!

guiyrt · 2025-02-21T11:48:13Z

Thanks for the fix @hlky !

device_map in load_model_dict_into_meta

18454cf

hlky requested review from SunMarc and sayakpaul February 21, 2025 06:14

sayakpaul reviewed Feb 21, 2025

View reviewed changes

hlky added 2 commits February 21, 2025 07:37

_LOW_CPU_MEM_USAGE_DEFAULT

040d657

Merge branch 'main' into model-loading-refactor-fixes

9e2e165

fix is_peft_version is_bitsandbytes_version

7d96d88

DN6 approved these changes Feb 21, 2025

View reviewed changes

sayakpaul approved these changes Feb 21, 2025

View reviewed changes

hlky merged commit d75ea3c into huggingface:main Feb 21, 2025
11 of 12 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

`device_map` in `load_model_dict_into_meta` #10851

`device_map` in `load_model_dict_into_meta` #10851

Uh oh!

hlky commented Feb 21, 2025

Uh oh!

HuggingFaceDocBuilderDev commented Feb 21, 2025

Uh oh!

sayakpaul left a comment

Uh oh!

hlky commented Feb 21, 2025

Uh oh!

sayakpaul commented Feb 21, 2025

Uh oh!

hlky commented Feb 21, 2025

Uh oh!

sayakpaul commented Feb 21, 2025

Uh oh!

hlky commented Feb 21, 2025

Uh oh!

hlky commented Feb 21, 2025

Uh oh!

sayakpaul commented Feb 21, 2025

Uh oh!

guiyrt commented Feb 21, 2025

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants

Uh oh!

device_map in load_model_dict_into_meta #10851

device_map in load_model_dict_into_meta #10851

Uh oh!

Conversation

hlky commented Feb 21, 2025

What does this PR do?

Who can review?

Uh oh!

HuggingFaceDocBuilderDev commented Feb 21, 2025

Uh oh!

sayakpaul left a comment

Choose a reason for hiding this comment

Uh oh!

hlky commented Feb 21, 2025

Uh oh!

sayakpaul commented Feb 21, 2025

Uh oh!

hlky commented Feb 21, 2025

Uh oh!

sayakpaul commented Feb 21, 2025

Uh oh!

hlky commented Feb 21, 2025

Uh oh!

hlky commented Feb 21, 2025

Uh oh!

sayakpaul commented Feb 21, 2025

Uh oh!

guiyrt commented Feb 21, 2025

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants

`device_map` in `load_model_dict_into_meta` #10851

`device_map` in `load_model_dict_into_meta` #10851