generated from amazon-archives/__template_MIT-0
-
Notifications
You must be signed in to change notification settings - Fork 19
Open
Labels
Description
Hi team,
Sharing some observation for potential bugs as below:
- this seems to be duplicated. The first class definition should be changed to
neuronx-distributed/examples/inference/llama2/neuron_modeling_llama.py
Lines 283 to 312 in 4f95471
class NeuronLlamaModel(NeuronBaseModel, LlamaPreTrainedModel): """ The neuron version of the LlamaModel """ def setup_attr_for_model(self, config: NeuronLlamaConfig): # Needed for init_inference_optimization() self.on_device_sampling = config.on_device_sampling self.tp_degree = config.tp_degree self.hidden_size = config.hidden_size self.num_attention_heads = config.num_attention_heads self.num_key_value_heads = config.num_key_value_heads self.max_batch_size = config.max_batch_size self.buckets = config.buckets def init_model(self, config: NeuronLlamaConfig): def forward(self, x): """ Forward pass of the ResBlock. Args: x (torch.Tensor): Input tensor. Returns: torch.Tensor: Output after the residual connection and activation. """ return x + self.act(self.linear(x)) class NeuronLlamaModel(NeuronBaseModel, LlamaPreTrainedModel): ResBlock - https://github.com/aws-neuron/neuronx-distributed/blob/main/examples/inference/modules/model_base.py#L453-L486 these lines seems to be not relevant, and should be deleted and replaced with
return [res] + updated_kv_cache
Thank you!