Skip to content

Potential bugs for the llama inference example #30

@yinsong1986

Description

@yinsong1986

Hi team,

Sharing some observation for potential bugs as below:

  • class NeuronLlamaModel(NeuronBaseModel, LlamaPreTrainedModel):
    """
    The neuron version of the LlamaModel
    """
    def setup_attr_for_model(self, config: NeuronLlamaConfig):
    # Needed for init_inference_optimization()
    self.on_device_sampling = config.on_device_sampling
    self.tp_degree = config.tp_degree
    self.hidden_size = config.hidden_size
    self.num_attention_heads = config.num_attention_heads
    self.num_key_value_heads = config.num_key_value_heads
    self.max_batch_size = config.max_batch_size
    self.buckets = config.buckets
    def init_model(self, config: NeuronLlamaConfig):
    def forward(self, x):
    """
    Forward pass of the ResBlock.
    Args:
    x (torch.Tensor): Input tensor.
    Returns:
    torch.Tensor: Output after the residual connection and activation.
    """
    return x + self.act(self.linear(x))
    class NeuronLlamaModel(NeuronBaseModel, LlamaPreTrainedModel):
    this seems to be duplicated. The first class definition should be changed to ResBlock
  • https://github.com/aws-neuron/neuronx-distributed/blob/main/examples/inference/modules/model_base.py#L453-L486 these lines seems to be not relevant, and should be deleted and replaced with return [res] + updated_kv_cache

Thank you!

Metadata

Metadata

Assignees

No one assigned

    Labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions