Potential bugs for the llama inference example

Hi team,

Sharing some observation for potential bugs as below:

- https://github.com/aws-neuron/neuronx-distributed/blob/4f954715f39b2cc9e628ded79274957401bea086/examples/inference/llama2/neuron_modeling_llama.py#L283-L312 this seems to be duplicated. The first class definition should be changed to `ResBlock`
- https://github.com/aws-neuron/neuronx-distributed/blob/main/examples/inference/modules/model_base.py#L453-L486 these lines seems to be not relevant, and should be deleted and replaced with `return [res] + updated_kv_cache`

Thank you!

	class NeuronLlamaModel(NeuronBaseModel, LlamaPreTrainedModel):
	"""
	The neuron version of the LlamaModel
	"""
	def setup_attr_for_model(self, config: NeuronLlamaConfig):
	# Needed for init_inference_optimization()
	self.on_device_sampling = config.on_device_sampling
	self.tp_degree = config.tp_degree
	self.hidden_size = config.hidden_size
	self.num_attention_heads = config.num_attention_heads
	self.num_key_value_heads = config.num_key_value_heads
	self.max_batch_size = config.max_batch_size
	self.buckets = config.buckets

	def init_model(self, config: NeuronLlamaConfig):

	def forward(self, x):
	"""
	Forward pass of the ResBlock.

	Args:
	x (torch.Tensor): Input tensor.

	Returns:
	torch.Tensor: Output after the residual connection and activation.
	"""
	return x + self.act(self.linear(x))


	class NeuronLlamaModel(NeuronBaseModel, LlamaPreTrainedModel):

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Potential bugs for the llama inference example #30

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Potential bugs for the llama inference example #30

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions