bug: Llama reproduce error with kernl

### Description

When trying to use kernl with default Llama 7B on **A100** device, I get this error.

### Steps to reproduce

## 

~~~python
import torch
from transformers import LlamaModel, LlamaConfig, LlamaTokenizer, LlamaForCausalLM
from kernl.model_optimization import optimize_model

config  = LlamaConfig()
model = LlamaForCausalLM(config).cuda()
optimize_model(model)

length = 5
input_ids = torch.randint(low=0, high=model.config.vocab_size, size=(1,length)).cuda()
with torch.inference_mode(), torch.cuda.amp.autocast():
	outputs = model.generate(input_ids=input_ids)

print(outputs.shape)
~~~

## 

### Expected Behavior

A properly working Llama model.

### Actual Behavior

The following message occurs:
![image](https://user-images.githubusercontent.com/64180736/233828932-d0da1624-6262-4119-af8e-67088fc5477b.png)
![image](https://user-images.githubusercontent.com/64180736/233828960-0a9618cf-17a3-4066-bb91-720b161a37d2.png)
![image](https://user-images.githubusercontent.com/64180736/233828974-5dc415f7-9518-4c2d-b068-8cb23762b5ed.png)
![image](https://user-images.githubusercontent.com/64180736/233828987-a473ff4e-40d4-445f-a045-3a0f02b1e6e6.png)


### Your environment

- A100



### Self-service

- [X] I would be willing to help fix this bug myself.

### Code of Conduct

- [X] I agree to follow this project's Code of Conduct

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

bug: Llama reproduce error with kernl #321

Description

Steps to reproduce

Expected Behavior

Actual Behavior

Your environment

Self-service

Code of Conduct

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

bug: Llama reproduce error with kernl #321

Description

Description

Steps to reproduce

Expected Behavior

Actual Behavior

Your environment

Self-service

Code of Conduct

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions