Description
When trying to use kernl with default Llama 7B on A100 device, I get this error.
Steps to reproduce
import torch
from transformers import LlamaModel, LlamaConfig, LlamaTokenizer, LlamaForCausalLM
from kernl.model_optimization import optimize_model
config = LlamaConfig()
model = LlamaForCausalLM(config).cuda()
optimize_model(model)
length = 5
input_ids = torch.randint(low=0, high=model.config.vocab_size, size=(1,length)).cuda()
with torch.inference_mode(), torch.cuda.amp.autocast():
outputs = model.generate(input_ids=input_ids)
print(outputs.shape)
Expected Behavior
A properly working Llama model.
Actual Behavior
The following message occurs:




Your environment
Self-service
Code of Conduct