在Jetson AGX Orin上进行4bit量化时报错 #1290
Unanswered
FanZhang91
asked this question in
Q&A
Replies: 2 comments
-
model = AutoPeftModelForCausalLM.from_pretrained(
model_dir, trust_remote_code=trust_remote_code, device_map='auto',
quantization_config = BitsAndBytesConfig(load_in_4bit=True),
) 用这种方式加载 |
Beta Was this translation helpful? Give feedback.
0 replies
-
在4090服务器上无论是采用quntize还是BitsAndBytesConfig的形式进行量化都是可以运行的,只是发现int4和int8量化还不如half快,这个是否正常?但是在Orin这种设备上int4量化会报错,这个呀怎么处理? |
Beta Was this translation helpful? Give feedback.
0 replies
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Uh oh!
There was an error while loading. Please reload this page.
-
我使用类似下面的方式加载量化模型,但是会报错:RuntimeError: no kernel image available for execution on the device.
model = AutoModel.from_pretrained(MODEL_PATH, trust_remote_code=True).quantize(bits=4, device="cuda").cuda().eval()
model = AutoModel.from_pretrained(MODEL_PATH, trust_remote_code=True).cuda().quantize(bits=4, device="cuda").eval()
请问要如何修复这个问题?
Beta Was this translation helpful? Give feedback.
All reactions