Skip to content

Commit 23e8f8b

Browse files
committed
Change CLIP dtype management in llama.py
It is probably safer to keep CLIP at its original precision (e.g., fp16) regardless of the autocast setting: Some casting (e.g., from fp16 to bf16) may be lossy and can potentially harm the pre-trained model. Keep the changes to llama.py only at this moment since a lot of copy- pasted codes may be refactored in the future (#3).
1 parent b93d0b6 commit 23e8f8b

File tree

1 file changed

+4
-2
lines changed

1 file changed

+4
-2
lines changed

accessory/model/LLM/llama.py

Lines changed: 4 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -364,8 +364,10 @@ def clip_encode_image(self, x):
364364

365365

366366
def encode_image(self, image):
367-
# return self.patch_embed(image)
368-
image_tokens = self.clip_encode_image(image)
367+
with torch.cuda.amp.autocast(enabled=False):
368+
image = image.half()
369+
image_tokens = self.clip_encode_image(image)
370+
image = image.to(self.clip_proj.weight.dtype)
369371
image_tokens = self.clip_proj_norm(self.clip_proj(image_tokens))
370372
return image_tokens
371373

0 commit comments

Comments
 (0)