Skip to content

Conversation

@blap
Copy link

@blap blap commented Sep 18, 2024

GGUF conversion for HF1BitLLM/Llama3-8B-1.58-100B-tokens: https://huggingface.co/HF1BitLLM/Llama3-8B-1.58-100B-tokens/discussions/3

@github-actions github-actions bot added the python python script changes label Sep 18, 2024
@compilade
Copy link
Collaborator

I appreciate the initiative, but this won't really be mergable as-is. As I've noted in https://huggingface.co/HF1BitLLM/Llama3-8B-1.58-100B-tokens/discussions/3, this is a "very ad-hoc patch, [and it] probably only works for this model".

This is becaue it only handles safetensors files, and only specially handles the particular format used for this model when using lazy conversion (so it won't work with --no-lazy).

I think the way forward for this kind of thing would be to explicitly support the quantization_config field in config.json to load quantized models. Not sure if there's a way to call transformers code compatibly with lazy conversion (i.e. streaming dequantization instead of dequantizing the whole model at once in memory) or if each type will need to be manually implemented.

@blap blap closed this by deleting the head repository Sep 19, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

python python script changes

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants