Skip to content

Conversation

@francoishernandez
Copy link
Member

@francoishernandez francoishernandez commented May 7, 2025

We'll probably never have a perfect solution to handle every HF cases, but it doesn't hurt to keep rationalizing a few things.

Addressed topics

  • centralize mappings and configs in separate file
  • clarify encoder/decoder keys mapping (previously, decoder stuff would be in the root mapping, whereas encoder would be in the specific key)
  • first shards params are transparently grabbed from mapping root, instead of relying on a fixed set which is a hassle to maintain
  • move specific config flags to "config" key of main mapping
  • simplify shards building loop (ongoing -- shall we loop on params/map instead of checkpoints?)

Some notes:

  1. while testing this, I checked Mixtral quickly, and it appears to have been broken for a while (even before previous refactoring); not sure if we'll fix this here or later EDIT: MoE (Mixtral/Qwen3) seems fine after a few patches but AWQ is not -- though deprecated so not sure we want to dive back into it (might be better off investigating llm-compressor which replaces it)
  2. Did not test all architectures yet (e.g. gpt2/nllb/xlmroberta) EDIT: only XLM-roberta not fully tested
  3. transformer decoder refactoring a while ago introduced post_attention_layernorm, which should probably be made optional (e.g. phi-2) EDIT: introduced post_attention_layernorm flag (default True)

@francoishernandez francoishernandez changed the title New iteration on convert HF New iteration on convert HF, fix some models, support Qwen3/Qwen3MoE May 15, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants