Skip to content

Release version v0.5.0 + add --hf-token, use hf-mem as lib, etc.#39

Draft
alvarobartt wants to merge 10 commits intomainfrom
clean-and-opinionate
Draft

Release version v0.5.0 + add --hf-token, use hf-mem as lib, etc.#39
alvarobartt wants to merge 10 commits intomainfrom
clean-and-opinionate

Conversation

@alvarobartt
Copy link
Owner

@alvarobartt alvarobartt commented Mar 5, 2026

Description

This PR sets the version to v0.5.0 by running uv version 0.5.0 prior the release; and also includes hf-mem as a lib rather than only as a CLI, adds the --hf-token argument (closes #37, or at least makes it more explicit), and last but not least, uses Claude Sonnet 4.6 via pi to refactor the codebase in an opinionated way (+ some manual intervention) due to the recent community contributions.

Additionally, this PR moves out the KV cache estimation, and applies the feedback mentioned in https://huggingface.co/Qwen/Qwen3.5-397B-A17B/discussions/20#69a5bf82a2b3b0f27e8eacef to handle properly the full and sliding attention, rather than always assuming full attention + use the head_dim if specified instead of calculating it; all the kudos here go to https://huggingface.co/YouJiacheng.


  • I have read and followed the guidelines in CONTRIBUTING.md.
  • This has been discussed over an issue or discussion.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

401 Unauthorized, for models with ToS agreements on HF

1 participant