Skip to content

Feat: Support dynamic ONNX model loading from HF (maintaining zero-dependency)#46

Open
Kagandi wants to merge 7 commits intoPrithivirajDamodaran:mainfrom
Kagandi:main
Open

Feat: Support dynamic ONNX model loading from HF (maintaining zero-dependency)#46
Kagandi wants to merge 7 commits intoPrithivirajDamodaran:mainfrom
Kagandi:main

Conversation

@Kagandi
Copy link

@Kagandi Kagandi commented Dec 11, 2025

Description:

I’ve read your comments regarding the design philosophy of Flashrank—specifically the goal to keep the library curated, lightweight, and focused on "tiny and performant" models. I fully agree that Flashrank should not become a heavy wrapper for massive models.

However, I believe this PR strengthens that mission while improving maintainability:

Decoupling Code from Models: Currently, adding a new "tiny/performant" model requires a code change and a release by you. This PR allows the community to experiment with new lightweight ONNX models immediately.

Zero Dependencies: This uses the existing architecture and does not add new dependencies.

Strictly Lightweight (Proposed Safeguard): To ensure this feature doesn't violate the "Flashrank" ethos, I can implement a hard file-size limit (e.g., < 200MB) for custom loaded models. This guarantees that users cannot load massive/slow models, keeping the library true to its name while offering flexibility.

This change allows the library to remain "light and fast" while offloading the burden of constant model updates from the maintainers.

Changes

refactor

  • Extracted file download logic into a dedicated download_file() function to eliminate code duplication
  • Updated model preparation workflow to use the new reusable download function
  • Enhanced support for downloading both model archives and individual files from HuggingFace Hub

feat

  • Added support for downloading models and required tokenizer files from HuggingFace Hub
  • Implemented a fallback mechanism to download from HuggingFace when models are not found in the local model map
  • Added helper function _download_hf_model_files() to fetch models using HuggingFace URLs

fix

  • Added a check for token_type_ids presence in ONNX model inputs before using them
  • Improved robustness of model loading for different ONNX model architectures

chore

  • Bumped version to 0.3 in setup.py
  • Added hf_model_url for HuggingFace models

@raphaeleduardo42
Copy link

Worked as intended with onnx-community/bge-reranker-v2-m3-ONNX

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants