-
Notifications
You must be signed in to change notification settings - Fork 30.2k
[WIP] Add PLDR-LLM #40108
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[WIP] Add PLDR-LLM #40108
Conversation
[For maintainers] Suggested jobs to run (before merge) run-slow: auto, pldrllm |
Hi @burcgokden, I think this makes more sense as a remote code model! https://huggingface.co/docs/transformers/main/en/custom_models Generally we only add models to the main library when there's a significant pre-trained checkpoint with a lot of expected users, because at that point the Transformers team takes responsibility for maintenance. Many large models (e.g. Phi-3.5) are custom code models, and users can download them just like library models, with |
Hi @Rocketknight1, Thank you for your note and the link provided. A custom model approach with |
fix mllama vision encoder Signed-off-by: Isotr0py <[email protected]>
…ggingface#40100) * switch order for BC and future logic * in generate as well
* fix qwen3moe gguf architecture * Fix Qwen3Moe GGUF loading --------- Co-authored-by: Mohamed Mekkouri <[email protected]> Co-authored-by: Jinuk Kim <[email protected]>
Currently model_debugging_utils.py would have an unguarded `import torch.distributed.tensor`. This PR ensures that the distributed module is available before including its tensor module.
* default to dq if cpu * an other check * style * revert some changes
* fix flash attention * i got a stroke reading that comment * change dropout kwarg back to before * rename _fa3... as it's used for multiple variants and should work as fallback instead * simplify imports and support kwargs for fa * style * fix comments order * small fix * skip kernels test (causes cuda illegal memories w/o cleanup), fix fa test in general esp for models like bart * style * allow fullgraph by preloading on init * make globals "private" * ci pls be happy * change skip conditions based on backend flag (indicating missing mask interface) * move globals support to a function to prepare kwargs * style * generalize supported kwargs * small change to doc * fix * add comments * style * revert prep during generate * style * revert weird style changes * add fa kwarg prep during generate with fixes back * how did this even happen * how * add comment
…enizer at train time (huggingface#38441) * tmp commit * add test * make fixup * reset warns/info in test
…tention (huggingface#39707) Fix the is_causal logic to enable bidirectional attention Co-authored-by: Arthur <[email protected]>
…uggingface#39894) decoding -> generation; add collections
* Add model card for MobileViT * Update docs/source/en/model_doc/mobilevit.md Co-authored-by: Steven Liu <[email protected]> * Update docs/source/en/model_doc/mobilevit.md Co-authored-by: Steven Liu <[email protected]> * Update docs/source/en/model_doc/mobilevit.md Co-authored-by: Steven Liu <[email protected]> * Update docs/source/en/model_doc/mobilevit.md Co-authored-by: Steven Liu <[email protected]> * Update docs/source/en/model_doc/mobilevit.md Co-authored-by: Steven Liu <[email protected]> * Update mobilevit.md * Update mobilevit.md * Update mobilevit.md * Update docs/source/en/model_doc/mobilevit.md Co-authored-by: Steven Liu <[email protected]> * Update docs/source/en/model_doc/mobilevit.md Co-authored-by: Steven Liu <[email protected]> * Update mobilevit.md * Update mobilevit.md * Update mobilevit.md * Update mobilevit.md --------- Co-authored-by: Steven Liu <[email protected]>
* docs: ko: tiny_agents.md * feat: nmt draft * fix: manual edits * fix: manual edits
…gface#39975) * [bugfix] ensure correct tensor device in Idefics2, Idefics3, and SmolVLM models * to cuda
* changed xLSTMRMS.. to RMS... * fix linter error --------- Co-authored-by: Nikita <[email protected]>
* fix quantoquantized
fix bug; add tests
* factor out expand inputs * callable arg * improve docs, add test * Update docs/source/en/generation_strategies.md Co-authored-by: Joao Gante <[email protected]> --------- Co-authored-by: Joao Gante <[email protected]>
* Add initial collated reports script and job definition * provide commit hash for this run. Also use hash in generated artifact name. Json formatting * tidy * Add option to upload collated reports to hf hub * Add glob pattern for test report folders * Fix glob * Use machine_type as path filter instead of glob. Include machine_type in collated report
…uggingface#40127) * handle case where EOS token is None in gen config * update eli5 dataset
* use pil_torch_interpolation_mapping for NEAREST/NEAREST_EXACT * fix min torchvision version * use InterpolationMode directly * remove unused is_torchvision_greater_or_equal, * nit
…gface#39519) * docs: ko: processors.md * feat: nmt draft * fix: manual edits * Update docs/source/ko/main_classes/processors.md Co-authored-by: Ahnjj_DEV <[email protected]> * Update docs/source/ko/main_classes/processors.md Co-authored-by: Ahnjj_DEV <[email protected]> --------- Co-authored-by: TaskerJang <[email protected]> Co-authored-by: Ahnjj_DEV <[email protected]>
* docs: ko: jamba.md * feat: nmt draft * fix: manual edits * fix: resolve suggestion Co-authored-by: Minseo Kim <[email protected]> --------- Co-authored-by: Minseo Kim <[email protected]>
huggingface#39713) * docs: ko: main_classes/optimizer_schedules * feat: nmt draft * fix: improve TOC anchors and expressions in optimizer_schedules - Add TOC anchors to all section headers - Fix terminology and improve Korean expressions * fix: Correct translation of 'weight decay fixed' to '가중치 감쇠가 적용된' Changed '가중치 감쇠가 수정된' to '가중치 감쇠가 적용된' for more accurate translation of 'weight decay fixed' in the context of optimization. * fix: Use more natural Korean inheritance expression Changed '에서 상속받는' to '을 상속받는' to follow natural Korean grammar patterns for inheritance terminology. * fix: Use consistent '미세 조정' translation for 'finetuned models' Changed '파인튜닝된' to '미세 조정된 모델' to follow the established translation glossary for 'finetuned models' terminology.
…ggingface#40135) * working? * fix tests
* make visualizer rely on create causal mask * format * fixup * fixup * read token * read token, duh * what is up with that token * small tests? * adjust * try with flush * normalize for ANSI * buffer shenanigans
…ForSequenceClassification (huggingface#35991) * [ModernBert] Prevent the attention mask from being None in ModernBertForSequenceClassification * fix the modular conversion
* Clean up xcodec addition. * Clean up config. * Switch to fixtures test. * Small stuff. * Polish XCodec and standardize across codecs. * Update src/transformers/models/xcodec/modeling_xcodec.py Co-authored-by: Anton Vlasjuk <[email protected]> * Format and fix test. * Update tol. --------- Co-authored-by: Anton Vlasjuk <[email protected]>
* add cors warnings * Update src/transformers/commands/serving.py Co-authored-by: Quentin Gallouédec <[email protected]> * Update src/transformers/commands/serving.py Co-authored-by: Arthur <[email protected]> * Apply suggestions from code review * make fixup --------- Co-authored-by: Quentin Gallouédec <[email protected]> Co-authored-by: Arthur <[email protected]>
…nal embeddings (huggingface#40300) fix: use consistent dtype for sine positional embeddings
Signed-off-by: cyy <[email protected]>
* fix * cleanup, revert aimv2 fa changes * fix aria * i searched a long time but the cross dependency is for the recent models so... * this was something... evolla * fix modernbert decoder + make fa test more robust * nit
…m dec layers (huggingface#40277) * handle support for cache classes when num enc layers != num dec layers * handle overwrites * one more corner case * Update src/transformers/generation/utils.py * Update src/transformers/generation/utils.py * Apply suggestions from code review * handle corner case :o
* more docs to device agnostic Signed-off-by: YAO Matrix <[email protected]> * more Signed-off-by: YAO Matrix <[email protected]> * 1 Signed-off-by: YAO Matrix <[email protected]> * 2 Signed-off-by: YAO Matrix <[email protected]> * Update vitpose.md * Update camembert.md * Update camembert.md --------- Signed-off-by: YAO Matrix <[email protected]>
…iningArguments (huggingface#40353) * Update trainer.md * Update trainer.md Removed the detail about label_names argument usage from the tip/ warning section * Update training_args.py Added the label_names usage clarification in the docstring * Update trainer.md --------- Co-authored-by: Steven Liu <[email protected]>
* merge opensource_hunyuan * add head_dim * fix assertion error * fix seen_tokens * ready_for_upstream (merge request !17) Squash merge branch 'ready_for_upstream' into 'main' * fix configuration type&docstring * fix style * ready_for_upstream (merge request !18) Squash merge branch 'ready_for_upstream' into 'main' * add doc * fix testcode * fix configuration type&docstring * rename base model * remove assert * update * remove tiktoken * update * fix moe and code style (huggingface#3) * update * fix format * update * revert makefile * fix moe config * fix numel() * remove prepare_inputs_for_generation * fix kv_seq_len * add docs/toctree * remove unused paramter&add licence * add licence * remove unused paramter * fix code * dense modular update import fix fix use mistralmodel fix qknorm add sliding_window make style fix dense done hunyuan moe fix import fix modular fixup fixup * update model path * fix mlp_bias * fix modular * Fix modeling (huggingface#5) * fix attention * use llamamodel * fix code * Fix qk (huggingface#6) * fix qk_norm * fix * fix modual * Fix moe (huggingface#7) * fix some moe code * fix einsum * try top1 * use top1 * Fix rotary (huggingface#8) * fix rotary * fix modeling * fix modular * fix testcode * remove A13B unit test * Fix moe v1 (huggingface#9) fix moe & gate * Fix gate norm (huggingface#10) * add norm_topk_prob * Fix testcase (huggingface#11) * fix&skip test * Fix testcase (huggingface#12) * skip testcase * Fix norm topk (huggingface#13) * hardcode norm_topk_prob * fix testcase --------- Co-authored-by: pridejcyang <[email protected]> Co-authored-by: Mingji Han <[email protected]>
fix idefics3 vision embeddings Signed-off-by: Isotr0py <[email protected]>
* Changed datasets to avoid a datasets error * Changed back split to test
change multimodal data links to HF hub
…eration pipelines (huggingface#40356) * add support to skip_special_tokens in pipelines * add test * rm redundant
) * update everywhere * style * pipelines * switch it everywhere in tests * switch it everywhere in docs * switch in converters everywhere * update in examples * update in model docstrings * style * warnings * style * Update configuration_utils.py * fix * Update configuration_utils.py * fixes and add first test * add pipeline tests * Update test_pipelines_common.py * add config test * Update test_modeling_common.py * add new ones * post rebase * add new * post rebase adds
* move commonalities to mixin * revert - unrelated * fix copies * fix style * comments
…e#40241) allow to overwrite kwargs from subconfigs
* Add GptOssForTokenClassification for GPT-OSS models * After run make fixup
…e#40352) * bug fix - return_lse dynamically set * addressed compatibility with return type - flex_attention_forward * rename variables * revert changes to commits
* draft commit * draft commit * Fixup chat_extras too * Update conversations.md * Update the toctree and titles * Update the writing guide! * Use @zucchini-nlp's suggestion * Update docs/source/en/conversations.md Co-authored-by: Steven Liu <[email protected]> * Update docs/source/en/conversations.md Co-authored-by: Steven Liu <[email protected]> * Update docs/source/en/conversations.md Co-authored-by: Steven Liu <[email protected]> * Apply suggestions from code review Co-authored-by: Steven Liu <[email protected]> * Apply suggestions from code review Co-authored-by: Steven Liu <[email protected]> * Apply suggestions from code review Co-authored-by: Steven Liu <[email protected]> * Apply suggestions from code review Co-authored-by: Steven Liu <[email protected]> * Apply suggestions from code review Co-authored-by: Steven Liu <[email protected]> --------- Co-authored-by: Steven Liu <[email protected]>
* start working the doc * remove gemma2 * review
Fix a typo.
* HF papers * clean * Update src/transformers/models/gemma3n/configuration_gemma3n.py Co-authored-by: Steven Liu <[email protected]> * style --------- Co-authored-by: Steven Liu <[email protected]>
Hi @Rocketknight1, a working initial commit for the PLDR-LLM model was pushed together with documentation and tests for reference and acknowledgement of availability of this model. The custom model approach along with |
What does this PR do?
Fixes # (issue)
This PR is for adding a new model: PLDR-LLM (Large Language Model from Power Law Decoder Representation)
#40101
Before submitting
Pull Request section?
to it if that's the case.
[WIP] Add PLDR-LLM #40101
documentation guidelines, and
here are tips on formatting docstrings.
Who can review?
Anyone in the community is free to review the PR once the tests have passed. Feel free to tag
members/contributors who may be interested in your PR.