[WIP] Add PLDR-LLM #40108

burcgokden · 2025-08-12T13:42:00Z

What does this PR do?

Fixes # (issue)
This PR is for adding a new model: PLDR-LLM (Large Language Model from Power Law Decoder Representation)
#40101

Before submitting

This PR fixes a typo or improves the docs (you can dismiss the other checks if that's the case).
Did you read the contributor guideline,
Pull Request section?
Was this discussed/approved via a Github issue or the forum? Please add a link
to it if that's the case.
[WIP] Add PLDR-LLM #40101
Did you make sure to update the documentation with your changes? Here are the
documentation guidelines, and
here are tips on formatting docstrings.
Did you write any new necessary tests?

Who can review?

Anyone in the community is free to review the PR once the tests have passed. Feel free to tag
members/contributors who may be interested in your PR.

github-actions · 2025-08-12T13:43:08Z

[For maintainers] Suggested jobs to run (before merge)

run-slow: auto, pldrllm

Rocketknight1 · 2025-08-13T13:10:05Z

Hi @burcgokden, I think this makes more sense as a remote code model! https://huggingface.co/docs/transformers/main/en/custom_models

Generally we only add models to the main library when there's a significant pre-trained checkpoint with a lot of expected users, because at that point the Transformers team takes responsibility for maintenance. Many large models (e.g. Phi-3.5) are custom code models, and users can download them just like library models, with trust_remote_code=True.

burcgokden · 2025-08-13T23:26:07Z

Hi @Rocketknight1, Thank you for your note and the link provided. A custom model approach with trust_remote_code=True should be ok. I'll try this approach when the pretrained models in the hub are updated to be compatible with transformers library. Working out a few more kinks right now, a working code with PLDR-LLM support will be available soon.

…pointingLayer` (huggingface#40091)

fix mllama vision encoder Signed-off-by: Isotr0py <[email protected]>

…ggingface#40100) * switch order for BC and future logic * in generate as well

* fix qwen3moe gguf architecture * Fix Qwen3Moe GGUF loading --------- Co-authored-by: Mohamed Mekkouri <[email protected]> Co-authored-by: Jinuk Kim <[email protected]>

Currently model_debugging_utils.py would have an unguarded `import torch.distributed.tensor`. This PR ensures that the distributed module is available before including its tensor module.

* default to dq if cpu * an other check * style * revert some changes

* fix flash attention * i got a stroke reading that comment * change dropout kwarg back to before * rename _fa3... as it's used for multiple variants and should work as fallback instead * simplify imports and support kwargs for fa * style * fix comments order * small fix * skip kernels test (causes cuda illegal memories w/o cleanup), fix fa test in general esp for models like bart * style * allow fullgraph by preloading on init * make globals "private" * ci pls be happy * change skip conditions based on backend flag (indicating missing mask interface) * move globals support to a function to prepare kwargs * style * generalize supported kwargs * small change to doc * fix * add comments * style * revert prep during generate * style * revert weird style changes * add fa kwarg prep during generate with fixes back * how did this even happen * how * add comment

…enizer at train time (huggingface#38441) * tmp commit * add test * make fixup * reset warns/info in test

…tention (huggingface#39707) Fix the is_causal logic to enable bidirectional attention Co-authored-by: Arthur <[email protected]>

…uggingface#39894) decoding -> generation; add collections

* Add model card for MobileViT * Update docs/source/en/model_doc/mobilevit.md Co-authored-by: Steven Liu <[email protected]> * Update docs/source/en/model_doc/mobilevit.md Co-authored-by: Steven Liu <[email protected]> * Update docs/source/en/model_doc/mobilevit.md Co-authored-by: Steven Liu <[email protected]> * Update docs/source/en/model_doc/mobilevit.md Co-authored-by: Steven Liu <[email protected]> * Update docs/source/en/model_doc/mobilevit.md Co-authored-by: Steven Liu <[email protected]> * Update mobilevit.md * Update mobilevit.md * Update mobilevit.md * Update docs/source/en/model_doc/mobilevit.md Co-authored-by: Steven Liu <[email protected]> * Update docs/source/en/model_doc/mobilevit.md Co-authored-by: Steven Liu <[email protected]> * Update mobilevit.md * Update mobilevit.md * Update mobilevit.md * Update mobilevit.md --------- Co-authored-by: Steven Liu <[email protected]>

* docs: ko: tiny_agents.md * feat: nmt draft * fix: manual edits * fix: manual edits

…gface#39975) * [bugfix] ensure correct tensor device in Idefics2, Idefics3, and SmolVLM models * to cuda

* changed xLSTMRMS.. to RMS... * fix linter error --------- Co-authored-by: Nikita <[email protected]>

* fix quantoquantized

fix bug; add tests

* factor out expand inputs * callable arg * improve docs, add test * Update docs/source/en/generation_strategies.md Co-authored-by: Joao Gante <[email protected]> --------- Co-authored-by: Joao Gante <[email protected]>

* Add initial collated reports script and job definition * provide commit hash for this run. Also use hash in generated artifact name. Json formatting * tidy * Add option to upload collated reports to hf hub * Add glob pattern for test report folders * Fix glob * Use machine_type as path filter instead of glob. Include machine_type in collated report

…uggingface#40127) * handle case where EOS token is None in gen config * update eli5 dataset

* use pil_torch_interpolation_mapping for NEAREST/NEAREST_EXACT * fix min torchvision version * use InterpolationMode directly * remove unused is_torchvision_greater_or_equal, * nit

…gface#39519) * docs: ko: processors.md * feat: nmt draft * fix: manual edits * Update docs/source/ko/main_classes/processors.md Co-authored-by: Ahnjj_DEV <[email protected]> * Update docs/source/ko/main_classes/processors.md Co-authored-by: Ahnjj_DEV <[email protected]> --------- Co-authored-by: TaskerJang <[email protected]> Co-authored-by: Ahnjj_DEV <[email protected]>

* docs: ko: jamba.md * feat: nmt draft * fix: manual edits * fix: resolve suggestion Co-authored-by: Minseo Kim <[email protected]> --------- Co-authored-by: Minseo Kim <[email protected]>

huggingface#39713) * docs: ko: main_classes/optimizer_schedules * feat: nmt draft * fix: improve TOC anchors and expressions in optimizer_schedules - Add TOC anchors to all section headers - Fix terminology and improve Korean expressions * fix: Correct translation of 'weight decay fixed' to '가중치 감쇠가 적용된' Changed '가중치 감쇠가 수정된' to '가중치 감쇠가 적용된' for more accurate translation of 'weight decay fixed' in the context of optimization. * fix: Use more natural Korean inheritance expression Changed '에서 상속받는' to '을 상속받는' to follow natural Korean grammar patterns for inheritance terminology. * fix: Use consistent '미세 조정' translation for 'finetuned models' Changed '파인튜닝된' to '미세 조정된 모델' to follow the established translation glossary for 'finetuned models' terminology.

…ggingface#40135) * working? * fix tests

* make visualizer rely on create causal mask * format * fixup * fixup * read token * read token, duh * what is up with that token * small tests? * adjust * try with flush * normalize for ANSI * buffer shenanigans

…ForSequenceClassification (huggingface#35991) * [ModernBert] Prevent the attention mask from being None in ModernBertForSequenceClassification * fix the modular conversion

* Clean up xcodec addition. * Clean up config. * Switch to fixtures test. * Small stuff. * Polish XCodec and standardize across codecs. * Update src/transformers/models/xcodec/modeling_xcodec.py Co-authored-by: Anton Vlasjuk <[email protected]> * Format and fix test. * Update tol. --------- Co-authored-by: Anton Vlasjuk <[email protected]>

* add cors warnings * Update src/transformers/commands/serving.py Co-authored-by: Quentin Gallouédec <[email protected]> * Update src/transformers/commands/serving.py Co-authored-by: Arthur <[email protected]> * Apply suggestions from code review * make fixup --------- Co-authored-by: Quentin Gallouédec <[email protected]> Co-authored-by: Arthur <[email protected]>

…nal embeddings (huggingface#40300) fix: use consistent dtype for sine positional embeddings

Signed-off-by: cyy <[email protected]>

* fix * cleanup, revert aimv2 fa changes * fix aria * i searched a long time but the cross dependency is for the recent models so... * this was something... evolla * fix modernbert decoder + make fa test more robust * nit

…m dec layers (huggingface#40277) * handle support for cache classes when num enc layers != num dec layers * handle overwrites * one more corner case * Update src/transformers/generation/utils.py * Update src/transformers/generation/utils.py * Apply suggestions from code review * handle corner case :o

* more docs to device agnostic Signed-off-by: YAO Matrix <[email protected]> * more Signed-off-by: YAO Matrix <[email protected]> * 1 Signed-off-by: YAO Matrix <[email protected]> * 2 Signed-off-by: YAO Matrix <[email protected]> * Update vitpose.md * Update camembert.md * Update camembert.md --------- Signed-off-by: YAO Matrix <[email protected]>

…iningArguments (huggingface#40353) * Update trainer.md * Update trainer.md Removed the detail about label_names argument usage from the tip/ warning section * Update training_args.py Added the label_names usage clarification in the docstring * Update trainer.md --------- Co-authored-by: Steven Liu <[email protected]>

* merge opensource_hunyuan * add head_dim * fix assertion error * fix seen_tokens * ready_for_upstream (merge request !17) Squash merge branch 'ready_for_upstream' into 'main' * fix configuration type&docstring * fix style * ready_for_upstream (merge request !18) Squash merge branch 'ready_for_upstream' into 'main' * add doc * fix testcode * fix configuration type&docstring * rename base model * remove assert * update * remove tiktoken * update * fix moe and code style (huggingface#3) * update * fix format * update * revert makefile * fix moe config * fix numel() * remove prepare_inputs_for_generation * fix kv_seq_len * add docs/toctree * remove unused paramter&add licence * add licence * remove unused paramter * fix code * dense modular update import fix fix use mistralmodel fix qknorm add sliding_window make style fix dense done hunyuan moe fix import fix modular fixup fixup * update model path * fix mlp_bias * fix modular * Fix modeling (huggingface#5) * fix attention * use llamamodel * fix code * Fix qk (huggingface#6) * fix qk_norm * fix * fix modual * Fix moe (huggingface#7) * fix some moe code * fix einsum * try top1 * use top1 * Fix rotary (huggingface#8) * fix rotary * fix modeling * fix modular * fix testcode * remove A13B unit test * Fix moe v1 (huggingface#9) fix moe & gate * Fix gate norm (huggingface#10) * add norm_topk_prob * Fix testcase (huggingface#11) * fix&skip test * Fix testcase (huggingface#12) * skip testcase * Fix norm topk (huggingface#13) * hardcode norm_topk_prob * fix testcase --------- Co-authored-by: pridejcyang <[email protected]> Co-authored-by: Mingji Han <[email protected]>

fix idefics3 vision embeddings Signed-off-by: Isotr0py <[email protected]>

* Changed datasets to avoid a datasets error * Changed back split to test

change multimodal data links to HF hub

…eration pipelines (huggingface#40356) * add support to skip_special_tokens in pipelines * add test * rm redundant

) * update everywhere * style * pipelines * switch it everywhere in tests * switch it everywhere in docs * switch in converters everywhere * update in examples * update in model docstrings * style * warnings * style * Update configuration_utils.py * fix * Update configuration_utils.py * fixes and add first test * add pipeline tests * Update test_pipelines_common.py * add config test * Update test_modeling_common.py * add new ones * post rebase * add new * post rebase adds

* move commonalities to mixin * revert - unrelated * fix copies * fix style * comments

…e#40241) allow to overwrite kwargs from subconfigs

…on for gdino (huggingface#40369)

add seed oss

* Add GptOssForTokenClassification for GPT-OSS models * After run make fixup

…e#40352) * bug fix - return_lse dynamically set * addressed compatibility with return type - flex_attention_forward * rename variables * revert changes to commits

@zucchini-nlp

* draft commit * draft commit * Fixup chat_extras too * Update conversations.md * Update the toctree and titles * Update the writing guide! * Use @zucchini-nlp's suggestion * Update docs/source/en/conversations.md Co-authored-by: Steven Liu <[email protected]> * Update docs/source/en/conversations.md Co-authored-by: Steven Liu <[email protected]> * Update docs/source/en/conversations.md Co-authored-by: Steven Liu <[email protected]> * Apply suggestions from code review Co-authored-by: Steven Liu <[email protected]> * Apply suggestions from code review Co-authored-by: Steven Liu <[email protected]> * Apply suggestions from code review Co-authored-by: Steven Liu <[email protected]> * Apply suggestions from code review Co-authored-by: Steven Liu <[email protected]> * Apply suggestions from code review Co-authored-by: Steven Liu <[email protected]> --------- Co-authored-by: Steven Liu <[email protected]>

* start working the doc * remove gemma2 * review

Fix a typo.

* HF papers * clean * Update src/transformers/models/gemma3n/configuration_gemma3n.py Co-authored-by: Steven Liu <[email protected]> * style --------- Co-authored-by: Steven Liu <[email protected]>

burcgokden · 2025-08-27T15:19:22Z

Hi @Rocketknight1, a working initial commit for the PLDR-LLM model was pushed together with documentation and tests for reference and acknowledgement of availability of this model. The custom model approach along with trust_remote_code=True works reasonably well at the moment. We intend to develop on this branch as needed to keep up with updates in the transformers library. I'll close this pull request in a couple of days. Thank you.

added base model files to adapt and update.

96c3fc0

qgallouedec and others added 26 commits August 23, 2025 01:18

Replace logger.warning with logger.warning_once in `GradientCheck…

922bcf2

…pointingLayer` (huggingface#40091)

Fix regression in mllama vision encoder (huggingface#40083)

c997513

fix mllama vision encoder Signed-off-by: Isotr0py <[email protected]>

Switch the order of args in StaticCache (for BC and future logic) (hu…

17168a5

…ggingface#40100) * switch order for BC and future logic * in generate as well

Fix Qwen3 MoE GGUF architecture mismatch (huggingface#39976)

2ed9a8e

* fix qwen3moe gguf architecture * Fix Qwen3Moe GGUF loading --------- Co-authored-by: Mohamed Mekkouri <[email protected]> Co-authored-by: Jinuk Kim <[email protected]>

Fix error on importing unavailable torch.distributed (huggingface#40038)

05e5a95

Currently model_debugging_utils.py would have an unguarded `import torch.distributed.tensor`. This PR ensures that the distributed module is available before including its tensor module.

Default to dequantize if cpu in device_map for mxfp4 (huggingface#39993)

d8b7254

* default to dq if cpu * an other check * style * revert some changes

[trainer] ensure special tokens in model configs are aligned with tok…

4faefc6

…enizer at train time (huggingface#38441) * tmp commit * add test * make fixup * reset warns/info in test

Fix Causality Handling in Flash Attention to Support Bidirectional At…

c4f39cd

…tention (huggingface#39707) Fix the is_causal logic to enable bidirectional attention Co-authored-by: Arthur <[email protected]>

[docs] Add reference to HF-maintained custom_generate collections (h…

ed3288a

…uggingface#39894) decoding -> generation; add collections

remove sequence parallel in llama4 (huggingface#40084)

2a5f7d8

🌐 [i18n-KO] Translated tiny_agents.md to Korean (huggingface#39913)

9504be9

* docs: ko: tiny_agents.md * feat: nmt draft * fix: manual edits * fix: manual edits

[bugfix] Fix tensor device in Idefics2, Idefics3, and SmolVLM (huggin…

1a97c7f

…gface#39975) * [bugfix] ensure correct tensor device in Idefics2, Idefics3, and SmolVLM models * to cuda

changed xLSTMRMSNorm to RMSNorm (huggingface#40113)

0a87fce

* changed xLSTMRMS.. to RMS... * fix linter error --------- Co-authored-by: Nikita <[email protected]>

Fix QuantoQuantizedCache import issues (huggingface#40109)

191f561

* fix quantoquantized

[serve] allow array content inputs for LLMs (huggingface#39829)

93d2ceb

fix bug; add tests

decoding_method argument in generate (huggingface#40085)

a734507

* factor out expand inputs * callable arg * improve docs, add test * Update docs/source/en/generation_strategies.md Co-authored-by: Joao Gante <[email protected]> --------- Co-authored-by: Joao Gante <[email protected]>

DOCS: Add missing space in SECURITY.md (huggingface#40087)

8cb9dee

[trainer] handle case where EOS token is None in generation_config (h…

d41bada

…uggingface#40127) * handle case where EOS token is None in gen config * update eli5 dataset

Fix hidden torchvision>=0.15 dependency issue (huggingface#39928)

bdb2946

* use pil_torch_interpolation_mapping for NEAREST/NEAREST_EXACT * fix min torchvision version * use InterpolationMode directly * remove unused is_torchvision_greater_or_equal, * nit

🌐 [i18n-KO] Translated jamba.md to Korean (huggingface#39890)

cbb6231

* docs: ko: jamba.md * feat: nmt draft * fix: manual edits * fix: resolve suggestion Co-authored-by: Minseo Kim <[email protected]> --------- Co-authored-by: Minseo Kim <[email protected]>

🚨🚨 [generate] ignore cache_implementation="hybrid" hub defaults (hu…

72c36a6

…ggingface#40135) * working? * fix tests

molbap and others added 28 commits August 23, 2025 01:18

Fix attention vizualizer (huggingface#40285)

9b7dbb7

* make visualizer rely on create causal mask * format * fixup * fixup * read token * read token, duh * what is up with that token * small tests? * adjust * try with flush * normalize for ANSI * buffer shenanigans

[ModernBert] Prevent the attention mask from being None in ModernBert…

c4c14b2

…ForSequenceClassification (huggingface#35991) * [ModernBert] Prevent the attention mask from being None in ModernBertForSequenceClassification * fix the modular conversion

[detection] use consistent dtype for Conditional and DAB DETR positio…

742e596

…nal embeddings (huggingface#40300) fix: use consistent dtype for sine positional embeddings

Remove more PyTorch 2.2 compatible code (huggingface#40337)

edf45e3

Signed-off-by: cyy <[email protected]>

[FA] Fix some model tests (huggingface#40350)

9b39d4e

* fix * cleanup, revert aimv2 fa changes * fix aria * i searched a long time but the cross dependency is for the recent models so... * this was something... evolla * fix modernbert decoder + make fa test more robust * nit

Qwen2.5-VL test fixes for ROCm (huggingface#40308)

b00cde2

Fix idefics3 vision embeddings indices dtype (huggingface#40360)

6aca74f

fix idefics3 vision embeddings Signed-off-by: Isotr0py <[email protected]>

wav2vec2 fixes (huggingface#40341)

8855f1a

* Changed datasets to avoid a datasets error * Changed back split to test

Change multimodal data links to HF hub (huggingface#40309)

bad10bf

change multimodal data links to HF hub

[pipelines] add support to skip_special_tokens in the main text gen…

6c12c94

…eration pipelines (huggingface#40356) * add support to skip_special_tokens in pipelines * add test * rm redundant

[processor] move commonalities to mixin (huggingface#40339)

afb5c51

* move commonalities to mixin * revert - unrelated * fix copies * fix style * comments

[configuration] allow to overwrite kwargs from subconfigs (huggingfac…

2085855

…e#40241) allow to overwrite kwargs from subconfigs

fix(example): align parameter names with the latest function definiti…

8c56754

…on for gdino (huggingface#40369)

Addiing ByteDance Seed Seed-OSS (huggingface#40272)

a59748a

add seed oss

Add GptOssForTokenClassification for GPT-OSS models (huggingface#40190)

7847025

* Add GptOssForTokenClassification for GPT-OSS models * After run make fixup

Bug Fix: Dynamically set return_lse flag in FlexAttention (huggingfac…

33f092d

…e#40352) * bug fix - return_lse dynamically set * addressed compatibility with return type - flex_attention_forward * rename variables * revert changes to commits

Rework the Cache documentation (huggingface#40373)

b3dbe1f

* start working the doc * remove gemma2 * review

Update README_zh-hans.md (huggingface#40380)

f3aee54

Fix a typo.

HF papers in doc (huggingface#40381)

5347cdd

* HF papers * clean * Update src/transformers/models/gemma3n/configuration_gemma3n.py Co-authored-by: Steven Liu <[email protected]> * style --------- Co-authored-by: Steven Liu <[email protected]>

initial commit of PLDR-LLM model files.

a28acc8

burcgokden closed this Sep 1, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[WIP] Add PLDR-LLM #40108

[WIP] Add PLDR-LLM #40108

burcgokden commented Aug 12, 2025 •

edited

Loading

Uh oh!

github-actions bot commented Aug 12, 2025

Uh oh!

Rocketknight1 commented Aug 13, 2025

Uh oh!

burcgokden commented Aug 13, 2025

Uh oh!

burcgokden commented Aug 27, 2025

Uh oh!

Uh oh!

[WIP] Add PLDR-LLM #40108

[WIP] Add PLDR-LLM #40108

Conversation

burcgokden commented Aug 12, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

What does this PR do?

Before submitting

Who can review?

Uh oh!

github-actions bot commented Aug 12, 2025

Uh oh!

Rocketknight1 commented Aug 13, 2025

Uh oh!

burcgokden commented Aug 13, 2025

Uh oh!

burcgokden commented Aug 27, 2025

Uh oh!

Uh oh!

burcgokden commented Aug 12, 2025 •

edited

Loading