Skip to content

Conversation

burcgokden
Copy link

@burcgokden burcgokden commented Aug 12, 2025

What does this PR do?

Fixes # (issue)
This PR is for adding a new model: PLDR-LLM (Large Language Model from Power Law Decoder Representation)
#40101

Before submitting

Who can review?

Anyone in the community is free to review the PR once the tests have passed. Feel free to tag
members/contributors who may be interested in your PR.

Copy link
Contributor

[For maintainers] Suggested jobs to run (before merge)

run-slow: auto, pldrllm

@Rocketknight1
Copy link
Member

Hi @burcgokden, I think this makes more sense as a remote code model! https://huggingface.co/docs/transformers/main/en/custom_models

Generally we only add models to the main library when there's a significant pre-trained checkpoint with a lot of expected users, because at that point the Transformers team takes responsibility for maintenance. Many large models (e.g. Phi-3.5) are custom code models, and users can download them just like library models, with trust_remote_code=True.

@burcgokden
Copy link
Author

Hi @Rocketknight1, Thank you for your note and the link provided. A custom model approach with trust_remote_code=True should be ok. I'll try this approach when the pretrained models in the hub are updated to be compatible with transformers library. Working out a few more kinks right now, a working code with PLDR-LLM support will be available soon.

qgallouedec and others added 26 commits August 23, 2025 01:18
fix mllama vision encoder

Signed-off-by: Isotr0py <[email protected]>
…ggingface#40100)

* switch order for BC and future logic

* in generate as well
* fix qwen3moe gguf architecture

* Fix Qwen3Moe GGUF loading

---------

Co-authored-by: Mohamed Mekkouri <[email protected]>
Co-authored-by: Jinuk Kim <[email protected]>
Currently model_debugging_utils.py would have an unguarded `import torch.distributed.tensor`. This PR ensures that the distributed module is available before including its tensor module.
* default to dq if cpu

* an other check

* style

* revert some changes
* fix flash attention

* i got a stroke reading that comment

* change dropout kwarg back to before

* rename _fa3... as it's used for multiple variants and should work as fallback instead

* simplify imports and support kwargs for fa

* style

* fix comments order

* small fix

* skip kernels test (causes cuda illegal memories w/o cleanup), fix fa test in general esp for models like bart

* style

* allow fullgraph by preloading on init

* make globals "private"

* ci pls be happy

* change skip conditions based on backend flag (indicating missing mask interface)

* move globals support to a function to prepare kwargs

* style

* generalize supported kwargs

* small change to doc

* fix

* add comments

* style

* revert prep during generate

* style

* revert weird style changes

* add fa kwarg prep during generate with fixes back

* how did this even happen

* how

* add comment
…enizer at train time (huggingface#38441)

* tmp commit

* add test

* make fixup

* reset warns/info in test
…tention (huggingface#39707)

Fix the is_causal logic to enable bidirectional attention

Co-authored-by: Arthur <[email protected]>
* Add model card for MobileViT

* Update docs/source/en/model_doc/mobilevit.md

Co-authored-by: Steven Liu <[email protected]>

* Update docs/source/en/model_doc/mobilevit.md

Co-authored-by: Steven Liu <[email protected]>

* Update docs/source/en/model_doc/mobilevit.md

Co-authored-by: Steven Liu <[email protected]>

* Update docs/source/en/model_doc/mobilevit.md

Co-authored-by: Steven Liu <[email protected]>

* Update docs/source/en/model_doc/mobilevit.md

Co-authored-by: Steven Liu <[email protected]>

* Update mobilevit.md

* Update mobilevit.md

* Update mobilevit.md

* Update docs/source/en/model_doc/mobilevit.md

Co-authored-by: Steven Liu <[email protected]>

* Update docs/source/en/model_doc/mobilevit.md

Co-authored-by: Steven Liu <[email protected]>

* Update mobilevit.md

* Update mobilevit.md

* Update mobilevit.md

* Update mobilevit.md

---------

Co-authored-by: Steven Liu <[email protected]>
* docs: ko: tiny_agents.md

* feat: nmt draft

* fix: manual edits

* fix: manual edits
…gface#39975)

* [bugfix] ensure correct tensor device in Idefics2, Idefics3, and SmolVLM models

* to cuda
* changed xLSTMRMS.. to RMS...

* fix linter error

---------

Co-authored-by: Nikita <[email protected]>
* factor out expand inputs

* callable arg

* improve docs, add test

* Update docs/source/en/generation_strategies.md

Co-authored-by: Joao Gante <[email protected]>

---------

Co-authored-by: Joao Gante <[email protected]>
* Add initial collated reports script and job definition

* provide commit hash for this run. Also use hash in generated artifact name. Json formatting

* tidy

* Add option to upload collated reports to hf hub

* Add glob pattern for test report folders

* Fix glob

* Use machine_type as path filter instead of glob. Include machine_type in collated report
…uggingface#40127)

* handle case where EOS token is None in gen config

* update eli5 dataset
* use pil_torch_interpolation_mapping for NEAREST/NEAREST_EXACT

* fix min torchvision version

* use InterpolationMode directly

* remove unused is_torchvision_greater_or_equal,

* nit
…gface#39519)

* docs: ko: processors.md

* feat: nmt draft

* fix: manual edits

* Update docs/source/ko/main_classes/processors.md

Co-authored-by: Ahnjj_DEV <[email protected]>

* Update docs/source/ko/main_classes/processors.md

Co-authored-by: Ahnjj_DEV <[email protected]>

---------

Co-authored-by: TaskerJang <[email protected]>
Co-authored-by: Ahnjj_DEV <[email protected]>
* docs: ko: jamba.md

* feat: nmt draft

* fix: manual edits

* fix: resolve suggestion

Co-authored-by: Minseo Kim <[email protected]>

---------

Co-authored-by: Minseo Kim <[email protected]>
huggingface#39713)

* docs: ko: main_classes/optimizer_schedules

* feat: nmt draft

* fix: improve TOC anchors and expressions in optimizer_schedules

- Add TOC anchors to all section headers
- Fix terminology and improve Korean expressions

* fix: Correct translation of 'weight decay fixed' to '가중치 감쇠가 적용된'

Changed '가중치 감쇠가 수정된' to '가중치 감쇠가 적용된' for more accurate translation of 'weight decay fixed' in the context of optimization.

* fix: Use more natural Korean inheritance expression

Changed '에서 상속받는' to '을 상속받는' to follow natural Korean grammar patterns for inheritance terminology.

* fix: Use consistent '미세 조정' translation for 'finetuned models'

Changed '파인튜닝된' to '미세 조정된 모델' to follow the established translation glossary for 'finetuned models' terminology.
molbap and others added 28 commits August 23, 2025 01:18
* make visualizer rely on create causal mask

* format

* fixup

* fixup

* read token

* read token, duh

* what is up with that token

* small tests?

* adjust

* try with flush

* normalize for ANSI

* buffer shenanigans
…ForSequenceClassification (huggingface#35991)

* [ModernBert] Prevent the attention mask from being None in ModernBertForSequenceClassification

* fix the modular conversion
* Clean up xcodec addition.

* Clean up config.

* Switch to fixtures test.

* Small stuff.

* Polish XCodec and standardize across codecs.

* Update src/transformers/models/xcodec/modeling_xcodec.py

Co-authored-by: Anton Vlasjuk <[email protected]>

* Format and fix test.

* Update tol.

---------

Co-authored-by: Anton Vlasjuk <[email protected]>
* add cors warnings

* Update src/transformers/commands/serving.py

Co-authored-by: Quentin Gallouédec <[email protected]>

* Update src/transformers/commands/serving.py

Co-authored-by: Arthur <[email protected]>

* Apply suggestions from code review

* make fixup

---------

Co-authored-by: Quentin Gallouédec <[email protected]>
Co-authored-by: Arthur <[email protected]>
…nal embeddings (huggingface#40300)

fix: use consistent dtype for sine positional embeddings
* fix

* cleanup, revert aimv2 fa changes

* fix aria

* i searched a long time but the cross dependency is for the recent models so...

* this was something... evolla

* fix modernbert decoder + make fa test more robust

* nit
…m dec layers (huggingface#40277)

* handle support for cache classes when num enc layers != num dec layers

* handle overwrites

* one more corner case

* Update src/transformers/generation/utils.py

* Update src/transformers/generation/utils.py

* Apply suggestions from code review

* handle corner case :o
* more docs to device agnostic

Signed-off-by: YAO Matrix <[email protected]>

* more

Signed-off-by: YAO Matrix <[email protected]>

* 1

Signed-off-by: YAO Matrix <[email protected]>

* 2

Signed-off-by: YAO Matrix <[email protected]>

* Update vitpose.md

* Update camembert.md

* Update camembert.md

---------

Signed-off-by: YAO Matrix <[email protected]>
…iningArguments (huggingface#40353)

* Update trainer.md

* Update trainer.md

Removed the detail about label_names argument usage from the tip/ warning section

* Update training_args.py

Added the label_names usage clarification in the docstring

* Update trainer.md

---------

Co-authored-by: Steven Liu <[email protected]>
* merge opensource_hunyuan

* add head_dim

* fix assertion error

* fix seen_tokens

* ready_for_upstream (merge request !17)

Squash merge branch 'ready_for_upstream' into 'main'

* fix configuration type&docstring
* fix style

* ready_for_upstream (merge request !18)

Squash merge branch 'ready_for_upstream' into 'main'
* add doc
* fix testcode
* fix configuration type&docstring

* rename base model

* remove assert

* update

* remove tiktoken

* update

* fix moe and code style (huggingface#3)

* update

* fix format

* update

* revert makefile

* fix moe config

* fix numel()

* remove prepare_inputs_for_generation

* fix kv_seq_len

* add docs/toctree

* remove unused paramter&add licence

* add licence

* remove unused paramter

* fix code

* dense modular

update import

fix

fix

use mistralmodel

fix qknorm

add sliding_window

make style

fix

dense done

hunyuan moe

fix import

fix modular

fixup

fixup

* update model path

* fix mlp_bias

* fix modular

* Fix modeling (huggingface#5)

* fix attention

* use llamamodel

* fix code

* Fix qk (huggingface#6)

* fix qk_norm

* fix

* fix modual

* Fix moe (huggingface#7)

* fix some moe code

* fix einsum

* try top1

* use top1

* Fix rotary (huggingface#8)

* fix rotary

* fix modeling

* fix modular

* fix testcode

* remove A13B unit test

* Fix moe v1 (huggingface#9)

fix moe & gate

* Fix gate norm (huggingface#10)

* add norm_topk_prob

* Fix testcase (huggingface#11)

* fix&skip test

* Fix testcase (huggingface#12)


* skip testcase

* Fix norm topk (huggingface#13)

* hardcode norm_topk_prob

* fix testcase

---------

Co-authored-by: pridejcyang <[email protected]>
Co-authored-by: Mingji Han <[email protected]>
fix idefics3 vision embeddings

Signed-off-by: Isotr0py <[email protected]>
* Changed datasets to avoid a datasets error

* Changed back split to test
…eration pipelines (huggingface#40356)

* add support to skip_special_tokens in pipelines

* add test

* rm redundant
)

* update everywhere

* style

* pipelines

* switch it everywhere in tests

* switch it everywhere in docs

* switch in converters everywhere

* update in examples

* update in model docstrings

* style

* warnings

* style

* Update configuration_utils.py

* fix

* Update configuration_utils.py

* fixes and add first test

* add pipeline tests

* Update test_pipelines_common.py

* add config test

* Update test_modeling_common.py

* add new ones

* post rebase

* add new

* post rebase adds
* move commonalities to mixin

* revert - unrelated

* fix copies

* fix style

* comments
* Add GptOssForTokenClassification for GPT-OSS models

* After run make fixup
…e#40352)

* bug fix - return_lse dynamically set

* addressed compatibility with return type - flex_attention_forward

* rename variables

* revert changes to commits
* draft commit

* draft commit

* Fixup chat_extras too

* Update conversations.md

* Update the toctree and titles

* Update the writing guide!

* Use @zucchini-nlp's suggestion

* Update docs/source/en/conversations.md

Co-authored-by: Steven Liu <[email protected]>

* Update docs/source/en/conversations.md

Co-authored-by: Steven Liu <[email protected]>

* Update docs/source/en/conversations.md

Co-authored-by: Steven Liu <[email protected]>

* Apply suggestions from code review

Co-authored-by: Steven Liu <[email protected]>

* Apply suggestions from code review

Co-authored-by: Steven Liu <[email protected]>

* Apply suggestions from code review

Co-authored-by: Steven Liu <[email protected]>

* Apply suggestions from code review

Co-authored-by: Steven Liu <[email protected]>

* Apply suggestions from code review

Co-authored-by: Steven Liu <[email protected]>

---------

Co-authored-by: Steven Liu <[email protected]>
* start working the doc

* remove gemma2

* review
* HF papers

* clean

* Update src/transformers/models/gemma3n/configuration_gemma3n.py

Co-authored-by: Steven Liu <[email protected]>

* style

---------

Co-authored-by: Steven Liu <[email protected]>
@burcgokden
Copy link
Author

Hi @Rocketknight1, a working initial commit for the PLDR-LLM model was pushed together with documentation and tests for reference and acknowledgement of availability of this model. The custom model approach along with trust_remote_code=True works reasonably well at the moment. We intend to develop on this branch as needed to keep up with updates in the transformers library. I'll close this pull request in a couple of days. Thank you.

@burcgokden burcgokden closed this Sep 1, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.