Skip to content

Commit 94d8a0f

Browse files
authored
Merge branch 'main' into 40001
2 parents 7e230e5 + 849c377 commit 94d8a0f

File tree

144 files changed

+2067
-1122
lines changed

Some content is hidden

Large Commits have some content hidden by default. Use the searchbox below for content that may be hidden.

144 files changed

+2067
-1122
lines changed

docs/source/en/_toctree.yml

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -1048,7 +1048,7 @@
10481048
- local: model_doc/llama4
10491049
title: Llama4
10501050
- local: model_doc/llava
1051-
title: Llava
1051+
title: LLaVA
10521052
- local: model_doc/llava_next
10531053
title: LLaVA-NeXT
10541054
- local: model_doc/llava_next_video

docs/source/en/generation_strategies.md

Lines changed: 26 additions & 21 deletions
Original file line numberDiff line numberDiff line change
@@ -315,37 +315,37 @@ tokenizer.batch_decode(outputs, skip_special_tokens=True)
315315
```
316316

317317

318-
## Custom decoding methods
318+
## Custom generation methods
319319

320-
Custom decoding methods enable specialized generation behavior such as the following:
320+
Custom generation methods enable specialized behavior such as:
321321
- have the model continue thinking if it is uncertain;
322322
- roll back generation if the model gets stuck;
323323
- handle special tokens with custom logic;
324-
- enhanced input preparation for advanced models;
324+
- use specialized KV caches;
325325

326-
We enable custom decoding methods through model repositories, assuming a specific model tag and file structure (see subsection below). This feature is an extension of [custom modeling code](./models.md#custom-models) and, like such, requires setting `trust_remote_code=True`.
326+
We enable custom generation methods through model repositories, assuming a specific model tag and file structure (see subsection below). This feature is an extension of [custom modeling code](./models.md#custom-models) and, like such, requires setting `trust_remote_code=True`.
327327

328-
If a model repository holds a custom decoding method, the easiest way to try it out is to load the model and generate with it:
328+
If a model repository holds a custom generation method, the easiest way to try it out is to load the model and generate with it:
329329

330330
```py
331331
from transformers import AutoModelForCausalLM, AutoTokenizer
332332

333333
# `transformers-community/custom_generate_example` holds a copy of `Qwen/Qwen2.5-0.5B-Instruct`, but
334-
# with custom generation code -> calling `generate` uses the custom decoding method!
334+
# with custom generation code -> calling `generate` uses the custom generation method!
335335
tokenizer = AutoTokenizer.from_pretrained("transformers-community/custom_generate_example")
336336
model = AutoModelForCausalLM.from_pretrained(
337337
"transformers-community/custom_generate_example", device_map="auto", trust_remote_code=True
338338
)
339339

340340
inputs = tokenizer(["The quick brown"], return_tensors="pt").to(model.device)
341-
# The custom decoding method is a minimal greedy decoding implementation. It also prints a custom message at run time.
341+
# The custom generation method is a minimal greedy decoding implementation. It also prints a custom message at run time.
342342
gen_out = model.generate(**inputs)
343343
# you should now see its custom message, "✨ using a custom generation method ✨"
344344
print(tokenizer.batch_decode(gen_out, skip_special_tokens=True))
345345
'The quick brown fox jumps over a lazy dog, and the dog is a type of animal. Is'
346346
```
347347

348-
Model repositories with custom decoding methods have a special property: their decoding method can be loaded from **any** model through [`~GenerationMixin.generate`]'s `custom_generate` argument. This means anyone can create and share their custom generation method to potentially work with any Transformers model, without requiring users to install additional Python packages.
348+
Model repositories with custom generation methods have a special property: their generation method can be loaded from **any** model through [`~GenerationMixin.generate`]'s `custom_generate` argument. This means anyone can create and share their custom generation method to potentially work with any Transformers model, without requiring users to install additional Python packages.
349349

350350
```py
351351
from transformers import AutoModelForCausalLM, AutoTokenizer
@@ -354,7 +354,7 @@ tokenizer = AutoTokenizer.from_pretrained("Qwen/Qwen2.5-0.5B-Instruct")
354354
model = AutoModelForCausalLM.from_pretrained("Qwen/Qwen2.5-0.5B-Instruct", device_map="auto")
355355

356356
inputs = tokenizer(["The quick brown"], return_tensors="pt").to(model.device)
357-
# `custom_generate` replaces the original `generate` by the custom decoding method defined in
357+
# `custom_generate` replaces the original `generate` by the custom generation method defined in
358358
# `transformers-community/custom_generate_example`
359359
gen_out = model.generate(**inputs, custom_generate="transformers-community/custom_generate_example", trust_remote_code=True)
360360
print(tokenizer.batch_decode(gen_out, skip_special_tokens=True)[0])
@@ -364,7 +364,7 @@ print(tokenizer.batch_decode(gen_out, skip_special_tokens=True)[0])
364364
You should read the `README.md` file of the repository containing the custom generation strategy to see what the new arguments and output type differences are, if they exist. Otherwise, you can assume it works like the base [`~GenerationMixin.generate`] method.
365365

366366
> [!TIP]
367-
> You can find all custom decoding methods by [searching for their custom tag.](https://huggingface.co/models?other=custom_generate), `custom_generate`
367+
> You can find all custom generation methods by [searching for their custom tag.](https://huggingface.co/models?other=custom_generate), `custom_generate`.
368368
369369
Consider the Hub repository [transformers-community/custom_generate_example](https://huggingface.co/transformers-community/custom_generate_example) as an example. The `README.md` states that it has an additional input argument, `left_padding`, which adds a number of padding tokens before the prompt.
370370

@@ -387,11 +387,11 @@ torch>=99.0 (installed: 2.6.0)
387387

388388
Updating your Python requirements accordingly will remove this error message.
389389

390-
### Creating a custom decoding method
390+
### Creating a custom generation method
391391

392-
To create a new decoding method, you need to create a new [**Model**](https://huggingface.co/new) repository and push a few files into it.
393-
1. The model you've designed your decoding method with.
394-
2. `custom_generate/generate.py`, which contains all the logic for your custom decoding method.
392+
To create a new generation method, you need to create a new [**Model**](https://huggingface.co/new) repository and push a few files into it.
393+
1. The model you've designed your generation method with.
394+
2. `custom_generate/generate.py`, which contains all the logic for your custom generation method.
395395
3. `custom_generate/requirements.txt`, used to optionally add new Python requirements and/or lock specific versions to correctly use your method.
396396
4. `README.md`, where you should add the `custom_generate` tag and document any new arguments or output type differences of your custom method here.
397397

@@ -409,7 +409,7 @@ your_repo/
409409

410410
#### Adding the base model
411411

412-
The starting point for your custom decoding method is a model repository just like any other. The model to add to this repository should be the model you've designed your method with, and it is meant to be part of a working self-contained model-generate pair. When the model in this repository is loaded, your custom decoding method will override `generate`. Don't worry -- your decoding method can still be loaded with any other Transformers model, as explained in the section above.
412+
The starting point for your custom generation method is a model repository just like any other. The model to add to this repository should be the model you've designed your method with, and it is meant to be part of a working self-contained model-generate pair. When the model in this repository is loaded, your custom generation method will override `generate`. Don't worry -- your generation method can still be loaded with any other Transformers model, as explained in the section above.
413413

414414
If you simply want to copy an existing model, you can do
415415

@@ -418,13 +418,13 @@ from transformers import AutoModelForCausalLM, AutoTokenizer
418418

419419
tokenizer = AutoTokenizer.from_pretrained("source/model_repo")
420420
model = AutoModelForCausalLM.from_pretrained("source/model_repo")
421-
tokenizer.save_pretrained("your/decoding_method", push_to_hub=True)
422-
model.save_pretrained("your/decoding_method", push_to_hub=True)
421+
tokenizer.save_pretrained("your/generation_method", push_to_hub=True)
422+
model.save_pretrained("your/generation_method", push_to_hub=True)
423423
```
424424

425425
#### generate.py
426426

427-
This is the core of your decoding method. It *must* contain a method named `generate`, and this method *must* contain a `model` argument as its first argument. `model` is the model instance, which means you have access to all attributes and methods in the model, including the ones defined in [`GenerationMixin`] (like the base `generate` method).
427+
This is the core of your generation method. It *must* contain a method named `generate`, and this method *must* contain a `model` argument as its first argument. `model` is the model instance, which means you have access to all attributes and methods in the model, including the ones defined in [`GenerationMixin`] (like the base `generate` method).
428428

429429
> [!WARNING]
430430
> `generate.py` must be placed in a folder named `custom_generate`, and not at the root level of the repository. The file paths for this feature are hardcoded.
@@ -465,7 +465,7 @@ def generate(model, input_ids, generation_config=None, left_padding=None, **kwar
465465
return input_ids
466466
```
467467

468-
Follow the recommended practices below to ensure your custom decoding method works as expected.
468+
Follow the recommended practices below to ensure your custom generation method works as expected.
469469
- Feel free to reuse the logic for validation and input preparation in the original [`~GenerationMixin.generate`].
470470
- Pin the `transformers` version in the requirements if you use any private method/attribute in `model`.
471471
- Consider adding model validation, input validation, or even a separate test file to help users sanity-check your code in their environment.
@@ -476,7 +476,7 @@ Your custom `generate` method can relative import code from the `custom_generate
476476
from .utils import some_function
477477
```
478478

479-
Only relative imports from the same-level `custom_generate` folder are supported. Parent/sibling folder imports are not valid. The `custom_generate` argument also works locally with any directory that contains a `custom_generate` structure. This is the recommended workflow for developing your custom decoding method.
479+
Only relative imports from the same-level `custom_generate` folder are supported. Parent/sibling folder imports are not valid. The `custom_generate` argument also works locally with any directory that contains a `custom_generate` structure. This is the recommended workflow for developing your custom generation method.
480480

481481

482482
#### requirements.txt
@@ -485,7 +485,7 @@ You can optionally specify additional Python requirements in a `requirements.txt
485485

486486
#### README.md
487487

488-
The root level `README.md` in the model repository usually describes the model therein. However, since the focus of the repository is the custom decoding method, we highly recommend to shift its focus towards describing the custom decoding method. In addition to a description of the method, we recommend documenting any input and/or output differences to the original [`~GenerationMixin.generate`]. This way, users can focus on what's new, and rely on Transformers docs for generic implementation details.
488+
The root level `README.md` in the model repository usually describes the model therein. However, since the focus of the repository is the custom generation method, we highly recommend to shift its focus towards describing the custom generation method. In addition to a description of the method, we recommend documenting any input and/or output differences to the original [`~GenerationMixin.generate`]. This way, users can focus on what's new, and rely on Transformers docs for generic implementation details.
489489

490490
For discoverability, we highly recommend you to add the `custom_generate` tag to your repository. To do so, the top of your `README.md` file should look like the example below. After you push the file, you should see the tag in your repository!
491491

@@ -504,6 +504,11 @@ Recommended practices:
504504
- Add self-contained examples to enable quick experimentation.
505505
- Describe soft-requirements such as if the method only works well with a certain family of models.
506506

507+
### Finding custom generation methods
508+
509+
You can find all custom generation methods by [searching for their custom tag.](https://huggingface.co/models?other=custom_generate), `custom_generate`. In addition to the tag, we curate two collections of `custom_generate` methods:
510+
- [Custom generation methods - Community](https://huggingface.co/collections/transformers-community/custom-generation-methods-community-6888fb1da0efbc592d3a8ab6) -- a collection of powerful methods contributed by the community;
511+
- [Custom generation methods - Tutorials](https://huggingface.co/collections/transformers-community/custom-generation-methods-tutorials-6823589657a94940ea02cfec) -- a collection of reference implementations for methods that previously were part of `transformers`, as well as tutorials for `custom_generate`.
507512

508513
## Resources
509514

docs/source/en/model_doc/glm4_moe.md

Lines changed: 15 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -18,7 +18,21 @@ rendered properly in your Markdown viewer.
1818

1919
## Overview
2020

21-
This will update After model release.
21+
The [**GLM-4.5**](https://arxiv.org/abs/2508.06471) series models are foundation models designed for intelligent agents, MoE variants are documented here as Glm4Moe.
22+
23+
GLM-4.5 has **355** billion total parameters with **32** billion active parameters, while GLM-4.5-Air adopts a more compact design with **106** billion total parameters and **12** billion active parameters. GLM-4.5 models unify reasoning, coding, and intelligent agent capabilities to meet the complex demands of intelligent agent applications.
24+
25+
Both GLM-4.5 and GLM-4.5-Air are hybrid reasoning models that provide two modes: thinking mode for complex reasoning and tool usage, and non-thinking mode for immediate responses.
26+
27+
We have open-sourced the base models, hybrid reasoning models, and FP8 versions of the hybrid reasoning models for both GLM-4.5 and GLM-4.5-Air. They are released under the MIT open-source license and can be used commercially and for secondary development.
28+
29+
As demonstrated in our comprehensive evaluation across 12 industry-standard benchmarks, GLM-4.5 achieves exceptional performance with a score of **63.2**, in the **3rd** place among all the proprietary and open-source models. Notably, GLM-4.5-Air delivers competitive results at **59.8** while maintaining superior efficiency.
30+
31+
![bench](https://raw.githubusercontent.com/zai-org/GLM-4.5/refs/heads/main/resources/bench.png)
32+
33+
For more eval results, show cases, and technical details, please visit our [technical report](https://arxiv.org/abs/2508.06471) or [technical blog](https://z.ai/blog/glm-4.5).
34+
35+
The model code, tool parser and reasoning parser can be found in the implementation of [transformers](https://github.com/huggingface/transformers/tree/main/src/transformers/models/glm4_moe), [vLLM](https://github.com/vllm-project/vllm/blob/main/vllm/model_executor/models/glm4_moe_mtp.py) and [SGLang](https://github.com/sgl-project/sglang/blob/main/python/sglang/srt/models/glm4_moe.py).
2236

2337
## Glm4MoeConfig
2438

docs/source/en/model_doc/glm4v_moe.md

Lines changed: 11 additions & 9 deletions
Original file line numberDiff line numberDiff line change
@@ -25,20 +25,22 @@ rendered properly in your Markdown viewer.
2525

2626
## Overview
2727

28-
The Glm4vMoe model was proposed in [<INSERT PAPER NAME HERE>](<INSERT PAPER LINK HERE>) by <INSERT AUTHORS HERE>.
29-
<INSERT SHORT SUMMARY HERE>
28+
Vision-language models (VLMs) have become a key cornerstone of intelligent systems. As real-world AI tasks grow increasingly complex, VLMs urgently need to enhance reasoning capabilities beyond basic multimodal perception — improving accuracy, comprehensiveness, and intelligence — to enable complex problem solving, long-context understanding, and multimodal agents.
3029

31-
The abstract from the paper is the following:
30+
Through our open-source work, we aim to explore the technological frontier together with the community while empowering more developers to create exciting and innovative applications.
3231

33-
*<INSERT PAPER ABSTRACT HERE>*
32+
[GLM-4.5V](https://github.com/zai-org/GLM-V) is based on ZhipuAI’s next-generation flagship text foundation model GLM-4.5-Air (106B parameters, 12B active). It continues the technical approach of [GLM-4.1V-Thinking](https://arxiv.org/abs/2507.01006), achieving SOTA performance among models of the same scale on 42 public vision-language benchmarks. It covers common tasks such as image, video, and document understanding, as well as GUI agent operations.
3433

35-
Tips:
34+
![bench_45](https://raw.githubusercontent.com/zai-org/GLM-V/refs/heads/main/resources/bench_45v.jpeg)
3635

37-
<INSERT TIPS ABOUT MODEL HERE>
38-
39-
This model was contributed by [INSERT YOUR HF USERNAME HERE](https://huggingface.co/<INSERT YOUR HF USERNAME HERE>).
40-
The original code can be found [here](<INSERT LINK TO GITHUB REPO HERE>).
36+
Beyond benchmark performance, GLM-4.5V focuses on real-world usability. Through efficient hybrid training, it can handle diverse types of visual content, enabling full-spectrum vision reasoning, including:
37+
- **Image reasoning** (scene understanding, complex multi-image analysis, spatial recognition)
38+
- **Video understanding** (long video segmentation and event recognition)
39+
- **GUI tasks** (screen reading, icon recognition, desktop operation assistance)
40+
- **Complex chart & long document parsing** (research report analysis, information extraction)
41+
- **Grounding** (precise visual element localization)
4142

43+
The model also introduces a **Thinking Mode** switch, allowing users to balance between quick responses and deep reasoning. This switch works the same as in the `GLM-4.5` language model.
4244

4345
## Glm4vMoeConfig
4446

docs/source/en/model_doc/llava_onevision.md

Lines changed: 8 additions & 11 deletions
Original file line numberDiff line numberDiff line change
@@ -38,7 +38,7 @@ yielding new emerging capabilities. In particular, strong video understanding an
3838
cross-scenario capabilities are demonstrated through task transfer from images to
3939
videos.*
4040

41-
<img src="https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/transformers/model_doc/llava-ov-acrhitecture.png"
41+
<img src="https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/transformers/model_doc/llava-ov-architecture.png"
4242
alt="drawing" width="600"/>
4343

4444
<small> LLaVA-OneVision architecture. Taken from the <a href="https://huggingface.co/papers/2408.03326">original paper.</a> </small>
@@ -165,20 +165,20 @@ conversation_1 = [
165165
"content": [
166166
{"type": "image", "url": "https://www.ilankelman.org/stopsigns/australia.jpg"},
167167
{"type": "text", "text": "What is shown in this image?"},
168-
],
168+
],
169169
},
170170
{
171171
"role": "assistant",
172172
"content": [
173173
{"type": "text", "text": "There is a red stop sign in the image."},
174-
],
174+
],
175175
},
176176
{
177177
"role": "user",
178178
"content": [
179179
{"type": "image", "url": "http://images.cocodataset.org/val2017/000000039769.jpg"},
180180
{"type": "text", "text": "What about this image? How many cats do you see?"},
181-
],
181+
],
182182
},
183183
]
184184

@@ -188,7 +188,7 @@ conversation_2 = [
188188
"content": [
189189
{"type": "image", "url": "https://huggingface.co/microsoft/kosmos-2-patch14-224/resolve/main/snowman.jpg"},
190190
{"type": "text", "text": "What is shown in this image?"},
191-
],
191+
],
192192
},
193193
]
194194

@@ -198,13 +198,14 @@ inputs = processor.apply_chat_template(
198198
tokenize=True,
199199
return_dict=True,
200200
padding=True,
201-
return_tensors="pt"
201+
padding_side="left",
202+
return_tensors="pt",
202203
).to(model.device, torch.float16)
203204

204205
# Generate
205206
generate_ids = model.generate(**inputs, max_new_tokens=30)
206207
processor.batch_decode(generate_ids, skip_special_tokens=True, clean_up_tokenization_spaces=False)
207-
['user\n\nWhat is shown in this image?\nassistant\nThere is a red stop sign in the image.\nuser\n\nWhat about this image? How many cats do you see?\nassistant\ntwo', 'user\n\nWhat is shown in this image?\nassistant\n']
208+
['user\n\nWhat is shown in this image?\nassistant\nThere is a red stop sign in the image.\nuser\n\nWhat about this image? How many cats do you see?\nassistant\ntwo', 'user\n\nWhat is shown in this image?\nassistant\nThe image shows a whimsical scene of a snowman sitting by a campfire. The snowman is anthropomorphized, wearing a hat and']
208209
```
209210

210211
### Video inference
@@ -312,10 +313,6 @@ model = LlavaOnevisionForConditionalGeneration.from_pretrained(
312313

313314
[[autodoc]] LlavaOnevisionVideoProcessor
314315

315-
## LlavaOnevisionVideoProcessor
316-
317-
[[autodoc]] LlavaOnevisionVideoProcessor
318-
319316
## LlavaOnevisionModel
320317

321318
[[autodoc]] LlavaOnevisionModel

0 commit comments

Comments
 (0)