Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
12 changes: 6 additions & 6 deletions docs/source/en/_toctree.yml
Original file line number Diff line number Diff line change
Expand Up @@ -516,13 +516,15 @@
- local: model_doc/gemma2
title: Gemma2
- local: model_doc/glm
title: GLM
title: GLM-4
- local: model_doc/glm4
title: glm4
title: GLM-4-0414
- local: model_doc/glm4_moe
title: glm4_moe
title: GLM-4.5, GLM-4.6, GLM-4.7
- local: model_doc/glm4_moe_lite
title: GLM-4.7-Flash
- local: model_doc/glm_image
title: GlmImage
title: GLM-Image
- local: model_doc/openai-gpt
title: GPT
- local: model_doc/gpt_neo
Expand Down Expand Up @@ -743,8 +745,6 @@
title: XLNet
- local: model_doc/xlstm
title: xLSTM
- local: model_doc/glm4_moe_lite
title: y
- local: model_doc/yoso
title: YOSO
- local: model_doc/zamba
Expand Down
2 changes: 1 addition & 1 deletion docs/source/en/model_doc/glm.md
Original file line number Diff line number Diff line change
Expand Up @@ -15,7 +15,7 @@ rendered properly in your Markdown viewer.
-->
*This model was released on 2024-06-18 and added to Hugging Face Transformers on 2024-10-18.*

# GLM
# GLM-4

<div class="flex flex-wrap space-x-1">
<img alt="PyTorch" src="https://img.shields.io/badge/PyTorch-DE3412?style=flat&logo=pytorch&logoColor=white">
Expand Down
4 changes: 2 additions & 2 deletions docs/source/en/model_doc/glm4.md
Original file line number Diff line number Diff line change
@@ -1,4 +1,4 @@
<!--Copyright 2025 The GLM & ZhipuAI team and The HuggingFace Team. All rights reserved.
<!--Copyright 2025 The ZhipuAI Inc. and The HuggingFace Inc. team. All rights reserved.

Licensed under the Apache License, Version 2.0 (the "License"); you may not use this file except in compliance with
the License. You may obtain a copy of the License at
Expand All @@ -15,7 +15,7 @@ rendered properly in your Markdown viewer.
-->
*This model was released on 2024-06-18 and added to Hugging Face Transformers on 2025-04-09.*

# Glm4
# GLM-4-0414

## Overview

Expand Down
36 changes: 29 additions & 7 deletions docs/source/en/model_doc/glm4_moe.md
Original file line number Diff line number Diff line change
Expand Up @@ -15,11 +15,37 @@ rendered properly in your Markdown viewer.
-->
*This model was released on 2025-07-28 and added to Hugging Face Transformers on 2025-07-21.*

# Glm4Moe
# GLM-4.5, GLM-4.6, GLM-4.7

## Overview

Both **GLM-4.6** and **GLM-4.5** language model use this class. The implementation in transformers does not include an MTP layer.
**GLM-4.7**, **GLM-4.6** and **GLM-4.5** language model use this class. The implementation in transformers does not include an MTP layer.

### GLM-4.7

**GLM-4.7**, your new coding partner, is coming with the following features:

- **Core Coding**: GLM-4.7 brings clear gains, compared to its predecessor GLM-4.6, in multilingual agentic coding and terminal-based tasks, including (73.8%, +5.8%) on SWE-bench, (66.7%, +12.9%) on SWE-bench Multilingual, and (41%, +16.5%) on Terminal Bench 2.0. GLM-4.7 also supports thinking before acting, with significant improvements on complex tasks in mainstream agent frameworks such as Claude Code, Kilo Code, Cline, and Roo Code.
- **Vibe Coding**: GLM-4.7 takes a big step forward in improving UI quality. It produces cleaner, more modern webpages and generates better-looking slides with more accurate layout and sizing.
- **Tool Using**: GLM-4.7 achieves significantly improvements in Tool using. Significant better performances can be seen on benchmarks such as τ^2-Bench and on web browsing via BrowseComp.
- **Complex Reasoning**: GLM-4.7 delivers a substantial boost in mathematical and reasoning capabilities, achieving (42.8%, +12.4%) on the HLE (Humanity’s Last Exam) benchmark compared to GLM-4.6.

More general, one would also witness significant improvements in many other scenarios such as chat, creative writing, and role-play scenario.

![bench](https://raw.githubusercontent.com/zai-org/GLM-4.5/refs/heads/main/resources/bench_glm47.png)

**Interleaved Thinking & Preserved Thinking**

![thinking](https://raw.githubusercontent.com/zai-org/GLM-4.5/refs/heads/main/resources/thinking.png)

GLM-4.7 further enhances **Interleaved Thinking** (a feature introduced since GLM-4.5) and introduces **Preserved Thinking** and **Turn-level Thinking**. By thinking between actions and staying consistent across turns, it makes complex tasks more stable and more controllable:
- **Interleaved Thinking**: The model thinks before every response and tool calling, improving instruction following and the quality of generation.
- **Preserved Thinking**: In coding agent scenarios, the model automatically retains all thinking blocks across multi-turn conversations, reusing the existing reasoning instead of re-deriving from scratch. This reduces information loss and inconsistencies, and is well-suited for long-horizon, complex tasks.
- **Turn-level Thinking**: The model supports per-turn control over reasoning within a session—disable thinking for lightweight requests to reduce latency/cost, enable it for complex tasks to improve accuracy and stability.

More details: https://docs.z.ai/guides/capabilities/thinking-mode

For more eval results, show cases, and technical details, please visit [GLM-4.7 technical blog](https://z.ai/blog/glm-4.7).

### GLM-4.6

Expand All @@ -33,9 +59,7 @@ Compared with GLM-4.5, **GLM-4.6** brings several key improvements:

We evaluated GLM-4.6 across eight public benchmarks covering agents, reasoning, and coding. Results show clear gains over GLM-4.5, with GLM-4.6 also holding competitive advantages over leading domestic and international models such as **DeepSeek-V3.1-Terminus** and **Claude Sonnet 4**.

![bench](https://raw.githubusercontent.com/zai-org/GLM-4.5/refs/heads/main/resources/bench_glm46.png)

For more eval results, show cases, and technical details, please visit our [technical blog](https://z.ai/blog/glm-4.6).
For more eval results, show cases, and technical details, please visit [GLM-4.6 technical blog](https://z.ai/blog/glm-4.6).

### GLM-4.5

Expand All @@ -49,8 +73,6 @@ We have open-sourced the base models, hybrid reasoning models, and FP8 versions

As demonstrated in our comprehensive evaluation across 12 industry-standard benchmarks, GLM-4.5 achieves exceptional performance with a score of **63.2**, in the **3rd** place among all the proprietary and open-source models. Notably, GLM-4.5-Air delivers competitive results at **59.8** while maintaining superior efficiency.

![bench](https://raw.githubusercontent.com/zai-org/GLM-4.5/refs/heads/main/resources/bench.png)

For more eval results, show cases, and technical details, please visit our [technical report](https://huggingface.co/papers/2508.06471) or [technical blog](https://z.ai/blog/glm-4.5).

The model code, tool parser and reasoning parser can be found in the implementation of [transformers](https://github.com/huggingface/transformers/tree/main/src/transformers/models/glm4_moe), [vLLM](https://github.com/vllm-project/vllm/blob/main/vllm/model_executor/models/glm4_moe_mtp.py) and [SGLang](https://github.com/sgl-project/sglang/blob/main/python/sglang/srt/models/glm4_moe.py).
Expand Down
43 changes: 13 additions & 30 deletions docs/source/en/model_doc/glm4_moe_lite.md
Original file line number Diff line number Diff line change
@@ -1,45 +1,28 @@
<!--Copyright 2025 the HuggingFace Team. All rights reserved.
<!--Copyright 2025 The ZhipuAI Inc. and The HuggingFace Inc. team. All rights reserved.
Licensed under the Apache License, Version 2.0 (the "License");
you may not use this file except in compliance with the License.
You may obtain a copy of the License at
Licensed under the Apache License, Version 2.0 (the "License"); you may not use this file except in compliance with
the License. You may obtain a copy of the License at
http://www.apache.org/licenses/LICENSE-2.0
http://www.apache.org/licenses/LICENSE-2.0
Unless required by applicable law or agreed to in writing, software
distributed under the License is distributed on an "AS IS" BASIS,
WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
See the License for the specific language governing permissions and
limitations under the License.
Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on
an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the
specific language governing permissions and limitations under the License.
⚠️ Note that this file is in Markdown but contain specific syntax for our doc-builder (similar to MDX) that may not be rendered properly in your Markdown viewer.
⚠️ Note that this file is in Markdown but contain specific syntax for our doc-builder (similar to MDX) that may not be
rendered properly in your Markdown viewer.
-->
*This model was released on {release_date} and added to Hugging Face Transformers on 2025-12-24.*
*This model was released on 2026-01-18 and added to Hugging Face Transformers on 2026-01-13.*


# y
# GLM-4.7-Flash

## Overview

The y model was proposed in [<INSERT PAPER NAME HERE>](<INSERT PAPER LINK HERE>) by <INSERT AUTHORS HERE>.
<INSERT SHORT SUMMARY HERE>

The abstract from the paper is the following:

<INSERT PAPER ABSTRACT HERE>

Tips:

<INSERT TIPS ABOUT MODEL HERE>

This model was contributed by [INSERT YOUR HF USERNAME HERE](https://huggingface.co/<INSERT YOUR HF USERNAME HERE>).
The original code can be found [here](<INSERT LINK TO GITHUB REPO HERE>).

## Usage examples
GLM-4.7-Flash offers a new option for lightweight deployment that balances performance and efficiency.

<INSERT SOME NICE EXAMPLES HERE>
![bench](https://raw.githubusercontent.com/zai-org/GLM-4.5/refs/heads/main/resources/bench_glm47_flash.png)

## Glm4MoeLiteConfig

Expand Down
6 changes: 3 additions & 3 deletions tests/models/glm4_moe_lite/test_modeling_glm4_moe_lite.py
Original file line number Diff line number Diff line change
Expand Up @@ -63,7 +63,7 @@ class Glm4MoeModelTest(CausalLMModelTest, unittest.TestCase):
model_split_percents = [0.5, 0.7, 0.8]

def _check_past_key_values_for_generate(self, batch_size, past_key_values, seq_length, config):
"""Needs to be overridden as GLM-Lite has special MLA cache format (though we don't really use the MLA)"""
"""Needs to be overridden as GLM-4.7-Flash has special MLA cache format (though we don't really use the MLA)"""
self.assertIsInstance(past_key_values, Cache)

# (batch, head, seq_length, head_features)
Expand Down Expand Up @@ -103,9 +103,9 @@ def test_compile_static_cache(self):
]

prompts = ["[gMASK]<sop>hello", "[gMASK]<sop>tell me"]
tokenizer = AutoTokenizer.from_pretrained("zai-org/GLM-4.5")
tokenizer = AutoTokenizer.from_pretrained("zai-org/GLM-4.7-Flash")
model = Glm4MoeLiteForCausalLM.from_pretrained(
"zai-org/GLM-Lite", device_map=torch_device, dtype=torch.bfloat16
"zai-org/GLM-4.7-Flash", device_map=torch_device, dtype=torch.bfloat16
)
inputs = tokenizer(prompts, return_tensors="pt", padding=True).to(model.device)

Expand Down