huggingface · vasqu · Jan 19, 2026 · Jan 16, 2026 · Jan 16, 2026 · Jan 16, 2026
diff --git a/docs/source/en/_toctree.yml b/docs/source/en/_toctree.yml
@@ -516,13 +516,15 @@
       - local: model_doc/gemma2
         title: Gemma2
       - local: model_doc/glm
-        title: GLM
+        title: GLM-4
       - local: model_doc/glm4
-        title: glm4
+        title: GLM-4-0414
       - local: model_doc/glm4_moe
-        title: glm4_moe
+        title: GLM-4.5, GLM-4.6, GLM-4.7
+      - local: model_doc/glm4_moe_lite
+        title: GLM-4.7-Flash
       - local: model_doc/glm_image
-        title: GlmImage
+        title: GLM-Image
       - local: model_doc/openai-gpt
         title: GPT
       - local: model_doc/gpt_neo
@@ -743,8 +745,6 @@
         title: XLNet
       - local: model_doc/xlstm
         title: xLSTM
-      - local: model_doc/glm4_moe_lite
-        title: y
       - local: model_doc/yoso
         title: YOSO
       - local: model_doc/zamba

diff --git a/docs/source/en/model_doc/glm.md b/docs/source/en/model_doc/glm.md
@@ -15,7 +15,7 @@ rendered properly in your Markdown viewer.
 -->
 *This model was released on 2024-06-18 and added to Hugging Face Transformers on 2024-10-18.*
 
-# GLM
+# GLM-4
 
 <div class="flex flex-wrap space-x-1">
 <img alt="PyTorch" src="https://img.shields.io/badge/PyTorch-DE3412?style=flat&logo=pytorch&logoColor=white">

diff --git a/docs/source/en/model_doc/glm4.md b/docs/source/en/model_doc/glm4.md
@@ -1,4 +1,4 @@
-<!--Copyright 2025 The GLM & ZhipuAI team and The HuggingFace Team. All rights reserved.
+<!--Copyright 2025 The ZhipuAI Inc. and The HuggingFace Inc. team. All rights reserved.
 
 Licensed under the Apache License, Version 2.0 (the "License"); you may not use this file except in compliance with
 the License. You may obtain a copy of the License at
@@ -15,7 +15,7 @@ rendered properly in your Markdown viewer.
 -->
 *This model was released on 2024-06-18 and added to Hugging Face Transformers on 2025-04-09.*
 
-# Glm4
+# GLM-4-0414
 
 ## Overview
 

diff --git a/docs/source/en/model_doc/glm4_moe.md b/docs/source/en/model_doc/glm4_moe.md
@@ -15,11 +15,37 @@ rendered properly in your Markdown viewer.
 -->
 *This model was released on 2025-07-28 and added to Hugging Face Transformers on 2025-07-21.*
 
-# Glm4Moe
+# GLM-4.5, GLM-4.6, GLM-4.7
 
 ## Overview
 
-Both **GLM-4.6** and **GLM-4.5** language model use this class. The implementation in transformers does not include an MTP layer.
+**GLM-4.7**, **GLM-4.6** and **GLM-4.5** language model use this class. The implementation in transformers does not include an MTP layer.
+
+### GLM-4.7
+
+**GLM-4.7**, your new coding partner, is coming with the following features:
+
+- **Core Coding**: GLM-4.7 brings clear gains, compared to its predecessor GLM-4.6, in multilingual agentic coding and terminal-based tasks, including (73.8%, +5.8%) on SWE-bench, (66.7%, +12.9%) on SWE-bench Multilingual, and (41%, +16.5%) on Terminal Bench 2.0. GLM-4.7 also supports thinking before acting, with significant improvements on complex tasks in mainstream agent frameworks such as Claude Code, Kilo Code, Cline, and Roo Code.
+- **Vibe Coding**: GLM-4.7 takes a big step forward in improving UI quality. It produces cleaner, more modern webpages and generates better-looking slides with more accurate layout and sizing.
+- **Tool Using**: GLM-4.7 achieves significantly improvements in Tool using. Significant better performances can be seen on benchmarks such as τ^2-Bench and on web browsing via BrowseComp.
+- **Complex Reasoning**: GLM-4.7 delivers a substantial boost in mathematical and reasoning capabilities, achieving (42.8%, +12.4%) on the HLE (Humanity’s Last Exam) benchmark compared to GLM-4.6.
+
+More general, one would also witness significant improvements in many other scenarios such as chat, creative writing, and role-play scenario.
+
+![bench](https://raw.githubusercontent.com/zai-org/GLM-4.5/refs/heads/main/resources/bench_glm47.png)
+
+**Interleaved Thinking & Preserved Thinking**
+
+![thinking](https://raw.githubusercontent.com/zai-org/GLM-4.5/refs/heads/main/resources/thinking.png)
+
+GLM-4.7 further enhances **Interleaved Thinking** (a feature introduced since GLM-4.5) and introduces **Preserved Thinking** and **Turn-level Thinking**. By thinking between actions and staying consistent across turns, it makes complex tasks more stable and more controllable:
+- **Interleaved Thinking**: The model thinks before every response and tool calling, improving instruction following and the quality of generation.
+- **Preserved Thinking**: In coding agent scenarios, the model automatically retains all thinking blocks across multi-turn conversations, reusing the existing reasoning instead of re-deriving from scratch. This reduces information loss and inconsistencies, and is well-suited for long-horizon, complex tasks.
+- **Turn-level Thinking**: The model supports per-turn control over reasoning within a session—disable thinking for lightweight requests to reduce latency/cost, enable it for complex tasks to improve accuracy and stability.
+
+More details: https://docs.z.ai/guides/capabilities/thinking-mode
+
+For more eval results, show cases, and technical details, please visit [GLM-4.7 technical blog](https://z.ai/blog/glm-4.7).
 
 ### GLM-4.6
 
@@ -33,9 +59,7 @@ Compared with GLM-4.5, **GLM-4.6**  brings several key improvements:
 
 We evaluated GLM-4.6 across eight public benchmarks covering agents, reasoning, and coding. Results show clear gains over GLM-4.5, with GLM-4.6 also holding competitive advantages over leading domestic and international models such as **DeepSeek-V3.1-Terminus** and **Claude Sonnet 4**.
 
-![bench](https://raw.githubusercontent.com/zai-org/GLM-4.5/refs/heads/main/resources/bench_glm46.png)
-
-For more eval results, show cases, and technical details, please visit our [technical blog](https://z.ai/blog/glm-4.6).
+For more eval results, show cases, and technical details, please visit [GLM-4.6 technical blog](https://z.ai/blog/glm-4.6).
 
 ### GLM-4.5
 
@@ -49,8 +73,6 @@ We have open-sourced the base models, hybrid reasoning models, and FP8 versions
 
 As demonstrated in our comprehensive evaluation across 12 industry-standard benchmarks, GLM-4.5 achieves exceptional performance with a score of **63.2**, in the **3rd** place among all the proprietary and open-source models. Notably, GLM-4.5-Air delivers competitive results at **59.8** while maintaining superior efficiency.
 
-![bench](https://raw.githubusercontent.com/zai-org/GLM-4.5/refs/heads/main/resources/bench.png)
-
 For more eval results, show cases, and technical details, please visit our [technical report](https://huggingface.co/papers/2508.06471) or [technical blog](https://z.ai/blog/glm-4.5).
 
 The model code, tool parser and reasoning parser can be found in the implementation of [transformers](https://github.com/huggingface/transformers/tree/main/src/transformers/models/glm4_moe), [vLLM](https://github.com/vllm-project/vllm/blob/main/vllm/model_executor/models/glm4_moe_mtp.py) and [SGLang](https://github.com/sgl-project/sglang/blob/main/python/sglang/srt/models/glm4_moe.py).

diff --git a/docs/source/en/model_doc/glm4_moe_lite.md b/docs/source/en/model_doc/glm4_moe_lite.md
@@ -1,45 +1,28 @@
-<!--Copyright 2025 the HuggingFace Team. All rights reserved.
+<!--Copyright 2025 The ZhipuAI Inc. and The HuggingFace Inc. team. All rights reserved.
 
-Licensed under the Apache License, Version 2.0 (the "License");
-you may not use this file except in compliance with the License.
-You may obtain a copy of the License at
+Licensed under the Apache License, Version 2.0 (the "License"); you may not use this file except in compliance with
+the License. You may obtain a copy of the License at
 
-    http://www.apache.org/licenses/LICENSE-2.0
+http://www.apache.org/licenses/LICENSE-2.0
 
-Unless required by applicable law or agreed to in writing, software
-distributed under the License is distributed on an "AS IS" BASIS,
-WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
-See the License for the specific language governing permissions and
-limitations under the License.
+Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on
+an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the
+specific language governing permissions and limitations under the License.
 
-
-⚠️ Note that this file is in Markdown but contain specific syntax for our doc-builder (similar to MDX) that may not be rendered properly in your Markdown viewer.
+⚠️ Note that this file is in Markdown but contain specific syntax for our doc-builder (similar to MDX) that may not be
+rendered properly in your Markdown viewer.
 
 -->
-*This model was released on {release_date} and added to Hugging Face Transformers on 2025-12-24.*
+*This model was released on 2026-01-18 and added to Hugging Face Transformers on 2026-01-13.*
 
 
-# y
+# GLM-4.7-Flash
 
 ## Overview
 
-The y model was proposed in [<INSERT PAPER NAME HERE>](<INSERT PAPER LINK HERE>) by <INSERT AUTHORS HERE>.
-<INSERT SHORT SUMMARY HERE>
-
-The abstract from the paper is the following:
-
-<INSERT PAPER ABSTRACT HERE>
-
-Tips:
-
-<INSERT TIPS ABOUT MODEL HERE>
-
-This model was contributed by [INSERT YOUR HF USERNAME HERE](https://huggingface.co/<INSERT YOUR HF USERNAME HERE>).
-The original code can be found [here](<INSERT LINK TO GITHUB REPO HERE>).
-
-## Usage examples
+GLM-4.7-Flash offers a new option for lightweight deployment that balances performance and efficiency.
 
-<INSERT SOME NICE EXAMPLES HERE>
+![bench](https://raw.githubusercontent.com/zai-org/GLM-4.5/refs/heads/main/resources/bench_glm47_flash.png)
 
 ## Glm4MoeLiteConfig
 

diff --git a/tests/models/glm4_moe_lite/test_modeling_glm4_moe_lite.py b/tests/models/glm4_moe_lite/test_modeling_glm4_moe_lite.py
@@ -63,7 +63,7 @@ class Glm4MoeModelTest(CausalLMModelTest, unittest.TestCase):
     model_split_percents = [0.5, 0.7, 0.8]
 
     def _check_past_key_values_for_generate(self, batch_size, past_key_values, seq_length, config):
-        """Needs to be overridden as GLM-Lite has special MLA cache format (though we don't really use the MLA)"""
+        """Needs to be overridden as GLM-4.7-Flash has special MLA cache format (though we don't really use the MLA)"""
         self.assertIsInstance(past_key_values, Cache)
 
         # (batch, head, seq_length, head_features)
@@ -103,9 +103,9 @@ def test_compile_static_cache(self):
         ]
 
         prompts = ["[gMASK]<sop>hello", "[gMASK]<sop>tell me"]
-        tokenizer = AutoTokenizer.from_pretrained("zai-org/GLM-4.5")
+        tokenizer = AutoTokenizer.from_pretrained("zai-org/GLM-4.7-Flash")
         model = Glm4MoeLiteForCausalLM.from_pretrained(
-            "zai-org/GLM-Lite", device_map=torch_device, dtype=torch.bfloat16
+            "zai-org/GLM-4.7-Flash", device_map=torch_device, dtype=torch.bfloat16
         )
         inputs = tokenizer(prompts, return_tensors="pt", padding=True).to(model.device)
-Original file line number
+Diff line change
@@ Expand Up / @@ -15,7 +15,7 @@ rendered properly in your Markdown viewer. @@
     -->
     *This model was released on 2024-06-18 and added to Hugging Face Transformers on 2024-10-18.*
-    # GLM
+    # GLM-4
     <div class="flex flex-wrap space-x-1">
     <img alt="PyTorch" src="https://img.shields.io/badge/PyTorch-DE3412?style=flat&logo=pytorch&logoColor=white">
@@ Expand Down @@