The FP8 version of GLM-4.7-Flash (and maybe llm-compressor support)

### System Info / 系統信息

CUDA: 12.8
Transformers: 5.0.0.dev0 (nightly) for running GLM-4.7-Flash, 4.57.3 for running llm-compressor
llm-compressor: 0.9.0
Python: 3.13
OS: Ubuntu

### Who can help? / 谁可以帮助到您？

@zRzRzRzRzRzRzR 

### Information / 问题信息

- [ ] The official example scripts / 官方的示例脚本
- [x] My own modified scripts / 我自己修改的脚本和任务

### Reproduction / 复现过程

1. Using llm-compressor to quantize the model to FP8 with transformers==4.57.3 (llm-compressor does not support transformers==4.57.3)

The code:
```python
import os
from transformers import AutoProcessor, AutoModelForCausalLM 

from llmcompressor import oneshot
from llmcompressor.modifiers.quantization import QuantizationModifier

# NOTE: Requires a minimum of transformers 4.57.0

MODEL_ID = "zai-org/GLM-4.7-Flash"

# Load model.
model = AutoModelForCausalLM.from_pretrained(MODEL_ID, dtype="auto")
processor = AutoProcessor.from_pretrained(MODEL_ID)

# Configure the quantization algorithm and scheme.
# In this case, we:
#   * quantize the weights to fp8 with channel-wise quantization
#   * quantize the activations to fp8 with dynamic token activations
# NOTE: only datafree quantization is supported for Qwen3-VL-MoE currently
recipe = QuantizationModifier(
    targets="Linear",
    scheme="FP8_DYNAMIC",
    ignore=[
        "lm_head"
    ],
)

# Apply quantization.
oneshot(model=model, recipe=recipe)

# Save to disk in compressed-tensors format.
SAVE_DIR = SAVE_DIR = MODEL_ID.split("/")[1] + "-FP8-Dynamic"
model.save_pretrained(SAVE_DIR)
processor.save_pretrained(SAVE_DIR)
```

The traceback:
```python
Traceback (most recent call last):
  File "/data/models/vllm_glm/glm_4.7_flash_fp8.py", line 4, in <module>
    from llmcompressor import oneshot
  File "/data/models/vllm_glm/.venv/lib/python3.13/site-packages/llmcompressor/__init__.py", line 23, in <module>
    from llmcompressor.core.session_functions import (
    ...<4 lines>...
    )
  File "/data/models/vllm_glm/.venv/lib/python3.13/site-packages/llmcompressor/core/__init__.py", line 10, in <module>
    from llmcompressor.core.lifecycle import CompressionLifecycle
  File "/data/models/vllm_glm/.venv/lib/python3.13/site-packages/llmcompressor/core/lifecycle.py", line 14, in <module>
    from llmcompressor.core.state import State
  File "/data/models/vllm_glm/.venv/lib/python3.13/site-packages/llmcompressor/core/state.py", line 14, in <module>
    from llmcompressor.metrics import BaseLogger, LoggerManager
  File "/data/models/vllm_glm/.venv/lib/python3.13/site-packages/llmcompressor/metrics/__init__.py", line 12, in <module>
    from .logger import *
  File "/data/models/vllm_glm/.venv/lib/python3.13/site-packages/llmcompressor/metrics/logger.py", line 24, in <module>
    from llmcompressor.utils import is_package_available
  File "/data/models/vllm_glm/.venv/lib/python3.13/site-packages/llmcompressor/utils/__init__.py", line 8, in <module>
    from .dev import *
  File "/data/models/vllm_glm/.venv/lib/python3.13/site-packages/llmcompressor/utils/dev.py", line 14, in <module>
    from transformers.modeling_utils import TORCH_INIT_FUNCTIONS
ImportError: cannot import name 'TORCH_INIT_FUNCTIONS' from 'transformers.modeling_utils' (/data/models/vllm_glm/.venv/lib/python3.13/site-packages/transformers/modeling_utils.py)
```

### Expected behavior / 期待表现

1. The [GLM-4.7-Flash commit](https://github.com/huggingface/transformers/pull/43031#issue-3760347782) is for the transformer v5. However, llm-compressor only support transformer<5.0. Thus, there is any workaround to use llm-compressor to quantize the model to FP8 (maybe a new commit to transformers 4.57.6).
2. Or is there any plan for releasing the FP8 version of GLM-4.7-Flash.

Thanks for the great model. 🫡

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

The FP8 version of GLM-4.7-Flash (and maybe llm-compressor support) #129

System Info / 系統信息

Who can help? / 谁可以帮助到您？

Information / 问题信息

Reproduction / 复现过程

Expected behavior / 期待表现

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

The FP8 version of GLM-4.7-Flash (and maybe llm-compressor support) #129

Description

System Info / 系統信息

Who can help? / 谁可以帮助到您？

Information / 问题信息

Reproduction / 复现过程

Expected behavior / 期待表现

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions