[Transform] Serialize transforms config #412

kylesayrs · 2025-08-01T23:39:30Z

Purpose

Enable saving models with applied transforms
- Transform config encodes both online and offline (fused) rotations

config.json

{
  "architectures": [
    "LlamaForCausalLM"
  ],
  "attention_bias": false,
  "attention_dropout": 0.0,
  "bos_token_id": 128000,
  "eos_token_id": 128009,
  "head_dim": 128,
  "hidden_act": "silu",
  "hidden_size": 4096,
  "initializer_range": 0.02,
  "intermediate_size": 14336,
  "max_position_embeddings": 8192,
  "mlp_bias": false,
  "model_type": "llama",
  "num_attention_heads": 32,
  "num_hidden_layers": 32,
  "num_key_value_heads": 8,
  "pretraining_tp": 1,
  "quantization_config": {
    "config_groups": {
      "group_0": {
        "input_activations": null,
        "output_activations": null,
        "targets": [
          "Linear"
        ],
        "weights": {
          "actorder": null,
          "block_structure": null,
          "dynamic": false,
          "group_size": 128,
          "num_bits": 4,
          "observer": "minmax",
          "observer_kwargs": {},
          "strategy": "group",
          "symmetric": true,
          "type": "int"
        }
      }
    },
    "global_compression_ratio": null,
    "ignore": [
      "lm_head"
    ],
    "kv_cache_scheme": null,
    "quant_method": "compressed-tensors",
    "quantization_status": "compressed",
    "sparsity_config": {},
    "transform_config": {
      "config_groups": {
        "u": {
          "apply": [
            {
              "ignore": [
                "lm_head"
              ],
              "inverse": false,
              "location": "weight_output",
              "targets": [
                "Linear"
              ]
            },
            {
              "ignore": [
                "lm_head"
              ],
              "inverse": true,
              "location": "output",
              "targets": [
                "Linear"
              ]
            }
          ],
          "head_dim": null,
          "randomize": false,
          "requires_grad": false,
          "type": "random-hadamard"
        },
        "v": {
          "apply": [
            {
              "ignore": [
                "lm_head"
              ],
              "inverse": false,
              "location": "input",
              "targets": [
                "Linear"
              ]
            },
            {
              "ignore": [
                "lm_head"
              ],
              "inverse": true,
              "location": "weight_input",
              "targets": [
                "Linear"
              ]
            }
          ],
          "head_dim": null,
          "randomize": false,
          "requires_grad": false,
          "type": "random-hadamard"
        }
      }
    },
    "version": "0.10.3.dev146+ga3cd59d"
  },
  "rms_norm_eps": 1e-05,
  "rope_scaling": null,
  "rope_theta": 500000.0,
  "tie_word_embeddings": false,
  "torch_dtype": "bfloat16",
  "transformers_version": "4.55.0.dev0",
  "use_cache": true,
  "vocab_size": 128256
}

Prerequisites

[Transform] Serialize with tied weights #370

Changes

Implement transform_config similar to sparsity config, as a subconfig of the quantization config
- This aligns with HF's pattern of treating a "quantization config" as a compression or optimization config
Transform config is passed to serialization by attaching the transform config to the model when it is applied to the model
Refactor ModelCompressor.update_config to support writing q/s/t configs

Follow ups

Some work will need to be done if we want to support users passing with CompressedTensorsConfig
Right now there are 3 ways we pass configs. Some work could be done to consolidate these methods [WIP] Refactor serialization of qconfig #410
- qconfig is reconstructed from attached schemes
- sconfig is inferred from the model in LC and passed as an argument
- qconfig is attached to the model directly

Testing

Sample model is located at https://huggingface.co/nm-testing/Meta-Llama-3-8B-Instruct-quip-w4a16

Signed-off-by: Kyle Sayers <[email protected]>

…tory

Signed-off-by: Kyle Sayers <[email protected]>

…s/transform_factory

…s/transform_permutations

…ermutations

Signed-off-by: Kyle Sayers <[email protected]>

…tory

Signed-off-by: Kyle Sayers <[email protected]>

…tory

Signed-off-by: Kyle Sayers <[email protected]>

…tory

Signed-off-by: Kyle Sayers <[email protected]>

…ermutations

Signed-off-by: Kyle Sayers <[email protected]>

… Compression Params (#407) * add compression param; update qdq for batch greater than 1 * make generic * fix tests * remove incorrect line change; make generic * update

Signed-off-by: Kyle Sayers <[email protected]>

brian-dellabetta

changes to transform config look good to me, so approving, but definitely need to confirm with @dsikka and @rahul-tuli the changes to quantization life cycle

src/compressed_tensors/base.py

src/compressed_tensors/compressors/model_compressors/model_compressor.py

src/compressed_tensors/quantization/lifecycle/forward.py

src/compressed_tensors/quantization/utils/helpers.py

The base branch was changed.

dsikka

LGTM but needs rebase

src/compressed_tensors/compressors/model_compressors/model_compressor.py

Signed-off-by: Kyle Sayers <[email protected]>

brian-dellabetta

one nit question, otherwise LGTM

src/compressed_tensors/compressors/model_compressors/model_compressor.py

This reverts commit 0731aa5.

## Purpose ## * Enable offline spinquant-style transforms ## Prerequisites ## * neuralmagic/compressed-tensors#370 * neuralmagic/compressed-tensors#412 * neuralmagic/compressed-tensors#414 ## Changes ## * Added `spinquant_example.py` to examples folder * Added `SpinQuantModifier` which handles the construction of a spinquant-style transform config ## Testing ## * Added modifier serialization and correctness tests ## Evaluation ## Using this branch, and [the original SpinQuant code](https://github.com/facebookresearch/SpinQuant), we see very similar results for `meta-llama/Llama-3.2-1B-Instruct` with W4A16 quantization. Results are equivalent in hf (in-memory vs serialized and re-loaded), and very similar in vllm. The symmetric scales calculation in `llm-compressor` is slightly different than original SpinQuant paper, which uses the original GPTQ implementation. When this is swapped in, results are consistent, with hadamard improving results on `gsm8k_llama` and `arc_challenge_llama`: Scheme | Impl | gsm8k | gsm8k_llama | arc_challenge_llama -- | -- | -- | -- | -- Hadamard+W4A16 | LC | 0.2403 | 0.2835 | 0.5262 W4A16 | LC | 0.1964 | 0.1933 | 0.4781 Hadamard+W4A16 | LC+SQscales | 0.1721 | 0.2183 | 0.485 W4A16 | LC+SQscales | 0.207 | 0.1706 | 0.4498 Hadamard+W4A16 | SQ | 0.1736 | 0.2282 | 0.4807 W4A16 | SQ | 0.1986 | 0.1774 | 0.4489 To run LC+SQScales, change [this line in CT](https://github.com/neuralmagic/compressed-tensors/blob/b2df366797b00330ec765f5891dde14e4cc74c9d/src/compressed_tensors/quantization/utils/helpers.py#L111) from ```python scales = max_val_pos / (float(bit_range) / 2) ``` to ```python scales = max_val_pos / (float(bit_max)) ``` <details> <summary>The following python script was used to generate these results</summary> Clone SpinQuant repo and paste this in the top-level directory: ```python # coding=utf-8 # Copyright (c) Meta Platforms, Inc. and affiliates. # All rights reserved. # # This source code is licensed under the license found in the # LICENSE file in the root directory of this source tree. import torch from typing import Literal import os os.environ["VLLM_WORKER_MULTIPROC_METHOD"] = "spawn" from torch import nn import lm_eval from transformers import LlamaForCausalLM, AutoTokenizer import transformers from train_utils.main import prepare_model from train_utils.modeling_llama_quant import LlamaForCausalLM as LlamaForCausalLMQuant from utils.hadamard_utils import random_hadamard_matrix, hadamard_matrix from utils.process_args import process_args_ptq # model_id = "meta-llama/Llama-3.1-8B-Instruct" # model_id = "meta-llama/Llama-3.2-3B-Instruct" model_id = "meta-llama/Llama-3.2-1B-Instruct" dtype = torch.bfloat16 class RotateModule(nn.Module): def __init__(self, R_init): super(RotateModule, self).__init__() self.weight = nn.Parameter(R_init.to(torch.float32).to(torch.device("cuda"))) def forward(self, x, transpose=False): if transpose: return x @ self.weight else: return self.weight @ x def get_sq_model( r1r2=Literal["eye", "random-hadamard", "hadamard"], w_bits=Literal[4, 16], w_clip: bool = False, ) -> LlamaForCausalLMQuant: model_args, training_args, ptq_args = process_args_ptq() model_args.input_model = model_id if w_bits == 4: ptq_args.w_bits = 4 ptq_args.w_groupsize = 128 ptq_args.w_rtn = True # if False, GPTQ is used ptq_args.w_clip = w_clip ptq_args.a_bits = 16 ptq_args.k_bits = 16 ptq_args.v_bits = 16 print("=======ARGS=======", ptq_args) config = transformers.AutoConfig.from_pretrained(model_args.input_model) # Llama v3.2 specific: Spinquant is not compatiable with tie_word_embeddings, clone lm_head from embed_tokens process_word_embeddings = False if config.tie_word_embeddings: config.tie_word_embeddings = False process_word_embeddings = True model = LlamaForCausalLMQuant.from_pretrained( pretrained_model_name_or_path=model_args.input_model, config=config, torch_dtype=dtype, device_map="cuda", ) if process_word_embeddings: model.lm_head.weight.data = model.model.embed_tokens.weight.data.clone() model = prepare_model(ptq_args, model) for param in model.parameters(): param.requires_grad = False match r1r2: case "eye": R1 = torch.eye(model.config.hidden_size, device="cuda") case "random-hadamard": R1 = random_hadamard_matrix(model.config.hidden_size, "cuda") case _: R1 = hadamard_matrix(model.config.hidden_size, "cuda") model.R1 = RotateModule(R1) for i in range(model.config.num_hidden_layers): # Each head dim = 128 for Llama model match r1r2: case "eye": R2 = torch.eye( model.config.hidden_size // model.config.num_attention_heads, device="cuda", ) case "random-hadamard": R2 = random_hadamard_matrix( model.config.hidden_size // model.config.num_attention_heads, "cuda" ) case _: R2 = hadamard_matrix( model.config.hidden_size // model.config.num_attention_heads, "cuda" ) model.model.layers[i].self_attn.R2 = RotateModule(R2) model.config.use_cache = False return model def get_lc_model( r1r2=Literal["eye", "random-hadamard", "hadamard"], w_bits=Literal[4, 16] ) -> LlamaForCausalLM: from llmcompressor import oneshot from llmcompressor.modifiers.quantization import QuantizationModifier from llmcompressor.modifiers.transform import SpinQuantModifier model = LlamaForCausalLM.from_pretrained( pretrained_model_name_or_path=model_id, torch_dtype=dtype, device_map="cuda", ) recipe = [ SpinQuantModifier( rotations=[] if r1r2 == "eye" else ["R1", "R2"], transform_type="hadamard", ) ] if w_bits == 4: recipe.append( QuantizationModifier( targets="Linear", scheme="W4A16", ignore=["lm_head"], ) ) oneshot( model=model, recipe=recipe, pipeline="datafree", log_dir=None, ) return model if __name__ == "__main__": for scales_impl in ["sq_min_hack", "lc_min_hack"]: for r1r2 in ["eye", "hadamard"]: for sq_lc in ["sq", "lc"]: w_bits = 4 os.environ["SCALES_IMPL"] = scales_impl model = ( get_sq_model(r1r2=r1r2, w_bits=w_bits) if sq_lc == "sq" else get_lc_model(r1r2=r1r2, w_bits=w_bits) ).to("cuda") SAVE_DIR = model_id.split("/")[1] + f"-{scales_impl}-{r1r2}-w4a16" model.save_pretrained(SAVE_DIR, save_compressed=True) tokenizer = AutoTokenizer.from_pretrained( model_id, trust_remote_code=True ) tokenizer.save_pretrained(SAVE_DIR) del model del tokenizer torch.cuda.empty_cache() results = lm_eval.simple_evaluate( # 1) hf in-memory # model=lm_eval.models.huggingface.HFLM( # pretrained=model, # batch_size=32, # add_bos_token=False, # ), # 1/) # 2) vllm serialized model="vllm", model_args={ "pretrained": SAVE_DIR, "add_bos_token": False, "dtype": "auto", "max_model_len": 4096, "gpu_memory_utilization": 0.5, "enable_chunked_prefill": True, }, # 2/) # 3) hf serialized # model="hf", # model_args={ # "pretrained": SAVE_DIR, # "add_bos_token": False, # "dtype": "auto", # }, # device="cuda", # 3/) tasks=["gsm8k_llama", "gsm8k", "arc_challenge_llama"], num_fewshot=8, batch_size=32, apply_chat_template=True, fewshot_as_multiturn=True, ) print( f"RESULTS, {model_id} {sq_lc} R1R2 {r1r2} W_BITS {w_bits} SCALEIMPL {scales_impl}" ) print(lm_eval.utils.make_table(results)) ``` </details> ## Follow Ups ## * Infer data free pipeline, even if a transform modifier is included * Rotations R3 and R4 * Modify example to use GPTQ once basic evaluation has been performed --------- Signed-off-by: Kyle Sayers <[email protected]> Signed-off-by: Brian Dellabetta <[email protected]> Co-authored-by: Kyle Sayers <[email protected]>

@brian-dellabetta

## Purpose ## * Enable quip-style transforms ## Prerequisites ## * neuralmagic/compressed-tensors#370 * neuralmagic/compressed-tensors#412 * neuralmagic/compressed-tensors#414 ## Changes ## * Added `quip_example.py` to examples folder * As made clear in the disclaimer, this example requires minimum versions of compressed-tensors and transformers to run * Added `QuIPModifier` which handles the construction of a quip-style transform config ## Testing ## * Added modifier serialization and correctness tests ## Evaluation ## Evaluation performed by @brian-dellabetta Evals on Llama 3.2 1B with Quip (num_fewshot 8, limit 1000 to be compatible with results [here](https://github.com/vllm-project/llm-compressor/pull/1243/files#diff-bdc27f23c0dc2da352d5c83abdc0f267873edf4d36f88474038b975df75bd8c3R38-R64)) : | Strat | gsm8k,strict | gsm8k_llama,strict | |-|-|-| | FP16 | .352 | .323 | | Quip | .348 | .322 | | W4A16 | .180 | .017 | | Quip+W4A16 | .213 | .141 | ## Follow Ups ## * Infer data free pipeline, even if a transform modifier is included * Modify example to use GPTQ once basic evaluation has been performed --------- Signed-off-by: Kyle Sayers <[email protected]> Signed-off-by: Brian Dellabetta <[email protected]> Co-authored-by: Brian Dellabetta <[email protected]> Co-authored-by: Dipika Sikka <[email protected]>

@dbarbuzzi

* add utilities Signed-off-by: Kyle Sayers <[email protected]> * add tests Signed-off-by: Kyle Sayers <[email protected]> * add additional tests Signed-off-by: Kyle Sayers <[email protected]> * add utils and tests Signed-off-by: Kyle Sayers <[email protected]> * Implement transform factories Signed-off-by: Kyle Sayers <[email protected]> * add permutations Signed-off-by: Kyle Sayers <[email protected]> * add delete_offload_module Signed-off-by: Kyle Sayers <[email protected]> * key inverses by weight Signed-off-by: Kyle Sayers <[email protected]> * fix tests Signed-off-by: Kyle Sayers <[email protected]> * standardize random hadamard Signed-off-by: Kyle Sayers <[email protected]> * prepend input hooks Signed-off-by: Kyle Sayers <[email protected]> * apply sqrt division first Signed-off-by: Kyle Sayers <[email protected]> * use divided hadamards Signed-off-by: Kyle Sayers <[email protected]> * fix typo Signed-off-by: Kyle Sayers <[email protected]> * add random option Signed-off-by: Kyle Sayers <[email protected]> * use random seeds, rename matrix multiply Signed-off-by: Kyle Sayers <[email protected]> * add deterministic generation to random matrix Signed-off-by: Kyle Sayers <[email protected]> * fix perm math Signed-off-by: Kyle Sayers <[email protected]> * update docstrings Signed-off-by: Kyle Sayers <[email protected]> * update docstrings Signed-off-by: Kyle Sayers <[email protected]> * cleanup Signed-off-by: Kyle Sayers <[email protected]> * cleanup 2 Signed-off-by: Kyle Sayers <[email protected]> * make seed optional Signed-off-by: Kyle Sayers <[email protected]> * remove iterable check and missing return value Signed-off-by: Kyle Sayers <[email protected]> * Remove unrelated changes * simplify code Signed-off-by: Kyle Sayers <[email protected]> * implement apply, use in tests Signed-off-by: Kyle Sayers <[email protected]> * use hadamards database file Signed-off-by: Kyle Sayers <[email protected]> * try manifest Signed-off-by: Kyle Sayers <[email protected]> * try setup, update hadamards list Signed-off-by: Kyle Sayers <[email protected]> * fix setup Signed-off-by: Kyle Sayers <[email protected]> * add docstrings, cleanup Signed-off-by: Kyle Sayers <[email protected]> * fix setup, thank you @dbarbuzzi Signed-off-by: Kyle Sayers <[email protected]> * remove numpy, add tests Signed-off-by: Kyle Sayers <[email protected]> * solidify dtype, add gpu tests Signed-off-by: Kyle Sayers <[email protected]> * fix docstring Signed-off-by: Kyle Sayers <[email protected]> * add device option Signed-off-by: Kyle Sayers <[email protected]> * construct on execution device, cache on offload device Signed-off-by: Kyle Sayers <[email protected]> * save construction device changes for later Signed-off-by: Kyle Sayers <[email protected]> * construct on execution device, cache on offload device * cite nja sloane Signed-off-by: Kyle Sayers <[email protected]> * remove dreg Signed-off-by: Kyle Sayers <[email protected]> * put on device via safe_open Signed-off-by: Kyle Sayers <[email protected]> * nits and docstrings Signed-off-by: Kyle Sayers <[email protected]> * update docstring Signed-off-by: Kyle Sayers <[email protected]> * Merge * merge with construct: construct in float32 Signed-off-by: Kyle Sayers <[email protected]> * construct with same dtype, constructing on fp32 found no difference Signed-off-by: Kyle Sayers <[email protected]> * remove unnecessary imports Signed-off-by: Kyle Sayers <[email protected]> * bugfixes (neuralmagic#375) Signed-off-by: Brian Dellabetta <[email protected]> * use factory_kwargs Signed-off-by: Kyle Sayers <[email protected]> * add frozen dict to deps Signed-off-by: Kyle Sayers <[email protected]> * fix style Signed-off-by: Kyle Sayers <[email protected]> * merge Signed-off-by: Kyle Sayers <[email protected]> * use delete_offload_module Signed-off-by: Kyle Sayers <[email protected]> * add docstrign Signed-off-by: Kyle Sayers <[email protected]> * use parametrize Signed-off-by: Kyle Sayers <[email protected]> * populate _dynamic_tied_weights_keys Signed-off-by: Kyle Sayers <[email protected]> * ensure serializable Signed-off-by: Kyle Sayers <[email protected]> * remove extra space Signed-off-by: Kyle Sayers <[email protected]> * apply style Signed-off-by: Kyle Sayers <[email protected]> * merge dregs * skip offloading tests until transformers changes land Signed-off-by: Kyle Sayers <[email protected]> * use set Signed-off-by: Kyle Sayers <[email protected]> * [Quantization][Decompression] Fix QDQ for dynamic quant; Update NVFP4 Compression Params (neuralmagic#407) * add compression param; update qdq for batch greater than 1 * make generic * fix tests * remove incorrect line change; make generic * update * serialize Signed-off-by: Kyle Sayers <[email protected]> * fix typo, comment Signed-off-by: Kyle Sayers <[email protected]> --------- Signed-off-by: Kyle Sayers <[email protected]> Signed-off-by: Brian Dellabetta <[email protected]> Co-authored-by: Brian Dellabetta <[email protected]> Co-authored-by: Dipika Sikka <[email protected]>

…euralmagic#419) This reverts commit 0731aa5.

kylesayrs added 30 commits May 30, 2025 13:40

add utilities

d8a10ec

Signed-off-by: Kyle Sayers <[email protected]>

add tests

d2af054

Signed-off-by: Kyle Sayers <[email protected]>

add additional tests

e32d5b5

Signed-off-by: Kyle Sayers <[email protected]>

add utils and tests

9d0518b

Signed-off-by: Kyle Sayers <[email protected]>

Implement transform factories

8c5a2d9

Signed-off-by: Kyle Sayers <[email protected]>

Merge branch 'kylesayrs/transform_utils' into kylesayrs/transform_fac…

809e367

…tory

add permutations

8d613b3

Signed-off-by: Kyle Sayers <[email protected]>

add delete_offload_module

57d171a

Signed-off-by: Kyle Sayers <[email protected]>

Merge branch 'kylesayrs/transform-accelerate-utilities' into kylesayr…

d77bcef

…s/transform_factory

Merge branch 'kylesayrs/transform-accelerate-utilities' into kylesayr…

ab73b43

…s/transform_permutations

Merge branch 'kylesayrs/transform_factory' into kylesayrs/transform_p…

4b55733

…ermutations

key inverses by weight

aa7d21b

Signed-off-by: Kyle Sayers <[email protected]>

fix tests

6901e02

Signed-off-by: Kyle Sayers <[email protected]>

standardize random hadamard

47ae9fe

Signed-off-by: Kyle Sayers <[email protected]>

Merge branch 'kylesayrs/transform_utils' into kylesayrs/transform_fac…

34f1343

…tory

prepend input hooks

1039100

Signed-off-by: Kyle Sayers <[email protected]>

Merge remote-tracking branch 'origin' into kylesayrs/transform_utils

5677553

apply sqrt division first

68ec14e

Signed-off-by: Kyle Sayers <[email protected]>

Merge branch 'kylesayrs/transform_utils' into kylesayrs/transform_fac…

a62418a

…tory

use divided hadamards

b117523

Signed-off-by: Kyle Sayers <[email protected]>

fix typo

a46f754

Signed-off-by: Kyle Sayers <[email protected]>

add random option

cb1cb52

Signed-off-by: Kyle Sayers <[email protected]>

Merge branch 'kylesayrs/transform_utils' into kylesayrs/transform_fac…

7c02bb2

…tory

use random seeds, rename matrix multiply

02af1e9

Signed-off-by: Kyle Sayers <[email protected]>

add deterministic generation to random matrix

f45f3e9

Signed-off-by: Kyle Sayers <[email protected]>

fix perm math

7a7abdf

Signed-off-by: Kyle Sayers <[email protected]>

update docstrings

6e52894

Signed-off-by: Kyle Sayers <[email protected]>

update docstrings

7230933

Signed-off-by: Kyle Sayers <[email protected]>

Merge branch 'kylesayrs/transform_factory' into kylesayrs/transform_p…

f74fe3e

…ermutations

cleanup

92ddea9

Signed-off-by: Kyle Sayers <[email protected]>

kylesayrs and others added 4 commits July 31, 2025 10:59

Merge remote-tracking branch 'origin' into kylesayrs/transform_save

c45352e

use set

85ae8ba

Signed-off-by: Kyle Sayers <[email protected]>

[Quantization][Decompression] Fix QDQ for dynamic quant; Update NVFP4…

a7b5272

… Compression Params (#407) * add compression param; update qdq for batch greater than 1 * make generic * fix tests * remove incorrect line change; make generic * update

serialize

45fbe5c

Signed-off-by: Kyle Sayers <[email protected]>

kylesayrs changed the base branch from main to kylesayrs/transform_save August 1, 2025 23:39

Merge remote-tracking branch 'origin' into kylesayrs/serialize-tconfig

9d6a127

kylesayrs marked this pull request as ready for review August 1, 2025 23:53

brian-dellabetta previously approved these changes Aug 4, 2025

View reviewed changes

kylesayrs mentioned this pull request Aug 5, 2025

[Transform] QuIP Modifier vllm-project/llm-compressor#1648

Merged

Base automatically changed from kylesayrs/transform_save to main August 7, 2025 01:12

dsikka previously approved these changes Aug 7, 2025

View reviewed changes

src/compressed_tensors/compressors/model_compressors/model_compressor.py Outdated Show resolved Hide resolved

kylesayrs added 2 commits August 7, 2025 00:57

Merge remote-tracking branch 'origin' into kylesayrs/serialize-tconfig

1e412d8

fix typo, comment

bebcc55

Signed-off-by: Kyle Sayers <[email protected]>

kylesayrs dismissed dsikka’s stale review via bebcc55 August 7, 2025 04:59

kylesayrs requested review from dsikka and brian-dellabetta August 7, 2025 04:59

brian-dellabetta approved these changes Aug 7, 2025

View reviewed changes

src/compressed_tensors/compressors/model_compressors/model_compressor.py Show resolved Hide resolved

kylesayrs mentioned this pull request Aug 8, 2025

[Transform] Spinquant with R1 and R2 vllm-project/llm-compressor#1615

Merged

Merge remote-tracking branch 'origin' into kylesayrs/serialize-tconfig

873bda0

dsikka approved these changes Aug 11, 2025

View reviewed changes

dsikka merged commit 0731aa5 into main Aug 11, 2025
1 check passed

dsikka deleted the kylesayrs/serialize-tconfig branch August 11, 2025 18:13

dsikka added a commit that referenced this pull request Aug 12, 2025

Revert "[Transform] Serialize transforms config (#412)"

667310f

This reverts commit 0731aa5.

dsikka restored the kylesayrs/serialize-tconfig branch August 12, 2025 01:34

dsikka added a commit that referenced this pull request Aug 12, 2025

Revert "[Transform] Serialize transforms config (#412)" (#419)

2154c62

This reverts commit 0731aa5.

Etelis added a commit to Etelis/compressed-tensors that referenced this pull request Sep 11, 2025

Revert "[Transform] Serialize transforms config (neuralmagic#412)" (n…

b1889b0

…euralmagic#419) This reverts commit 0731aa5.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[Transform] Serialize transforms config #412

[Transform] Serialize transforms config #412

Uh oh!

kylesayrs commented Aug 1, 2025 •

edited

Loading

Uh oh!

brian-dellabetta left a comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

dsikka left a comment

Uh oh!

Uh oh!

brian-dellabetta left a comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

[Transform] Serialize transforms config #412

[Transform] Serialize transforms config #412

Uh oh!

Conversation

kylesayrs commented Aug 1, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Purpose

Prerequisites

Changes

Follow ups

Testing

Uh oh!

brian-dellabetta left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

dsikka left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

brian-dellabetta left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

kylesayrs commented Aug 1, 2025 •

edited

Loading