Commit 3544a0e
authored
[Distributed] [model_free_ptq] Eliminate reindexing step via fine-grained parallelized partial reads (#2498)
## Purpose
Eliminates the `reindex_fused_weights` preprocessing step for microscale
schemes (NVFP4, MXFP4) by enabling each shard to be processed
independently
with full parallelism, even when fused weight sets (q/k/v, gate/up) span
multiple shards.
## Approach
Instead of grouping shards together (which reduces parallelism), each
shard
process fetches only the specific fused partner tensors it needs from
other
shards via targeted partial safetensors reads, computes the fused global
scale locally, and writes only its own output shard. No cross-process
coordination or file locking required.
## Changes
### `helpers.py`
Added `build_tensor_file_index()` — reads `index.json` once at startup
and
builds a flat mapping of `tensor_name → resolved_file_path`. This gives
each
worker process an O(1) lookup to find which file contains any fused
partner
tensor, without re-scanning headers at runtime.
### `process.py`
Updated `process_file_microscale_scheme()` with an optional
`tensor_file_index` parameter. When provided:
- `_fetch_fused_partners()` is called to identify any fused set members
missing from the current shard, then fetches only those specific tensors
via partial safetensors reads (headers + target tensors only, not full
files)
- Fused global scale is computed locally using all members of the fused
set
- `_belongs_to_shard()` ensures only native tensors are written to the
output
shard — fetched partner tensors are used for scale computation only and
never written to the wrong shard
### `__init__.py`
Simplified back to one job per shard — full parallelism restored. For
microscale schemes, builds the `tensor_file_index` once from
`index.json`
and passes it to each job. No union-find, no grouping logic needed.
### `validate.py`
Removed `NotImplementedError` for cross-shard fused weights — the case
is
now handled natively. Replaced with `logger.debug` noting that partner
tensors will be resolved via partial reads.
## Latest Updates: Eliminate reindexing step via inverse_weights_map
with unified job signatures
## Approach
Each shard job receives a precomputed `inverse_weights_map` specifying
exactly
which tensors to load from which files. For cross-shard fused weights,
only the
shard owning the **primary** tensor (q_proj, gate_proj) fetches its
partners —
preventing double reads. All jobs share a unified signature for both
standard
and microscale schemes.
## Changes
### `microscale.py`
- Refactor `DEFAULT_FUSED_MAPPINGS` from a list of lists to
`{primary_pattern: [partner_templates]}` — only the primary-owning shard
fetches its partners, preventing double reads for cross-shard fused
weights
- Move `build_inverse_weights_map()` here — uses regex match on primary
patterns to construct partner names and locate them in other shards
### `process.py`
- **Unified signature** for `validate_file`, `process_file`, and
`process_file_microscale_scheme`:
`(inverse_weights_map, save_path, scheme, ignore, device, converter)`
- All functions use `safe_open` + `f.get_tensor()` for true partial
reads
- Partner tensors re-saved into requesting shard's output; caller
updates
safetensors index to reflect new locations
### `__init__.py`
- Single `_get_weights_map()` helper handles both single-file and
multi-file
models (reads `safetensors.index.json` or scans file headers via
`safe_open`)
- Single `_build_quantization_jobs()` replaces separate
standard/microscale
builders — one job per shard with identical tuple structure for both
- Validate jobs use `*job[1:]` for full future-proofing
### `helpers.py`
- Removed `build_weights_map` and `build_inverse_weights_map` (moved to
`microscale.py`)
### `validate.py`
- Removed `NotImplementedError` for cross-shard fused weights — handled
natively
- Updated to reflect `inverse_weights_map`-based approach
## Testing
- `pytest tests/llmcompressor/entrypoints/model_free/` — all passing
locally
- `make style && make quality` — all checks pass
Signed-off-by: David Zheng <dqzheng1996@gmail.com>
Closes #2497
Related to #2448
Signed-off-by: David Zheng <dqzheng1996@gmail.com>1 parent 31e585c commit 3544a0e
File tree
6 files changed
+664
-77
lines changed- src/llmcompressor/entrypoints/model_free
- tests/llmcompressor/entrypoints/model_free
6 files changed
+664
-77
lines changed| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
| 1 | + | |
1 | 2 | | |
2 | 3 | | |
3 | 4 | | |
| |||
12 | 13 | | |
13 | 14 | | |
14 | 15 | | |
15 | | - | |
16 | 16 | | |
17 | 17 | | |
18 | 18 | | |
19 | | - | |
| 19 | + | |
| 20 | + | |
| 21 | + | |
| 22 | + | |
20 | 23 | | |
| 24 | + | |
21 | 25 | | |
22 | 26 | | |
23 | 27 | | |
| |||
46 | 50 | | |
47 | 51 | | |
48 | 52 | | |
49 | | - | |
50 | | - | |
| 53 | + | |
| 54 | + | |
| 55 | + | |
| 56 | + | |
| 57 | + | |
| 58 | + | |
| 59 | + | |
| 60 | + | |
| 61 | + | |
51 | 62 | | |
52 | 63 | | |
| 64 | + | |
53 | 65 | | |
54 | | - | |
55 | | - | |
| 66 | + | |
| 67 | + | |
56 | 68 | | |
57 | 69 | | |
58 | | - | |
59 | | - | |
60 | | - | |
61 | | - | |
| 70 | + | |
| 71 | + | |
62 | 72 | | |
63 | 73 | | |
64 | 74 | | |
65 | 75 | | |
66 | 76 | | |
67 | 77 | | |
68 | 78 | | |
69 | | - | |
70 | | - | |
71 | | - | |
72 | | - | |
73 | | - | |
74 | | - | |
75 | | - | |
| 79 | + | |
76 | 80 | | |
77 | | - | |
78 | | - | |
79 | | - | |
80 | | - | |
81 | | - | |
82 | | - | |
83 | | - | |
84 | | - | |
| 81 | + | |
| 82 | + | |
85 | 83 | | |
86 | 84 | | |
87 | 85 | | |
88 | | - | |
| 86 | + | |
89 | 87 | | |
90 | 88 | | |
91 | | - | |
92 | | - | |
93 | | - | |
94 | | - | |
| 89 | + | |
| 90 | + | |
| 91 | + | |
| 92 | + | |
| 93 | + | |
| 94 | + | |
| 95 | + | |
| 96 | + | |
| 97 | + | |
| 98 | + | |
| 99 | + | |
| 100 | + | |
| 101 | + | |
95 | 102 | | |
96 | 103 | | |
97 | 104 | | |
| |||
101 | 108 | | |
102 | 109 | | |
103 | 110 | | |
104 | | - | |
| 111 | + | |
| 112 | + | |
| 113 | + | |
105 | 114 | | |
106 | | - | |
| 115 | + | |
| 116 | + | |
| 117 | + | |
| 118 | + | |
| 119 | + | |
| 120 | + | |
| 121 | + | |
| 122 | + | |
| 123 | + | |
| 124 | + | |
| 125 | + | |
| 126 | + | |
| 127 | + | |
| 128 | + | |
| 129 | + | |
| 130 | + | |
| 131 | + | |
| 132 | + | |
| 133 | + | |
| 134 | + | |
| 135 | + | |
| 136 | + | |
| 137 | + | |
| 138 | + | |
| 139 | + | |
| 140 | + | |
| 141 | + | |
| 142 | + | |
| 143 | + | |
| 144 | + | |
| 145 | + | |
| 146 | + | |
| 147 | + | |
| 148 | + | |
| 149 | + | |
| 150 | + | |
| 151 | + | |
| 152 | + | |
| 153 | + | |
| 154 | + | |
| 155 | + | |
| 156 | + | |
| 157 | + | |
| 158 | + | |
| 159 | + | |
| 160 | + | |
| 161 | + | |
| 162 | + | |
| 163 | + | |
| 164 | + | |
| 165 | + | |
| 166 | + | |
| 167 | + | |
| 168 | + | |
| 169 | + | |
| 170 | + | |
| 171 | + | |
| 172 | + | |
| 173 | + | |
| 174 | + | |
| 175 | + | |
| 176 | + | |
| 177 | + | |
| 178 | + | |
| 179 | + | |
| 180 | + | |
| 181 | + | |
| 182 | + | |
| 183 | + | |
| 184 | + | |
| 185 | + | |
| 186 | + | |
| 187 | + | |
| 188 | + | |
| 189 | + | |
| 190 | + | |
| 191 | + | |
| 192 | + | |
| 193 | + | |
| 194 | + | |
| 195 | + | |
| 196 | + | |
| 197 | + | |
| 198 | + | |
| 199 | + | |
| 200 | + | |
| 201 | + | |
| 202 | + | |
| 203 | + | |
| 204 | + | |
| 205 | + | |
| 206 | + | |
| 207 | + | |
| 208 | + | |
| 209 | + | |
| 210 | + | |
| 211 | + | |
| 212 | + | |
| 213 | + | |
| 214 | + | |
| 215 | + | |
| 216 | + | |
| 217 | + | |
| 218 | + | |
| 219 | + | |
| 220 | + | |
| 221 | + | |
| 222 | + | |
| 223 | + | |
| 224 | + | |
| 225 | + | |
| 226 | + | |
| 227 | + | |
| 228 | + | |
| 229 | + | |
| 230 | + | |
| 231 | + | |
| 232 | + | |
| 233 | + | |
| 234 | + | |
| 235 | + | |
| 236 | + | |
| 237 | + | |
| 238 | + | |
| 239 | + | |
| 240 | + | |
| 241 | + | |
| 242 | + | |
| 243 | + | |
| 244 | + | |
| 245 | + | |
| 246 | + | |
| 247 | + | |
| 248 | + | |
| 249 | + | |
| 250 | + | |
| 251 | + | |
| 252 | + | |
| 253 | + | |
| 254 | + | |
| 255 | + | |
| 256 | + | |
| 257 | + | |
| 258 | + | |
| 259 | + | |
| 260 | + | |
| 261 | + | |
| 262 | + | |
| 263 | + | |
| 264 | + | |
| 265 | + | |
| 266 | + | |
| 267 | + | |
| 268 | + | |
| 269 | + | |
| 270 | + | |
| 271 | + | |
| 272 | + | |
| 273 | + | |
| 274 | + | |
| 275 | + | |
| 276 | + | |
| 277 | + | |
| 278 | + | |
| 279 | + | |
| 280 | + | |
| 281 | + | |
| 282 | + | |
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
96 | 96 | | |
97 | 97 | | |
98 | 98 | | |
| 99 | + | |
| 100 | + | |
| 101 | + | |
| 102 | + | |
| 103 | + | |
| 104 | + | |
| 105 | + | |
| 106 | + | |
| 107 | + | |
| 108 | + | |
| 109 | + | |
| 110 | + | |
| 111 | + | |
| 112 | + | |
| 113 | + | |
| 114 | + | |
| 115 | + | |
| 116 | + | |
| 117 | + | |
| 118 | + | |
0 commit comments