Commit fd7d9e6
Fix MLX 1-token output, tracing to stderr, and shallow_clone perf
- Fix ConvSubsampling NHWC flatten order: transpose (T',F,C) → (T',C,F)
before flattening so the linear projection receives features in the same
order as PyTorch's NCHW layout. This was the root cause of the model
generating only 1 token on the MLX backend.
- Direct tracing output to stderr in both CLI and server binaries so
transcript text on stdout is not contaminated by log lines.
- Replace shallow_clone() CPU round-trip (to_vec_f32 + from_data_f32)
with mlx_array_set() for O(1) ref-counted sharing, eliminating the
~75s encoder construction overhead.
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>1 parent 21a2bee commit fd7d9e6
4 files changed
+11
-12
lines changed| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
535 | 535 | | |
536 | 536 | | |
537 | 537 | | |
| 538 | + | |
538 | 539 | | |
539 | 540 | | |
540 | 541 | | |
| |||
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
60 | 60 | | |
61 | 61 | | |
62 | 62 | | |
| 63 | + | |
63 | 64 | | |
64 | 65 | | |
65 | 66 | | |
| |||
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
165 | 165 | | |
166 | 166 | | |
167 | 167 | | |
168 | | - | |
169 | | - | |
170 | | - | |
171 | | - | |
172 | | - | |
| 168 | + | |
| 169 | + | |
| 170 | + | |
173 | 171 | | |
174 | | - | |
175 | | - | |
176 | | - | |
177 | | - | |
178 | | - | |
179 | | - | |
| 172 | + | |
| 173 | + | |
| 174 | + | |
180 | 175 | | |
181 | 176 | | |
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
97 | 97 | | |
98 | 98 | | |
99 | 99 | | |
100 | | - | |
| 100 | + | |
| 101 | + | |
| 102 | + | |
101 | 103 | | |
102 | 104 | | |
103 | 105 | | |
| |||
0 commit comments