0.0.9
- Lock MCG and MUL1 multipliers, no longer flag as experimental
- Switch to MCG codebook by default to new models (use
--codebook 3instfor previous default) - Add more calibration data
- Increase default calibration size to 250 rows (use
--cal_rows 100for previous default) - Fix quantized cache for bsz > 1
- Fix kernel selection on A100
- A few more TP-related fixes
Full Changelog: v0.0.8...v0.0.9