Releases: wejoncy/QLLM
Releases · wejoncy/QLLM
v0.2.3.1
What's Changed
- ci: add PyPI deploy stage with manual approval by @wejoncy in #174
- docs: update README for CUDA 13.0, Python 3.11-3.13 by @wejoncy in #175
- fix: parallel_download_decorator compatible with transformers >= 5 by @wejoncy in #176
- Bump version to 0.2.3.1 by @wejoncy in #177
Full Changelog: v0.2.3...v0.2.3.1
v0.2.3
What's Changed
- enhance vptq by @wejoncy in #158
- patch parallel download shard by @wejoncy in #159
- fix vptq cache dir and model name by @wejoncy in #160
- add log by @wejoncy in #161
- fix moe hessian by @wejoncy in #162
- [vptq] fix caching by @wejoncy in #163
- [dataset] fix torch load by @wejoncy in #164
- [vptq] support save and load by @wejoncy in #165
- fix local checkpoint loading by @wejoncy in #167
- reduce memory usage during repack by @ReinForce-II in #168
- fix: compatibility with transformers >= 5 and support non-llama models in chat plugin by @wejoncy in #171
- Bump version to 0.2.3 by @wejoncy in #172
- fix: CI build - ubuntu-22.04, MSVC setup for Windows by @wejoncy in #173
New Contributors
- @ReinForce-II made their first contribution in #168
Full Changelog: v0.2.2.1...v0.2.3
v0.2.2.1
v0.2.1
What's Changed
- more Awq models && onnx kernel bug when g=-1 by @wejoncy in #138
- feat: support new quantization algorithm 'Vptq' by @wejoncy in #141
- vptq: polish vptq config by @wejoncy in #142
- bump to 0.2.1 by @wejoncy in #143
- fix package by @wejoncy in #144
- fix ci by @wejoncy in #145
- support auto dtype by @wejoncy in #146
- quick fix by @wejoncy in #147
- fix package name by @wejoncy in #148
Full Changelog: v0.2.0...v0.2.1
v0.2.0
v0.1.9.1
What's Changed
- add assert message && ci upgrade torch 2.2.2 by @wejoncy in #124
- Update README.md by @wejoncy in #125
- fix version match erros by @wejoncy in #128
- add macro GENERAL_TORCH to get rid of OptionalCUDAGuard by @wejoncy in #129
- quick fix by @wejoncy in #130
- v0.1.9.1 by @wejoncy in #131
Full Changelog: v0.1.9...v0.1.9.1
v0.1.9
What's Changed
- Bump to 0.1.8 by @wejoncy in #109
- new autogptq config format && parallel load by @wejoncy in #110
- bugfix by @wejoncy in #111
- fix issue by @wejoncy in #113
- Fix 112 by @wejoncy in #114
- Fix typos by @emphasis10 in #115
- minor fix, attn_implementation by @wejoncy in #120
- Bump to 0.1.9 by @wejoncy in #121
- -allow-unsupported-compiler by @wejoncy in #122
New Contributors
- @emphasis10 made their first contribution in #115
Full Changelog: v0.1.8...v0.1.9
v0.1.8
v0.1.7.1
v0.1.7
What's Changed
- ort ops support in main branch with act_order by @wejoncy in #92
- support export hqq to onnx by @wejoncy in #93
- Bump to 0.1.7 by @wejoncy in #94
- improve .cpu() with non_blocking by @wejoncy in #95
- disable win in release by @wejoncy in #96
- refactor args by @wejoncy in #97
Full Changelog: v0.1.6...v0.1.7