v0.4.4: improved vLLM perf with on-device-sampling disable, fix speculation algo, PEFT update for GRPO
What's Changed
Inference
- vLLM searches local configuration files by @tengomucho in #1046
- Fix speculation algorithm by @dacorvo in #1047
- Simplify LLM inference modeling and use longer sequences in tests by @dacorvo in #1052
- Remove Inf1 support by @tengomucho in #1054
- Improve vLLM performance when on-device-sampling is disabled by @dacorvo in #1055
Training
- PEFT update for GRPO by @michaelbenayoun in #1044
- Collective ops in
optimum/neuron/accelerateby @michaelbenayoun in #1042 - Gradient checkpointing fix by @michaelbenayoun in #1043
Other
- Update pyproject.toml for uv by @michaelbenayoun in #1040
- chore: update pyproject.toml to fix incoherences by @tengomucho in #1050
Full Changelog: v0.4.3...v0.4.4