v0.4.4: improved vLLM perf with on-device-sampling disable, fix speculation algo, PEFT update for GRPO

tengomucho released this 12 Jan 10:14

f4bc3c5

What's Changed

Inference

vLLM searches local configuration files by @tengomucho in #1046
Fix speculation algorithm by @dacorvo in #1047
Simplify LLM inference modeling and use longer sequences in tests by @dacorvo in #1052
Remove Inf1 support by @tengomucho in #1054
Improve vLLM performance when on-device-sampling is disabled by @dacorvo in #1055

Training

PEFT update for GRPO by @michaelbenayoun in #1044
Collective ops in optimum/neuron/accelerate by @michaelbenayoun in #1042
Gradient checkpointing fix by @michaelbenayoun in #1043

Other

Update pyproject.toml for uv by @michaelbenayoun in #1040
chore: update pyproject.toml to fix incoherences by @tengomucho in #1050

Full Changelog: v0.4.3...v0.4.4

Contributors

dacorvo, tengomucho, and michaelbenayoun

Assets 2