v0.3.0
What's Changed
- Test decoder long ctx by @kcirred in #117
- Get criteria from DT artifact by @lupalby in #118
- open gpt.json read-only to support parallel reading. by @gpaulsen in #126
- Drive Paged Program Script enhancements by @JRosenkranz in #128
- [dpp] eliminated pad_token_id from print by @kcirred in #130
- Add the ability to enforce homogeneous program ids in prefill in DPP script by @JRosenkranz in #131
- update test scripts to work with 4 layer micro model by @JRosenkranz in #134
- fixed inference.py for batch size 1 symbolic sdpa by @JRosenkranz in #135
- Make limits more flexible by @ani300 in #138
- Allow specific user prompts in DPP script by @JRosenkranz in #139
- Fix warmup to match vllm by @JRosenkranz in #141
- Add ability in DPP script to select one or many programs that satisfy min batch and min sequence requirements by @JRosenkranz in #137
- Fix paged generate with too much padding by @ani300 in #142
- clean_up_tokenization_spaces=True (default) causes incorrect number of tokens after sampling by @JRosenkranz in #143
- fixed issue where program_id was int when should have been string by @JRosenkranz in #144
New Contributors
Full Changelog: v0.2.3...v0.3.0