Skip to content

Releases: MoonshotAI/checkpoint-engine

v0.2.2

11 Dec 09:33
089d185

Choose a tag to compare

What's Changed

  • fix: propagate remote exception traceback to parameter server by @SongXiaoXi in #59
  • misc: update README with environment variable instructions and vLLM version specified by @specture724 in #61
  • feat: reuse pin_memory when registering checkpoint by @specture724 in #56
  • feat: inplace pin memory for safetensors in /dev/shm/ by @specture724 in #58
  • feat: force unregister shared pin memory buffer supported by @specture724 in #62
  • feat: docs for force unregister by @specture724 in #63

New Contributors

Full Changelog: v0.2.1...v0.2.2

v0.2.1

24 Nov 13:29
279a908

Choose a tag to compare

What's Changed

  • [Hardware] broadcast support for Huawei Ascend NPU by @kip-cxj in #39
  • [Doc] add sglang usage document by @stmatengss in #45
  • Fix ValueError handling and add device type check by @HubertZhang in #47
  • Fix wrong device_type, refine documents in worker.py by @HubertZhang in #52
  • Expose uds in UpdateRequest by @HubertZhang in #49
  • fix: test_update.py failed because _get_physical_gpu_id doesn't get 'device_manager' argument by @specture724 in #53
  • fix: add log to hint to set NCCL_IB_HCA env when _get_my_rdma_device raise an assertion failure by @specture724 in #54
  • fix: force ps to quit when error occur during updating by @specture724 in #43
  • [Hardware] p2p support for Huawei Ascend NPU by @kip-cxj in #46
  • bugfix: reset global meta when gather meta by @HubertZhang in #57

Full Changelog: v0.2.0...v0.2.1

v0.2.0

30 Oct 02:27
a291782

Choose a tag to compare

Feature

See #25. We speedup the P2P implementation and make it have the same speed of broadcast. Also, we bind each GPU to its corresponding NUMA node to ensure stable H2D transfer speeds, which will speedup the update duration. We update the test result.

Model Device Info GatherMetas Update (Broadcast) Update (P2P)
GLM-4.5-Air (BF16) 8xH800 TP8 0.12s 3.47s (3.02GiB) 4.12s (3.02GiB)
Qwen3-235B-A22B-Instruct-2507 (BF16) 8xH800 TP8 0.33s 6.22s (2.67GiB) 7.10s (2.68GiB)
DeepSeek-V3.1 (FP8) 16xH20 TP16 1.17s 10.19s (5.39GiB) 11.80s (5.41GiB)
Kimi-K2-Instruct (FP8) 16xH20 TP16 1.33s 14.36s (5.89GiB) 17.49s (5.91GiB)
DeepSeek-V3.1 (FP8) 256xH20 TP16 0.80s 11.33s (8.00GiB) 11.81s (8.00GiB)
Kimi-K2-Instruct (FP8) 256xH20 TP16 1.22s 16.04s (8.00GiB) 16.75s (8.00GiB)

What's Changed

Full Changelog: v0.1.3...v0.2.0

v0.1.3

14 Oct 08:10
8a60e65

Choose a tag to compare

What's Changed

  • fix register_files fastapi parameter parse error by @ruizhang1230 in #27
  • fix destroy process group error when using p2p update by @ruizhang1230 in #30
  • feat: support configurable gpu count and memory fraction by @zxpdemonio in #29

Full Changelog: v0.1.2...v0.1.3

v0.1.2

22 Sep 14:26
716c0da

Choose a tag to compare

What's Changed

Full Changelog: https://github.com/MoonshotAI/checkpoint-engine/commits/v0.1.2