Skip to content

Releases: MoonshotAI/checkpoint-engine

v0.4.0

02 Feb 14:06
e906b46

Choose a tag to compare

What's Changed

  • refactor: use a shared TCPStore in ParameterServer and create ProcessGroup using PrefixStore by @HubertZhang in #82
  • feat: add StatelessProcessGroup to extend collective library by @kip-cxj in #66
  • feat: release ipc buffers before calling update_weights_from_ipc's post_hook by @HubertZhang in #84

Full Changelog: v0.3.4...v0.4.0

v0.4.0-rc0

02 Feb 11:23
e906b46

Choose a tag to compare

v0.4.0-rc0 Pre-release
Pre-release

What's Changed

  • refactor: use a shared TCPStore in ParameterServer and create ProcessGroup using PrefixStore by @HubertZhang in #82
  • feat: add StatelessProcessGroup to extend collective library by @kip-cxj in #66
  • feat: release ipc buffers before calling update_weights_from_ipc's post_hook by @HubertZhang in #84

Full Changelog: v0.3.4...v0.4.0-rc0

v0.3.4

28 Jan 12:34
15446dd

Choose a tag to compare

What's Changed

New Contributors

Full Changelog: v0.3.3...v0.3.4

v0.3.3

20 Jan 11:52
f6910d6

Choose a tag to compare

What's Changed

Full Changelog: v0.3.2...v0.3.3

v0.3.2

09 Jan 12:24
4a73109

Choose a tag to compare

What's Changed

  • fix p2p update error when disable_h2d_buffer is true by @ruizhang1230 in #76
  • fix: set current CUDA device in _inplace_pin_memory function by @SongXiaoXi in #77

Full Changelog: v0.3.1...v0.3.2

v0.3.1

05 Jan 09:00

Choose a tag to compare

Same as v0.3.1-rc0

v0.3.1-rc0

04 Jan 07:52

Choose a tag to compare

v0.3.1-rc0 Pre-release
Pre-release

What's Changed

Full Changelog: v0.3.0-rc1...v0.3.1-rc0

v0.3.0

23 Dec 11:21
88370e2

Choose a tag to compare

What's Changed

  • feat: docs added for auto_pg, and auto_pg default set to True by @specture724 in #65
  • hotfix: add a switch to disable inplace pinning of tensors by @specture724 in #68
  • hotfix: inplace pin memory caused cudaErrorHostMemoryAlreadyRegistered by @specture724 in #69
  • fix: CUDA OOM encountered with store based barrier by @specture724 in #70

Full Changelog: v0.3.0-rc0...v0.3.0-rc1

v0.2.3

18 Dec 07:18

Choose a tag to compare

  • Disable "inplace pin memory" feature by default from 0.2.2, as it may cause issues

v0.3.0-rc0

11 Dec 09:48
baf6f61

Choose a tag to compare

v0.3.0-rc0 Pre-release
Pre-release

What's Changed

  • fix: use tcp store_based_barrier to control p2p update synchronization by @specture724 in #51

Full Changelog: v0.2.2...v0.3.0-rc0