This release offers major performance improvements to the CUDA backend. It includes a new Virtual Pool Memory Manager (VPMM) in openvm-cuda-common that provides multi-stream memory management using CUDA driver APIs to avoid memory fragmentation. Several kernels in openvm-cuda-backend were also optimized to give significant performance gains.
Added
- (CUDA common) New memory manager with Virtual Pool (VPMM Spec) with multi-stream support built on top of the CUDA Virtual Memory Management driver API
Changed
- (CUDA common) Multi-arch build support
- (CUDA backend) Quotient values kernel optimization
- (CUDA backend) FRI reduced opening kernel optimization by removing bit reversal for better memory access patterns
What's Changed
- ci: add custom RunsOn runners by @jonathanpwang in #117
- ci: pin gpu image to cuda 12.9 by @jonathanpwang in #116
- feat(cuda): Virtual Pool Memory Manager by @gaxiom in #114
- feat: utility to generate SymbolicConstraintsDag statistics by @stephenh-axiom-xyz in #118
- fix(cuda): Stop using constant twiddle by @gaxiom in #119
- chore: use newer ami by @luffykai in #124
- chore(cuda): Ntt refactoring by @gaxiom in #123
- fix(cuda): NTT edge case by @gaxiom in #126
- feat: support multiple CUDA archs by @gaxiom in #127
- docs(readme): Fix Crate Docs link in README by @jonathanpwang in #129
- feat: Quotient evaluation optimization by @stephenh-axiom-xyz in #131
- chore: bump workspace to
v1.2.1-rc.2by @jonathanpwang in #132 - chore(cuda): universal babybear impl by @gaxiom in #130
- chore(cuda): CUDA_DEBUG by @gaxiom in #133
- chore: bump workspace to
v1.2.1-rc.3by @jonathanpwang in #134 - perf(cuda): Opener in natural order by @gaxiom in #135
- feat(cuda): Support multithreaded VPMM + bump up to
1.2.1-rc.4by @gaxiom in #136 - fix(cuda): VPMM reallocate after auto-cleanup by @gaxiom in #138
- chore(cuda): auto-cleanup VPMM fix + multithreaded Ci + zero pages allow by @gaxiom in #139
- chore: bump workspace to v1.2.1-rc.5 by @jonathanpwang in #143
- chore: loosen tokio versioning by @jonathanpwang in #146
- feat(cuda): VPMM v3 by @gaxiom in #148
- fix: cost estimate unread variable by @jonathanpwang in #151
- fix: don't defrag what we already defraged by @gaxiom in #150
- fix(cuda): set device order by @gaxiom in #152
- chore(cuda): VPMM order changed by @gaxiom in #153
- release: v1.2.1 with updated changelog by @jonathanpwang in #156
Full Changelog: v1.2.0...v1.2.1