[feat] implement record_stream when using CUDA streams during group offloading
#20059
pr_dependency_test.yml
on: pull_request
check_dependencies
17s