I think the correct way to implement the constrained gradient in batch is to do gradient rejection for each sample in a batch before reducing them into one grad.
I don't think we can take the mean of the gradients and do rejection on the mean gradient and get the same results.
Any ideas?