I was running a computation whose basis alone takes 80 Gb of RAM, most of it being used by the P matrices in the nonlocal term.
Since compute_stresses_cart computes the gradient with ForwardDiff, it needs 7x as much additional memory (primal part + 6 dual parts). Unfortunately, 80 x 7 Gb was too much memory for me so my process got killed. :(
Could there be a way to reduce the memory usage of stress computations without sacrificing (too much) performance?