Manopt for Hartree-Fock #500

lukas-weber · 2025-08-08T21:45:15Z

lukas-weber
Aug 8, 2025

Hi,

Inspired by the effort that is going into combining DFTK with Manopt and a conversation at Juliacon, I am running some comparisons to see if Manopt can speed up my Hartree-Fock calculations.

Some Background for Hartree-Fock

Here is a reformulation that may or may not be enjoyable for non-Physicists. In Hartree-Fock, we want to minimize the Rayleigh quotient

$$E(Ψ) = \frac{Ψ^\dagger H Ψ}{Ψ^\dagger Ψ}$$

for a given operator $H$ under the constraint that $Ψ$ is a Slater determinant. $H$ and $Ψ$ live on the many-body Hilbert space $\mathcal{H}$. The Slater determinants $\mathrm{Slater}(\mathcal{H})$ are a subset of $\mathcal{H}$ (but not a subspace).

The dimension $\dim(\mathcal{H})=d^N$ is exponentially large in the number of electrons $N$, making it intractable to work with $\mathcal{H}$ directly. Instead, we can use a parameterization $ψ: ℝ^{n_\uparrow × d} × ℝ^{n_\downarrow × d} \to \mathrm{Slater}(\mathcal{H})$. Here $n_\uparrow$, $n_\downarrow=N-n_\uparrow$ are the numbers of electrons with spin up and down, respectively.

This parameterization is very nice, because there are simple analytic expressions for $E(ψ(x)) = f(G_\uparrow(x_\uparrow), G_\downarrow(x_\downarrow))$, where $G_i = x_i (x_i^\dagger x_i)^{-1} x_i^\dagger$ and the inner product $ψ(x)^\dagger ψ(y) = \det(x_\uparrow^\dagger y_\uparrow) \det(x_\downarrow^\dagger y_\downarrow)$.

These expressions show that the parameterization is actually overcomplete, because $G$ is invariant under row permutations and rescalings. I think ψ can be made bijective by considering the Grassmannian as the domain: $ψ': Gr_{n_\uparrow}(ℝ^{d}) × Gr_{n_\downarrow}(ℝ^{d}) \to \mathrm{Slater}(\mathcal{H})$.

Optimization

All in all, this seems like the perfect setting for Riemannian optimization, with a single catch: If I use the standard embedding of the Grassmannian in $ℝ^{n×d}$, will it not inherit the inner product $Tr x^\dagger y$? But the natural metric (coming from Physics) for the Slater determinants is the Fubini-Study metric based on the determinant inner product above.

For this reason what I have done so far is the following (and here my knowledge about manifolds is very limited, so it may be entirely wrong):

use the standard mapping ψ above
calculate the geometric tensor for the determinant Fubini-Study metric
use the geometric tensor as a preconditioner in Optim.jl LBFGS

This is very slow (inversion of the geometric tensor scales O((nd)^3) or something), but it is more reliable at finding the global minimum, which saves some time in the long run. It would be nice to replace it by Manopt, if I can get similar reliability.

So to test this, I did Hartree-Fock on the N₂ molecule with a bond separation of 2Å, in the cc-pvdz Gaussian basis set, starting from 10 fixed random points on the Grassmannian, using the following three methods:

Optim.jl with Optim.LBFGS(P=geometric_tensor(x0), precondprep=(g,x)->copy!(g, geometric_tensor(x)))
Optim.jl with Optim.LBFGS()
Manopt.jl with

    M = ProductManifold(map(s->Grassmann(s...), size.(x0))...)

    backend = RiemannianProjectionBackend(AutoZygote())
    
    f(x) = energy(data,x.x)
    F(M,x) = f(x)
    ∇F(M,x) = ManifoldDiff.gradient(M, f, x, backend)

    r = quasi_Newton(M, F, ∇F, ArrayPartition(x0); stopping_criterion=StopWhenGradientNormLess(g_tol))

The results are below

Next to the method is the fraction of solutions that converged to the global minimum. While standard LBFGS() is quick, it only found the correct minimum once. The “preconditioned” version is better at finding the minimum but when it doesn’t, it gets completely stuck, I wonder if it runs into some ill-conditioned corner or if my “preconditioning” confuses the line search somehow.
Manopt quasi_Newton with standard settings is as reliable as the preconditioning method, but when it reaches the global minimum, it does so slower than the other methods, both in iterations and in time.

I have the following questions

Is better performance expected if I change some of the settings?
How do the iteration times scale with problem size? Can I get a better scaling than inverting the geometric tensor?
Am I right about the determinant-vs-trace inner product/metric issue? Is there a magic isometric embedding in a small space that has the determinant-Fubini-Study metric?

kellertuer · 2025-08-09T07:16:19Z

kellertuer
Aug 9, 2025
Maintainer

Thanks, this looks very interesting – and cool that you give Manopt a try!
I supervised a master thesis this spring semester that lead to a PR on DFTK.jl that is currently waiting for a review at JuliaMolSim/DFTK.jl#1105

But to be honest, I am a very mere beginner on the physics behind that. My knowledge starts, when you have formulated the energy (and its gradient) as some $f(p)$ where $p$ is a point on Grassmann.

Here is a few things I noticed from that PR linked above and discussions with both Michael from DFTK.jl and my student:

when you stop of course heavily depends on your stopping criterion and in your case especially the tolerance.
To “move around” on the manifold one would usually follow geodesics (imagine the sphere, then that is great arcs) called the exponential map– but one can more generally use retractions, which are (at least up to) first order approximations of the exponential map. Again on the sphere you could use plus (i.e. wander into the embedding) and project back. You can have that in mind as a tradeoff: exp would be very exact (but on Grassmann for example slow), a retraction is maybe a bit imprecise, but that decreases for smaller steps – and QN still converges. On Grassmann there are three, see https://juliamanifolds.github.io/Manifolds.jl/stable/manifolds/grassmann/#ManifoldsBase.retract-Tuple{Grassmann,%20Any,%20Any,%20PolarRetraction} for Polar and QR (below); since project is defined + and a projection should work as well.

This might maybe answer your first question, though these are super hard to impossible to answer in the phrased generality

For your second question, I have no clue about your metric tensor, where that is used, how it is used – did you write a new metric on Grassmann in Manifolds.jl? And how a problem scales with time does neither depend on Manifolds.jl nor Manopt.jl but purely on your problem at hand. So I am not sure I can provide any inside here.

For your third question – I think I do not even understand the question, sorry. If you have a specific metric that is not the default one on Grassmann (cf https://juliamanifolds.github.io/Manifolds.jl/stable/manifolds/grassmann/#ManifoldsBase.inner-Tuple{Grassmann,%20Any,%20Any,%20Any}), the embedding can still be the same, but you will simply not “inherit” a metric from the embedding, but have to
a) write a new MetricManifold in Manifold.jl to have a new inner
b) you loose exp/log, but retractions stay retractions independent of the metric – actually other metrics geodesics/exp just yield retractions then as well
c) Your linesearch will tremendously change, since that heavily depends on the metric
d) Your AD tools will need a new Riemannian gradient conversion, since the gradient depends on the metric – I wrote a longer explanation at https://manoptjl.org/stable/tutorials/AutomaticDifferentiation/#EmbeddedGradient which would require a change_representer function to be implemented.

This might be a ramble into the completely wrong direction, since – again sorry – I do not understand the question, but maybe that at least gives a bit of an insight into the right direction.

3 replies

kellertuer Aug 9, 2025
Maintainer

adendum, I also do not fully understand your plots, since it seems Optim (even without geometry) is best? That looks very strange to me, since in Euclidean terms minimisers would not be isolated and such – but just from a single plot (and with my really minimal to zero understanding of the original problem) that is also nearly impossible to comment much on.

lukas-weber Aug 11, 2025
Author

Thanks a lot for the quick and detailed answers!

Okay, sorry, let me try to clarify:

Only the optim_geom example uses the custom geometric tensor and it is used as a preconditioner matrix for LBFGS in hopes it somehow helps. The Manopt test runs purely with default settings.
Question 2 was about scaling was about the run-time scaling of retraction, exponential map and the other things Manopt does under the hood as a function of the dimension of the Grassmann. The scaling of my energy evaluations is $\mathcal{O}(d^4)$ (because size(v,2) ~ $d$). And it would be great if the other parts of the algorithm do not scale worse than that. For example my primitive geometric tensor preconditioner requires an inverse or pseudoinverse of the geometric tensor, which is $\mathcal{O}(n^3d^3)$ and quite bad once $n\sim d$.
As long all it does is SVDs on the parameter matrix, I think It will not be worse than $\mathcal{O}(d^3)$, but I do not know what it is like once you put a metric and do the gradient conversion etc.
I think question 3 was probably my misunderstanding what embedding actually means. Basically, the Grassmannian also has an embedding in ℝ^(2^d), but there, it has a different metric. I was wondering if such an embedding also exists in a smaller space.

Now regarding why not just going with vanilla Optim LBFGS. It is the fastest per iteration, but it does not seem to converge to local minima more often. The hope is to make the optimization landscape better by getting closer to what the problem would look like in the full Hilbert space (where it would be convex, if we were not restricting the allowed states to the subset of Slater determinants).

This one example plot is not enough to test if this actually works, and I actually just ran it again with more starting guesses and vanilla LBFGS reached the global minimum more often than optim_geom.

I should probably come up with a more comprehensive testsuite with many different systems.

I will try the different settings for projections next and memory size next, thanks for all the helpful suggestions!

kellertuer Aug 11, 2025
Maintainer

Thanks for the clarifications :)

As far as I remember (but best to check the sources, mainly Absil, Bendoikat, Zimmermann and their Handbook on Grassmann stuff), the algorithms shouild scale nicely for you,
and you can usually in higher dimensions even easier go to retractions to even get something a bit cheaper but not too worse in approximating exp

In general (by the Whitney embedding theorem https://en.wikipedia.org/wiki/Whitney_embedding_theorem) you can always embed if you increase the dimension by two, but sometimes that is hard to find in oractice and still not isometric (i.e. in a waz that the embedded tangent spaces can use the metric in the embedding). So usually we choose an ambedding (or even just a representation), that is in some sense nice to work with.

Concerning the experiments, besides the restrictions/constraints, one challenge might be that the minimisers are not isolated, or in other words that you have something like a whole line being a minimizer or such. On Stiefel you have that for example when you actually look for the subspace (i.e. have a Grassmann problem); newton methods usually do not like that.

Note that the “kind of convexity” also changes when moving to manifolds, sincve on a manifold there are no straight lines - but we have their generalisations, geodesics - the set of convex functions is different (the only ones that stay are contant functions, but they are a bit boring in optimisation).

And since curvature is positive on Stiefel/Grassmann like on the sphere, convexity is also just a local property.

mateuszbaran · 2025-08-09T10:49:01Z

mateuszbaran
Aug 9, 2025
Maintainer

Hi!

Is better performance expected if I change some of the settings?

Yes, definitely. memory_size argument of quasi_Newton is min(manifold_dimension(M), 20), but in Optim.jl it's 10. Higher value might lead to faster convergence but each iteration is slower. Next, Optim.jl has better line search algorithms by default. I'd suggest trying Hager-Zhang or More-Thuente: https://manoptjl.org/stable/extensions/#LineSearches.jl . There might be also some potential gains from optimizing gradient calculation but it'd require benchmarking. You can also try different retraction and vector transport methods (for example vector_transport_method=ProjectionTransport() is much cheaper, IIRC the default one for Grassmann needs to compute SVD). I can take a look if you can share a full example. In my benchmarks Manopt.jl's L-BFGS was usually slightly faster than Optim.jl with the same memory size and line search.

All in all, this seems like the perfect setting for Riemannian optimization, with a single catch: If I use the standard embedding of the Grassmannian in
? But the natural metric (coming from Physics) for the Slater determinants is the Fubini-Study metric based on the determinant inner product above.

As Ronny wrote, it might be worthwhile to implement a new metric here. We can help with that.

How do the iteration times scale with problem size? Can I get a better scaling than inverting the geometric tensor?

This strongly depends on the retraction and vector transport methods that are used. ProjectionTransport is much cheaper than ParallelTransport for Grassmann, QRRetraction is usually cheaper than PolarRetraction.

Am I right about the determinant-vs-trace inner product/metric issue? Is there a magic isometric embedding in a small space that has the determinant-Fubini-Study metric?

As far as I know we don't have that implemented.

2 replies

mateuszbaran Aug 9, 2025
Maintainer

And regarding help with implementing a new metric, you can just implement non-JuliaManifolds functions that calculate the inner product, a retraction and a vector transport and we can wrap in JuliaManifolds stuff that Manopt can understand.

kellertuer Aug 9, 2025
Maintainer

Ah yes, I forgot about memory size. I can also link the master thesis the PR above is based on, once it is available in NTNU Open, that is hopefully in about a month. That also contains quite a comparison between Manopt and Optim here. Or I can send it to you by mail, Lukas.

lukas-weber · 2025-08-11T15:20:08Z

lukas-weber
Aug 11, 2025
Author

Thanks a lot, I’ll try tweaking the settings a bit!

I put my minimal working example at https://github.com/lukas-weber/hf-optim-test (energy and gradient evaluations are intentionally not very optimized in order to keep the example minimal and self-contained).

I would also be very happy to read the master thesis (my e-mail is on my website, reachable from my profile).

0 replies

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Manopt for Hartree-Fock #500

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{editor}}'s edit

{{editor}}'s edit

Uh oh!

Replies: 3 comments 5 replies

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{title}}

Uh oh!

Select a reply

Uh oh!

Manopt for Hartree-Fock #500

Uh oh!

Uh oh!

lukas-weber Aug 8, 2025

Some Background for Hartree-Fock

Optimization

Replies: 3 comments · 5 replies

Uh oh!

kellertuer Aug 9, 2025 Maintainer

Uh oh!

kellertuer Aug 9, 2025 Maintainer

Uh oh!

lukas-weber Aug 11, 2025 Author

Uh oh!

kellertuer Aug 11, 2025 Maintainer

Uh oh!

mateuszbaran Aug 9, 2025 Maintainer

Uh oh!

mateuszbaran Aug 9, 2025 Maintainer

Uh oh!

kellertuer Aug 9, 2025 Maintainer

Uh oh!

lukas-weber Aug 11, 2025 Author

lukas-weber
Aug 8, 2025

Replies: 3 comments 5 replies

kellertuer
Aug 9, 2025
Maintainer

kellertuer Aug 9, 2025
Maintainer

lukas-weber Aug 11, 2025
Author

kellertuer Aug 11, 2025
Maintainer

mateuszbaran
Aug 9, 2025
Maintainer

mateuszbaran Aug 9, 2025
Maintainer

kellertuer Aug 9, 2025
Maintainer

lukas-weber
Aug 11, 2025
Author