Skip to content

Add TGLFNNukaeaTransportModel#1436

Merged
copybara-service[bot] merged 3 commits intomainfrom
tglfnn-ukaea
Oct 24, 2025
Merged

Add TGLFNNukaeaTransportModel#1436
copybara-service[bot] merged 3 commits intomainfrom
tglfnn-ukaea

Conversation

@theo-brown
Copy link
Collaborator

@theo-brown theo-brown commented Aug 12, 2025

Implementing TGLF-NN-UKAEA in TORAX.

@theo-brown theo-brown force-pushed the tglfnn-ukaea branch 3 times, most recently from 9685829 to dba2acb Compare August 19, 2025 10:19
@theo-brown
Copy link
Collaborator Author

theo-brown commented Aug 19, 2025

Initial tests, TGLFNN (solid) vs QLKNN (dashed)

  • All tests run by modifying test_iterhybrid_predictor_corrector.py.
  • Note TGLFNNukaea training data may not cover ITER domains, so take with a pinch of salt.
  • In TGLFNNukaea, VEXB_SHEAR input set to 0 - I wasn't sure what to do with this, as I believe the expression for ExB shear either needs an E-field or a rotation rate, neither of which are present in TORAX.
  • TGLFNNukaea solid lines, QLKNN dashed lines.

Te only

tglfnn-vs-qlknn-Te

Te, Ti

tglfnn-vs-qlknn-Te-Ti

Te, Ti, psi

tglfnn-vs-qlknn-Te-Ti-psi

Te, Ti, psi, ne

tglfnn-vs-qlknn-Te-Ti-psi-ne

Conclusions

  • Something wrong in calculation of particle diffusivities and convectivities from fluxes?
  • Thermal diffusivities seem pretty good, behaviour broadly matching QLKNN

@jcitrin
Copy link
Collaborator

jcitrin commented Aug 20, 2025

In TGLFNNukaea, VEXB_SHEAR input set to 0 - I wasn't sure what to do with this, as I believe the expression for ExB shear either needs an E-field or a rotation rate, neither of which are present in TORAX.

This is fine for now. Rotation in TORAX is coming this Q3.

TGLFNNukaea solid lines, QLKNN dashed lines.

  1. No need for Te only. It doesn't add much beyond Te+Ti

  2. I think that the TGLF "transport barrier" at rho<0.4 (especially for Te+Ti+psi) is due to the negative magnetic shear. In QLKNN we reduce that impact with the avoid_big_negative_s flag which is True by default. I think this exasperates the difference. From comparisons to higher fidelity modelling we know that slab modes are underrepresented by QuaLiKiz which should still play a role at low/negative magnetic shear, hence this choice which is also in QuaLiKiz itself. So either we make something similar for TGLF and TGLFNN (if it's also relevant for TGLF), or we turn that flag off in QLKNN for the comparisons, or just do the comparison for a case without negative magnetic shear.

  3. Particle transport. OMG. Yeah that doesn't look right

@theo-brown theo-brown force-pushed the tglfnn-ukaea branch 3 times, most recently from 0c6356e to 7199b4e Compare August 27, 2025 10:52
@theo-brown
Copy link
Collaborator Author

theo-brown commented Sep 1, 2025

We found an issue with the version of the NN we were using. Things are making a bit more sense now, I believe. chi profiles looking broadly good. Will look at De, Ve / particle fluxes when I next have a chance - as the NN outputs main ion flux I think there just needs to be some reweighting to get the electron flux.

@theo-brown
Copy link
Collaborator Author

theo-brown commented Sep 18, 2025

iterhybrid_predictor_corrector, with multimachine TGLFNNukaea vs QLKNN.

Getting reasonable matches for χₑ, χᵢ in ρ > 0.5.

Mismatch for 0.2 < ρ < 0.5 looks like difference in turbulent modes.

Mismatch for ρ < 0.2 not particularly relevant due to patch transport?

Any thoughts @lorenzozanisi @fcasson @jcitrin?

t=0
Figure_1

t=5
Figure_1

@theo-brown
Copy link
Collaborator Author

Progress update: been working on a toy case with @fcasson, we've now fixed various problems with the output normalisations and conversion to chis.

Next step is to make sure we're doing the right thing with particle transport.

Toy tokamak, initial condition, comparison between JETTO and TORAX
image

@theo-brown theo-brown force-pushed the tglfnn-ukaea branch 4 times, most recently from 0b8a368 to ee6cb6d Compare October 3, 2025 16:11
@theo-brown
Copy link
Collaborator Author

Comparison with JETTO, toy case, including particle transport and pellet source:

Initial condition: shows exact match in chi profiles
image

Final condition: shows close match in T, n. Mismatch in chi_e is most likely due to solver differences (e.g. numerical temporal smoothing/damping due to theta method)
image

@theo-brown theo-brown marked this pull request as ready for review October 3, 2025 16:11
)
chi_i = -P_i / (
core_profiles.n_i.face_value() * dT_i_drhon * geo.g1_over_vpr_face
)
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The denormalisation of TGLF is slightly different to QuaLiKiz.

In particular, TGLF normalises everything with respect to the electron temperature and density, which means that chi_i can't be denormalised in the same way.

Due to the difficulties I had trying to track down bugs in this interface, I would strongly advocate for this sort of layout - denormalisation, conversion to power, conversion to chi - rather than what is done in QLKNN.

However, if there are any speed improvements that can be made by changing what is computed, they definitely take priority.

)
# For stability, we also set purely diffusive transport at some minimum
# threshold of the temperature gradient
D_eff_mask &= abs(tglf_inputs.lref_over_lne) >= transport.An_min
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The D/V splitting could plausibly be reused from QLK, I just found it easier writing it out when debugging.

Copy link
Collaborator Author

@theo-brown theo-brown left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Summarised some open questions / details. @jcitrin this is now ready for review :)

Thanks @fcasson for all your in-depth support in getting this over the line!

Some of these questions (eg memoisation and weights loading, which is currently the cause of the failing test) may also be worth revisiting with @hamelphi on fusion-surrogates. Currently fusion_surrogates[tglfnnukaea] requires installing PyTorch (!) because our weights are distributed as .pt or .onnx, and at the start of this process it was deemed preferable to reimplement the network in JAX (which means loading weights from Pytorch .pt) rather than rely on ONNX loaders that may not have a reliable maintenance pathway.

@theo-brown theo-brown requested a review from jcitrin October 3, 2025 16:26
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Comparison with the base test

Figure_1

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes that looks roughly as expected.

In general we see more impact of trapped electron drive in TGLF compared to QuaLiKiz. In lower-density reactor-relevant scenarios like the hybrid scenario, this will be more prevalent.

Intuitively what this means is while ITG dominated (see chi_i/chi_e>1) , the increased trapped electron drive should increase the density peaking as we go closer to the ITG-TEM boundary. This is indeed what is seen in TGLF.

The increased density peaking should decrease a bit the ITG turbulence threshold, resulting in the lower Ti. The higher density due to peaking should result in more Ti-Te coupling, reducing Te/Ti, which is also what we see.

@theo-brown
Copy link
Collaborator Author

As of this morning, proposed new method for distributing weights via pip installable package currently under review in internal UKAEA repos. This would massively simplify loading here and in fusion_surrogates.

@theo-brown
Copy link
Collaborator Author

theo-brown commented Oct 19, 2025

@jcitrin @hamelphi updated to reflect google-deepmind/fusion_surrogates#32. This fixes the problems with Git LFS, and also simplifies the dependency tree.

Tests are failing until fusion_surrogates PR is merged and new version is published to PyPI. Tests pass on a local install, with the version of fusion surrogates from google-deepmind/fusion_surrogates#32.
Lmk when it's in and I'll manually retrigger the tests.

@theo-brown theo-brown force-pushed the tglfnn-ukaea branch 4 times, most recently from 5ca0e5e to 0126a5c Compare October 23, 2025 11:06
Changes to TGLFBasedTransportModel
- Override _make_core_transport with TGLF-specific calculation
- Correct output normalisations
- Correct particle transport

Other changes:
- Update dependencies
Copy link
Collaborator

@jcitrin jcitrin left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Will start the internal review now.

Could we have a followup PR with documentation?

@jcitrin jcitrin added the copybara:import-manual Set when ready for copybara manual import label Oct 23, 2025
@theo-brown
Copy link
Collaborator Author

Forgot about docs! Added now.

@jcitrin
Copy link
Collaborator

jcitrin commented Oct 23, 2025

Looking at the pyproject.toml: it looks like tglfnnukaea is not installed by default. Wouldn't it be easier for users to have it installed by default? It may be a workhorse model so users may be surprised if they don't have it from the getgo

@hamelphi
Copy link
Collaborator

Looking at the pyproject.toml: it looks like tglfnnukaea is not installed by default. Wouldn't it be easier for users to have it installed by default? It may be a workhorse model so users may be surprised if they don't have it from the getgo

Sounds good to me. We should keep in mind that this will add 60Mb to the deps. Though the current default install is around 1Gb (the largest chunk being jaxlib at 300Mb), so it is relatively small.

We could also make it default for fusion_surrogates.

@theo-brown
Copy link
Collaborator Author

Wouldn't it be easier for users to have it installed by default? It may be a workhorse model

Yep, not averse to this!

Before we do, it might be good to do some speed/performance testing, because I've found that some change over the last few commits has caused a big drop in performance. Could we have it as an optional dependency for the time being until it's a bit more stress-tested?

@jcitrin
Copy link
Collaborator

jcitrin commented Oct 23, 2025

Yep, not averse to this! Before we do, it might be good to do some speed/performance testing, because I've found that some change over the last few commits has caused a big drop in performance. Could we have it as an optional dependency for the time being until it's a bit more stress-tested?

So shall we keep it as is for now, and then in a future PR make it default once the performance issues are solved?

@theo-brown
Copy link
Collaborator Author

Sounds good to me!

copybara-service bot pushed a commit that referenced this pull request Oct 23, 2025
--
f6eaa06 by Theo Brown <7982453+theo-brown@users.noreply.github.com>:

Add TGLFNNukaeaTransportModel

Changes to TGLFBasedTransportModel
- Override _make_core_transport with TGLF-specific calculation
- Correct output normalisations
- Correct particle transport

Other changes:
- Update dependencies

--
a07351b by Theo Brown <7982453+theo-brown@users.noreply.github.com>:

Switch to using fusion_surrogates==0.4.3 and tglfnn_ukaea Python package

--
3ffbc0f by Theo Brown <7982453+theo-brown@users.noreply.github.com>:

Add documentation

FUTURE_COPYBARA_INTEGRATE_REVIEW=#1436 from google-deepmind:tglfnn-ukaea 3ffbc0f
PiperOrigin-RevId: 823087120
copybara-service bot pushed a commit that referenced this pull request Oct 23, 2025
--
f6eaa06 by Theo Brown <7982453+theo-brown@users.noreply.github.com>:

Add TGLFNNukaeaTransportModel

Changes to TGLFBasedTransportModel
- Override _make_core_transport with TGLF-specific calculation
- Correct output normalisations
- Correct particle transport

Other changes:
- Update dependencies

--
a07351b by Theo Brown <7982453+theo-brown@users.noreply.github.com>:

Switch to using fusion_surrogates==0.4.3 and tglfnn_ukaea Python package

--
3ffbc0f by Theo Brown <7982453+theo-brown@users.noreply.github.com>:

Add documentation

FUTURE_COPYBARA_INTEGRATE_REVIEW=#1436 from google-deepmind:tglfnn-ukaea 3ffbc0f
PiperOrigin-RevId: 823087120
copybara-service bot pushed a commit that referenced this pull request Oct 23, 2025
--
f6eaa06 by Theo Brown <7982453+theo-brown@users.noreply.github.com>:

Add TGLFNNukaeaTransportModel

Changes to TGLFBasedTransportModel
- Override _make_core_transport with TGLF-specific calculation
- Correct output normalisations
- Correct particle transport

Other changes:
- Update dependencies

--
a07351b by Theo Brown <7982453+theo-brown@users.noreply.github.com>:

Switch to using fusion_surrogates==0.4.3 and tglfnn_ukaea Python package

--
3ffbc0f by Theo Brown <7982453+theo-brown@users.noreply.github.com>:

Add documentation

FUTURE_COPYBARA_INTEGRATE_REVIEW=#1436 from google-deepmind:tglfnn-ukaea 3ffbc0f
PiperOrigin-RevId: 823087120
@theo-brown
Copy link
Collaborator Author

Timing comparison:

image

@hamelphi
Copy link
Collaborator

hamelphi commented Oct 24, 2025

Thanks for the benchmark analysis.

I am not surprised that QLKNN_7_11 is faster than TGLFNN since it is a much smaller model. The specific model was picked with inference time as one of the metrics in mind. That also explains why GPUs do not help in that case, since I suppose the overhead is large compared to the inference time. GPUs could become more relevant for QLKNN with large batch inference.

On the flip side, I do expect the quality of TGLFNN predictions to be more accurate, or at least to have a closer correlation with the base model.

copybara-service bot pushed a commit that referenced this pull request Oct 24, 2025
--
f6eaa06 by Theo Brown <7982453+theo-brown@users.noreply.github.com>:

Add TGLFNNukaeaTransportModel

Changes to TGLFBasedTransportModel
- Override _make_core_transport with TGLF-specific calculation
- Correct output normalisations
- Correct particle transport

Other changes:
- Update dependencies

--
a07351b by Theo Brown <7982453+theo-brown@users.noreply.github.com>:

Switch to using fusion_surrogates==0.4.3 and tglfnn_ukaea Python package

--
3ffbc0f by Theo Brown <7982453+theo-brown@users.noreply.github.com>:

Add documentation

FUTURE_COPYBARA_INTEGRATE_REVIEW=#1436 from google-deepmind:tglfnn-ukaea 3ffbc0f
PiperOrigin-RevId: 823087120
@copybara-service copybara-service bot merged commit 9b9504f into main Oct 24, 2025
18 of 19 checks passed
@jcitrin
Copy link
Collaborator

jcitrin commented Oct 24, 2025

Congratulations @theo-brown , @lorenzozanisi ! It's a great milestone having this in TORAX. Looking forward to using it! Rotation will come soon.

@fcasson

@lorenzozanisi
Copy link
Collaborator

Thanks @theo-brown @hamelphi @fcasson and @jcitrin for pulling this together - excited to see the NNs in action within TORAX!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

copybara:import-manual Set when ready for copybara manual import

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants