Transformer predictor for JEPA #1590

kctezcan · 2026-01-13T13:21:04Z

Description

Issue Number

Closes #1587

Checklist before asking for review

I have performed a self-review of my code
My changes comply with basic sanity checks:
- I have fixed formatting issues with ./scripts/actions.sh lint
- I have run unit tests with ./scripts/actions.sh unit-test
- I have documented my code and I have updated the docstrings.
- I have added unit tests, if relevant
I have tried my changes with data and code:
- I have run the integration tests with ./scripts/actions.sh integration-test
- (bigger changes) I have run a full training and I have written in the comment the run_id(s): launch-slurm.py --time 60
- (bigger changes and experiments) I have shared a hegdedoc in the github issue with all the configurations and runs for this experiments
I have informed and aligned with people impacted by my change:
- for config changes: the MatterMost channels and/or a design doc
- for changes of dependencies: the MatterMost software development channel

kctezcan · 2026-01-13T13:21:21Z

Some design choices to discuss:

the latent space has the default dimensions of 12288x2048. The optimal predictor dimension is 384. We use a linear layer without bias at the entrance of the predictor to map 2048->384 and as the last layer another linear map without bias to map 384->2048. These dimensions are all config parameters.
We hardcoded 4 as the MLP factor (factor for the hidden dimension in the MLP layers) as this is the optimal value from the literature.
we have introduced new parameters only in the jepa config: pred_xxx

sophie-xhonneux

Broadly, I am happy to approve the comments are addressed.

But I would like to talk about whether the global assimilation engine makes sense to be a part of the encoder or if for JEPA we should instead use the AggregationEngine (Global transformer without the masked tokens)

sophie-xhonneux · 2026-01-14T10:55:54Z

config/default_config_jepa.yml

 # ### Example validation and training config for student-teacher with JEPA
 validation_config:
  losses:
+    LossPhysical: {weight: 0.0, loss_fcts: [['mse', 0.8], ['mae', 0.2]]}


can we remove this if instead of setting the weight to 0?

this file is removed with the new configs

sophie-xhonneux · 2026-01-14T10:56:10Z

config/default_config_jepa.yml

  training_mode: "student_teacher"  # "masking", "student_teacher", "forecast"
  target_and_aux_calc: "EMATeacher"
  losses :  
+    LossPhysical: {weight: 0.0, loss_fcts: [['mse', 0.8], ['mae', 0.2]]}


this file is removed with the new configs

kctezcan · 2026-01-14T12:22:37Z

I renamed the new parameters related to the JEPA predictor from pred_ to sslpred_ because I realized there were already parameters called pred_ for the decoder (i.e. prediction heads)

clessig · 2026-01-15T07:46:40Z

config/config_physical_jepa.yml

 forecast_att_dense_rate: 1.0
 with_step_conditioning: True # False

+sslpred_num_blocks: 12


Where would this go into the new config structure?

Option A is model but it's ultimately something specific to the JEPA loss term.

clessig · 2026-01-15T07:48:07Z

src/weathergen/model/engines.py

+
+        for _ in range(self.cf.sslpred_num_blocks):
+            self.pred_blocks.append(
+                MultiSelfAttentionHead(


We should have a transformer_block module for attention + MLP (can/should be in a separate PR)

clessig · 2026-01-15T07:50:33Z

src/weathergen/model/model.py

                    )
                elif loss == "JEPA":
-                    self.latent_heads[loss] = LatentPredictionHead(
+                    self.latent_heads[loss] = TransformerPredictionHead(


Why is here no if-statement to chose between LatentPredictionHead and TransformerPredictionHead? LatentPredictionHead should be renamed to LatentPredictionHeadMLP (and then I would also prefer LatentPredictionHeadTransformer).

sophie-xhonneux · 2026-01-17T12:43:29Z

I clean up the PR a bit, but didn't know how to open a PR against this PR so I put it here #1649

clessig · 2026-01-18T09:07:38Z

I clean up the PR a bit, but didn't know how to open a PR against this PR so I put it here #1649

You need open the PR against the repo Kerem is using (MeteoSwiss fork of WeatherGenerator).

kctezcan added 4 commits January 13, 2026 08:31

WIP added a predictor class

bc19177

using the transformer predictor for jepa

f3d81b8

lint

d3fc692

Merge branch 'develop' into ktezcan/dev/iss1587_predictor_jepa

0d23560

github-project-automation bot added this to WeatherGen-dev Jan 13, 2026

added pred_ params in the test config

37145b3

sophie-xhonneux approved these changes Jan 14, 2026

View reviewed changes

kctezcan added 3 commits January 14, 2026 12:57

renamed params to sslpred_

3e9372b

merged develop

1dcd781

lint

c845509

sophie-xhonneux approved these changes Jan 14, 2026

View reviewed changes

clessig reviewed Jan 15, 2026

View reviewed changes

kctezcan added 2 commits January 15, 2026 16:21

Merge branch 'develop' into ktezcan/dev/iss1587_predictor_jepa

3a00e40

added the only jepa config

2048d16

sophie-xhonneux mentioned this pull request Jan 17, 2026

Sophiex/kerem/pr/transformer head #1649

Open

4 tasks

Transformer predictor for JEPA #1590

Are you sure you want to change the base?

Transformer predictor for JEPA #1590

Uh oh!

Conversation

kctezcan commented Jan 13, 2026

Description

Issue Number

Checklist before asking for review

Uh oh!

kctezcan commented Jan 13, 2026

Uh oh!

sophie-xhonneux left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

kctezcan commented Jan 14, 2026

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

sophie-xhonneux commented Jan 17, 2026

Uh oh!

clessig commented Jan 18, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants