Add IWMTL example in README.md

ValerianRey · ValerianRey · commit 4e067a5cc7d3 · 2025-10-11T16:32:22.000+02:00
diff --git a/README.md b/README.md
@@ -155,6 +155,59 @@ Jacobian descent using [UPGrad](https://torchjd.org/stable/docs/aggregation/upgr
       optimizer.step()
 ```
 
+Lastly, you can even combine the two approaches by considering multiple tasks and each element of
+the batch independently. We call that Instance-Wise Multitask Learning (IWMTL).
+
+```python
+import torch
+from torch.nn import Linear, MSELoss, ReLU, Sequential
+from torch.optim import SGD
+
+from torchjd.aggregation import Flattening, UPGradWeighting
+from torchjd.autogram import Engine
+
+shared_module = Sequential(Linear(10, 5), ReLU(), Linear(5, 3), ReLU())
+task1_module = Linear(3, 1)
+task2_module = Linear(3, 1)
+params = [
+    *shared_module.parameters(),
+    *task1_module.parameters(),
+    *task2_module.parameters(),
+]
+
+optimizer = SGD(params, lr=0.1)
+mse = MSELoss(reduction="none")
+weighting = Flattening(UPGradWeighting())
+engine = Engine(shared_module, batch_dim=0)
+
+inputs = torch.randn(8, 16, 10)  # 8 batches of 16 random input vectors of length 10
+task1_targets = torch.randn(8, 16)  # 8 batches of 16 targets for the first task
+task2_targets = torch.randn(8, 16)  # 8 batches of 16 targets for the second task
+
+for input, target1, target2 in zip(inputs, task1_targets, task2_targets):
+    features = shared_module(input)  # shape: [16, 3]
+    out1 = task1_module(features).squeeze(1)  # shape: [16]
+    out2 = task2_module(features).squeeze(1)  # shape: [16]
+
+    # Compute the matrix of losses: one loss per element of the batch and per task
+    losses = torch.stack([mse(out1, target1), mse(out2, target2)], dim=1)  # shape: [16, 2]
+
+    # Compute the gramian (inner products between pairs of gradients of the losses)
+    gramian = engine.compute_gramian(losses)  # shape: [16, 2, 2, 16]
+
+    # Obtain the weights that lead to no conflict between reweighted gradients
+    weights = weighting(gramian)  # shape: [16, 2]
+
+    optimizer.zero_grad()
+    # Do the standard backward pass, but weighted using the obtained weights
+    losses.backward(weights)
+    optimizer.step()
+```
+
+Note that here,  because the losses are a matrix instead of a simple vector, we compute a
+*generalized Gramian* and we extract weights from it using a
+[GeneralizedWeighting](https://torchjd.org/docs/aggregation/index.html#torchjd.aggregation.GeneralizedWeighting).
+
 More usage examples can be found [here](https://torchjd.org/stable/examples/).
 
 ## Supported Aggregators and Weightings