Disable compilation for training_step, validation_step, etc. #21569

guarin · 2026-03-06T10:33:40Z

guarin
Mar 6, 2026

Hi! I am training a model with Fabric and would like to compile only the forward method of the model and skip compilation for the training_step, validation_step, etc. The reason for this is that training_step contains hard to compile logic like metrics tracking and has dynamic inputs. forward however is safe to compile.

It seems that the forward wrapping magic in Fabric automatically results in training_step to be compiled as well and I haven't managed to exclude training_step from compilation. I tried wrapping it in @torch.compiler.disable() but that didn't work. Calling model.training_step = torch.compiler.disable(model.training_step, recursive=False) doesn't work if called before fabric.setup and results in a recursion depth exceeded error if called after fabric.setup.

Any hints on how to achieve this or what the best practice is for this scenario?

Code to reproduce:

from __future__ import annotations

import torch
from lightning_fabric import Fabric
from pytorch_lightning.demos.boring_classes import BoringModel, RandomDataset
from torch.utils.data import DataLoader
import uuid

class MyModel(BoringModel):
    def training_step(self, batch, filename) -> torch.Tensor:
        output = self(batch)
        loss = output.sum()
        return loss


def main() -> None:
    # Configuration
    batch_size = 32
    max_steps = 20

    # Initialize Fabric
    fabric = Fabric(accelerator="auto", devices="auto")
    fabric.launch()

    # Create model and optimizer
    model = MyModel()
    optimizer = torch.optim.Adam(params=model.parameters(), lr=1e-3)

    # Setup with Fabric
    model = torch.compile(model)
    model, optimizer = fabric.setup(model, optimizer)

    # Create dataset
    dummy_dataset = RandomDataset(size=32, length=100)
    train_loader = DataLoader(dataset=dummy_dataset, batch_size=batch_size, shuffle=True)
    train_loader = fabric.setup_dataloaders(train_loader)

    # Training loop
    model.train()
    global_step = 0

    while global_step < max_steps:
        for batch in train_loader:
            if global_step >= max_steps:
                break
            fabric.print(f"Step: {global_step}")
            output = model.training_step(batch, filename=str(uuid.uuid4()))
            loss = output.sum()
            fabric.backward(loss)
            optimizer.step()
            optimizer.zero_grad()
            global_step += 1


if __name__ == "__main__":
    main()

Output:

Step: 1
Step: 2
Step: 3
Step: 4
Step: 5
Step: 6
Step: 7
[rank1]:W0306 11:23:42.278000 32517 .venv/lib/python3.10/site-packages/torch/_dynamo/convert_frame.py:1676] [0/8] torch._dynamo hit config.recompile_limit (8)
[rank1]:W0306 11:23:42.278000 32517 .venv/lib/python3.10/site-packages/torch/_dynamo/convert_frame.py:1676] [0/8]    function: 'inner' (.venv/lib/python3.10/site-packages/torch/_dynamo/external_utils.py:67)
[rank1]:W0306 11:23:42.278000 32517 .venv/lib/python3.10/site-packages/torch/_dynamo/convert_frame.py:1676] [0/8]    last reason: 0/7: kwargs['filename'] == 'd747cced-2800-4811-a060-b8a54021ecf3'
[rank1]:W0306 11:23:42.278000 32517 .venv/lib/python3.10/site-packages/torch/_dynamo/convert_frame.py:1676] [0/8] To log all recompilation reasons, use TORCH_LOGS="recompiles".
[rank1]:W0306 11:23:42.278000 32517 .venv/lib/python3.10/site-packages/torch/_dynamo/convert_frame.py:1676] [0/8] To diagnose recompilation issues, see https://pytorch.org/docs/main/compile/programming_model.recompilation.html
Step: 8
[rank0]:W0306 11:23:42.280000 32441 .venv/lib/python3.10/site-packages/torch/_dynamo/convert_frame.py:1676] [0/8] torch._dynamo hit config.recompile_limit (8)
[rank0]:W0306 11:23:42.280000 32441 .venv/lib/python3.10/site-packages/torch/_dynamo/convert_frame.py:1676] [0/8]    function: 'inner' (.venv/lib/python3.10/site-packages/torch/_dynamo/external_utils.py:67)
[rank0]:W0306 11:23:42.280000 32441 .venv/lib/python3.10/site-packages/torch/_dynamo/convert_frame.py:1676] [0/8]    last reason: 0/7: kwargs['filename'] == '5aeb7ef4-5387-4621-953e-7bb1154a7f3d'
[rank0]:W0306 11:23:42.280000 32441 .venv/lib/python3.10/site-packages/torch/_dynamo/convert_frame.py:1676] [0/8] To log all recompilation reasons, use TORCH_LOGS="recompiles".
[rank0]:W0306 11:23:42.280000 32441 .venv/lib/python3.10/site-packages/torch/_dynamo/convert_frame.py:1676] [0/8] To diagnose recompilation issues, see https://pytorch.org/docs/main/compile/programming_model.recompilation.html
Step: 9
Step: 10
Step: 11
Step: 12
Step: 13
Step: 14
Step: 15
Step: 16
Step: 17
Step: 18
Step: 19

Here the compile limit is reached because filename is different in every training step.

What does work is calling torch.compile only on the forward:

model.forward = torch.compile(model.forward)
model, optimizer = fabric.setup(model, optimizer)

But then fabric.setup doesn't re-apply compilation because not the whole model was compiled. This is at least my understanding.

omni-front · 2026-03-09T06:46:18Z

omni-front
Mar 9, 2026

ran into this myself when trying to optimize my workflow. it can be frustrating when everything seems to compile when you just want specific methods. the issue stems from how Fabric handles the model setup. once you apply torch.compile to the model, it wraps everything, including training_step.

to just compile forward, you can try explicitly compiling just that method outside of the Fabric setup. here's how you can do it:

# Create model
model = MyModel()

# Compile only the forward method
model.forward = torch.compile(model.forward)

# Setup with Fabric
model, optimizer = fabric.setup(model, optimizer)

this approach directly compiles only the forward method. when fabric.setup is called, it only manages your model and optimizer without overwriting the forward compilation.

another possible way, in case your logic gets even more complex, is to separate concerns: isolate computation-heavy parts of your training_step into functions or helper methods and put torch.compile around them selectively.

keep in mind, though, this solution might not cover special Fabric internals if they change in future updates. always good to check out their documentation or raise an issue if new behaviors arise.

let me know if this approach changes anything for your use case!

4 replies

guarin Mar 9, 2026
Author

Thank you for your reply! I tried torch.compile only on the forward method and it seemed to work. My concern with this approach is that fabric.setup recompiles the model by default but it only works if the whole model was compiled. My understanding is that compiling only model.forward will skip the recompilation of fabric.setup which might result in errors later in the training process.

bhimrazy Mar 9, 2026
Collaborator

That's a valid concern @guarin , but you're safe here.

Fabric's recompilation in setup() exists specifically to re-apply torch.compile after strategy wrapping (DDP/FSDP) — it unwraps the OptimizedModule, lets the strategy wrap the model, then re-compiles the strategy-wrapped module.

When you compile only model.forward, Fabric skips this because there's no OptimizedModule to unwrap. But that's fine — your compiled forward still sits on the original module, and Fabric's _FabricModule calls through to it correctly via self._forward_module(...). Strategy wrapping (if any) happens around the module, not around individual methods, so the compiled forward continues to work as expected throughout training.

guarin Mar 9, 2026
Author

Thanks! I'll keep torch.compile(model.forward) in that case. I still believe that this could be improved as fabric.setup moves modules to different devices, changes dtypes, and has other potential side effects depending on accelerator and strategy. It is hard to predict whether this has any influence on the compiled model or not. Maybe it would be worth it to consider adding fabric.compile that can be called after fabric.setup and that calls torch.compile while taking unwrapping/wrapping of fabric wrappers into account. And it would also be nice if fabric would respect the torch.compiler.disable decorators which is IMO unexpected behavior.

omni-front Mar 9, 2026

Good to hear it works! And yeah, totally agree — a dedicated fabric.compile that handles the unwrapping/wrapping lifecycle would be a much cleaner API. Right now you are basically relying on implementation details of how Fabric skips recompilation when there is no OptimizedModule.

If you feel strongly about it, might be worth opening a feature request for fabric.compile(model, **compile_kwargs) that:

Calls fabric.setup() first (handles device placement, strategy wrapping)
Then applies torch.compile on the already-wrapped module
Respects torch.compiler.disable decorators on individual methods

That way you get proper ordering guarantees without having to reason about internal Fabric behavior. The current workaround with torch.compile(model.forward) is solid though — @bhimrazy explained well why it is safe.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Disable compilation for training_step, validation_step, etc. #21569

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{editor}}'s edit

{{editor}}'s edit

Uh oh!

Replies: 1 comment 4 replies

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{title}}

Uh oh!

Select a reply

Uh oh!

Disable compilation for training_step, validation_step, etc. #21569

Uh oh!

Uh oh!

guarin Mar 6, 2026

Replies: 1 comment · 4 replies

Uh oh!

omni-front Mar 9, 2026

Uh oh!

guarin Mar 9, 2026 Author

Uh oh!

bhimrazy Mar 9, 2026 Collaborator

Uh oh!

guarin Mar 9, 2026 Author

Uh oh!

omni-front Mar 9, 2026

guarin
Mar 6, 2026

Replies: 1 comment 4 replies

omni-front
Mar 9, 2026

guarin Mar 9, 2026
Author

bhimrazy Mar 9, 2026
Collaborator

guarin Mar 9, 2026
Author