Skip to content

Conversation

@timmoon10
Copy link
Collaborator

Description

This PR adds a basic usage guide for the op fuser and includes it in the autogenerated API docs.

It is ready as-is, but if reviews take a while I may expand it with a guide on creating custom fused ops.

Type of change

  • Documentation change (change only to the documentation, either a fix or a new content)
  • Bug fix (non-breaking change which fixes an issue)
  • New feature (non-breaking change which adds functionality)
  • Breaking change (fix or feature that would cause existing functionality to not work as expected)
  • Infra/Build change
  • Code refactoring

Changes

  • Add basic usage guide for op fuser
  • Include TE ops in autogenerated API docs
  • Debug TE ops docstrings

Checklist:

  • I have read and followed the contributing guidelines
  • The functionality is complete
  • I have commented my code, particularly in hard-to-understand areas
  • I have made corresponding changes to the documentation
  • My changes generate no new warnings
  • I have added tests that prove my fix is effective or that my feature works
  • New and existing unit tests pass locally with my changes

@timmoon10 timmoon10 added the documentation Improvements or additions to documentation label Dec 3, 2025
@greptile-apps
Copy link
Contributor

greptile-apps bot commented Dec 3, 2025

Greptile Overview

Greptile Summary

This PR adds comprehensive documentation for the operation fuser API, a "bottom-up" alternative to Transformer Engine's monolithic modules that allows users to construct and fuse individual operations flexibly.

Key Changes

New Documentation (251 lines)

  • Created docs/examples/op_fuser/op_fuser.rst with detailed usage guide covering:
    • Motivation for the op fuser approach (flexibility vs. monolithic modules)
    • Basic usage with Sequential and FusibleOperation
    • Quantization workflows with FP8/FP4
    • Branching operations (AddExtraInput, MakeExtraOutput)
    • Implementation details (BasicOperation, FusedOperation, OperationFuser)
    • Common misconceptions (not a kernel compiler, not a graph compiler)
  • Includes three diagrams showing operation fusion examples

API Documentation Updates

  • Added "Operation fuser" section to docs/api/pytorch.rst with 26+ operations:
    • Container classes: Sequential, FusibleOperation
    • Basic operations: Linear, BasicLinear, LayerNorm, RMSNorm, Bias, etc.
    • Activation functions: GELU, SwiGLU, ReLU, GEGLU, etc.
    • Utility operations: Quantize, Reshape, AddExtraInput, MakeExtraOutput
    • Distributed operations: AllGather, AllReduce, ReduceScatter

Docstring Fixes (15 files)

  • Fixed RST formatting: single backticks (`) to double backticks (``) for inline code
  • Fixed hyperlink formatting: added spaces before closing backticks (e.g., link <url>__ instead of link<url>__)
  • Standardized boolean/None literals: `False`False, `None`None
  • Improved docstring structure and readability

Minor Fixes

  • Fixed .gitignore: changed *.DS_Store to .DS_Store (more accurate pattern)

Confidence Score: 5/5

  • This PR is completely safe to merge - it contains only documentation improvements with no code logic changes
  • Perfect confidence score because: (1) All changes are documentation-only - new RST guide, API reference updates, and docstring formatting fixes, (2) No functional code changes that could introduce bugs, (3) Docstring fixes improve documentation quality and RST rendering, (4) The new op fuser guide is comprehensive and well-structured with clear examples
  • No files require special attention - all changes are documentation improvements

Important Files Changed

File Analysis

Filename Score Overview
docs/examples/op_fuser/op_fuser.rst 5/5 New comprehensive documentation guide for op fuser API with examples and diagrams
docs/api/pytorch.rst 5/5 Added Operation fuser section with 26 fusible operations to API documentation
.gitignore 5/5 Fixed .DS_Store pattern from *.DS_Store to .DS_Store
transformer_engine/pytorch/ops/basic/activation.py 5/5 Fixed RST hyperlink formatting (added spaces before closing backticks) and improved docstring structure
transformer_engine/pytorch/ops/basic/basic_linear.py 5/5 Fixed docstring formatting: single to double backticks, False/None literal formatting

Sequence Diagram

sequenceDiagram
    participant User
    participant Sequential
    participant OperationFuser
    participant BasicOps
    participant FusedOps
    
    User->>Sequential: forward(input)
    Sequential->>Sequential: _make_module_groups()
    Sequential->>OperationFuser: __call__(input)
    OperationFuser->>OperationFuser: maybe_fuse_ops()
    OperationFuser->>BasicOps: Analyze fusion opportunities
    OperationFuser->>FusedOps: Create fused operations
    FusedOps->>BasicOps: fuser_forward()
    BasicOps-->>FusedOps: output
    FusedOps-->>OperationFuser: output
    OperationFuser-->>Sequential: output
    Sequential-->>User: output

Loading

greptile-apps[bot]

This comment was marked as resolved.

timmoon10 and others added 2 commits December 2, 2025 22:03
Review suggestion from @greptile-apps

Co-authored-by: greptile-apps[bot] <165735046+greptile-apps[bot]@users.noreply.github.com>
Signed-off-by: Tim Moon <[email protected]>
@timmoon10

This comment was marked as outdated.

greptile-apps[bot]

This comment was marked as resolved.

Co-authored-by: greptile-apps[bot] <165735046+greptile-apps[bot]@users.noreply.github.com>
Signed-off-by: Tim Moon <[email protected]>
greptile-apps[bot]

This comment was marked as outdated.

Copy link
Contributor

@greptile-apps greptile-apps bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Additional Comments (1)

  1. transformer_engine/pytorch/ops/basic/activation.py, line 387 (link)

    syntax: Extra space before period.

    Note: If this suggestion doesn't match your team's coding style, reply to this and let me know. I'll remember it for next time!

19 files reviewed, 1 comment

Edit Code Review Agent Settings | Greptile

@pggPL pggPL self-requested a review December 17, 2025 12:47
@timmoon10
Copy link
Collaborator Author

/te-ci core pytorch

greptile-apps[bot]

This comment was marked as resolved.

Signed-off-by: Tim Moon <[email protected]>
greptile-apps[bot]

This comment was marked as outdated.

Copy link
Contributor

@greptile-apps greptile-apps bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No files reviewed, no comments

Edit Code Review Agent Settings | Greptile

Comment on lines +40 to +54
At the most basic level, the operation fuser API involves two classes
in the ``transformer_engine.pytorch.ops`` submodule:

- ``FusibleOperation``: An abstract base class for tensor operations.
Examples include ``Linear``, ``LayerNorm``, and ``AllReduce``. It is
a subclass of ``torch.nn.Module``, so it can hold trainable
parameters and can be called to perform the operation's forward
pass.
- ``Sequential``: A container of modules in sequential order. It has a
very similar interface as ``torch.nn.Sequential``. If it contains
any ``FusibleOperation`` s, then it may attempt to fuse them in the
forward and backward passes.

Thus, using the operation fuser simply involves constructing
``FusibleOperation`` s and passing them into a ``Sequential``.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Who is the intended audience of this documentation? On one hand it seems it is the user (since you show examples of how things could be written), on the other you also include the details of the implementation.

Comment on lines +151 to +153
This is an expert technique. Quantizer configurations can be quite
complicated, so the ``Quantize`` operation's quantizers may be
suboptimal.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not sure what that means - any examples?

Copy link
Collaborator Author

@timmoon10 timmoon10 Jan 15, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For MXFP8, it's not safe for the quantize op to produce a MXFP8Tensor with swizzled scales. There's no way to know if it will consumed by a GEMM or by something else.

the block has been split into two sections, each with one branching
operation.

Implementation details
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, I think this file should be split into 2 (maybe 3) separate sections - one primarily user facing with the sections describing how to use sequential, maybe second one showing how to define your own fusion with a user-provided kernel, and then the third one showing those internal implementation details.

Comment on lines +246 to +251
- **The op fuser is not interchangeable with the monolithic TE
modules**: Modules like ``Linear``, ``LayerNormLinear``, and
``TransformerLayer`` support a wide range of features and advanced
workflows, which makes them challenging to decompose into simple
operations that work with the fuser. They are also carefully
hand-tuned to achieve maximum performance.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We would like to get to the point where the sequential is the default, right? So while right now this is true, it may not be in the future.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

2.12.0 documentation Improvements or additions to documentation

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants