Skip to content

Conversation

@PragmaTwice
Copy link
Member

This PR adds support in mlir-tblgen for generating docstrings for each Python class corresponding to an MLIR op. The docstrings are currently derived from the op’s description in ODS, with indentation adjusted to display nicely in Python. This makes it easier for Python users to see the op descriptions directly in their IDE or LSP while coding.

In the future, we can extend the docstrings to include explanations for each method, attribute, and so on.

This idea was previously discussed in the #mlir-python channel on Discord with @makslevental and @superbobry.

@llvmbot
Copy link
Member

llvmbot commented Sep 12, 2025

@llvm/pr-subscribers-mlir-core

Author: Twice (PragmaTwice)

Changes

This PR adds support in mlir-tblgen for generating docstrings for each Python class corresponding to an MLIR op. The docstrings are currently derived from the op’s description in ODS, with indentation adjusted to display nicely in Python. This makes it easier for Python users to see the op descriptions directly in their IDE or LSP while coding.

In the future, we can extend the docstrings to include explanations for each method, attribute, and so on.

This idea was previously discussed in the #mlir-python channel on Discord with @makslevental and @superbobry.


Full diff: https://github.com/llvm/llvm-project/pull/158198.diff

3 Files Affected:

  • (modified) mlir/test/mlir-tblgen/op-python-bindings.td (+28)
  • (modified) mlir/test/python/ir/auto_location.py (+3-3)
  • (modified) mlir/tools/mlir-tblgen/OpPythonBindingGen.cpp (+27-3)
diff --git a/mlir/test/mlir-tblgen/op-python-bindings.td b/mlir/test/mlir-tblgen/op-python-bindings.td
index 3ec69c33b4bb9..9f36d2b1ad0f5 100644
--- a/mlir/test/mlir-tblgen/op-python-bindings.td
+++ b/mlir/test/mlir-tblgen/op-python-bindings.td
@@ -252,6 +252,34 @@ def DeriveResultTypesVariadicOp : TestOp<"derive_result_types_variadic_op", [Fir
 // CHECK: def derive_result_types_variadic_op(res, _gen_res_1, type_, *, loc=None, ip=None)
 // CHECK:   return _get_op_result_or_op_results(DeriveResultTypesVariadicOp(res=res, _gen_res_1=_gen_res_1, type_=type_, loc=loc, ip=ip))
 
+
+// CHECK: class DescriptionOp(_ods_ir.OpView):
+// CHECK:   """
+// CHECK:   This is a long description.
+// CHECK:   It has multiple lines.
+// CHECK:   A code block (to test the indent).
+// CHECK:   ```mlir
+// CHECK:   test.loop {
+// CHECK:     test.yield
+// CHECK:   }
+// CHECK:   ```
+// CHECK:   Add \"\"\" will not terminate the description.
+// CHECK:   """
+def DescriptionOp : TestOp<"description"> {
+  let description = [{
+    This is a long description.
+    It has multiple lines.
+
+    A code block (to test the indent).
+    ```mlir
+    test.loop {
+      test.yield
+    }
+    ```
+    Add """ will not terminate the description.
+  }];
+}
+
 // CHECK: @_ods_cext.register_operation(_Dialect)
 // CHECK: class EmptyOp(_ods_ir.OpView):
 // CHECK-LABEL: OPERATION_NAME = "test.empty"
diff --git a/mlir/test/python/ir/auto_location.py b/mlir/test/python/ir/auto_location.py
index 01b5542119b4e..24b4fb076afe0 100644
--- a/mlir/test/python/ir/auto_location.py
+++ b/mlir/test/python/ir/auto_location.py
@@ -51,7 +51,7 @@ def testInferLocations():
         _cext.globals.register_traceback_file_inclusion(_arith_ops_gen.__file__)
         three = arith.constant(IndexType.get(), 3)
         # fmt: off
-        # CHECK: loc(callsite("ConstantOp.__init__"("{{.*}}[[SEP]]mlir[[SEP]]dialects[[SEP]]_arith_ops_gen.py":397:4 to :235) at callsite("testInferLocations"("{{.*}}[[SEP]]test[[SEP]]python[[SEP]]ir[[SEP]]auto_location.py":52:16 to :50) at callsite("run"("{{.*}}[[SEP]]test[[SEP]]python[[SEP]]ir[[SEP]]auto_location.py":13:4 to :7) at "<module>"("{{.*}}[[SEP]]test[[SEP]]python[[SEP]]ir[[SEP]]auto_location.py":26:1 to :4)))))
+        # CHECK: loc(callsite("ConstantOp.__init__"("{{.*}}[[SEP]]mlir[[SEP]]dialects[[SEP]]_arith_ops_gen.py":649:4 to :235) at callsite("testInferLocations"("{{.*}}[[SEP]]test[[SEP]]python[[SEP]]ir[[SEP]]auto_location.py":52:16 to :50) at callsite("run"("{{.*}}[[SEP]]test[[SEP]]python[[SEP]]ir[[SEP]]auto_location.py":13:4 to :7) at "<module>"("{{.*}}[[SEP]]test[[SEP]]python[[SEP]]ir[[SEP]]auto_location.py":26:1 to :4)))))
         # fmt: on
         print(three.location)
 
@@ -60,14 +60,14 @@ def foo():
             print(four.location)
 
         # fmt: off
-        # CHECK: loc(callsite("ConstantOp.__init__"("{{.*}}[[SEP]]mlir[[SEP]]dialects[[SEP]]_arith_ops_gen.py":397:4 to :235) at callsite("testInferLocations.<locals>.foo"("{{.*}}[[SEP]]test[[SEP]]python[[SEP]]ir[[SEP]]auto_location.py":59:19 to :53) at callsite("testInferLocations"("{{.*}}[[SEP]]test[[SEP]]python[[SEP]]ir[[SEP]]auto_location.py":65:8 to :13) at callsite("run"("{{.*}}[[SEP]]test[[SEP]]python[[SEP]]ir[[SEP]]auto_location.py":13:4 to :7) at "<module>"("{{.*}}[[SEP]]test[[SEP]]python[[SEP]]ir[[SEP]]auto_location.py":26:1 to :4))))))
+        # CHECK: loc(callsite("ConstantOp.__init__"("{{.*}}[[SEP]]mlir[[SEP]]dialects[[SEP]]_arith_ops_gen.py":649:4 to :235) at callsite("testInferLocations.<locals>.foo"("{{.*}}[[SEP]]test[[SEP]]python[[SEP]]ir[[SEP]]auto_location.py":59:19 to :53) at callsite("testInferLocations"("{{.*}}[[SEP]]test[[SEP]]python[[SEP]]ir[[SEP]]auto_location.py":65:8 to :13) at callsite("run"("{{.*}}[[SEP]]test[[SEP]]python[[SEP]]ir[[SEP]]auto_location.py":13:4 to :7) at "<module>"("{{.*}}[[SEP]]test[[SEP]]python[[SEP]]ir[[SEP]]auto_location.py":26:1 to :4))))))
         # fmt: on
         foo()
 
         _cext.globals.register_traceback_file_exclusion(__file__)
 
         # fmt: off
-        # CHECK: loc("ConstantOp.__init__"("{{.*}}[[SEP]]mlir[[SEP]]dialects[[SEP]]_arith_ops_gen.py":397:4 to :235))
+        # CHECK: loc("ConstantOp.__init__"("{{.*}}[[SEP]]mlir[[SEP]]dialects[[SEP]]_arith_ops_gen.py":649:4 to :235))
         # fmt: on
         foo()
 
diff --git a/mlir/tools/mlir-tblgen/OpPythonBindingGen.cpp b/mlir/tools/mlir-tblgen/OpPythonBindingGen.cpp
index 6a7aa9e3432d5..e45a142cf9d38 100644
--- a/mlir/tools/mlir-tblgen/OpPythonBindingGen.cpp
+++ b/mlir/tools/mlir-tblgen/OpPythonBindingGen.cpp
@@ -13,6 +13,7 @@
 
 #include "OpGenHelpers.h"
 
+#include "mlir/Support/IndentedOstream.h"
 #include "mlir/TableGen/GenInfo.h"
 #include "mlir/TableGen/Operator.h"
 #include "llvm/ADT/StringSet.h"
@@ -62,10 +63,11 @@ from ._{0}_ops_gen import _Dialect
 
 /// Template for operation class:
 ///   {0} is the Python class name;
-///   {1} is the operation name.
+///   {1} is the operation name;
+///   {2} is the docstring for this operation.
 constexpr const char *opClassTemplate = R"Py(
 @_ods_cext.register_operation(_Dialect)
-class {0}(_ods_ir.OpView):
+class {0}(_ods_ir.OpView):{2}
   OPERATION_NAME = "{1}"
 )Py";
 
@@ -1034,9 +1036,31 @@ static void emitValueBuilder(const Operator &op,
   }
 }
 
+/// Retrieve the description of the given op and generate a docstring for it.
+static std::string makeDocStringForOp(const Operator &op) {
+  if (!op.hasDescription())
+    return "";
+
+  auto desc = op.getDescription().rtrim(" \t").str();
+  // Replace all """ with \"\"\" to avoid early termination of the literal.
+  desc = llvm::join(llvm::split(desc, R"(""")"), R"(\"\"\")");
+
+  std::string docString = "\n";
+  llvm::raw_string_ostream os(docString);
+  raw_indented_ostream identedOs(os);
+  os << R"(  """)" << "\n";
+  identedOs.printReindented(desc, "  ");
+  if (!StringRef(desc).ends_with("\n"))
+    os << "\n";
+  os << R"(  """)" << "\n";
+
+  return docString;
+}
+
 /// Emits bindings for a specific Op to the given output stream.
 static void emitOpBindings(const Operator &op, raw_ostream &os) {
-  os << formatv(opClassTemplate, op.getCppClassName(), op.getOperationName());
+  os << formatv(opClassTemplate, op.getCppClassName(), op.getOperationName(),
+                makeDocStringForOp(op));
 
   // Sized segments.
   if (op.getTrait(attrSizedTraitForKind("operand")) != nullptr) {

@llvmbot
Copy link
Member

llvmbot commented Sep 12, 2025

@llvm/pr-subscribers-mlir

Author: Twice (PragmaTwice)

Changes

This PR adds support in mlir-tblgen for generating docstrings for each Python class corresponding to an MLIR op. The docstrings are currently derived from the op’s description in ODS, with indentation adjusted to display nicely in Python. This makes it easier for Python users to see the op descriptions directly in their IDE or LSP while coding.

In the future, we can extend the docstrings to include explanations for each method, attribute, and so on.

This idea was previously discussed in the #mlir-python channel on Discord with @makslevental and @superbobry.


Full diff: https://github.com/llvm/llvm-project/pull/158198.diff

3 Files Affected:

  • (modified) mlir/test/mlir-tblgen/op-python-bindings.td (+28)
  • (modified) mlir/test/python/ir/auto_location.py (+3-3)
  • (modified) mlir/tools/mlir-tblgen/OpPythonBindingGen.cpp (+27-3)
diff --git a/mlir/test/mlir-tblgen/op-python-bindings.td b/mlir/test/mlir-tblgen/op-python-bindings.td
index 3ec69c33b4bb9..9f36d2b1ad0f5 100644
--- a/mlir/test/mlir-tblgen/op-python-bindings.td
+++ b/mlir/test/mlir-tblgen/op-python-bindings.td
@@ -252,6 +252,34 @@ def DeriveResultTypesVariadicOp : TestOp<"derive_result_types_variadic_op", [Fir
 // CHECK: def derive_result_types_variadic_op(res, _gen_res_1, type_, *, loc=None, ip=None)
 // CHECK:   return _get_op_result_or_op_results(DeriveResultTypesVariadicOp(res=res, _gen_res_1=_gen_res_1, type_=type_, loc=loc, ip=ip))
 
+
+// CHECK: class DescriptionOp(_ods_ir.OpView):
+// CHECK:   """
+// CHECK:   This is a long description.
+// CHECK:   It has multiple lines.
+// CHECK:   A code block (to test the indent).
+// CHECK:   ```mlir
+// CHECK:   test.loop {
+// CHECK:     test.yield
+// CHECK:   }
+// CHECK:   ```
+// CHECK:   Add \"\"\" will not terminate the description.
+// CHECK:   """
+def DescriptionOp : TestOp<"description"> {
+  let description = [{
+    This is a long description.
+    It has multiple lines.
+
+    A code block (to test the indent).
+    ```mlir
+    test.loop {
+      test.yield
+    }
+    ```
+    Add """ will not terminate the description.
+  }];
+}
+
 // CHECK: @_ods_cext.register_operation(_Dialect)
 // CHECK: class EmptyOp(_ods_ir.OpView):
 // CHECK-LABEL: OPERATION_NAME = "test.empty"
diff --git a/mlir/test/python/ir/auto_location.py b/mlir/test/python/ir/auto_location.py
index 01b5542119b4e..24b4fb076afe0 100644
--- a/mlir/test/python/ir/auto_location.py
+++ b/mlir/test/python/ir/auto_location.py
@@ -51,7 +51,7 @@ def testInferLocations():
         _cext.globals.register_traceback_file_inclusion(_arith_ops_gen.__file__)
         three = arith.constant(IndexType.get(), 3)
         # fmt: off
-        # CHECK: loc(callsite("ConstantOp.__init__"("{{.*}}[[SEP]]mlir[[SEP]]dialects[[SEP]]_arith_ops_gen.py":397:4 to :235) at callsite("testInferLocations"("{{.*}}[[SEP]]test[[SEP]]python[[SEP]]ir[[SEP]]auto_location.py":52:16 to :50) at callsite("run"("{{.*}}[[SEP]]test[[SEP]]python[[SEP]]ir[[SEP]]auto_location.py":13:4 to :7) at "<module>"("{{.*}}[[SEP]]test[[SEP]]python[[SEP]]ir[[SEP]]auto_location.py":26:1 to :4)))))
+        # CHECK: loc(callsite("ConstantOp.__init__"("{{.*}}[[SEP]]mlir[[SEP]]dialects[[SEP]]_arith_ops_gen.py":649:4 to :235) at callsite("testInferLocations"("{{.*}}[[SEP]]test[[SEP]]python[[SEP]]ir[[SEP]]auto_location.py":52:16 to :50) at callsite("run"("{{.*}}[[SEP]]test[[SEP]]python[[SEP]]ir[[SEP]]auto_location.py":13:4 to :7) at "<module>"("{{.*}}[[SEP]]test[[SEP]]python[[SEP]]ir[[SEP]]auto_location.py":26:1 to :4)))))
         # fmt: on
         print(three.location)
 
@@ -60,14 +60,14 @@ def foo():
             print(four.location)
 
         # fmt: off
-        # CHECK: loc(callsite("ConstantOp.__init__"("{{.*}}[[SEP]]mlir[[SEP]]dialects[[SEP]]_arith_ops_gen.py":397:4 to :235) at callsite("testInferLocations.<locals>.foo"("{{.*}}[[SEP]]test[[SEP]]python[[SEP]]ir[[SEP]]auto_location.py":59:19 to :53) at callsite("testInferLocations"("{{.*}}[[SEP]]test[[SEP]]python[[SEP]]ir[[SEP]]auto_location.py":65:8 to :13) at callsite("run"("{{.*}}[[SEP]]test[[SEP]]python[[SEP]]ir[[SEP]]auto_location.py":13:4 to :7) at "<module>"("{{.*}}[[SEP]]test[[SEP]]python[[SEP]]ir[[SEP]]auto_location.py":26:1 to :4))))))
+        # CHECK: loc(callsite("ConstantOp.__init__"("{{.*}}[[SEP]]mlir[[SEP]]dialects[[SEP]]_arith_ops_gen.py":649:4 to :235) at callsite("testInferLocations.<locals>.foo"("{{.*}}[[SEP]]test[[SEP]]python[[SEP]]ir[[SEP]]auto_location.py":59:19 to :53) at callsite("testInferLocations"("{{.*}}[[SEP]]test[[SEP]]python[[SEP]]ir[[SEP]]auto_location.py":65:8 to :13) at callsite("run"("{{.*}}[[SEP]]test[[SEP]]python[[SEP]]ir[[SEP]]auto_location.py":13:4 to :7) at "<module>"("{{.*}}[[SEP]]test[[SEP]]python[[SEP]]ir[[SEP]]auto_location.py":26:1 to :4))))))
         # fmt: on
         foo()
 
         _cext.globals.register_traceback_file_exclusion(__file__)
 
         # fmt: off
-        # CHECK: loc("ConstantOp.__init__"("{{.*}}[[SEP]]mlir[[SEP]]dialects[[SEP]]_arith_ops_gen.py":397:4 to :235))
+        # CHECK: loc("ConstantOp.__init__"("{{.*}}[[SEP]]mlir[[SEP]]dialects[[SEP]]_arith_ops_gen.py":649:4 to :235))
         # fmt: on
         foo()
 
diff --git a/mlir/tools/mlir-tblgen/OpPythonBindingGen.cpp b/mlir/tools/mlir-tblgen/OpPythonBindingGen.cpp
index 6a7aa9e3432d5..e45a142cf9d38 100644
--- a/mlir/tools/mlir-tblgen/OpPythonBindingGen.cpp
+++ b/mlir/tools/mlir-tblgen/OpPythonBindingGen.cpp
@@ -13,6 +13,7 @@
 
 #include "OpGenHelpers.h"
 
+#include "mlir/Support/IndentedOstream.h"
 #include "mlir/TableGen/GenInfo.h"
 #include "mlir/TableGen/Operator.h"
 #include "llvm/ADT/StringSet.h"
@@ -62,10 +63,11 @@ from ._{0}_ops_gen import _Dialect
 
 /// Template for operation class:
 ///   {0} is the Python class name;
-///   {1} is the operation name.
+///   {1} is the operation name;
+///   {2} is the docstring for this operation.
 constexpr const char *opClassTemplate = R"Py(
 @_ods_cext.register_operation(_Dialect)
-class {0}(_ods_ir.OpView):
+class {0}(_ods_ir.OpView):{2}
   OPERATION_NAME = "{1}"
 )Py";
 
@@ -1034,9 +1036,31 @@ static void emitValueBuilder(const Operator &op,
   }
 }
 
+/// Retrieve the description of the given op and generate a docstring for it.
+static std::string makeDocStringForOp(const Operator &op) {
+  if (!op.hasDescription())
+    return "";
+
+  auto desc = op.getDescription().rtrim(" \t").str();
+  // Replace all """ with \"\"\" to avoid early termination of the literal.
+  desc = llvm::join(llvm::split(desc, R"(""")"), R"(\"\"\")");
+
+  std::string docString = "\n";
+  llvm::raw_string_ostream os(docString);
+  raw_indented_ostream identedOs(os);
+  os << R"(  """)" << "\n";
+  identedOs.printReindented(desc, "  ");
+  if (!StringRef(desc).ends_with("\n"))
+    os << "\n";
+  os << R"(  """)" << "\n";
+
+  return docString;
+}
+
 /// Emits bindings for a specific Op to the given output stream.
 static void emitOpBindings(const Operator &op, raw_ostream &os) {
-  os << formatv(opClassTemplate, op.getCppClassName(), op.getOperationName());
+  os << formatv(opClassTemplate, op.getCppClassName(), op.getOperationName(),
+                makeDocStringForOp(op));
 
   // Sized segments.
   if (op.getTrait(attrSizedTraitForKind("operand")) != nullptr) {

@makslevental
Copy link
Contributor

Hey good job!

@makslevental
Copy link
Contributor

So this is great but I'm worried there could be other illegal characters in people's descriptions (and then this change would break them). For example:

class DescriptionOp:
  """
  This is a long description.
  It has multiple lines.
  A code block (to test the indent).
  \`\`\`mlir
  test.loop {
    test.yield
  }
  \`\`\`
  Add \"\"\" will not terminate the description.
  c:\users\me\stuff.
  """
  pass

print(DescriptionOp.__doc__)

(ignore that I had to escape the backticks here for legal markdown).

This produces

/Users/mlevental/miniforge3/envs/iree/bin/python /Users/mlevental/dev_projects/llvm-project/demo.py 
  File "/Users/mlevental/dev_projects/llvm-project/demo.py", line 13
    """
       ^
SyntaxError: (unicode error) 'unicodeescape' codec can't decode bytes in position 195-196: truncated \uXXXX escape

On the otherhand using a raw docstring

class DescriptionOp:
  r"""
  This is a long description.
  It has multiple lines.
  A code block (to test the indent).
  \`\`\`mlir
  test.loop {
    test.yield
  }
  \`\`\`
  Add \"\"\" will not terminate the description.
  c:\users\me\stuff.
  """
  pass

print(DescriptionOp.__doc__)

prints

  This is a long description.
  It has multiple lines.
  A code block (to test the indent).
  \`\`\`mlir
  test.loop {
    test.yield
  }
  \`\`\`
  Add \"\"\" will not terminate the description.
  c:\users\me\stuff.

So I think we should just be safe and emit raw docstrings.


# fmt: off
# CHECK: loc(callsite("ConstantOp.__init__"("{{.*}}[[SEP]]mlir[[SEP]]dialects[[SEP]]_arith_ops_gen.py":397:4 to :235) at callsite("testInferLocations.<locals>.foo"("{{.*}}[[SEP]]test[[SEP]]python[[SEP]]ir[[SEP]]auto_location.py":59:19 to :53) at callsite("testInferLocations"("{{.*}}[[SEP]]test[[SEP]]python[[SEP]]ir[[SEP]]auto_location.py":65:8 to :13) at callsite("run"("{{.*}}[[SEP]]test[[SEP]]python[[SEP]]ir[[SEP]]auto_location.py":13:4 to :7) at "<module>"("{{.*}}[[SEP]]test[[SEP]]python[[SEP]]ir[[SEP]]auto_location.py":26:1 to :4))))))
# CHECK: loc(callsite("ConstantOp.__init__"("{{.*}}[[SEP]]mlir[[SEP]]dialects[[SEP]]_arith_ops_gen.py":649:4 to :235) at callsite("testInferLocations.<locals>.foo"("{{.*}}[[SEP]]test[[SEP]]python[[SEP]]ir[[SEP]]auto_location.py":59:19 to :53) at callsite("testInferLocations"("{{.*}}[[SEP]]test[[SEP]]python[[SEP]]ir[[SEP]]auto_location.py":65:8 to :13) at callsite("run"("{{.*}}[[SEP]]test[[SEP]]python[[SEP]]ir[[SEP]]auto_location.py":13:4 to :7) at "<module>"("{{.*}}[[SEP]]test[[SEP]]python[[SEP]]ir[[SEP]]auto_location.py":26:1 to :4))))))
Copy link
Member Author

@PragmaTwice PragmaTwice Sep 16, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It seems the line numbers are changed frequently as the tblgen code changed, which may introduce code conflicts. Maybe we can make it a regex like {{ \d+ }}? cc @makslevental

Copy link
Contributor

@makslevental makslevental Sep 16, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i mean yes but then that defeats the purpose of verifying that the line numbers are correct?

Copy link
Contributor

@makslevental makslevental Sep 16, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

actually yea sure why not - for the line numbers in generated code we can do {{ \d+ }} (ain't no one gonna check those anyway lol).

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm yeah you are right. Although practically we may just generate and then paste the line number here. 🤔

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For the generated line number it's fine to ignore - the line numbers that reference the test source file itself are good enough.

@PragmaTwice
Copy link
Member Author

Thank you all! I'm going to merge it.

@makslevental
Copy link
Contributor

Thank you all! I'm going to merge it.

image

@PragmaTwice PragmaTwice merged commit 10d0d95 into llvm:main Sep 16, 2025
9 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

mlir:core MLIR Core Infrastructure mlir

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants