-
Notifications
You must be signed in to change notification settings - Fork 15k
[MLIR][Python] Add docstring for generated python op classes #158198
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
|
@llvm/pr-subscribers-mlir-core Author: Twice (PragmaTwice) ChangesThis PR adds support in mlir-tblgen for generating docstrings for each Python class corresponding to an MLIR op. The docstrings are currently derived from the op’s description in ODS, with indentation adjusted to display nicely in Python. This makes it easier for Python users to see the op descriptions directly in their IDE or LSP while coding. In the future, we can extend the docstrings to include explanations for each method, attribute, and so on. This idea was previously discussed in the Full diff: https://github.com/llvm/llvm-project/pull/158198.diff 3 Files Affected:
diff --git a/mlir/test/mlir-tblgen/op-python-bindings.td b/mlir/test/mlir-tblgen/op-python-bindings.td
index 3ec69c33b4bb9..9f36d2b1ad0f5 100644
--- a/mlir/test/mlir-tblgen/op-python-bindings.td
+++ b/mlir/test/mlir-tblgen/op-python-bindings.td
@@ -252,6 +252,34 @@ def DeriveResultTypesVariadicOp : TestOp<"derive_result_types_variadic_op", [Fir
// CHECK: def derive_result_types_variadic_op(res, _gen_res_1, type_, *, loc=None, ip=None)
// CHECK: return _get_op_result_or_op_results(DeriveResultTypesVariadicOp(res=res, _gen_res_1=_gen_res_1, type_=type_, loc=loc, ip=ip))
+
+// CHECK: class DescriptionOp(_ods_ir.OpView):
+// CHECK: """
+// CHECK: This is a long description.
+// CHECK: It has multiple lines.
+// CHECK: A code block (to test the indent).
+// CHECK: ```mlir
+// CHECK: test.loop {
+// CHECK: test.yield
+// CHECK: }
+// CHECK: ```
+// CHECK: Add \"\"\" will not terminate the description.
+// CHECK: """
+def DescriptionOp : TestOp<"description"> {
+ let description = [{
+ This is a long description.
+ It has multiple lines.
+
+ A code block (to test the indent).
+ ```mlir
+ test.loop {
+ test.yield
+ }
+ ```
+ Add """ will not terminate the description.
+ }];
+}
+
// CHECK: @_ods_cext.register_operation(_Dialect)
// CHECK: class EmptyOp(_ods_ir.OpView):
// CHECK-LABEL: OPERATION_NAME = "test.empty"
diff --git a/mlir/test/python/ir/auto_location.py b/mlir/test/python/ir/auto_location.py
index 01b5542119b4e..24b4fb076afe0 100644
--- a/mlir/test/python/ir/auto_location.py
+++ b/mlir/test/python/ir/auto_location.py
@@ -51,7 +51,7 @@ def testInferLocations():
_cext.globals.register_traceback_file_inclusion(_arith_ops_gen.__file__)
three = arith.constant(IndexType.get(), 3)
# fmt: off
- # CHECK: loc(callsite("ConstantOp.__init__"("{{.*}}[[SEP]]mlir[[SEP]]dialects[[SEP]]_arith_ops_gen.py":397:4 to :235) at callsite("testInferLocations"("{{.*}}[[SEP]]test[[SEP]]python[[SEP]]ir[[SEP]]auto_location.py":52:16 to :50) at callsite("run"("{{.*}}[[SEP]]test[[SEP]]python[[SEP]]ir[[SEP]]auto_location.py":13:4 to :7) at "<module>"("{{.*}}[[SEP]]test[[SEP]]python[[SEP]]ir[[SEP]]auto_location.py":26:1 to :4)))))
+ # CHECK: loc(callsite("ConstantOp.__init__"("{{.*}}[[SEP]]mlir[[SEP]]dialects[[SEP]]_arith_ops_gen.py":649:4 to :235) at callsite("testInferLocations"("{{.*}}[[SEP]]test[[SEP]]python[[SEP]]ir[[SEP]]auto_location.py":52:16 to :50) at callsite("run"("{{.*}}[[SEP]]test[[SEP]]python[[SEP]]ir[[SEP]]auto_location.py":13:4 to :7) at "<module>"("{{.*}}[[SEP]]test[[SEP]]python[[SEP]]ir[[SEP]]auto_location.py":26:1 to :4)))))
# fmt: on
print(three.location)
@@ -60,14 +60,14 @@ def foo():
print(four.location)
# fmt: off
- # CHECK: loc(callsite("ConstantOp.__init__"("{{.*}}[[SEP]]mlir[[SEP]]dialects[[SEP]]_arith_ops_gen.py":397:4 to :235) at callsite("testInferLocations.<locals>.foo"("{{.*}}[[SEP]]test[[SEP]]python[[SEP]]ir[[SEP]]auto_location.py":59:19 to :53) at callsite("testInferLocations"("{{.*}}[[SEP]]test[[SEP]]python[[SEP]]ir[[SEP]]auto_location.py":65:8 to :13) at callsite("run"("{{.*}}[[SEP]]test[[SEP]]python[[SEP]]ir[[SEP]]auto_location.py":13:4 to :7) at "<module>"("{{.*}}[[SEP]]test[[SEP]]python[[SEP]]ir[[SEP]]auto_location.py":26:1 to :4))))))
+ # CHECK: loc(callsite("ConstantOp.__init__"("{{.*}}[[SEP]]mlir[[SEP]]dialects[[SEP]]_arith_ops_gen.py":649:4 to :235) at callsite("testInferLocations.<locals>.foo"("{{.*}}[[SEP]]test[[SEP]]python[[SEP]]ir[[SEP]]auto_location.py":59:19 to :53) at callsite("testInferLocations"("{{.*}}[[SEP]]test[[SEP]]python[[SEP]]ir[[SEP]]auto_location.py":65:8 to :13) at callsite("run"("{{.*}}[[SEP]]test[[SEP]]python[[SEP]]ir[[SEP]]auto_location.py":13:4 to :7) at "<module>"("{{.*}}[[SEP]]test[[SEP]]python[[SEP]]ir[[SEP]]auto_location.py":26:1 to :4))))))
# fmt: on
foo()
_cext.globals.register_traceback_file_exclusion(__file__)
# fmt: off
- # CHECK: loc("ConstantOp.__init__"("{{.*}}[[SEP]]mlir[[SEP]]dialects[[SEP]]_arith_ops_gen.py":397:4 to :235))
+ # CHECK: loc("ConstantOp.__init__"("{{.*}}[[SEP]]mlir[[SEP]]dialects[[SEP]]_arith_ops_gen.py":649:4 to :235))
# fmt: on
foo()
diff --git a/mlir/tools/mlir-tblgen/OpPythonBindingGen.cpp b/mlir/tools/mlir-tblgen/OpPythonBindingGen.cpp
index 6a7aa9e3432d5..e45a142cf9d38 100644
--- a/mlir/tools/mlir-tblgen/OpPythonBindingGen.cpp
+++ b/mlir/tools/mlir-tblgen/OpPythonBindingGen.cpp
@@ -13,6 +13,7 @@
#include "OpGenHelpers.h"
+#include "mlir/Support/IndentedOstream.h"
#include "mlir/TableGen/GenInfo.h"
#include "mlir/TableGen/Operator.h"
#include "llvm/ADT/StringSet.h"
@@ -62,10 +63,11 @@ from ._{0}_ops_gen import _Dialect
/// Template for operation class:
/// {0} is the Python class name;
-/// {1} is the operation name.
+/// {1} is the operation name;
+/// {2} is the docstring for this operation.
constexpr const char *opClassTemplate = R"Py(
@_ods_cext.register_operation(_Dialect)
-class {0}(_ods_ir.OpView):
+class {0}(_ods_ir.OpView):{2}
OPERATION_NAME = "{1}"
)Py";
@@ -1034,9 +1036,31 @@ static void emitValueBuilder(const Operator &op,
}
}
+/// Retrieve the description of the given op and generate a docstring for it.
+static std::string makeDocStringForOp(const Operator &op) {
+ if (!op.hasDescription())
+ return "";
+
+ auto desc = op.getDescription().rtrim(" \t").str();
+ // Replace all """ with \"\"\" to avoid early termination of the literal.
+ desc = llvm::join(llvm::split(desc, R"(""")"), R"(\"\"\")");
+
+ std::string docString = "\n";
+ llvm::raw_string_ostream os(docString);
+ raw_indented_ostream identedOs(os);
+ os << R"( """)" << "\n";
+ identedOs.printReindented(desc, " ");
+ if (!StringRef(desc).ends_with("\n"))
+ os << "\n";
+ os << R"( """)" << "\n";
+
+ return docString;
+}
+
/// Emits bindings for a specific Op to the given output stream.
static void emitOpBindings(const Operator &op, raw_ostream &os) {
- os << formatv(opClassTemplate, op.getCppClassName(), op.getOperationName());
+ os << formatv(opClassTemplate, op.getCppClassName(), op.getOperationName(),
+ makeDocStringForOp(op));
// Sized segments.
if (op.getTrait(attrSizedTraitForKind("operand")) != nullptr) {
|
|
@llvm/pr-subscribers-mlir Author: Twice (PragmaTwice) ChangesThis PR adds support in mlir-tblgen for generating docstrings for each Python class corresponding to an MLIR op. The docstrings are currently derived from the op’s description in ODS, with indentation adjusted to display nicely in Python. This makes it easier for Python users to see the op descriptions directly in their IDE or LSP while coding. In the future, we can extend the docstrings to include explanations for each method, attribute, and so on. This idea was previously discussed in the Full diff: https://github.com/llvm/llvm-project/pull/158198.diff 3 Files Affected:
diff --git a/mlir/test/mlir-tblgen/op-python-bindings.td b/mlir/test/mlir-tblgen/op-python-bindings.td
index 3ec69c33b4bb9..9f36d2b1ad0f5 100644
--- a/mlir/test/mlir-tblgen/op-python-bindings.td
+++ b/mlir/test/mlir-tblgen/op-python-bindings.td
@@ -252,6 +252,34 @@ def DeriveResultTypesVariadicOp : TestOp<"derive_result_types_variadic_op", [Fir
// CHECK: def derive_result_types_variadic_op(res, _gen_res_1, type_, *, loc=None, ip=None)
// CHECK: return _get_op_result_or_op_results(DeriveResultTypesVariadicOp(res=res, _gen_res_1=_gen_res_1, type_=type_, loc=loc, ip=ip))
+
+// CHECK: class DescriptionOp(_ods_ir.OpView):
+// CHECK: """
+// CHECK: This is a long description.
+// CHECK: It has multiple lines.
+// CHECK: A code block (to test the indent).
+// CHECK: ```mlir
+// CHECK: test.loop {
+// CHECK: test.yield
+// CHECK: }
+// CHECK: ```
+// CHECK: Add \"\"\" will not terminate the description.
+// CHECK: """
+def DescriptionOp : TestOp<"description"> {
+ let description = [{
+ This is a long description.
+ It has multiple lines.
+
+ A code block (to test the indent).
+ ```mlir
+ test.loop {
+ test.yield
+ }
+ ```
+ Add """ will not terminate the description.
+ }];
+}
+
// CHECK: @_ods_cext.register_operation(_Dialect)
// CHECK: class EmptyOp(_ods_ir.OpView):
// CHECK-LABEL: OPERATION_NAME = "test.empty"
diff --git a/mlir/test/python/ir/auto_location.py b/mlir/test/python/ir/auto_location.py
index 01b5542119b4e..24b4fb076afe0 100644
--- a/mlir/test/python/ir/auto_location.py
+++ b/mlir/test/python/ir/auto_location.py
@@ -51,7 +51,7 @@ def testInferLocations():
_cext.globals.register_traceback_file_inclusion(_arith_ops_gen.__file__)
three = arith.constant(IndexType.get(), 3)
# fmt: off
- # CHECK: loc(callsite("ConstantOp.__init__"("{{.*}}[[SEP]]mlir[[SEP]]dialects[[SEP]]_arith_ops_gen.py":397:4 to :235) at callsite("testInferLocations"("{{.*}}[[SEP]]test[[SEP]]python[[SEP]]ir[[SEP]]auto_location.py":52:16 to :50) at callsite("run"("{{.*}}[[SEP]]test[[SEP]]python[[SEP]]ir[[SEP]]auto_location.py":13:4 to :7) at "<module>"("{{.*}}[[SEP]]test[[SEP]]python[[SEP]]ir[[SEP]]auto_location.py":26:1 to :4)))))
+ # CHECK: loc(callsite("ConstantOp.__init__"("{{.*}}[[SEP]]mlir[[SEP]]dialects[[SEP]]_arith_ops_gen.py":649:4 to :235) at callsite("testInferLocations"("{{.*}}[[SEP]]test[[SEP]]python[[SEP]]ir[[SEP]]auto_location.py":52:16 to :50) at callsite("run"("{{.*}}[[SEP]]test[[SEP]]python[[SEP]]ir[[SEP]]auto_location.py":13:4 to :7) at "<module>"("{{.*}}[[SEP]]test[[SEP]]python[[SEP]]ir[[SEP]]auto_location.py":26:1 to :4)))))
# fmt: on
print(three.location)
@@ -60,14 +60,14 @@ def foo():
print(four.location)
# fmt: off
- # CHECK: loc(callsite("ConstantOp.__init__"("{{.*}}[[SEP]]mlir[[SEP]]dialects[[SEP]]_arith_ops_gen.py":397:4 to :235) at callsite("testInferLocations.<locals>.foo"("{{.*}}[[SEP]]test[[SEP]]python[[SEP]]ir[[SEP]]auto_location.py":59:19 to :53) at callsite("testInferLocations"("{{.*}}[[SEP]]test[[SEP]]python[[SEP]]ir[[SEP]]auto_location.py":65:8 to :13) at callsite("run"("{{.*}}[[SEP]]test[[SEP]]python[[SEP]]ir[[SEP]]auto_location.py":13:4 to :7) at "<module>"("{{.*}}[[SEP]]test[[SEP]]python[[SEP]]ir[[SEP]]auto_location.py":26:1 to :4))))))
+ # CHECK: loc(callsite("ConstantOp.__init__"("{{.*}}[[SEP]]mlir[[SEP]]dialects[[SEP]]_arith_ops_gen.py":649:4 to :235) at callsite("testInferLocations.<locals>.foo"("{{.*}}[[SEP]]test[[SEP]]python[[SEP]]ir[[SEP]]auto_location.py":59:19 to :53) at callsite("testInferLocations"("{{.*}}[[SEP]]test[[SEP]]python[[SEP]]ir[[SEP]]auto_location.py":65:8 to :13) at callsite("run"("{{.*}}[[SEP]]test[[SEP]]python[[SEP]]ir[[SEP]]auto_location.py":13:4 to :7) at "<module>"("{{.*}}[[SEP]]test[[SEP]]python[[SEP]]ir[[SEP]]auto_location.py":26:1 to :4))))))
# fmt: on
foo()
_cext.globals.register_traceback_file_exclusion(__file__)
# fmt: off
- # CHECK: loc("ConstantOp.__init__"("{{.*}}[[SEP]]mlir[[SEP]]dialects[[SEP]]_arith_ops_gen.py":397:4 to :235))
+ # CHECK: loc("ConstantOp.__init__"("{{.*}}[[SEP]]mlir[[SEP]]dialects[[SEP]]_arith_ops_gen.py":649:4 to :235))
# fmt: on
foo()
diff --git a/mlir/tools/mlir-tblgen/OpPythonBindingGen.cpp b/mlir/tools/mlir-tblgen/OpPythonBindingGen.cpp
index 6a7aa9e3432d5..e45a142cf9d38 100644
--- a/mlir/tools/mlir-tblgen/OpPythonBindingGen.cpp
+++ b/mlir/tools/mlir-tblgen/OpPythonBindingGen.cpp
@@ -13,6 +13,7 @@
#include "OpGenHelpers.h"
+#include "mlir/Support/IndentedOstream.h"
#include "mlir/TableGen/GenInfo.h"
#include "mlir/TableGen/Operator.h"
#include "llvm/ADT/StringSet.h"
@@ -62,10 +63,11 @@ from ._{0}_ops_gen import _Dialect
/// Template for operation class:
/// {0} is the Python class name;
-/// {1} is the operation name.
+/// {1} is the operation name;
+/// {2} is the docstring for this operation.
constexpr const char *opClassTemplate = R"Py(
@_ods_cext.register_operation(_Dialect)
-class {0}(_ods_ir.OpView):
+class {0}(_ods_ir.OpView):{2}
OPERATION_NAME = "{1}"
)Py";
@@ -1034,9 +1036,31 @@ static void emitValueBuilder(const Operator &op,
}
}
+/// Retrieve the description of the given op and generate a docstring for it.
+static std::string makeDocStringForOp(const Operator &op) {
+ if (!op.hasDescription())
+ return "";
+
+ auto desc = op.getDescription().rtrim(" \t").str();
+ // Replace all """ with \"\"\" to avoid early termination of the literal.
+ desc = llvm::join(llvm::split(desc, R"(""")"), R"(\"\"\")");
+
+ std::string docString = "\n";
+ llvm::raw_string_ostream os(docString);
+ raw_indented_ostream identedOs(os);
+ os << R"( """)" << "\n";
+ identedOs.printReindented(desc, " ");
+ if (!StringRef(desc).ends_with("\n"))
+ os << "\n";
+ os << R"( """)" << "\n";
+
+ return docString;
+}
+
/// Emits bindings for a specific Op to the given output stream.
static void emitOpBindings(const Operator &op, raw_ostream &os) {
- os << formatv(opClassTemplate, op.getCppClassName(), op.getOperationName());
+ os << formatv(opClassTemplate, op.getCppClassName(), op.getOperationName(),
+ makeDocStringForOp(op));
// Sized segments.
if (op.getTrait(attrSizedTraitForKind("operand")) != nullptr) {
|
|
Hey good job! |
|
So this is great but I'm worried there could be other illegal characters in people's descriptions (and then this change would break them). For example: class DescriptionOp:
"""
This is a long description.
It has multiple lines.
A code block (to test the indent).
\`\`\`mlir
test.loop {
test.yield
}
\`\`\`
Add \"\"\" will not terminate the description.
c:\users\me\stuff.
"""
pass
print(DescriptionOp.__doc__)(ignore that I had to escape the backticks here for legal markdown). This produces On the otherhand using a raw docstring class DescriptionOp:
r"""
This is a long description.
It has multiple lines.
A code block (to test the indent).
\`\`\`mlir
test.loop {
test.yield
}
\`\`\`
Add \"\"\" will not terminate the description.
c:\users\me\stuff.
"""
pass
print(DescriptionOp.__doc__)prints So I think we should just be safe and emit raw docstrings. |
Co-authored-by: Maksim Levental <[email protected]>
mlir/test/python/ir/auto_location.py
Outdated
|
|
||
| # fmt: off | ||
| # CHECK: loc(callsite("ConstantOp.__init__"("{{.*}}[[SEP]]mlir[[SEP]]dialects[[SEP]]_arith_ops_gen.py":397:4 to :235) at callsite("testInferLocations.<locals>.foo"("{{.*}}[[SEP]]test[[SEP]]python[[SEP]]ir[[SEP]]auto_location.py":59:19 to :53) at callsite("testInferLocations"("{{.*}}[[SEP]]test[[SEP]]python[[SEP]]ir[[SEP]]auto_location.py":65:8 to :13) at callsite("run"("{{.*}}[[SEP]]test[[SEP]]python[[SEP]]ir[[SEP]]auto_location.py":13:4 to :7) at "<module>"("{{.*}}[[SEP]]test[[SEP]]python[[SEP]]ir[[SEP]]auto_location.py":26:1 to :4)))))) | ||
| # CHECK: loc(callsite("ConstantOp.__init__"("{{.*}}[[SEP]]mlir[[SEP]]dialects[[SEP]]_arith_ops_gen.py":649:4 to :235) at callsite("testInferLocations.<locals>.foo"("{{.*}}[[SEP]]test[[SEP]]python[[SEP]]ir[[SEP]]auto_location.py":59:19 to :53) at callsite("testInferLocations"("{{.*}}[[SEP]]test[[SEP]]python[[SEP]]ir[[SEP]]auto_location.py":65:8 to :13) at callsite("run"("{{.*}}[[SEP]]test[[SEP]]python[[SEP]]ir[[SEP]]auto_location.py":13:4 to :7) at "<module>"("{{.*}}[[SEP]]test[[SEP]]python[[SEP]]ir[[SEP]]auto_location.py":26:1 to :4)))))) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It seems the line numbers are changed frequently as the tblgen code changed, which may introduce code conflicts. Maybe we can make it a regex like {{ \d+ }}? cc @makslevental
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
i mean yes but then that defeats the purpose of verifying that the line numbers are correct?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
actually yea sure why not - for the line numbers in generated code we can do {{ \d+ }} (ain't no one gonna check those anyway lol).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hmm yeah you are right. Although practically we may just generate and then paste the line number here. 🤔
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For the generated line number it's fine to ignore - the line numbers that reference the test source file itself are good enough.
|
Thank you all! I'm going to merge it. |

This PR adds support in mlir-tblgen for generating docstrings for each Python class corresponding to an MLIR op. The docstrings are currently derived from the op’s description in ODS, with indentation adjusted to display nicely in Python. This makes it easier for Python users to see the op descriptions directly in their IDE or LSP while coding.
In the future, we can extend the docstrings to include explanations for each method, attribute, and so on.
This idea was previously discussed in the
#mlir-pythonchannel on Discord with @makslevental and @superbobry.