Skip to content

Commit 8adf079

Browse files
Merge OpenAI Triton commit 2a04155 (#3437)
This PR change the Triton base from 27c8363 to 2a04155 (Feb 13). Pass rate: 98.11% Please do not squash and merge this PR.
2 parents 766cab6 + caac355 commit 8adf079

File tree

65 files changed

+681
-398
lines changed

Some content is hidden

Large Commits have some content hidden by default. Use the searchbox below for content that may be hidden.

65 files changed

+681
-398
lines changed

RELEASE.md

Lines changed: 50 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -1,4 +1,53 @@
1-
# Release Process
1+
# Releasing Triton
2+
3+
Triton releases provide a stable snapshot of the code base encapsulated into a binary that can easily be consumed through PyPI. Additionally, releases represent points in time when we, as the development team, can signal to the community that certain new features are available, what improvements have been made, and any changes that are coming that may impact them (i.e. breaking changes).
4+
5+
## Release Compatibility Matrix
6+
7+
Following is the Release Compatibility Matrix for Triton releases:
8+
9+
| Triton version | Python version | Manylinux version |
10+
| --- | --- | --- |
11+
| 3.2.0 | >=3.9, <=3.13 | glibc 2.17+ x86-64 |
12+
| 3.1.0 | >=3.8, <=3.12 | glibc 2.17+ x86-64 |
13+
| 3.0.0 | >=3.8, <=3.12 | glibc 2.17+ x86-64 |
14+
| 2.3.1 | >=3.7, <=3.12 | glibc 2.17+ x86-64 |
15+
| 2.3.0 | >=3.7, <=3.12 | glibc 2.17+ x86-64 |
16+
| 2.2.0 | >=3.7, <=3.12 | glibc 2.17+ x86-64 |
17+
| 2.1.0 | >=3.7, <=3.11 | glibc 2.17+ x86-64 |
18+
| 2.0.0 | >=3.6, <=3.11 | glibc 2.17+ x86-64 |
19+
| 1.1.1 | >=3.6, <=3.9 | glibc 2.17+ x86-64 |
20+
| 1.1.0 | >=3.6, <=3.9 | glibc 2.17+ x86-64 |
21+
| 1.0.0 | >=3.6, <=3.9 | glibc 2.17+ x86-64 |
22+
23+
## Release Cadence
24+
25+
Following is the release cadence for year 2024/2025. All future release dates below are tentative. Please note: Patch Releases are optional.
26+
27+
| Minor Version | Release branch cut | Release date | Patch Release date |
28+
| --- | --- | --- | --- |
29+
| 3.5.0 | Sep 2025 | Oct 2025 | --- |
30+
| 3.4.0 | Jun 2025 | Jul 2025 | --- |
31+
| 3.3.0 | Feb/Mar 2025 | Apr 2025 | --- |
32+
| 3.2.0 | Dec 2024 | Jan 2025 | --- |
33+
| 3.1.0 | Jun 2024 | Oct 2024 | --- |
34+
| 3.0.0 | Jun 2024 | Jul 2024 | --- |
35+
| 2.3.0 | Dec 2023 | Apr 2024 | May 2024 |
36+
| 2.2.0 | Dec 2023 | Jan 2024 | --- |
37+
38+
## Release Cherry-Pick Criteria
39+
40+
After branch cut, we approach finalizing the release branch with clear criteria on what cherry picks are allowed in. Note: a cherry pick is a process to land a PR in the release branch after branch cut. These are typically limited to ensure that the team has sufficient time to complete a thorough round of testing on a stable code base.
41+
42+
* Regression fixes - that address functional/performance regression against the most recent release (e.g. 3.2 for 3.3 release)
43+
* Critical fixes - critical fixes for severe issue such as silent incorrectness, backwards compatibility, crashes, deadlocks, (large) memory leaks
44+
* Fixes to new features introduced in the most recent release (e.g. 3.2 for 3.3 release)
45+
* Documentation improvements
46+
* Release branch specific changes (e.g. change version identifiers or CI fixes)
47+
48+
Please note: **No feature work allowed for cherry picks**. All PRs that are considered for cherry-picks need to be merged on trunk, the only exception are Release branch specific changes. An issue is for tracking cherry-picks to the release branch is created after the branch cut. **Only issues that have ‘cherry-picks’ in the issue tracker will be considered for the release.**
49+
50+
# Intel Release Process
251

352
Intel XPU Backend for Triton releases are aligned to the upstream `triton-lang/triton` project and to `PyTorch`. To make a release:
453

include/triton/Conversion/TritonGPUToLLVM/PatternTritonGPUOpToLLVM.h

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -100,7 +100,7 @@ void populateSPMDOpToLLVMPattern(LLVMTypeConverter &typeConverter,
100100
PatternBenefit benefit);
101101

102102
void populateFuncOpConversionPattern(LLVMTypeConverter &typeConverter,
103-
RewritePatternSet &patterns, int numWarps,
103+
RewritePatternSet &patterns,
104104
const TargetInfoBase &targetInfo,
105105
PatternBenefit benefit);
106106

include/triton/Conversion/TritonGPUToLLVM/TypeConverter.h

Lines changed: 5 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -14,7 +14,11 @@ class TritonGPUToLLVMTypeConverter : public LLVMTypeConverter {
1414
public:
1515
using TypeConverter::convertType;
1616

17-
TritonGPUToLLVMTypeConverter(MLIRContext *ctx, LowerToLLVMOptions &option,
17+
TritonGPUToLLVMTypeConverter(MLIRContext *ctx,
18+
const LowerToLLVMOptions &option,
19+
const TargetInfoBase &targetInfo,
20+
const DataLayoutAnalysis *analysis = nullptr);
21+
TritonGPUToLLVMTypeConverter(MLIRContext *ctx,
1822
const TargetInfoBase &targetInfo,
1923
const DataLayoutAnalysis *analysis = nullptr);
2024

include/triton/Conversion/TritonToTritonGPU/TritonToTritonGPUPass.h

Lines changed: 0 additions & 6 deletions
Original file line numberDiff line numberDiff line change
@@ -12,12 +12,6 @@ template <typename T> class OperationPass;
1212

1313
namespace triton {
1414

15-
constexpr static char AttrNumWarpsName[] = "ttg.num-warps";
16-
constexpr static char AttrNumCTAsName[] = "ttg.num-ctas";
17-
constexpr static char AttrTargetName[] = "ttg.target";
18-
19-
constexpr static char AttrNumThreadsPerWarp[] = "ttg.threads-per-warp";
20-
2115
// Create the pass with numWarps passed from cl::opt.
2216
std::unique_ptr<OperationPass<ModuleOp>> createConvertTritonToTritonGPUPass();
2317

include/triton/Dialect/Triton/IR/Dialect.h

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -91,7 +91,7 @@ class DialectVerifyTensorLayoutInterface
9191
DialectVerifyTensorLayoutInterface(Dialect *dialect) : Base(dialect) {}
9292

9393
virtual LogicalResult
94-
verifyTensorLayout(Attribute layout, RankedTensorType type, ModuleOp module,
94+
verifyTensorLayout(Attribute layout, RankedTensorType type, Operation *op,
9595
function_ref<InFlightDiagnostic()> emitError) const = 0;
9696
};
9797

include/triton/Dialect/Triton/IR/TritonOps.td

Lines changed: 5 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -1118,7 +1118,11 @@ def CallOp : TT_Op<"call", [CallOpInterface, /*MemRefsNormalizable, */DeclareOpI
11181118
}];
11191119
}
11201120

1121-
def FuncOp : TT_Op<"func", [AffineScope, AutomaticAllocationScope, CallableOpInterface, FunctionOpInterface, IsolatedFromAbove, OpAsmOpInterface]> {
1121+
def FuncOp : TT_Op<"func", [
1122+
AffineScope, AutomaticAllocationScope, CallableOpInterface,
1123+
FunctionOpInterface, IsolatedFromAbove, OpAsmOpInterface,
1124+
HasParent<"ModuleOp">
1125+
]> {
11221126
let summary = "An operation with a name containing a single `SSACFG` region";
11231127
let description = [{
11241128
Operations within the function cannot implicitly capture values defined

include/triton/Dialect/TritonGPU/IR/Dialect.h

Lines changed: 7 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -39,6 +39,13 @@ template <> struct hash<CacheKey> {
3939

4040
namespace mlir::triton::gpu {
4141

42+
constexpr static char AttrNumWarpsName[] = "ttg.num-warps";
43+
constexpr static char AttrNumCTAsName[] = "ttg.num-ctas";
44+
constexpr static char AttrTargetName[] = "ttg.target";
45+
constexpr static char AttrNumThreadsPerWarp[] = "ttg.threads-per-warp";
46+
47+
int lookupNumWarps(Operation *op);
48+
4249
class LinearLayoutCache {
4350
public:
4451
std::optional<LinearLayout> get(const CacheKey &key) {

include/triton/Dialect/TritonGPU/IR/TritonGPUDialect.td

Lines changed: 3 additions & 22 deletions
Original file line numberDiff line numberDiff line change
@@ -20,32 +20,13 @@ def TritonGPU_Dialect : Dialect {
2020
];
2121

2222
let extraClassDeclaration = [{
23-
static std::string getNumWarpsAttrName() { return "ttg.num-warps"; }
24-
static int getNumWarps(ModuleOp mod) {
25-
if (!mod->hasAttr("ttg.num-warps"))
26-
llvm::report_fatal_error(
27-
"TritonGPU module should contain a ttg.num-warps attribute");
28-
return cast<IntegerAttr>(mod->getAttr("ttg.num-warps")).getInt();
29-
}
30-
static int getNumCTAs(ModuleOp mod) {
31-
if (!mod->hasAttr("ttg.num-ctas"))
32-
return 1;
33-
return cast<IntegerAttr>(mod->getAttr("ttg.num-ctas")).getInt();
34-
}
3523
void registerTypes();
3624

37-
static std::string getThreadsPerWarpAttrName() { return "ttg.threads-per-warp"; }
38-
39-
static int getThreadsPerWarp(ModuleOp mod) {
40-
Attribute threadsPerWarp = mod->getDiscardableAttr("ttg.threads-per-warp");
41-
if(!threadsPerWarp) {
42-
return 32;
43-
}
44-
return cast<IntegerAttr>(threadsPerWarp).getInt();
45-
}
46-
4725
LinearLayout toLinearLayout(ArrayRef<int64_t> shape, Attribute layout);
4826

27+
static int getNumCTAs(ModuleOp mod);
28+
static int getThreadsPerWarp(ModuleOp mod);
29+
4930
private:
5031
LinearLayoutCache llCache;
5132
}];

include/triton/Dialect/TritonGPU/Transforms/Schedule.h

Lines changed: 2 additions & 19 deletions
Original file line numberDiff line numberDiff line change
@@ -106,27 +106,10 @@ class CoarseSchedule {
106106
return true;
107107
}
108108

109-
void insertMinimum(Operation *op, int stage, Cluster cluster) {
110-
auto res = opToStageAndCluster.insert({op, {stage, cluster}});
111-
if (res.second) {
112-
return;
113-
}
114-
auto &[existingStage, existingCluster] = res.first->second;
115-
existingStage = std::min(stage, existingStage);
116-
117-
// If existingCluster is reachable from cluster,
118-
// then cluster is earlier in the list
119-
auto it = cluster;
120-
for (auto it = cluster; it != clusters.end(); ++it) {
121-
if (it == existingCluster) {
122-
existingCluster = cluster;
123-
return;
124-
}
125-
}
126-
}
109+
bool insertMinimum(Operation *op, int stage, Cluster cluster);
127110

128111
bool insertDepsOfOp(Operation *op, int stage, CoarseSchedule::Cluster cluster,
129-
bool includeArg);
112+
bool includeArg, bool insertIfEarlier = false);
130113

131114
void erase(Operation *op) { opToStageAndCluster.erase(op); }
132115

include/triton/Dialect/TritonNvidiaGPU/IR/Dialect.h

Lines changed: 3 additions & 7 deletions
Original file line numberDiff line numberDiff line change
@@ -42,9 +42,7 @@
4242
#define GET_OP_CLASSES
4343
#include "triton/Dialect/TritonNvidiaGPU/IR/Ops.h.inc"
4444

45-
namespace mlir {
46-
namespace triton {
47-
namespace nvidia_gpu {
45+
namespace mlir::triton::nvidia_gpu {
4846

4947
struct TensorMemory : public SideEffects::Resource::Base<TensorMemory> {
5048
StringRef getName() final { return "<TensorMemory>"; }
@@ -63,12 +61,10 @@ Attribute getTmemCompatibleLayout(unsigned M, unsigned N,
6361
ArrayRef<int64_t> shape, unsigned numWarps,
6462
triton::gpu::CTALayoutAttr ctaLayout);
6563

66-
bool isDistributedLayoutTMemCompatible(ModuleOp mod,
64+
bool isDistributedLayoutTMemCompatible(Operation *op,
6765
RankedTensorType tensorType,
6866
gpu::MemDescType memType);
6967

70-
} // namespace nvidia_gpu
71-
} // namespace triton
72-
} // namespace mlir
68+
} // namespace mlir::triton::nvidia_gpu
7369

7470
#endif // TRITON_DIALECT_TRITONNVIDIAGPU_IR_DIALECT_H_

0 commit comments

Comments
 (0)