Skip to content
Merged
Show file tree
Hide file tree
Changes from 83 commits
Commits
Show all changes
84 commits
Select commit Hold shift + click to select a range
8734655
[release/2.8] Enable wheels
jithunnair-amd Apr 22, 2025
dc95b0c
Updates to build for Noble (Ubuntu 24.04) and py3.12
jithunnair-amd Jul 15, 2025
b741af3
[release/2.8] Make triton build ROCm version agnostic
ethanwee1 May 20, 2025
b4c293a
[release/2.8] Replace upstream install_rocm_magma.sh with rocm fork v…
jithunnair-amd Jul 16, 2025
9ed3d2e
[release/2.8] Upgrade numpy versions; Use different package versions …
jithunnair-amd Jul 16, 2025
12508fd
[release/2.8] Removing --user flag from all pip install commands
ethanwee1 Jun 19, 2025
90d7d4b
[ROCm] Remove use of warpsize on host-side compilation (pytorch#156979)
jithunnair-amd Jul 16, 2025
186180d
[release/2.8] Improve C10_WARP_SIZE compatibility
xinyazhang Jul 16, 2025
8e7b99f
Fix sha256 for aotriton ROCm7.0 tarball
jithunnair-amd Jul 16, 2025
d7c64fc
Update third_party/composable_kernel submodule commit as per https://…
jithunnair-amd Jul 16, 2025
b81d4d1
Use ROCm/triton and update triton.txt
jithunnair-amd Jul 16, 2025
98e9537
Add related_commits file (#2396)
pragupta Jul 22, 2025
12a145a
Add QA automation scripts for running PyTorch unit tests
jithunnair-amd Feb 19, 2025
3c7ddbf
[release/2.6] enable NHWC batchnorm with MIOpen (#2023)
dnikolaev-amd Apr 11, 2025
fb20451
test_decompose_mem_bound_mm.py tolerance increase for navi3x
iupaikov-amd May 13, 2025
32449c9
[release/2.7] enable NHWC batchnorm by default on ROCm7.0+ (#2180)
dnikolaev-amd May 22, 2025
23f0b5f
[release/2.7] import 'Dict' to fix common_utils.py (#2181)
dnikolaev-amd May 24, 2025
48630d8
[AUTOGENERATED] [release/2.7] [rocm6.4_internal_testing] Replaced ROC…
okakarpa May 29, 2025
ae17c3a
[release/2.7] [SWDEV-535259] enable miopen channels last 3d for conv …
okakarpa Jun 4, 2025
e4d62b1
[AUTOGENERATED] [release/2.7] Add 3D batchnorm tests (#2243)
okakarpa Jun 4, 2025
d40f3c8
[AUTOGENERATED] [release/2.5] [ROCm][layer_norm] Use __builtin_amdgcn…
rocm-mici Dec 18, 2024
dbb9f2a
[release/2.6] remove xfail from 'batch_norm_with_update' (#2070)
dnikolaev-amd Apr 30, 2025
e62e394
[release/2.7] Enable mx fp8 support on ROCm (#2199)
jagadish-amd Jun 4, 2025
e0160f1
Extend CK gemm/sdpa support to gfx950 (#45)
alugorey Apr 2, 2025
08390c7
[release/2.6] [SWDEV-529824] Fix Float16 CooperativeReduction Test Fa…
pmaybank May 29, 2025
01857c6
[ROCm] Set thread_work_size to 16 for vectorized elementwise kernels …
jerrymannil Jun 10, 2025
e60c0c4
[release/2.7] Fix SDPA skip logic (#2281)
AmdSampsa Jun 19, 2025
01eaee8
[release/2.7] Update test_binary_ufuncs.py after numpy upgrade (#2289)
ethanwee1 Jul 1, 2025
80e8974
[AUTOGENERATED] [release/2.7] fix jit_utils.cpp (#2320)
okakarpa Jul 8, 2025
bb44c0c
Clean up CUDA state between tests (#2335)
rraminen Jul 14, 2025
1f312c4
cublaslt/hipblaslt persistent workspace (#156495)
jeffdaily Jun 28, 2025
3b7f377
[AUTOGENERATED] [release/2.7] [release/2.6] Fix dtype before comparin…
okakarpa Jul 15, 2025
8b23614
[ROCm][Windows] Fixing undefined symbol linker error after exposing M…
tvukovic-amd Jun 27, 2025
5446c03
[MPS] Fix `index_kernel` for large tensors (#158239)
pytorchbot Jul 16, 2025
71c68bc
Add flag to fx.passes.split_module to normalize input names (#157793)
pytorchbot Jul 16, 2025
4c1d666
Add warning about removed sm50 and sm60 arches (#158478)
pytorchbot Jul 16, 2025
352edf2
[cherry-pick][inductor][triton] Update HAS_WARP_SPEC to check triton.…
atalman Jul 18, 2025
66b89d1
[CUDA] Use runtime driver API for cuStreamWriteValue32 (#158585)
pytorchbot Jul 18, 2025
10eb3f2
Add stride check for attn_mask on non-cpu device (#158618)
CaoE Jul 18, 2025
117d9d4
[cherry-pick] temporarily disabling generation of weblinks for torch …
Sidharth123-cpu Jul 18, 2025
88d04c8
[Reland] Add warning about removed sm50 and sm60 arches (#158744)
atalman Jul 21, 2025
3006279
[cherry-pick][release 2.8] Update OpenBLAS commit (#151547) (#158243)
Camyll Jul 21, 2025
45ef46b
[cherry-pick][Docker builds] Move from Miniconda to Miniforge (#15837…
atalman Jul 21, 2025
e5e8a38
[async-TP] Turn asserts back into silent skips (#158736)
pytorchbot Jul 21, 2025
a3dea79
[cherry-pick] Fix AArch64 segfaults by disabling strict-aliasing in G…
robert-hardwick Jul 21, 2025
d3960e5
Pull latest Sphinx theme (#158595) (#158673)
svekars Jul 22, 2025
2f85ac2
[Dynamo] Use proper sources for constructing dataclass defaults (#158…
pytorchbot Jul 22, 2025
9298444
[cherry-pick] Unify torch.tensor and torch.ops.aten.scalar_tensor beh…
atalman Jul 22, 2025
29973ff
Cherry pick PR 158746 (#158801)
svekars Jul 22, 2025
d007588
[MPS] Reimplement `tri[ul]` as Metal shaders (#158867)
pytorchbot Jul 22, 2025
9176b69
[MPS] Switch Cholesky decomp to column wise (#158237)
pytorchbot Jul 22, 2025
2d0385b
Revert "[Dynamo] Allow inlining into AO quantization modules (#152934…
atalman Jul 22, 2025
c1f2017
Move out super large one off foreach_copy test (#158880)
pytorchbot Jul 22, 2025
947a201
[Release Only] Remove nvshmem from list of preload libraries (#158925)
atalman Jul 23, 2025
360aa17
Use ROCm/triton and update triton.txt
jithunnair-amd Jul 16, 2025
f34b83a
[release/2.8] [Bugfix][Inductor] Fix dependency list merged incorrect…
pragupta Jul 25, 2025
bbb1d6e
[release/2.8] enable py3.13 (#2366)
ethanwee1 Jul 17, 2025
af2ce88
[SWDEV-539076] Initial naive foreach autotune support (#2377)
jataylo Jul 18, 2025
41956f1
[release/2.7][ROCm][tunableop] UT tolerance increase for matmul_small…
naromero77amd Jul 23, 2025
0826c75
[release/2.7] [SWDEV-543214] Reland #2416 Fix warps runtime (#2421)
jataylo Jul 30, 2025
af7b538
[AUTOGENERATED] [release/2.8] [ROCm] Use opportunistic fastatomics ba…
okakarpa Jul 31, 2025
b10cd6b
Update triton pin for gfx950 improvements (#2443)
jataylo Aug 2, 2025
5413133
[AUTOGENERATED] [release/2.8] [release/2.7] [SWDEV-543214] Reland #24…
okakarpa Aug 4, 2025
d6a6383
[AUTOGENERATED] [release/2.8] [ROCm] Limit number of values per threa…
okakarpa Aug 6, 2025
3995f1a
[release/2.8] Define datatypes when ROCM_VERSION >= 70000 (#2470)
rraminen Aug 7, 2025
4fe2355
[release/2.8] Add mx fp4 support (#2472)
jagadish-amd Aug 7, 2025
016bbef
[AUTOGENERATED] [release/2.8] [rocm7.0_internal_testing] skip test_tr…
okakarpa Aug 7, 2025
8e96f16
Update version as 2.8.0
jithunnair-amd Aug 8, 2025
29b4c24
[release/2.8] pin requirements.txt (#2481)
ethanwee1 Aug 8, 2025
16cac0c
[AUTOGENERATED] [release/2.8] [SWDEV-539215] - Autotune support for p…
okakarpa Aug 11, 2025
0856917
[release/2.8] fp8: skip rowwise tests (#2477)
jagadish-amd Aug 11, 2025
0da7d02
[release/2.8] update related_commit (#2490)
amd-sriram Aug 12, 2025
f7921f4
[SWDEV-539119] [release/2.8] Add fast_tanh support (#2484)
jataylo Aug 12, 2025
2b29216
[AUTOGENERATED] [release/2.8] remove extra transposes in NHWC convolu…
okakarpa Aug 12, 2025
4634272
[release/2.8] [triton] Triton bump to fix ROCm 7.0 issues (#2498)
iupaikov-amd Aug 13, 2025
0e1a3e9
[AUTOGENERATED] [release/2.8] [ROCm] Improve reduction sum performanc…
dhonnappa-amd Aug 15, 2025
fe840fa
[release/2.8] Using c10d.barrier() in test_extra_cuda_context test in…
dhonnappa-amd Aug 15, 2025
d9d5b96
[AUTOGENERATED] [release/2.8] Change triton package name depending on…
dhonnappa-amd Aug 15, 2025
608069b
[AUTOGENERATED] [release/2.8] NAVI32 specific fixes (#2467)
okakarpa Aug 18, 2025
d6007d3
[release/2.8] Define uint32 t when ROCM_VERSION >= 70000 (#2513)
rraminen Aug 19, 2025
5d3dec1
[AUTOGENERATED] [release/2.8] Remove tb-nightly (#2538)
dhonnappa-amd Aug 20, 2025
8ade7b5
Use ROCm/triton and update triton.txt
jithunnair-amd Jul 16, 2025
e4df565
Merge branch 'release/2.8' into fix_torch_macros_for_miopen
tvukovic-amd Aug 20, 2025
52684a6
revert triton change
tvukovic-amd Aug 25, 2025
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
2 changes: 1 addition & 1 deletion .ci/docker/ci_commit_pins/triton.txt
Original file line number Diff line number Diff line change
@@ -1 +1 @@
f9e5bf54a2fe1a6262a41b27b38180cdb6fae6a2
f9e5bf54a2fe1a6262a41b27b38180cdb6fae6a2
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nope, don't do this. Remove the triton change.

12 changes: 6 additions & 6 deletions aten/src/ATen/miopen/Descriptors.h
Original file line number Diff line number Diff line change
Expand Up @@ -39,7 +39,7 @@ struct DescriptorDeleter {
// function.
template <typename T, miopenStatus_t (*ctor)(T**), miopenStatus_t (*dtor)(T*)>
// NOLINTNEXTLINE(bugprone-exception-escape)
class TORCH_CUDA_CPP_API Descriptor {
class TORCH_HIP_CPP_API Descriptor {
public:
// Use desc() to access the underlying descriptor pointer in
// a read-only fashion. Most client code should use this.
Expand All @@ -65,7 +65,7 @@ class TORCH_CUDA_CPP_API Descriptor {
std::unique_ptr<T, DescriptorDeleter<T, dtor>> desc_;
};

class TORCH_CUDA_CPP_API TensorDescriptor : public Descriptor<
class TORCH_HIP_CPP_API TensorDescriptor : public Descriptor<
miopenTensorDescriptor,
&miopenCreateTensorDescriptor,
&miopenDestroyTensorDescriptor> {
Expand All @@ -88,7 +88,7 @@ class TORCH_CUDA_CPP_API TensorDescriptor : public Descriptor<

std::ostream& operator<<(std::ostream & out, const TensorDescriptor& d);

class TORCH_CUDA_CPP_API FilterDescriptor : public Descriptor<
class TORCH_HIP_CPP_API FilterDescriptor : public Descriptor<
miopenTensorDescriptor,
&miopenCreateTensorDescriptor,
&miopenDestroyTensorDescriptor> {
Expand All @@ -105,7 +105,7 @@ class TORCH_CUDA_CPP_API FilterDescriptor : public Descriptor<
}
};

struct TORCH_CUDA_CPP_API ConvolutionDescriptor
struct TORCH_HIP_CPP_API ConvolutionDescriptor
: public Descriptor<
miopenConvolutionDescriptor,
&miopenCreateConvolutionDescriptor,
Expand All @@ -121,7 +121,7 @@ struct TORCH_CUDA_CPP_API ConvolutionDescriptor
};

// NOLINTNEXTLINE(bugprone-exception-escape)
struct TORCH_CUDA_CPP_API DropoutDescriptor
struct TORCH_HIP_CPP_API DropoutDescriptor
: public Descriptor<
miopenDropoutDescriptor,
&miopenCreateDropoutDescriptor,
Expand All @@ -137,7 +137,7 @@ struct TORCH_CUDA_CPP_API DropoutDescriptor
}
};

struct TORCH_CUDA_CPP_API RNNDescriptor
struct TORCH_HIP_CPP_API RNNDescriptor
: public Descriptor<miopenRNNDescriptor,
&miopenCreateRNNDescriptor,
&miopenDestroyRNNDescriptor>
Expand Down
2 changes: 1 addition & 1 deletion aten/src/ATen/miopen/Handle.h
Original file line number Diff line number Diff line change
Expand Up @@ -5,5 +5,5 @@

namespace at::native {

TORCH_CUDA_CPP_API miopenHandle_t getMiopenHandle();
TORCH_HIP_CPP_API miopenHandle_t getMiopenHandle();
} // namespace at::native
2 changes: 1 addition & 1 deletion aten/src/ATen/miopen/Types.h
Original file line number Diff line number Diff line change
Expand Up @@ -6,7 +6,7 @@

namespace at::native {

TORCH_CUDA_CPP_API miopenDataType_t getMiopenDataType(const at::Tensor& tensor);
TORCH_HIP_CPP_API miopenDataType_t getMiopenDataType(const at::Tensor& tensor);

int64_t miopen_version();

Expand Down