Skip to content
Merged
Show file tree
Hide file tree
Changes from 1 commit
Commits
Show all changes
694 commits
Select commit Hold shift + click to select a range
76a841f
Port OpSchema.__post_init__ and OpSchema._recompute_comparison_key to…
swolchok Sep 18, 2025
46c647d
[vllm hash update] update the pinned vllm hash (#163304)
pytorchupdatebot Sep 19, 2025
3016616
[BE] Update Python min version to 3.10 (#162310)
malfet Sep 19, 2025
c91f59b
Fix performance regression when indexing by Numpy arrays (#163280)
ezyang Sep 18, 2025
ce5637b
Fix invalid indices bug for max_unpool2d/3d on MPS (#163036)
can-gaa-hou Sep 19, 2025
5780478
Revert "[BE] Update Python min version to 3.10 (#162310)"
pytorchmergebot Sep 19, 2025
1708120
Revert "[CI] Move Windows build/tests to Python-3.10 (#162862)"
pytorchmergebot Sep 19, 2025
e0bcd58
[MTIA] Add MTIA dispatch for kernel foreach_maximum(Add D80022242 bac…
DoubleBiao Sep 19, 2025
1302637
Revert "[dynamo][guards] Do not construct entire framelocals dict for…
pytorchmergebot Sep 19, 2025
32ad29b
Revert "[dynamo][guards] Fail on an unknown framelocals to dict conve…
pytorchmergebot Sep 19, 2025
0815091
[CP][BE] Cosmetic refactors for CP code base (#163115)
fegin Sep 18, 2025
ab5086a
[WOQ] Add XPU kernel for _weight_int8pack_mm (#160938)
xiaowangintel Sep 19, 2025
33e6c5a
[Dependabot] Update(deps): Bump transformers from 4.54.0 to 4.56.0 in…
dependabot[bot] Sep 19, 2025
bee362c
[ROCm][SymmMem] Fix skip condition for PLATFORM_SUPPORTS_SYMM_MEM (#1…
pragupta Sep 19, 2025
264e7f6
[ROCm] Fix mx fp8 and fp4 code after scaling refactor changes. (#163127)
jagadish-amd Sep 19, 2025
f8f230a
[FP8][cuBLAS][H100] only test fp32 outputs for rowwise `_scaled_mm` o…
eqy Sep 19, 2025
e631d76
[Flex] Changing how bwd configs are setup and updating default b200 c…
drisspg Sep 19, 2025
4967ad8
[Graph Partition] improve custom op output alias (#163227)
BoyuanFeng Sep 19, 2025
3e663ce
[Inductor][Triton][FP8] Add a Blackwell-specific scaled persistent + …
jananisriram Sep 19, 2025
2984bfe
[ez][CI] Run vllm workflow on vllm pin updates (#163353)
clee2000 Sep 19, 2025
a3b68c7
Revert "Fix boxcox to return same result for same input in one batch …
pytorchmergebot Sep 19, 2025
607469b
Revert "[ROCm] Bump FBGEMM commit to avoid CK errors (#162590)"
pytorchmergebot Sep 19, 2025
a0d2d84
Handling overflow for long int overflow for the product of kernel_hei…
arkadip-maitra Sep 19, 2025
b8c5ec5
[CD] Simplify NVIDIA driver installation step (#163349)
malfet Sep 19, 2025
52dd7a8
Move ROCM trunk wheel builds to 3.10 (#163339)
malfet Sep 19, 2025
03f34fd
Add explicit typing to nn.Module.__init__() parameters (#157389)
dsashidh Sep 19, 2025
bc7b17a
Realize LazyVariableTracker before raising exception (#163350)
guilhermeleobas Sep 19, 2025
979e10f
[Bugfix] Match eager stride semantics for cloned tensors with preserv…
Lucaskabela Sep 19, 2025
a273475
[BE] Introduce `CONDA_ROOT_DIR` (#163341)
malfet Sep 19, 2025
4a160da
[CUDA] revert PR 130472 (#162950)
thenumberouscode Sep 19, 2025
2a308c7
Revert "Improve device info with new flops and bandwidth formula base…
pytorchmergebot Sep 19, 2025
f8fb437
[SymmMem] Barrier on team instead of world (#163298)
kwen2501 Sep 18, 2025
7130b17
[SymmMem] Fix memory allocation hold-up (#162680)
kwen2501 Sep 18, 2025
ba3c2c8
SDP Backend function fix (#161169)
ahkush Sep 19, 2025
466122b
[inductor] avoid creating LoopBody twice (#162101)
shunting314 Sep 11, 2025
e88460f
[Inductor] don't call sympy_str when not needed (#162126)
shunting314 Sep 11, 2025
248156e
[Inductor] do loop reordering in a separate final round (#162355)
shunting314 Sep 11, 2025
df9a482
Bugfix for doing negative padding (#161639)
skpark-rh Sep 19, 2025
9f8a311
[Inductor][Intel GPU] Save `threads_per_warp` from tirton compiled ke…
etaf Sep 19, 2025
fab8455
Don't use declarations in global namespace in stable headers (#163352)
mikaylagawarecki Sep 19, 2025
e6a9db5
Add analytics ID to cpp docs (#163370)
svekars Sep 19, 2025
9b5ec0f
Use computed buffer sizes of torch for cusparseLt metadata (#163125)
aartbik Sep 19, 2025
0098e56
[CI] Move Windows build/tests to Python-3.10 (#162862)
malfet Sep 19, 2025
ee7bdd8
[graph partition] Add way to register custom rule (#163310)
zou3519 Sep 19, 2025
093f064
[CP][BE] Correct an incorrect docstring (#163131)
fegin Sep 18, 2025
8225a26
[dynamo] Fix issue with namedtuple slicing (#163351)
jansel Sep 19, 2025
bfe9e60
Simplify PrecompileContext to no longer be a CacheArtifactManager (#1…
jamesjwu Sep 20, 2025
a1df0b4
Lazy import to avoid circular import issue for DebugMode (#163381)
SherlockNoMad Sep 20, 2025
a31acf3
Clean up obsoleted vLLM tests (#163383)
huydhn Sep 20, 2025
e56dd5d
[Inductor-FX] Support torch.cond (#163234)
blaine-rister Sep 20, 2025
a87aea0
Update RandomSampler docstring. data_source must be Sized not Dataset…
dsashidh Sep 20, 2025
0b5a99b
remove duplicate import for defaultdict (#160519)
parsshar-RH Sep 20, 2025
df5d6d5
[inductor][triton heuristics] move allow tf32 out of config params (#…
coconutruben Sep 20, 2025
0ee331b
[inductor][choices] move extra kwargs out of get_template_configs (#1…
coconutruben Sep 20, 2025
d55c9d5
[CP] Fix cuDNN CP LSE dimension bug (#163231)
fegin Sep 18, 2025
5050cfa
[Opitmus] fix fp8 activation quatization for duplicates forward outpu…
mengluy0125 Sep 20, 2025
eb11d17
[Caffe2] Improve SVE batch box cox by 2% (#163360)
Nicoshev Sep 20, 2025
f9074c7
[STABLE ABI] Add copy_ operation. (#161895)
pearu Sep 19, 2025
d70c0ba
minimize graph capture output (#162211)
avikchaudhuri Sep 20, 2025
3938175
[1/n] Support cpu_tensor.to("cuda:0") in FakeTensorMode on cuda-less …
SherlockNoMad Sep 20, 2025
9e3725e
make fullgraph_capture work on mod, args, kwargs (#162849)
avikchaudhuri Sep 20, 2025
8e3fd3d
[AI Codemod][DevmatePerfOptimizationVectorReallocation] fbcode/caffe2…
yfeldblum Sep 20, 2025
e37b600
[CUDA][cuBLAS][FP8] Forward-fix #162022 (#163354)
eqy Sep 21, 2025
2887f3f
[BE] Slight improvements to documentation in python_dispatch (#162963)
ezyang Sep 19, 2025
97eb7a2
torchdim Python port (#160236)
ezyang Sep 20, 2025
5b386ee
[vllm hash update] update the pinned vllm hash (#163392)
pytorchupdatebot Sep 21, 2025
1ca9445
[BE][Ez]: Prevent copies of std::vector in CUDA ForeachOps (#163416)
Skylion007 Sep 21, 2025
f591bb5
Remove data_source argument from Sampler (#163134)
cyyever Sep 21, 2025
4a96a6f
[Docs] Fix indentations in cond.md (#156147)
windsonsea Sep 21, 2025
1faf636
Delete functorch C extension entirely. (#163340)
ezyang Sep 21, 2025
9ba9180
Add api info for torch._C._nn.pyi (#162707)
orangeH25 Sep 21, 2025
d8cbbc0
[Easy][AMP] Refactor the AMP logic for getting dtype (#162796)
fffrog Sep 12, 2025
5d8a226
[SymmMem] Promote `@requires_nvshmem` instead of `enable_triton` (#16…
kwen2501 Sep 21, 2025
f34744d
[inductor] bugfix: keep WeakDeps (WAR deps) during fusion (#162316)
v0i0 Sep 19, 2025
51152ef
Remove autograd code for Python < 3.9 (#163313)
cyyever Sep 21, 2025
5599f48
Fully native DTensor.__new__ (#162508)
swolchok Sep 18, 2025
4d3d32f
Add torchfuzz initial impl. (#163417)
laithsakka Sep 20, 2025
8b14f43
[torch] DRY a couple of lines in unpickler (#163447)
yfeldblum Sep 21, 2025
6ac2b3a
[BE] Adding aliases for CUDA and XPU API documentation (#162984)
jiannanWang Sep 21, 2025
8a281d7
[submodule] Bump libfmt to 12.0.0 (#163441)
cyyever Sep 21, 2025
0b59492
[export] Fix wrap_with_set_grad_enabled retracing (#163295)
angelayi Sep 21, 2025
01f927e
Remove workarounds for Python 3.6 (#163440)
cyyever Sep 22, 2025
281bb56
Enable half precision types on test_conv_cudnn_nhwc_support (#163444)
cyyever Sep 22, 2025
3a7db34
Revert "[SymmMem] Promote `@requires_nvshmem` instead of `enable_trit…
pytorchmergebot Sep 22, 2025
f007894
Revert "[RELAND] Always build USE_DISTRIBUTED (#160449) and Make dist…
pytorchmergebot Sep 22, 2025
ae5be03
Revert "Delete functorch C extension entirely. (#163340)"
pytorchmergebot Sep 22, 2025
edafc90
Revert "[BE] Make PyObjectSlot use a global PyInterpreter (#162659)"
pytorchmergebot Sep 22, 2025
96a3afb
Simplify BFLOAT16_AVAILABLE (#163445)
cyyever Sep 22, 2025
60b4791
[MPS] Fix compile linalg inv (#163452)
Isalia20 Sep 22, 2025
9f5a644
[BE] Update Python min version to 3.10 (#162310)
malfet Sep 22, 2025
10adeb9
Revert "[BE] Update Python min version to 3.10 (#162310)"
pytorchmergebot Sep 22, 2025
509c4e8
Update cutlass version for fbcode (#163091)
henrylhtsang Sep 19, 2025
eaac218
[ROCm] Fix environment variable AOTRITON_INSTALLED_PREFIX (#163373)
xinyazhang Sep 22, 2025
e310cc5
Update fbgemm submodule (#163411)
cthi Sep 22, 2025
9ca183e
switch from stack based to graph based aproach (#163459)
laithsakka Sep 22, 2025
06fe5b9
[AOTI] fix TestAOTInductorPackage temp file locked handler. (#163499)
xuhancn Sep 22, 2025
5e7be98
[BE] Update Python min version to 3.10 (#162310)
malfet Sep 22, 2025
281f8f4
Combine strong and weak refcounts in intrusive_ptr in a single refcou…
mcfi Sep 22, 2025
d279a6a
ci: Add a way to lint all files in a PR from label (#163525)
seemethere Sep 22, 2025
bec967e
Remove C++ and test branches for CUDA<12 (#163443)
cyyever Sep 22, 2025
3be9c86
[opaque obj] Initial OpaqueObject (#162660)
angelayi Sep 22, 2025
dd30667
[opaque_obj] Add set_payload + docs (#163276)
angelayi Sep 22, 2025
4941719
Enable logging for absolute memory estimation (#158799)
basilwong Sep 22, 2025
7e97811
Fix lint (#163542)
angelayi Sep 22, 2025
1818c36
[Fix] Restrict stride normalization to 1D tensors on export (#163282)
Kathryn-cat Sep 22, 2025
eaa613b
Revert "[opaque_obj] Add set_payload + docs (#163276)"
pytorchmergebot Sep 22, 2025
bf28990
Add support for NestedTensor share_memory_ (#162272)
adabeyta Sep 22, 2025
d150484
[opaque_obj] Add set_payload + docs (#163276)
angelayi Sep 22, 2025
6f9aef5
[2/n] Support module.to("cuda:0") in FakeTensorMode on cuda-less mach…
SherlockNoMad Sep 22, 2025
d008670
[triton] update 3.5 pin to bbb06c0334a6772b92d24bde54956e675c8c6604 (…
davidberard98 Sep 19, 2025
fd785b1
Add NestedTensor dispatch for _is_any_true/_is_all_true (#162096)
adabeyta Sep 22, 2025
e065d35
[BE]: Add a few more missing move from return indices (#163456)
Skylion007 Sep 22, 2025
46e1b7d
remove allow-untyped-defs from ./torch/utils/data/datapipes/iter/file…
bobrenjc93 Sep 22, 2025
cf28ab2
remove allow-untyped-defs from ./torch/ao/quantization/pt2e/duplicate…
bobrenjc93 Sep 22, 2025
02da475
Triton template IMA reads on B200 (#163460)
drisspg Sep 22, 2025
8abc2af
[STABLE ABI] Add clone method to torch::stable::Tensor (#161896)
pearu Sep 22, 2025
8e62d01
Add dynamic shapes doc (#159428)
svekars Sep 22, 2025
4027e97
[BE] Delete `skipIfMPSOnMacOS13` (#163515)
malfet Sep 22, 2025
09cb34c
[RELAND] Always build USE_DISTRIBUTED (#160449) and Make distributed …
ezyang Sep 22, 2025
e558f7a
[vllm hash update] update the pinned vllm hash (#163463)
pytorchupdatebot Sep 22, 2025
da05aa7
[BE] Use `output_t` directly (#163518)
malfet Sep 22, 2025
0256f91
[BUG] MaxUnpool2d/3d should check output dim before accessing its ele…
can-gaa-hou Sep 22, 2025
2b03663
Allow add_persistent_r_block to scale up rblock up to a limit (#162296)
PaulZhang12 Sep 17, 2025
7ea8998
Better decomp for torch.eye (#163386)
jansel Sep 22, 2025
36c2a13
[inductor] Fix bug where viewed outputs get padded (#163398)
jansel Sep 22, 2025
a1bd924
[inductor] Fallback on strided complex add (#163387)
jansel Sep 22, 2025
c8fd2b4
[inductor] Skip test_baddmm on XPU (#163414)
jansel Sep 22, 2025
4fc271e
[inductor] Don't require_dense for grid_sampler_2d_backward (#163415)
jansel Sep 22, 2025
e0cbab4
[Inductor] avoid CUDA__equal when constant tensors are from different…
cp2923 Sep 22, 2025
b756b58
Improve fake tensor leakage detection in export by not relying on gc …
tugsbayasgalan Sep 22, 2025
60c2bde
Replace Literal[None] with None in typing (#163489)
cyyever Sep 22, 2025
33daaad
dynamo: Handle objects in graph that do not support weakref (#163168)
c00w Sep 17, 2025
fa15fb0
[EZ] Remove XLA from unstable.yml (#163564)
malfet Sep 22, 2025
8da0086
Remove outdated commented CMake code (#163442)
cyyever Sep 22, 2025
68e75be
Update pytorch_sphinx_theme2 to latest hash (#163269)
svekars Sep 22, 2025
539e84e
[precompile] Add option to disable guard check on aot-compiled functi…
zhxchen17 Sep 23, 2025
3ef1bef
[sdpa] make sure to recompile if alignment is different than before (…
ColinPeppler Sep 19, 2025
2c7959e
[ignore][codex-test] Add typing to simple library registry (#161367)
bobrenjc93 Sep 23, 2025
8f30a8d
[AOTInductor] Add grid information for Triton Kernels (#160131)
muchulee8 Sep 22, 2025
e9300b2
remove allow-untyped-defs from ./torch/onnx/_internal/torchscript_exp…
bobrenjc93 Sep 22, 2025
6a48f57
[1/N] Remove 'type: ignore' suppressions (#163468)
cyyever Sep 23, 2025
447b8fc
[2/N] Use filesystem in inductor (#163465)
cyyever Sep 23, 2025
27164b6
Add fake_impl for _native_multi_head_attention (#163167)
ydwu4 Sep 23, 2025
0b75a16
[torchfuzz] Encapsulate fuzzing and codegen logic into ops (#163547)
bobrenjc93 Sep 22, 2025
95ac7d7
Rename to _debug_mode.py to make it private (#163534)
SherlockNoMad Sep 23, 2025
fcd79d5
[vllm hash update] update the pinned vllm hash (#163590)
pytorchupdatebot Sep 23, 2025
0e12238
[torchfuzz] remove supports_variable_inputs for now (#163553)
bobrenjc93 Sep 22, 2025
bb5be56
[torch][cuda][device_limits] Library for querying device hardware lim…
valentinandrei Sep 23, 2025
e3b392b
[BC breaking] Remove deprecated imports for torch.utils.data.datapipe…
cyyever Sep 23, 2025
d3a1345
Use functools.cache on has_efa (#163439)
cyyever Sep 23, 2025
19b754d
Revert "Update cutlass version for fbcode (#163091)"
pytorchmergebot Sep 23, 2025
08c5efd
[torchfuzz] cache operators (#163554)
bobrenjc93 Sep 22, 2025
d5e51d3
[torchfuzz] decompose -> fuzz_inputs_specs (#163555)
bobrenjc93 Sep 22, 2025
1545bb1
[torchfuzz] shuffle compatible ops (#163556)
bobrenjc93 Sep 22, 2025
309fe03
[torchfuzz] remove unneeded try catch (#163557)
bobrenjc93 Sep 22, 2025
45d9dcc
Update Kineto Submodule (#162222)
sraikund16 Sep 23, 2025
375f3e3
[OpenReg][Docs] Correct docs about `openreg` usage example. (#163235)
KarhouTam Sep 23, 2025
b426ba1
[torchfuzz] introduce tensor and scalar pointwise ops (#163558)
bobrenjc93 Sep 22, 2025
8d81564
[pt2][cache] rework cache for true generic usage + better tests (#163…
nmacchioni Sep 23, 2025
5d749ce
Remove test conditions for CUDA<12 (#163495)
cyyever Sep 23, 2025
3c64b2a
CUDA 13.0 Warning update for supported architectures (#163585)
atalman Sep 23, 2025
bda9ab2
[inductor] fix as_strided lowering with .view(dtype) inputs (#163319)
xmfan Sep 22, 2025
1a42656
[Flex attention] Fix flex attention head broadcast (#163426)
Isalia20 Sep 23, 2025
aff76c0
Revert "Add fake_impl for _native_multi_head_attention (#163167)"
pytorchmergebot Sep 23, 2025
e05c9c0
[ROCm][CI] cudagraph trees ut fixes (#163592)
jeffdaily Sep 23, 2025
4264fd3
Add basic tests for torch.distributed.tensor._utils.compute_global_te…
swolchok Sep 18, 2025
518c320
[inductor] libdevice.sqrt => tl.sqrt_rn (#163419)
jansel Sep 23, 2025
ed84e80
[inductor] Freeze layouts in FlexAttention (#163434)
jansel Sep 23, 2025
9c4d9f9
[inductor] Support out_dtype arg to matmul (#163393)
jansel Sep 23, 2025
6ef7487
[dynamo] Fix TorchFunctionMode handling with get_rng_state (#163412)
jansel Sep 23, 2025
49e7b2f
[inductor] Fix error from custom CUDA allocators (#163422)
jansel Sep 23, 2025
720a7b2
[export] Remove .contiguous() when saving weights to raw bytes (#163587)
yiming0416 Sep 23, 2025
0f67407
Large tests failing on bfloat16 (#163537)
drisspg Sep 22, 2025
b3cf5c7
Skip on sm100 later since Tests are non determinisitic (#163552)
drisspg Sep 22, 2025
5f0c7cb
Add B200 smoke test (#159494)
drisspg Sep 22, 2025
ebddbe7
[ROCm][CI] skip test_sparse_triangular_solve (#163651)
jeffdaily Sep 23, 2025
6e5dddb
Use accelerator API in common_dtensor (#163498)
dilililiwhy Sep 23, 2025
221ac81
Revert "[precompile] Add option to disable guard check on aot-compile…
pytorchmergebot Sep 23, 2025
134dfbe
[DCP] DTensor slice dequantization with proper block alignment (#163532)
saumishr Sep 23, 2025
fde929c
[AOTI] Fix model_package_loader get_cpp_compile_command (#163561)
xuhancn Sep 23, 2025
2aadcea
[ROCm] Improve perf for elementwise broadcast with mixed dtype (#163562)
jerrymannil Sep 23, 2025
649ceda
[export] handling NamedTuple inputs (#162959)
Raman-RH Sep 23, 2025
ca35dc2
[EZ] Fix UP041 violations (#163648)
malfet Sep 23, 2025
0696a4b
[EZ] Perma-ignore UP038 (#163649)
malfet Sep 23, 2025
8e6b0c7
[Inductor] Remove `no_type_check` annotation on properties (#163570)
blaine-rister Sep 23, 2025
bcb893a
[ROCm] Build FBGEMM_GENAI for gfx942 only (#162648)
jithunnair-amd Sep 23, 2025
22c5e8c
Add num_store to inductor_meta and use it to scale persistent reducti…
PaulZhang12 Sep 22, 2025
2a9745d
[multi-kernel] shape-similarity kernel selection (#163090)
pianpwk Sep 23, 2025
fc84743
Implement CUDA stream protocol (#163614)
msaroufim Sep 23, 2025
e671dcc
Update tests to check for more robust pattern (#163107)
tugsbayasgalan Sep 23, 2025
5ca563e
symintify fill_diagonol_ (#163485)
bobrenjc93 Sep 23, 2025
b182365
[ez] use list initializer syntax in fill_diagonal_ (#163607)
bobrenjc93 Sep 23, 2025
8c8416b
Update pytorch.org links in docs/conf.py (#163682)
svekars Sep 23, 2025
29af258
Less aggressive persistent reduction when it could induce large maski…
eellison Sep 23, 2025
c3d9f08
[torchfuzz] introduce multi process fuzzer (#163560)
bobrenjc93 Sep 23, 2025
c63e417
use reduction hint for aggressive rblock (#163371)
eellison Sep 23, 2025
b879ef7
[ROCm][CI] skip TestCudaPrimaryCtx.test_set_device_0 (#163693)
jeffdaily Sep 23, 2025
2014908
[MPS] Compute `offset2bag/bag_size/max_indices` in `_embedding_bag` (…
kurtamohler Sep 19, 2025
6b5ad5f
[Kineto] Add list of string parsing for profiler (#163593)
muchulee8 Sep 23, 2025
f3f67ff
Fix warn message (#163578)
drisspg Sep 22, 2025
f9fa138
[BE] Delete all pre py-3.10 checks (#163653)
malfet Sep 23, 2025
ee75c3d
Support for amin, amax, and aminmax (#163669)
srsuryadev Sep 23, 2025
eb3fbf5
[inductor] in emulate_precision_casts, disable fma fusion in triton (…
v0i0 Sep 23, 2025
4535254
[3/N] Use std::filesystem in inductor (#163632)
cyyever Sep 24, 2025
dc93529
[Triton] [Inductor] Restrict subprocess autotuning to just Triton (#1…
njriasan Sep 24, 2025
1e754d5
docs and optional kwargs for full graph capture (#163550)
avikchaudhuri Sep 24, 2025
be6c127
[AOTI] Pass comments from metadata to the autotune block (#163600)
desertfire Sep 23, 2025
e2ce79e
[Flex] Fix silent correctness w/ backpropping grads (#163677)
drisspg Sep 23, 2025
c261c71
Simplify _compute_local_shape_and_global_offset and make it SPMD. (#1…
ezyang Sep 19, 2025
ca512af
[inductor] Fix issue with scalar arg handling (#163481)
jansel Sep 23, 2025
6fa9727
[inductor] Fix bugs in emulate_precision_casts (#163520)
jansel Sep 23, 2025
d746b98
[inductor] Fix divmod error in decomp (#163482)
jansel Sep 23, 2025
42e9902
cd: Move arm64 to linux.arm64.r7g.12xlarge.memory (#163681)
seemethere Sep 23, 2025
6f1d962
[vllm hash update] update the pinned vllm hash (#163711)
pytorchupdatebot Sep 24, 2025
20eeb54
Add api info for torch._C._nn.pyi (#162936)
orangeH25 Sep 24, 2025
124dd36
[hop] support local_map + SAC (#163322)
xmfan Sep 23, 2025
0390798
[Triton] [Inductor] Enable Epilogue Subtiling in the blackwell ws tem…
njriasan Sep 24, 2025
a8e9ed2
[inductor] turn on loaf (for oss) by default (#162030)
shunting314 Sep 22, 2025
f68de58
[Inductor-FX] Support symbol and dynamic scalar graph inputs and outp…
blaine-rister Sep 24, 2025
2c5a3d7
Delete functorch C extension entirely. (#163340)
ezyang Sep 24, 2025
dad54ca
Add mistral/gpt-oss to benchmarks (#163565)
angelayi Sep 24, 2025
11a231e
[c10d] P2P tensors must be dense (#163719)
kwen2501 Sep 24, 2025
bf0747c
[Code Clean] Remove deadcodes about Python3.9 [1/N] (#163626)
fffrog Sep 24, 2025
0bca779
[Code Clean] Remove deadcodes about Python3.9 [2/N] (#163627)
fffrog Sep 24, 2025
33aabdd
[Code Clean] Remove deadcodes about Python3.9 [3/N] (#163629)
fffrog Sep 24, 2025
ec0cd81
[Code Clean] Remove deadcodes about Python3.9 [4/N] (#163643)
fffrog Sep 24, 2025
6f34cc0
[Code Clean] Remove deadcodes about Python3.9 [5/N] (#163644)
fffrog Sep 24, 2025
a635505
[Code Clean] Remove deadcodes about Python3.9 [6/N] (#163645)
fffrog Sep 24, 2025
2390d34
[Code Clean] Remove deadcodes about Python3.9 [7/N] (#163646)
fffrog Sep 24, 2025
3e1b1a3
Revert "[inductor] Fix issue with scalar arg handling" (#163737)
jansel Sep 24, 2025
207f104
[Triton] [Inductor] Set default configs for Blackwell Matmul Template…
njriasan Sep 24, 2025
b66aa1a
[ARM] Add test_memory_profiler to aarch64 tests (#145260)
robert-hardwick Sep 24, 2025
141fc72
[CD] CUDA 13.0 fix preload logic to include nvidia/cu13/lib/ (#163661)
atalman Sep 24, 2025
3b73841
update test_quantization tests to run weekly (#163077)
liangel-02 Sep 24, 2025
9d0d98a
Use cuda nvrtc so file based on cuda version used by torch (#163642)
atalman Sep 24, 2025
5d0f639
Make `Tensor.__dlpack__(stream=None)` capture-safe during CUDA Graph …
eee4017 Sep 24, 2025
4c2c401
Record redistribute_local_tensor in DebugMode (#163704)
SherlockNoMad Sep 24, 2025
9341ede
Revert to old behaviour of not padding strides if shape or stride is …
nandesuka Sep 24, 2025
768361e
Add less warps config to inner reductions (#162447)
PaulZhang12 Sep 24, 2025
c414f75
[WOQ][Inductor] Enable CUDA coverage for _weight_int8pack_mm (#163461)
bbeckca Sep 24, 2025
0456b23
[AOTI] Add verbose error information for extract file (#163718)
xuhancn Sep 24, 2025
71eec6a
[dist] handle discontiguous allgather/reducescatter inputs (#163712)
ngimel Sep 24, 2025
0dce2af
[ROCm][CI] adjust tf32 tolerance for test_compile_kernel_advanced (#1…
jeffdaily Sep 24, 2025
90a2825
Add `inference_mode` hint message to use `eval` with inference. (#163…
zeshengzong Sep 24, 2025
1495b35
Remove Python 3.9 for Triton builds (#163778)
atalman Sep 24, 2025
b40191b
Merge remote-tracking branch 'upstream/main' into rocm7.1_internal_te…
github-actions[bot] Sep 24, 2025
f3e8213
Fix merge conflicts
pragupta Sep 24, 2025
0ad8381
Address review comments wrt triton_heuristics and install_rocm
pragupta Sep 30, 2025
63fcd9b
update related_commits
pragupta Sep 30, 2025
77f4534
Fix more conflicts with triton_heuristics.py
pragupta Sep 30, 2025
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
2 changes: 0 additions & 2 deletions .ci/docker/common/install_rocm.sh
Original file line number Diff line number Diff line change
Expand Up @@ -114,8 +114,6 @@ EOF
rm -rf HIP clr
fi

# temporary hipblasLT dependency install
apt install libmsgpackc2
pip_install "git+https://github.com/rocm/composable_kernel@$ROCM_COMPOSABLE_KERNEL_VERSION"

# Cleanup
Expand Down
43 changes: 22 additions & 21 deletions torch/_inductor/runtime/triton_heuristics.py
Original file line number Diff line number Diff line change
Expand Up @@ -2959,31 +2959,32 @@ def _persistent_reduction_configs(
if "y" in size_hints:
pass
# TODO(jansel): we should be able to improve these heuristics
elif reduction_hint == ReductionHint.INNER:
if rnumel > 1024:
configs = configs[:1]
else:
x_block = 8
if xnumel // x_block < 128 or (loads_and_stores >= 5 and rnumel >= 256):
# If loads/stores greater than 5, a lot of register pressure
# rnumel < 256 means no vectorized loads if we split up r dim
# so xblock still needs to be larger
x_block = 1

configs = [
triton_config_reduction(
size_hints,
x_block,
rnumel,
register_intensive=True,
reduction_hint=reduction_hint,
)
]
elif not max_autotune_enabled: # Don't filter if tuning enabled
if reduction_hint == ReductionHint.INNER:
if rnumel > 1024:
configs = configs[:1]
else:
x_block = 8
if xnumel // x_block < 128 or (loads_and_stores >= 5 and rnumel >= 256):
# If loads/stores greater than 5, a lot of register pressure
# rnumel < 256 means no vectorized loads if we split up r dim
# so xblock still needs to be larger
x_block = 1

configs = [
triton_config_reduction(
size_hints,
x_block,
rnumel,
register_intensive=True,
reduction_hint=reduction_hint,
)
]

elif reduction_hint == ReductionHint.OUTER:
configs = configs[-1:]
elif reduction_hint == ReductionHint.OUTER_TINY:
configs = [
tiny_configs = [
triton_config_reduction(
size_hints,
2 * (256 // rnumel) if rnumel <= 256 else 1,

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Here are some corrections:

  1. tiny_configs should be defined before if-clause the line that starts with:
# defer to more autotuning, initially
if "y" in size_hints:
  1. The two elif comments on lines 2984 and 2986 should be indented one level in. In other words, they are inside the elif not max_autotune_enabled

  2. For the elif reduction_hint == ReductionHint.OUTER_TINY: it should just be:

configs = tiny_configs
  1. For the outermost, "if", "elif" clause, there is also the "else" part:
    else:
        # If autotune is enabled append tiny configs
        for conf in tiny_configs:
            if conf not in configs:
                configs.append(conf)

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you for catching these! Addressed them with the new commit. Please verify.

Expand Down