Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
652 commits
Select commit Hold shift + click to select a range
d33d125
[inductor] Remove output copy_ for pallas backend in some cases (#167…
jansel Nov 12, 2025
6bf51de
harden backed_size_oblivious and broadcast_shapes (#167232)
laithsakka Nov 12, 2025
780e325
Move XPUEvent to c10 (#158336)
guangyey Nov 12, 2025
4714eb7
Update dynamic_inductor_timm_training.csv (#167609)
zou3519 Nov 12, 2025
4cff8b5
Add option to disable applying side effects in dynamo (#167239)
tugsbayasgalan Nov 11, 2025
4f6aae3
Revert "[MPS] SparseMps mv op (#166708)"
pytorchmergebot Nov 12, 2025
a328326
Revert "[xpu][feature] Add XPU support on torch.accelerator.get_memor…
pytorchmergebot Nov 12, 2025
c5d91d9
Revert "Introduce a new API torch.accelerator.get_memory_info (#156812)"
pytorchmergebot Nov 12, 2025
7557e38
[ROCm] hipSPARSELt support - Update cuda_to_hip_mappings.py (#167335)
rraminen Nov 12, 2025
dd7a45a
[5/N] Use Python 3.10 typing (#167449)
cyyever Nov 12, 2025
bdb3753
Add Tests (#167392)
drisspg Nov 12, 2025
10a1578
Revert "Update Kineto Submodule (#167343)"
pytorchmergebot Nov 12, 2025
ed79693
[ROCm][CI] dynamo benchmark repvgg_a2 is flaky (#167660)
jeffdaily Nov 12, 2025
d105e3a
[dynamo][DebugMode] mask python keys in dispatch_key_set guard checks…
pianpwk Nov 12, 2025
760c901
[torch] Update caffe2/torch/csrc to build under CUDA 13 (#167401)
q10 Nov 12, 2025
bc09a84
Hide all symbols (except stable/headeronly/shim) if TORCH_STABLE_ONLY…
mikaylagawarecki Nov 12, 2025
2ad70c9
[CI] manually gen json for segfaults (#167250)
clee2000 Nov 12, 2025
a95eee6
[user-streams] Add API for accessing current stream given a device (#…
mlazos Nov 12, 2025
a6a0379
[caffe2] Address -Wswitch-default warnings in headers (#167563)
NSProgrammer Nov 12, 2025
74e85c6
Add TORCH_BOX helper for STABLE_TORCH_LIBRARY_IMPL (#167582)
janeyx99 Nov 12, 2025
5f0a5b8
Revert "Use stable topological sort in fuse_by_partitions (#167397)"
pytorchmergebot Nov 12, 2025
1311385
Revert "fix failure of exporting compiled model with nested dynamic s…
pytorchmergebot Nov 12, 2025
2ca428c
[CD] Preload `libnvrtc-builtinso.so` (#167614)
malfet Nov 12, 2025
0184ef2
[inductor][NFC][1/X] extract create_no_valid_choices from AlgorithmSe…
nmacchioni Nov 12, 2025
158e724
[torch] Update caffe2/c10/cuda to build under CUDA 13 (#167534)
q10 Nov 12, 2025
0dac408
MatMal - fix folding logic (#166891)
dstaay-fb Nov 12, 2025
537167a
Fix thread safety in getCurrentCUDABlasHandle and getCUDABlasLtWorksp…
t-ivan-gr Nov 12, 2025
2fa18d1
[export] Codemod more tests to use dynamo_graph_capture_for_export (#…
zhxchen17 Nov 12, 2025
a76dd6b
[MPS] SparseMps mv op (#166708)
Isalia20 Nov 12, 2025
273babe
[precompile] Integrate AOTI as a backend. (#167338)
zhxchen17 Nov 13, 2025
2d73900
[dynamo] speculate_subgraph_with_auto_output_flattening (#167438)
anijain2305 Nov 12, 2025
5f98a03
[dynamo] Make HintsWrapperHigherOrderVariable follow wrap semantics (…
anijain2305 Nov 12, 2025
0c5d5c7
[dynamo][invoke_subgraph] Do not restore side effects on invoke_subgr…
anijain2305 Nov 12, 2025
485f2b6
ProxyTorchDispatchMode: Decomposing missing sympy.SymExpr should hand…
aorenste Nov 12, 2025
35571fe
[effects] Add register_effectful_op (#163284)
angelayi Nov 12, 2025
c9b09a3
[opaque obj] Allow non-effectful scriptobjs (#163714)
angelayi Nov 12, 2025
e3dadb1
[opaque obj] torch.compile support (#163936)
angelayi Nov 12, 2025
19c8678
[opqaue obj] Add attribute support (#167230)
angelayi Nov 12, 2025
8919f69
[Inductor][2/2] Decouple flags for optimization and debug symbols (#1…
bbeckca Nov 13, 2025
8f5f89c
Revert "Fix thread safety in getCurrentCUDABlasHandle and getCUDABlas…
pytorchmergebot Nov 13, 2025
9b68682
[ROCm] Enable several DISABLED issues (#167183)
pragupta Nov 13, 2025
2984331
[inductor][NFC][2/X] extract do_autotuning/autotune/benchmark from Al…
nmacchioni Nov 13, 2025
d9a50bf
[dynamo] [3.14] Support np._CopyMode (#167619)
rtimpe Nov 12, 2025
eeebf9f
[dynamo] [3.14] Update broken numpy test (#167681)
rtimpe Nov 12, 2025
f9851af
Add Attention ops to CI (#165915)
jainapurva Nov 13, 2025
f570e58
Add C++ fast path for `DTensor.__torch_dispatch__` (#167051)
swolchok Nov 13, 2025
480b4ff
Avoid creating Python OpSchema in the DTensor dispatch fast path (#16…
swolchok Nov 13, 2025
2034ca9
extend C++ DTensor fast path to local operator dispatch (#166808)
swolchok Nov 13, 2025
3d801a4
DTensor fast path: port return_and_correct_aliasing and inplace/out c…
swolchok Nov 13, 2025
1a67403
Move MemPool out of c10 and into ATen. (#167506)
galv Nov 11, 2025
782fc3c
[DTensor] Add CPU instruction count benchmark for dispatch (#167394)
wconstab Nov 11, 2025
8f96e7b
Only remove_noop in pre_grad passes if remove_noop is not in the remo…
ShyamalShah3 Nov 13, 2025
8c86ccf
[DebugMode] .show_stack_trace inline (#167589)
pianpwk Nov 13, 2025
2c846bb
[xpu][test]port embedding indexing and native_mha test files for Inte…
wincent8 Nov 13, 2025
ce4f31f
[OpenReg][Feat][Docs] Enrich hook implementation and add focused docu…
KarhouTam Nov 13, 2025
9ae0ece
Introduce a new API torch.accelerator.get_memory_info (#156812)
guangyey Nov 13, 2025
f2d0a47
[xpu][feature] Add XPU support on torch.accelerator.get_memory_info (…
guangyey Nov 13, 2025
4de24bc
[Fix XPU typo] Fix a comment typo of FindSYCLToolkit.cmake (#165884)
luoyu-intel Nov 13, 2025
c940b1f
address DDE in matmul decomp (#166541)
laithsakka Nov 13, 2025
d3ca4a3
[CUDA][64-bit indexing] Handle 64-bit outer dim `cumsum` case (#167326)
eqy Nov 13, 2025
698aa0f
[MPS] sparse_mask_projection (#166260)
Isalia20 Nov 13, 2025
374ee9e
Fix missing thrust includes (#167450)
miscco Nov 13, 2025
7aac506
Revert "[precompile] Integrate AOTI as a backend. (#167338)"
pytorchmergebot Nov 13, 2025
460c7e1
Handle only a Tensor for IntList parsing (#167606)
ezyang Nov 13, 2025
6ea7791
[DebugMode] torch.hash_tensor option (#167486)
pianpwk Nov 13, 2025
b5e0e69
Correctly populate storage offset in DTensor constructor (#167597)
ezyang Nov 12, 2025
e5eb89e
remove allocation of new unbacked symbols during mod eval (#167123)
laithsakka Nov 13, 2025
fadb62f
[PyTorch] fix profiler issue with empty exported trace file (#167601)
TroyGarden Nov 13, 2025
d273422
[CUDA] Large max pool fix (#167427)
Isalia20 Nov 13, 2025
d8384e2
[Inductor] Remove bf16 fallback for atomic_add (#167380)
karthickai Nov 12, 2025
cfb3a6b
[2/N][BugFix][Refactor] fix several instances which use f = open(...)…
lingebeng Nov 13, 2025
38806f3
[inductor, 3.14] fix itertools.product pickle error in test_cpu_repro…
williamwen42 Nov 13, 2025
9ac3fc0
[inductor, 3.14] catch pickle.PicklingError exceptions (#167383)
williamwen42 Nov 13, 2025
23f4f32
[dynamo, 3.14] enable dynamo in 3.14 (#167384)
williamwen42 Nov 13, 2025
4fc6886
[3.14, dataloader] handle forkserver default mp start method in 3.14 …
williamwen42 Nov 13, 2025
940979a
[export, 3.14] handle patching methods with functools.partial correct…
williamwen42 Nov 13, 2025
21f32e4
[dynamo] clean up BaseUserFunctionVariable and LocalGeneratorObjectVa…
williamwen42 Nov 13, 2025
8f00ec3
[dynamo, nested graph breaks] disallow graph breaks in functorch ops,…
williamwen42 Nov 13, 2025
0b3bdb0
[EZ][BE] Remove unnecessary semicolon in Module.cpp (#167756)
malfet Nov 13, 2025
3d06351
[inductor][ez] skip cache for unit test via envvar (#167237)
shunting314 Nov 6, 2025
f79cdc8
[CD] [aarch64] unify the build.sh to build for aarch64 wheel (#166044)
tinglvv Nov 13, 2025
a954242
[MPS] Add Metal complex mm implementation (#167755)
malfet Nov 13, 2025
fe33d7c
Revert "address DDE in matmul decomp (#166541)"
pytorchmergebot Nov 13, 2025
0cd0bd7
address DDE in matmul decomp (#166541)
laithsakka Nov 13, 2025
08de54f
[3.14] Skip failing spherical_bessel_j0 tests (#167691)
rtimpe Nov 13, 2025
532389f
[torchelastic] Add flush option to TailLog (#167169)
cnphil Nov 14, 2025
2ef236e
[3.14, jit] skip jit tests on 3.14+, add jit deprecation warnings to …
williamwen42 Nov 13, 2025
813e5ea
[fx, 3.14] fix assert detection for 3.14 (#167700)
williamwen42 Nov 13, 2025
8cf0bdd
[xpu][fix] Fix conv1d precision error (#162944)
yucai-intel Nov 14, 2025
05bcfcc
[Profiler] Add Documentation for FunctionEvent (#167688)
sraikund16 Nov 14, 2025
96a4c4b
add device generalization support for distributed tests (#165067)
harikodali Nov 14, 2025
79317dc
Fix no source name in backward kernel names; Add flex_attention HOP t…
yushangdi Nov 14, 2025
5e6ac5c
[Pytorch] Improve conversion to bfloat16 on aarch64/NEON (#166958)
Nicoshev Nov 14, 2025
5b1e112
[Dynamo] Imporve-graph-break-skip-logs (#167067)
parsshar-RH Nov 14, 2025
45b2c3d
[OpenReg][Feat][Docs] Enrich OpenReg device management implementation…
KarhouTam Nov 14, 2025
2aba180
Always track _local_scalar_dense output in tensorify_python_scalars. …
laithsakka Nov 14, 2025
5623628
[SymmMem] op to get remote tensors (#167779)
kwen2501 Nov 14, 2025
c78e646
Fix different seq length (#167481)
Microve Nov 14, 2025
50bf1f0
deprecate check_is_size and guard_size_oblivious (#167198)
laithsakka Nov 14, 2025
3522e0c
Revert "Fix different seq length (#167481)"
pytorchmergebot Nov 14, 2025
0e7235e
[xpu][feature] [1/3] add fp8 scaled_mm implementation for XPU (#165978)
Stonepia Nov 14, 2025
e2c6834
Revert "deprecate check_is_size and guard_size_oblivious (#167198)"
pytorchmergebot Nov 14, 2025
f8a2ce3
Fix inplace ops on Partial DTensors to preserve aliasing semantics (#…
RohitRathore1 Nov 14, 2025
226850c
[ATen][CUDA] Add sm_121a flag for RowwiseScaledMM (#167734)
Aidyn-A Nov 14, 2025
b657061
[precompile] Integrate AOTI as a backend. (#167338)
zhxchen17 Nov 14, 2025
bfddfde
Add basic spin config and linting commands (#167226)
zklaus Nov 13, 2025
40e6f09
Re-land "Fix thread safety in getCurrentCUDABlasHandle and getCUDABla…
t-ivan-gr Nov 14, 2025
9d1a74c
Fix mvlgamma_ FPE crash on x86 with integer input (#164230)
RohitRathore1 Nov 14, 2025
99fdca8
[ROCm] Enable StaticCudaLauncher for ROCm (#166492)
chinmaydk99 Nov 14, 2025
02ee7dd
[CUDA][Test] Add `serialTest()` to some `largeTensorTest` tests (#167…
eqy Nov 14, 2025
065176c
[export] Add pytree input check for dynamo_graph_capture_for_export (…
zhxchen17 Nov 14, 2025
7ede33b
Tiling bug fix (#167771)
eellison Nov 14, 2025
e0fff31
[dynamo] Make global state guards and torch function stack guards dro…
zhxchen17 Nov 14, 2025
5eac46a
add assume_32bit_indexing inductor config (#167784)
laithsakka Nov 14, 2025
a74adcf
[codemod][lowrisk] Remove unused exception parameter from caffe2/caff…
r-barnes Nov 14, 2025
dd37a1a
Fix NaN gradients in atan2_backward when both inputs are zero (#166787)
pushkar-hue Nov 14, 2025
1176b2b
[BE]: Update NVTX submodule to 3.3.0 (#167751)
Skylion007 Nov 14, 2025
c429b1f
Ops convolution_backward optional flag bug (#165008)
trichmo Nov 14, 2025
9e2bf12
[MPS] addmm complex fix (#167826)
Isalia20 Nov 14, 2025
caca3f2
Revert "Re-land "Fix thread safety in getCurrentCUDABlasHandle and ge…
pytorchmergebot Nov 14, 2025
5b42a5d
[doc] Add example for torch.is_storage (#161898)
parsshar-RH Nov 14, 2025
8378abd
[torch.export] Fix for flaky test_annotate_on_assert (#167805)
malaybag Nov 14, 2025
d99c6bc
[export] Disable side effects on dynamo_graph_capture_for_export and …
zhxchen17 Nov 14, 2025
2ef85be
Add empty to stable ops (#167592)
mikaylagawarecki Nov 13, 2025
52b45c1
Add reshape, view, flatten to torch/csrc/stable (#167600)
mikaylagawarecki Nov 13, 2025
a2daf3f
[Inductor] Add support bound methods in pattern matcher (#167795)
karthickai Nov 14, 2025
200156e
DTensor: avoid unnecessary DTensorSpec creation in _ToTorchTensor.bac…
swolchok Nov 13, 2025
602102b
Revert "Hide all symbols (except stable/headeronly/shim) if TORCH_STA…
pytorchmergebot Nov 14, 2025
5a368b8
Revert "[CodeClean] Replace std::runtime_error with TORCH_CHECK (#165…
pytorchmergebot Nov 14, 2025
7aa210d
Revert "[CodeClean] Remove the Unused MACRO for AOT Inductor Runtime …
pytorchmergebot Nov 14, 2025
c87295c
[precompile] Support captured global tensors. (#167846)
zhxchen17 Nov 14, 2025
0922ba5
[BE] No need to pass const enum values by reference (#167868)
malfet Nov 14, 2025
d629b7a
Move CppTypeToScalarType to torch/headeronly (#167610)
mikaylagawarecki Nov 14, 2025
f4b8c4f
backed size oblivious checks for expand() (#167689)
aneeshgupta42 Nov 14, 2025
4c79305
[targets2buck] Clean up get_pt_ops_deps (#167690)
bigfootjon Nov 14, 2025
4ed26f7
distributed/debug: add an HTTP server for debugging running jobs (#16…
d4l3k Nov 14, 2025
e20ca3b
Remove python workaround for ContextDecorator (#167049)
cyyever Nov 14, 2025
08042bb
[6/N] Use Python 3.10 typing (#167649)
cyyever Nov 14, 2025
fcfb213
[inductor] layout constraint for weight-norm-bwd (#167667)
shunting314 Nov 13, 2025
ee0b5b4
Add new CI jobs to run dynamo tests on all python versions supported …
guilhermeleobas Nov 13, 2025
1c16382
Revert "distributed/debug: add an HTTP server for debugging running j…
pytorchmergebot Nov 15, 2025
da91bf5
Fix incorrect attention example in ONNX exporter docstring (#167646)
gaurav-redhat Nov 15, 2025
f6b54d8
flight_recorder: move to torch.distributed (#167782)
d4l3k Nov 15, 2025
b7f5277
Add meta registration for scaled_mm_v2 and test (#167653)
slayton58 Nov 14, 2025
cfe799b
Revert "Ops convolution_backward optional flag bug (#165008)"
pytorchmergebot Nov 15, 2025
fb04e9a
[CUDA][CUDA Graphs] Respect node-priority in `cudaGraphInstantiate` (…
eqy Nov 15, 2025
d7782dd
[ATEN][CUDA] Reduce register pressure introduced by CUDA_KERNEL_ASSER…
YyWangCS Nov 15, 2025
bc60b86
Skip stable diffusion models in torchbench, get tests and benchmarks …
zou3519 Nov 15, 2025
de0d69b
Remove useless super() delegation (#167791)
cyyever Nov 15, 2025
3d7a8b7
MPS: Fix clamp scalar cache key to store floats in hex representation…
jhavukainen Nov 15, 2025
c66a6c4
[HOP][print] Add functionalization (make sure ordering) for print (#1…
fxdawnn Nov 15, 2025
530e782
[codemod][lowrisk] Remove unused exception parameter from caffe2/torc…
r-barnes Nov 15, 2025
6ef3a62
Fix typo in FP16 accumulation section (#167703)
jiahy0825 Nov 15, 2025
79d2397
Fix grammar issues in C++ frontend documentation (#167702)
ryanzhou147 Nov 15, 2025
deabb3e
Use c10::filesystem (#167821)
cyyever Nov 15, 2025
d01a7b0
Back out "MatMal - fix folding logic" (#167884)
dstaay-fb Nov 15, 2025
79fc0a9
[xpu][fix]Fall back deterministic `index_copy` to `index_put` on XPU …
chunhuanMeng Nov 15, 2025
0ec53be
Refactor TensorAccessor for headeronly. (#166855)
pearu Nov 15, 2025
5cdbda1
[vision hash update] update the pinned vision hash (#167890)
pytorchupdatebot Nov 16, 2025
98b94b9
[pallas backend] implement gpu tiles/mask for power of 2 (#167584)
oulgen Nov 16, 2025
2245d7d
Improve char printing (#167899)
cyyever Nov 16, 2025
5d99a79
[xpu][test] Migrated two test files to XPU (#166684)
shangerxin Nov 16, 2025
e2e1075
Allow same triton kernels in export (#167862)
minjang Nov 16, 2025
363385a
s/Stragety/Strategy/ (#167916)
ezyang Nov 16, 2025
4322354
[Inductor] optimize scalar welford_reduce (#162709)
jiayisunx Nov 14, 2025
d8ce6f8
Enable PyTorch OSS numerics changes, inductor heuristics (#167799)
PaulZhang12 Nov 17, 2025
aa504d4
[audio hash update] update the pinned audio hash (#167914)
pytorchupdatebot Nov 17, 2025
f2e6f94
deprecate check_is_size and guard_size_oblivious (#167198)
laithsakka Nov 15, 2025
ca3aaef
Fix clamp broadcasting on MPS (Fixes #160734) (#165058)
roei-shlezinger Nov 17, 2025
b9bccec
Revert "[ATen][CUDA] Add sm_121a flag for RowwiseScaledMM (#167734)"
pytorchmergebot Nov 17, 2025
99117c1
Remove old NVTX interface (#167637)
Aidyn-A Nov 17, 2025
5804408
[1/3][XPU][feature] The implementation of memory private pool in XPU …
majing921201 Nov 17, 2025
93ddd38
Re-land#2 "Fix thread safety in getCurrentCUDABlasHandle and getCUDAB…
t-ivan-gr Nov 17, 2025
53809f9
[ARM] Improve LLM performance & mem usage using int4-bf16 KleidiAI ke…
usamahz Nov 17, 2025
661d165
[xla hash update] update the pinned xla hash (#167968)
pytorchupdatebot Nov 17, 2025
6fdb974
Update torch-xpu-ops commit pin (#167698)
CuiYifeng Nov 17, 2025
9ff95f6
[inductor] Expose config for fx bucket all_reduces (#167634)
IvanKobzarev Nov 12, 2025
2b5eabc
Rework PyObject preservation (v2) (#167564)
colesbury Nov 17, 2025
2f74916
Do not hardfail on use nccl estimations for non-nccl (#167827)
IvanKobzarev Nov 17, 2025
2b69673
[CD] Add libopenblas to dep list for AArch64+CPU whl (#167841)
robert-hardwick Nov 17, 2025
1b43d6c
[ROCm] enable fastSpecializedAtomicAdd for gfx950 (#167661)
jeffdaily Nov 17, 2025
4c152a7
Revert "add device generalization support for distributed tests (#165…
pytorchmergebot Nov 17, 2025
39ebab1
Revert "Remove python workaround for ContextDecorator (#167049)"
pytorchmergebot Nov 17, 2025
22ccd44
Revert "Improve char printing (#167899)"
pytorchmergebot Nov 17, 2025
a4c7bf7
Revert "Use c10::filesystem (#167821)"
pytorchmergebot Nov 17, 2025
094e529
[MPS] Fix repeat_interleave with slices (#167961)
malfet Nov 17, 2025
95d1df7
Disable CUDA MXFP4 on non-B200 GPUs (#167857)
slayton58 Nov 17, 2025
77acc66
[ROCm][CI] Upgrade ROCm CI to 7.1 (#166743)
xinyazhang Nov 17, 2025
567dcdb
Fix longstanding race condition around getAllOperatorsFor (#167860)
swolchok Nov 14, 2025
2f3bb74
Improve benchmarks/dynamo:check_perf_csv output and failure summary (…
adabeyta Nov 17, 2025
ae3ce54
Revert "[ROCm] Enable StaticCudaLauncher for ROCm (#166492)"
pytorchmergebot Nov 17, 2025
02b55c3
Move isQIntType to headeronly (#167772)
pearu Nov 16, 2025
1233be0
[STABLE ABI] Add mutable_data_ptr() and const_data_ptr() methods to t…
pearu Nov 16, 2025
01deee2
Fix dataloader tests failing on python 3.14 (#167429)
divyanshk Nov 17, 2025
694f9b9
Revert "[ROCm][CI] Upgrade ROCm CI to 7.1 (#166743)"
pytorchmergebot Nov 17, 2025
4414e1b
Cleanup in inductor usage of nccl estimator after its fix (#167633)
IvanKobzarev Nov 17, 2025
b288d00
[inductor] unittest for run2run determinism (#167482)
shunting314 Nov 15, 2025
689d731
[inductor] fix the decision of inner reduction (#167697)
shunting314 Nov 15, 2025
2ddcf53
Logaddexp complex inconsistent bw cpu and cuda (#163509)
cleonard530 Nov 17, 2025
a892f76
[MPS] mm out sparse (#167908)
Isalia20 Nov 17, 2025
927899d
fixes a few issues with out_dtype overload for addmm/baddbmm (#167931)
ngimel Nov 17, 2025
9d8ceaa
Revert "[ARM] Improve LLM performance & mem usage using int4-bf16 Kle…
pytorchmergebot Nov 17, 2025
bdd3c3a
Support SymInt placeholder in wrapper fxir (#167757)
nandesuka Nov 17, 2025
4e1b772
Fix: Improve fallback behavior in `deserialize_torch_artifact` and re…
abhitorch81 Nov 17, 2025
661fb53
Revert "Remove old NVTX interface (#167637)"
pytorchmergebot Nov 17, 2025
1c04a43
Revert "Tiling bug fix (#167771)"
pytorchmergebot Nov 17, 2025
f69815d
[pallas backend] remove unnecessary mypy comment (#167954)
oulgen Nov 17, 2025
b720887
Revert "deprecate check_is_size and guard_size_oblivious (#167198)"
pytorchmergebot Nov 17, 2025
c4f3d7d
[MPS] remove expected failure for a test (#167922)
Isalia20 Nov 17, 2025
86f9a9a
Revert "[CD] Add libopenblas to dep list for AArch64+CPU whl (#167841)"
pytorchmergebot Nov 17, 2025
9b39276
Revert "[CD] [aarch64] unify the build.sh to build for aarch64 wheel …
pytorchmergebot Nov 17, 2025
71f28f4
[export] Support module type with only __call__ override. (#167874)
zhxchen17 Nov 18, 2025
8a8c634
Tiling bug fix (#167771)
eellison Nov 17, 2025
66f3e4e
[torchfuzz] set default device cuda (#167938)
bobrenjc93 Nov 16, 2025
ee9008a
[torchfuzz] update IGNORE_PATTERNS (#167939)
bobrenjc93 Nov 16, 2025
510cc2e
[torchfuzz] check in test_fuzzer_issue_167937 (#168005)
bobrenjc93 Nov 17, 2025
bc30c98
[torchfuzz] clean up ignore patterns (#168006)
bobrenjc93 Nov 17, 2025
654f3f6
Fix: Dynamo log always emits ANSI color codes into torch_compile_debu…
wmhst7 Nov 18, 2025
bbf39ca
[inductor][fix] subproc autotuning respect cache dir changes (#167918)
nmacchioni Nov 18, 2025
8bb1152
[DTensor] Fix convolution ops with bias=None in torch.compile (#167258)
stmcgovern Nov 18, 2025
2d14e86
[HOP][print][dynamo]Add dynamo for hop print (#167571)
fxdawnn Nov 17, 2025
6eb71ce
[user-streams] Assign streams to gradient accum in bwd (#167513)
mlazos Nov 13, 2025
39f5e0e
[user-streams] Move user object bytecode generation after calling use…
mlazos Nov 13, 2025
1a0a198
Add multiple hiding nodes (#167847)
eellison Nov 17, 2025
63b012a
[CI] Remove --no-use-pep517 from .ci/onnx/test.sh (#168026)
jeffdaily Nov 18, 2025
7ffeb34
[XPU] [Feature] [2/3] add fp8 scaled_mm_v2 implementation for XPU (#1…
Stonepia Nov 18, 2025
ef7fa96
dist: add list_keys to Store API (#167883)
d4l3k Nov 18, 2025
e5e94ec
Introduce HOP for inductor compiled regions to allow torch dispatch (…
jamesjwu Nov 18, 2025
5df0e49
[pallas backend] implement complex numbers (#167947)
oulgen Nov 17, 2025
9ff1922
[pallas backend] implement more ops (#167951)
oulgen Nov 17, 2025
35dae27
[pallas backend] support reductions (#167953)
oulgen Nov 17, 2025
01f94d4
[xpu][test] [1/N] Enable missing Intel GPU inductor tests (#167047)
daisyden Nov 18, 2025
7392106
[user-streams] Stash graph created objects in keep_alive list for bac…
mlazos Nov 18, 2025
db1551b
[pytree][compile] Slightly faster TreeSpec init (#168024)
anijain2305 Nov 17, 2025
2b92b31
[simplefsdp] fix DSV3 autobucketing issue (#167797)
ruisizhang123 Nov 18, 2025
8cb8b6c
[SymmMem] Skip multicast init if any CUDA call fails (#168049)
kwen2501 Nov 18, 2025
d2ccb5b
Follow up on #161891 move additions to stable shim and use version gu…
mikaylagawarecki Nov 18, 2025
3beb378
Fix TORCH_FEATURE_VERSION guards (#167802)
mikaylagawarecki Nov 18, 2025
4c127f1
Split libtorch agnostic tests by feature version (#167803)
mikaylagawarecki Nov 18, 2025
2e907f4
Test libtorch_agnostic with TORCH_TARGET_VERSION on target pytorch ve…
mikaylagawarecki Nov 18, 2025
9760a63
Test that TORCH_FEATURE_VERSION guards are used where needed (#167962)
mikaylagawarecki Nov 18, 2025
2f023bf
[ATen][CUDA] Add sm_121a flag for RowwiseScaledMM (#167734)
Aidyn-A Nov 18, 2025
5605fce
Improve char printing (#167899)
cyyever Nov 18, 2025
d0e7d2e
[xpu][feature][inductor] Enable pad_mm Pass on Intel GPU (#166618)
jianyizh Nov 18, 2025
ee5610f
[BE] Check that swizzle arguments are passed to the call (#167869)
malfet Nov 14, 2025
57f36c9
[ROCm][CI] Upgrade ROCm CI to 7.1 (#166743)
xinyazhang Nov 18, 2025
f077eca
Fix inductor collective runtime units (#168055)
soulitzer Nov 18, 2025
e2b53ba
Do not autolabel PRs with `oncall:distributed` (#168084)
malfet Nov 18, 2025
da5ac4a
Merge remote-tracking branch 'upstream/main' into develop_IFU_20251118
github-actions[bot] Nov 18, 2025
a3c49a9
Fix conflicts and move triton ver to 3.5.0
Nov 19, 2025
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
The table of contents is too big for display.
Diff view
Diff view
  •  
  •  
  •  
25 changes: 21 additions & 4 deletions .ci/docker/almalinux/Dockerfile
Original file line number Diff line number Diff line change
Expand Up @@ -7,13 +7,13 @@ ENV LC_ALL en_US.UTF-8
ENV LANG en_US.UTF-8
ENV LANGUAGE en_US.UTF-8

ARG DEVTOOLSET_VERSION=11
ARG DEVTOOLSET_VERSION=13

RUN yum -y update
RUN yum -y install epel-release
# install glibc-langpack-en make sure en_US.UTF-8 locale is available
RUN yum -y install glibc-langpack-en
RUN yum install -y sudo wget curl perl util-linux xz bzip2 git patch which perl zlib-devel openssl-devel yum-utils autoconf automake make gcc-toolset-${DEVTOOLSET_VERSION}-toolchain
RUN yum install -y sudo wget curl perl util-linux xz bzip2 git patch which perl zlib-devel openssl-devel yum-utils autoconf automake make gcc-toolset-${DEVTOOLSET_VERSION}-gcc gcc-toolset-${DEVTOOLSET_VERSION}-gcc-c++ gcc-toolset-${DEVTOOLSET_VERSION}-gcc-gfortran gcc-toolset-${DEVTOOLSET_VERSION}-gdb
# Just add everything as a safe.directory for git since these will be used in multiple places with git
RUN git config --global --add safe.directory '*'
ENV PATH=/opt/rh/gcc-toolset-${DEVTOOLSET_VERSION}/root/usr/bin:$PATH
Expand Down Expand Up @@ -41,6 +41,7 @@ RUN bash ./install_conda.sh && rm install_conda.sh
# Install CUDA
FROM base as cuda
ARG CUDA_VERSION=12.6
ARG DEVTOOLSET_VERSION=13
RUN rm -rf /usr/local/cuda-*
ADD ./common/install_cuda.sh install_cuda.sh
COPY ./common/install_nccl.sh install_nccl.sh
Expand All @@ -50,7 +51,8 @@ ENV CUDA_HOME=/usr/local/cuda-${CUDA_VERSION}
# Preserve CUDA_VERSION for the builds
ENV CUDA_VERSION=${CUDA_VERSION}
# Make things in our path by default
ENV PATH=/usr/local/cuda-${CUDA_VERSION}/bin:$PATH
ENV PATH=/usr/local/cuda-${CUDA_VERSION}/bin:/opt/rh/gcc-toolset-${DEVTOOLSET_VERSION}/root/usr/bin:$PATH


FROM cuda as cuda12.6
RUN bash ./install_cuda.sh 12.6
Expand All @@ -68,8 +70,22 @@ FROM cuda as cuda13.0
RUN bash ./install_cuda.sh 13.0
ENV DESIRED_CUDA=13.0

FROM ${ROCM_IMAGE} as rocm
FROM ${ROCM_IMAGE} as rocm_base
ARG DEVTOOLSET_VERSION=13
ENV LC_ALL en_US.UTF-8
ENV LANG en_US.UTF-8
ENV LANGUAGE en_US.UTF-8
# Install devtoolset on ROCm base image
RUN yum -y update && \
yum -y install epel-release && \
yum -y install glibc-langpack-en && \
yum install -y sudo wget curl perl util-linux xz bzip2 git patch which perl zlib-devel openssl-devel yum-utils autoconf automake make gcc-toolset-${DEVTOOLSET_VERSION}-gcc gcc-toolset-${DEVTOOLSET_VERSION}-gcc-c++ gcc-toolset-${DEVTOOLSET_VERSION}-gcc-gfortran gcc-toolset-${DEVTOOLSET_VERSION}-gdb
RUN git config --global --add safe.directory '*'
ENV PATH=/opt/rh/gcc-toolset-${DEVTOOLSET_VERSION}/root/usr/bin:$PATH

FROM rocm_base as rocm
ARG PYTORCH_ROCM_ARCH
ARG DEVTOOLSET_VERSION=13
ENV PYTORCH_ROCM_ARCH ${PYTORCH_ROCM_ARCH}
ADD ./common/install_mkl.sh install_mkl.sh
RUN bash ./install_mkl.sh && rm install_mkl.sh
Expand All @@ -88,6 +104,7 @@ COPY --from=cuda13.0 /usr/local/cuda-13.0 /usr/local/cuda-13.0

# Final step
FROM ${BASE_TARGET} as final
ARG DEVTOOLSET_VERSION=13
COPY --from=openssl /opt/openssl /opt/openssl
COPY --from=patchelf /patchelf /usr/local/bin/patchelf
COPY --from=conda /opt/conda /opt/conda
Expand Down
8 changes: 2 additions & 6 deletions .ci/docker/almalinux/build.sh
Original file line number Diff line number Diff line change
Expand Up @@ -36,11 +36,7 @@ case ${DOCKER_TAG_PREFIX} in
;;
rocm*)
BASE_TARGET=rocm
PYTORCH_ROCM_ARCH="gfx900;gfx906;gfx908;gfx90a;gfx942;gfx1030;gfx1100;gfx1101;gfx1102;gfx1200;gfx1201"
# add gfx950, gfx115x conditionally starting in ROCm 7.0
if [[ "$ROCM_VERSION" == *"7.0"* ]]; then
PYTORCH_ROCM_ARCH="${PYTORCH_ROCM_ARCH};gfx950;gfx1150;gfx1151"
fi
PYTORCH_ROCM_ARCH="gfx900;gfx906;gfx908;gfx90a;gfx942;gfx1030;gfx1100;gfx1101;gfx1102;gfx1200;gfx1201;gfx950;gfx1150;gfx1151"
EXTRA_BUILD_ARGS="${EXTRA_BUILD_ARGS} --build-arg PYTORCH_ROCM_ARCH=${PYTORCH_ROCM_ARCH}"
;;
*)
Expand All @@ -63,7 +59,7 @@ docker build \
--target final \
--progress plain \
--build-arg "BASE_TARGET=${BASE_TARGET}" \
--build-arg "DEVTOOLSET_VERSION=11" \
--build-arg "DEVTOOLSET_VERSION=13" \
${EXTRA_BUILD_ARGS} \
-t ${tmp_tag} \
$@ \
Expand Down
43 changes: 36 additions & 7 deletions .ci/docker/build.sh
Original file line number Diff line number Diff line change
Expand Up @@ -168,6 +168,18 @@ case "$tag" in
VISION=yes
TRITON=yes
;;
pytorch-linux-jammy-py3.11-clang12)
ANACONDA_PYTHON_VERSION=3.11
CLANG_VERSION=12
VISION=no
TRITON=no
;;
pytorch-linux-jammy-py3.12-clang12)
ANACONDA_PYTHON_VERSION=3.12
CLANG_VERSION=12
VISION=no
TRITON=no
;;
pytorch-linux-jammy-rocm-n-py3 | pytorch-linux-jammy-rocm-n-py3-benchmarks | pytorch-linux-noble-rocm-n-py3)
if [[ $tag =~ "jammy" ]]; then
ANACONDA_PYTHON_VERSION=3.10
Expand All @@ -176,7 +188,7 @@ case "$tag" in
fi
GCC_VERSION=11
VISION=yes
ROCM_VERSION=7.0
ROCM_VERSION=7.1
NINJA_VERSION=1.9.0
TRITON=yes
KATEX=yes
Expand All @@ -195,9 +207,9 @@ case "$tag" in
NINJA_VERSION=1.9.0
TRITON=yes
;;
pytorch-linux-jammy-xpu-n-py3 | pytorch-linux-jammy-xpu-n-py3-inductor-benchmarks)
pytorch-linux-noble-xpu-n-py3 | pytorch-linux-noble-xpu-n-py3-inductor-benchmarks)
ANACONDA_PYTHON_VERSION=3.10
GCC_VERSION=11
GCC_VERSION=13
VISION=yes
XPU_VERSION=2025.2
NINJA_VERSION=1.9.0
Expand Down Expand Up @@ -248,6 +260,12 @@ case "$tag" in
HALIDE=yes
TRITON=yes
;;
pytorch-linux-jammy-cuda12.8-py3.12-pallas)
CUDA_VERSION=12.8.1
ANACONDA_PYTHON_VERSION=3.12
GCC_VERSION=11
PALLAS=yes
;;
pytorch-linux-jammy-py3.12-triton-cpu)
CUDA_VERSION=12.6
ANACONDA_PYTHON_VERSION=3.12
Expand All @@ -261,19 +279,29 @@ case "$tag" in
PYTHON_VERSION=3.10
CUDA_VERSION=12.8.1
;;
pytorch-linux-jammy-aarch64-py3.10-gcc11)
pytorch-linux-jammy-aarch64-py3.10-gcc13)
ANACONDA_PYTHON_VERSION=3.10
GCC_VERSION=11
GCC_VERSION=13
ACL=yes
VISION=yes
OPENBLAS=yes
# snadampal: skipping llvm src build install because the current version
# from pytorch/llvm:9.0.1 is x86 specific
SKIP_LLVM_SRC_BUILD_INSTALL=yes
;;
pytorch-linux-jammy-aarch64-py3.10-gcc11-inductor-benchmarks)
pytorch-linux-jammy-aarch64-py3.10-clang21)
ANACONDA_PYTHON_VERSION=3.10
GCC_VERSION=11
CLANG_VERSION=21
ACL=yes
VISION=yes
OPENBLAS=yes
# snadampal: skipping llvm src build install because the current version
# from pytorch/llvm:9.0.1 is x86 specific
SKIP_LLVM_SRC_BUILD_INSTALL=yes
;;
pytorch-linux-jammy-aarch64-py3.10-gcc13-inductor-benchmarks)
ANACONDA_PYTHON_VERSION=3.10
GCC_VERSION=13
ACL=yes
VISION=yes
OPENBLAS=yes
Expand Down Expand Up @@ -359,6 +387,7 @@ docker build \
--build-arg "INDUCTOR_BENCHMARKS=${INDUCTOR_BENCHMARKS}" \
--build-arg "EXECUTORCH=${EXECUTORCH}" \
--build-arg "HALIDE=${HALIDE}" \
--build-arg "PALLAS=${PALLAS}" \
--build-arg "XPU_VERSION=${XPU_VERSION}" \
--build-arg "UNINSTALL_DILL=${UNINSTALL_DILL}" \
--build-arg "ACL=${ACL:-}" \
Expand Down
1 change: 1 addition & 0 deletions .ci/docker/ci_commit_pins/jax.txt
Original file line number Diff line number Diff line change
@@ -0,0 +1 @@
0.8.0
2 changes: 1 addition & 1 deletion .ci/docker/ci_commit_pins/triton.txt
Original file line number Diff line number Diff line change
@@ -1 +1 @@
ac80c4190aa0321f761a08af97e1e1eee41f01d9
5df9c723de8c23508773b07fe16dd34e4c444541
4 changes: 2 additions & 2 deletions .ci/docker/common/install_clang.sh
Original file line number Diff line number Diff line change
Expand Up @@ -8,8 +8,8 @@ if [ -n "$CLANG_VERSION" ]; then
# work around ubuntu apt-get conflicts
sudo apt-get -y -f install
wget --no-check-certificate -O - https://apt.llvm.org/llvm-snapshot.gpg.key | sudo apt-key add -
if [[ $CLANG_VERSION == 18 ]]; then
apt-add-repository "deb http://apt.llvm.org/jammy/ llvm-toolchain-jammy-18 main"
if [[ $CLANG_VERSION -ge 18 ]]; then
apt-add-repository "deb http://apt.llvm.org/jammy/ llvm-toolchain-jammy-${CLANG_VERSION} main"
fi
fi

Expand Down
4 changes: 2 additions & 2 deletions .ci/docker/common/install_gcc.sh
Original file line number Diff line number Diff line change
Expand Up @@ -7,11 +7,11 @@ if [ -n "$GCC_VERSION" ]; then
# Need the official toolchain repo to get alternate packages
add-apt-repository ppa:ubuntu-toolchain-r/test
apt-get update
apt-get install -y g++-$GCC_VERSION
apt-get install -y g++-$GCC_VERSION gfortran-$GCC_VERSION
update-alternatives --install /usr/bin/gcc gcc /usr/bin/gcc-"$GCC_VERSION" 50
update-alternatives --install /usr/bin/g++ g++ /usr/bin/g++-"$GCC_VERSION" 50
update-alternatives --install /usr/bin/gcov gcov /usr/bin/gcov-"$GCC_VERSION" 50

update-alternatives --install /usr/bin/gfortran gfortran /usr/bin/gfortran-"$GCC_VERSION" 50

# Cleanup package manager
apt-get autoclean && apt-get clean
Expand Down
40 changes: 40 additions & 0 deletions .ci/docker/common/install_jax.sh
Original file line number Diff line number Diff line change
@@ -0,0 +1,40 @@
#!/bin/bash

set -ex

source "$(dirname "${BASH_SOURCE[0]}")/common_utils.sh"

# Get the pinned JAX version (same for all CUDA versions)
JAX_VERSION=$(get_pinned_commit /ci_commit_pins/jax)

function install_jax_12() {
echo "Installing JAX ${JAX_VERSION} with CUDA 12 support"
pip_install "jax[cuda12]==${JAX_VERSION}" -f https://storage.googleapis.com/jax-releases/jax_cuda_releases.html

# Verify installation
python -c "import jax" # check for errors
echo "JAX ${JAX_VERSION} installation completed successfully for CUDA 12"
}

function install_jax_13() {
echo "Installing JAX ${JAX_VERSION} with CUDA 13 support"
pip_install "jax[cuda13]==${JAX_VERSION}" -f https://storage.googleapis.com/jax-releases/jax_cuda_releases.html

# Verify installation
python -c "import jax" # check for errors
echo "JAX ${JAX_VERSION} installation completed successfully for CUDA 13"
}

# idiomatic parameter and option handling in sh
while test $# -gt 0
do
case "$1" in
12.4|12.6|12.6.*|12.8|12.8.*|12.9|12.9.*) install_jax_12;
;;
13.0|13.0.*) install_jax_13;
;;
*) echo "bad argument $1"; exit 1
;;
esac
shift
done
56 changes: 56 additions & 0 deletions .ci/docker/common/install_libgomp.sh
Original file line number Diff line number Diff line change
@@ -0,0 +1,56 @@
#!/bin/bash
# Script used only in CD pipeline

set -ex

# install dependencies
dnf -y install gmp-devel libmpc-devel texinfo flex bison

cd /usr/local/src
# fetch source for gcc 13
git clone --depth 1 --single-branch -b releases/gcc-13.3.0 https://github.com/gcc-mirror/gcc.git gcc-13.3.0

mkdir -p gcc-13.3.0/build-gomp
cd gcc-13.3.0/build-gomp

# configure gcc build
# I got these flags by:
# 1. downloading the source rpm for gcc-11 on AlmaLinux 8 container
# dnf install -y dnf-plugins-core rpmdevtools
# dnf download --source libgomp
# 2. extracting the gcc.spec from the source.
# rpmdev-extract gcc-xx.src.rpm
# 3. extracting optflags and ld_flags from gcc.spec:
# rpm --eval '%{optflags}'
# rpm --eval '%{build_ldflags}'
#
# I had to remove the following flags because they didn't compile for this version of libgomp:
# -Werror=format-security
# -specs=/usr/lib/rpm/redhat/redhat-hardened-cc1
# -specs=/usr/lib/rpm/redhat/redhat-annobin-cc1
#
# I added -march=armv8-a -mtune=generic to make them explicit. I don't think they're strictly needed.

OPT_FLAGS='-O2 -march=armv8-a -mtune=generic'\
' -fexceptions -g -grecord-gcc-switches -pipe -Wall'\
' -Wp,-D_FORTIFY_SOURCE=2 -Wp,-D_GLIBCXX_ASSERTIONS'\
' -fstack-protector-strong -fasynchronous-unwind-tables'\
' -fstack-clash-protection'

LDFLAGS='-Wl,-z,relro -Wl,--as-needed -Wl,-z,now'

CFLAGS="$OPT_FLAGS" \
CXXFLAGS="$OPT_FLAGS" \
LDFLAGS="$LDFLAGS" \
../configure \
--prefix=/usr \
--libdir=/usr/lib64 \
--enable-languages=c,c++ \
--disable-multilib \
--disable-bootstrap \
--enable-libgomp

# only build libgomp
make -j$(nproc) all-target-libgomp

make install-target-libgomp
1 change: 1 addition & 0 deletions .ci/docker/common/install_openblas.sh
Original file line number Diff line number Diff line change
Expand Up @@ -10,6 +10,7 @@ git clone https://github.com/OpenMathLib/OpenBLAS.git -b "${OPENBLAS_VERSION}" -

OPENBLAS_CHECKOUT_DIR="OpenBLAS"
OPENBLAS_BUILD_FLAGS="
CC=gcc
NUM_THREADS=128
USE_OPENMP=1
NO_SHARED=0
Expand Down
18 changes: 10 additions & 8 deletions .ci/docker/common/install_rocm.sh
Original file line number Diff line number Diff line change
Expand Up @@ -60,14 +60,16 @@ EOF
DEBIAN_FRONTEND=noninteractive apt-get install -y --allow-unauthenticated rocm-llvm-dev
fi

# precompiled miopen kernels added in ROCm 3.5, renamed in ROCm 5.5
# search for all unversioned packages
# if search fails it will abort this script; use true to avoid case where search fails
MIOPENHIPGFX=$(apt-cache search --names-only miopen-hip-gfx | awk '{print $1}' | grep -F -v . || true)
if [[ "x${MIOPENHIPGFX}" = x ]]; then
echo "miopen-hip-gfx package not available" && exit 1
else
DEBIAN_FRONTEND=noninteractive apt-get install -y --allow-unauthenticated ${MIOPENHIPGFX}
if [[ $(ver $ROCM_VERSION) -lt $(ver 7.1) ]]; then
# precompiled miopen kernels added in ROCm 3.5, renamed in ROCm 5.5, removed in ROCm 7.1
# search for all unversioned packages
# if search fails it will abort this script; use true to avoid case where search fails
MIOPENHIPGFX=$(apt-cache search --names-only miopen-hip-gfx | awk '{print $1}' | grep -F -v . || true)
if [[ "x${MIOPENHIPGFX}" = x ]]; then
echo "miopen-hip-gfx package not available" && exit 1
else
DEBIAN_FRONTEND=noninteractive apt-get install -y --allow-unauthenticated ${MIOPENHIPGFX}
fi
fi

# ROCm 6.0 had a regression where journal_mode was enabled on the kdb files resulting in permission errors at runtime
Expand Down
4 changes: 2 additions & 2 deletions .ci/docker/common/install_rocm_magma.sh
Original file line number Diff line number Diff line change
Expand Up @@ -12,8 +12,8 @@ function do_install() {

rocm_version_nodot=${rocm_version//./}

# post merge of https://github.com/icl-utk-edu/magma/pull/65
MAGMA_VERSION=c0792ae825fb36872784892ea643dd6f3456bc5f
# https://github.com/icl-utk-edu/magma/pull/65
MAGMA_VERSION=d6e4117bc88e73f06d26c6c2e14f064e8fc3d1ec
magma_archive="magma-rocm${rocm_version_nodot}-${MAGMA_VERSION}-1.tar.bz2"

rocm_dir="/opt/rocm"
Expand Down
Loading