Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
1668 commits
Select commit Hold shift + click to select a range
0442125
[Inductor] Restore original dtype for rank-0 CPU tensors (#166118)
blaine-rister Oct 24, 2025
31584f2
Add a Claude skill for writing docstrings. (#166175)
ezyang Oct 24, 2025
2c851c1
[FX][ez] fix the split_module tutorial code (#166154)
shunting314 Oct 24, 2025
6038e47
[Dynamo][Logging]Fix regression on stack adding to latest bytecode by…
fxdawnn Oct 24, 2025
c9b49e5
[MPS] Add `linalg.householder_product` for MPS (#166090)
kurtamohler Oct 24, 2025
32ac38f
[lint] workflow consistency linter to look at all files instead of ju…
clee2000 Oct 24, 2025
b04173b
[ONNX] Add a test to backed_size_oblivious patch in onnx (#166196)
titaiwangms Oct 24, 2025
b6a4236
[label_to_label] minor updates (#166172)
zou3519 Oct 24, 2025
82473c3
[torch.export] Add original module type to UnflattenedModule class (#…
malaybag Oct 24, 2025
4fc06f2
Use std::min for #165927 (#166199)
clee2000 Oct 24, 2025
bc11a42
[inductor][ez] fix score fusion memory typo (#166029)
shunting314 Oct 21, 2025
cc20b7a
[FlexFlash] update names (#166193)
drisspg Oct 24, 2025
98c8183
Add HIDDEN_NAMESPACE_BEGIN and END macros for hiding header APIs (#16…
janeyx99 Oct 24, 2025
dfdb68e
Hide all APIs in torch::stable (#166077)
janeyx99 Oct 24, 2025
cddd5f7
Hide stable Library structs instead of using anon namespace (#166078)
janeyx99 Oct 24, 2025
d486eee
Hide APIs in torch::headeronly (#166079)
janeyx99 Oct 24, 2025
9d0b77f
[10/N] Apply ruff UP035 rule (#165709)
cyyever Oct 25, 2025
79a4a9c
Fix race condition and make CUDA kthvalue deterministic (#165762)
nick-kuhn Oct 25, 2025
0c9763a
[Autograd] Add Default Autograd Fallback for PrivateUse1 in PyTorch (…
fffrog Oct 24, 2025
1d13c31
[OpenReg] Remove the Unnecessary Fallback Implementation for Autograd…
fffrog Oct 24, 2025
42bd210
[dynamo] Avoid ID_MATCH on methods - use CLOSURE_MATCH on functions (…
anijain2305 Oct 24, 2025
0a5d68d
[dynamo] Remove unnecessary NAME_MATCH guard (#166112)
anijain2305 Oct 24, 2025
8aa465f
[MPS] Migrate `angle` to Metal ops (#166210)
malfet Oct 25, 2025
761f946
[ROCm] new implementation of upsample_bilinear2d_backward (#164572)
glen-amd Oct 25, 2025
2efcf3c
Reverts #163712 and forces allgather/scatter inputs/outputs to be con…
ngimel Oct 25, 2025
b31bad1
[Pytorch] Enable autovec on aarch64 for type conversion (#166049)
Nicoshev Oct 25, 2025
de7fdfe
Export flex attention with kwargs and DTensor (#166045)
yiming0416 Oct 25, 2025
1d58d5f
[hops] fix unbacked runtime asserts for cond higher order op (#165893)
bobrenjc93 Oct 25, 2025
003601a
Set prefer_deferred_runtime_asserts_over_guards to True (#165820)
justinchuby Oct 25, 2025
1e2e7cb
Add doc for Symmetric Memory (#166148)
kwen2501 Oct 25, 2025
78bcfcf
[fx] Optimize torch.fx.Node.replace_all_uses_with (#165889)
jansel Oct 24, 2025
7924e3a
Remove likely unnecessary _EXPAND trick for non-windows in HIDDEN_NAM…
janeyx99 Oct 24, 2025
eb83c3c
Clean up unused Pyrefly suppressions (#166178)
maggiemoss Oct 25, 2025
ec51b13
Factor out shared scaled mm routines (#166139)
slayton58 Oct 23, 2025
c9bc00f
Split grouped_mm methods into their own file (#166140)
slayton58 Oct 23, 2025
661a560
[AI Codemod][DevmateFBSourceTestFailureBot] Fix for T241916639 ("Your…
XilunWu Oct 25, 2025
b0e9c86
[MPS] Move hypot to Metal (#166216)
malfet Oct 25, 2025
798a6d2
[Inductor][Autotune] Gracefully restart the autotune process after UL…
Aidyn-A Oct 25, 2025
74e53d0
[TorchScript] clearer debug for ConcreteModuleType::findSubmoduleConc…
cp2923 Oct 25, 2025
b55b779
Add file size limits to linters and refactor grep_linter (#166202)
aorenste Oct 24, 2025
516e589
Revert "Export flex attention with kwargs and DTensor (#166045)"
pytorchmergebot Oct 25, 2025
d97f655
[Intel GPU] Xpu matmul implementation for complex dtype (#160867)
PawelSwider2000 Oct 25, 2025
39a70ce
[feat]: add optimized exp_u20 implementation from Arm Optimized Routi…
Anallear Oct 25, 2025
621ba05
[cuDNN][SDPA] Handle `c10:Error` when checking device capability for …
eqy Oct 25, 2025
c7eee49
Fix pyrefly ignores 1/n (#166239)
maggiemoss Oct 26, 2025
25909d2
Simplify SingletonOrSharedTypePtr (#166183)
swolchok Oct 24, 2025
cdb60e4
[Inductor] Naive foreach autotune support (#162053)
jataylo Oct 26, 2025
8f80892
Use correct pyrefly syntax in suppressions distributed/... (#166241)
maggiemoss Oct 26, 2025
5121499
Fix pyrefly ignore syntax in /tools/... (#166240)
maggiemoss Oct 26, 2025
84b14f3
Fix error suppression syntax in utils and nn (#166242)
maggiemoss Oct 26, 2025
f863550
[dtensor] fix incorrect norm calculation for Partial DTensors (#159856)
ring00 Oct 26, 2025
a60d9e1
Fix flake8 B028 warnings (#166224)
cyyever Oct 26, 2025
e4c0101
Mark FlexAttentionBackward as cacheable (#165996)
jamesjwu Oct 26, 2025
262830d
[dynamo] Repro for 166238 (#166252)
anijain2305 Oct 26, 2025
a2b6afe
[dynamo][guards] CLASS_MATCH guard for readability (#166217)
anijain2305 Oct 26, 2025
154e4d3
Fix pyrelfy ignore syntax in distributions and ao (#166248)
maggiemoss Oct 26, 2025
86f9f1d
Enable local tensor model for DTensor redistribute tests (#166081)
dzmitry-huba Oct 26, 2025
507614b
Add GraphModule.recompile_submodules, use for regional inductor (#166…
jamesjwu Oct 26, 2025
27302a4
Fix error suppression syntax in onnx, jit, _dynamo (#166249)
maggiemoss Oct 27, 2025
9940e89
Fix pyrefly ignore syntax in _inductor (#166247)
maggiemoss Oct 27, 2025
000f495
[DeviceMesh] Use _flatten_rank_map to replace _flatten_mesh_list so t…
fduwjj Oct 27, 2025
c58d0ad
Propose Out-of-tree Backend Integration (PrivateUse1) as a module and…
jgong5 Oct 21, 2025
fa4cb91
add support for ir scalar literal parsing for inf/-inf/True/False (#…
weinanliu Oct 27, 2025
79aa88c
Remove old ROCm version checks and branches (#166111)
cyyever Oct 27, 2025
4e6afa8
[BE][Opinfo] Mark `[c]double` as unsupported for MPS (#166213)
malfet Oct 27, 2025
81fa4a2
Enable Intel GPU on 4 unit test cases (#165405)
daisyden Oct 27, 2025
4c38887
[rfc] add debug mode to print meta in fx graphs (#165874)
bobrenjc93 Oct 22, 2025
6530bc7
[DeviceMesh] Implement a device mesh concatenate api for submesh and …
fduwjj Oct 27, 2025
173bcda
Quick fix of torch.save memory leak (#165204)
ppwwyyxx Oct 27, 2025
90b30eb
Update torch-xpu-ops commit pin (#166129)
CuiYifeng Oct 27, 2025
8d4e488
Remove JITFunction constexpr and some arg_names (#166280)
oulgen Oct 27, 2025
90d7be3
Update slow tests (#165894)
pytorchupdatebot Oct 27, 2025
4295a9a
[xla hash update] update the pinned xla hash (#165895)
pytorchupdatebot Oct 27, 2025
7ce723d
[AOTI] Remove c10 as linked library (#165489)
desertfire Oct 24, 2025
e214af6
[Pytorch] Improve float32 erf() on aarch64 (#166262)
Nicoshev Oct 27, 2025
f2c8163
[DeviceMesh][2D] Use concatenate for 2D (FSDP+TP) instead of getting …
fduwjj Oct 27, 2025
f89a7e9
[1/N][Fix] Fix typo in aten folder (#166126)
lingebeng Oct 27, 2025
61bad3c
[dynamo] Move some FUNCTION_MATCH to CLOSURE_MATCH (#166244)
anijain2305 Oct 26, 2025
610c09f
[dynamo] Fix python_type for UserDefinedClassExceptionVariable (#166251)
anijain2305 Oct 26, 2025
99e07c3
[dynamo][misc] Replace UserFunctionVariable with VariableTracker buil…
anijain2305 Oct 26, 2025
a988510
Revert "Simplify the CUPTI CMake check for kineto (#161370)"
pytorchmergebot Oct 27, 2025
a076b4d
Use std::min for #166021 (#166195)
clee2000 Oct 27, 2025
eb2bad5
[Inductor] Make combo kernel MAX_NUM_ARGS configurable (#166274)
andyanwang Oct 26, 2025
a04edcb
[inductor] a few workspace api change (#166204)
shunting314 Oct 27, 2025
3f69b4d
[ROCm][tunableop] Fixes flaky test issue (#166084)
sarthaktandon9amd Oct 27, 2025
6ecd6b2
Document limitations of weights_only in SECURITY.md and torch.load do…
mikaylagawarecki Oct 27, 2025
c6a02ea
Add XLAHooksInterface to bazel file (#166179)
clee2000 Oct 27, 2025
36a48e7
Fix existing pyrefly errors on main (#166312)
maggiemoss Oct 27, 2025
8887a33
[PyTorch] Improve conversion from/to FP16 on aarch64+sve (#166306)
Nicoshev Oct 27, 2025
f6951cb
[dynamo] Fix recompilation error message to point to new programming …
eun2ce Oct 27, 2025
6096c0f
Export should use aot_export_joint_with_descriptors (#165931)
tugsbayasgalan Oct 27, 2025
9901d44
[torch/utils][Code Clean] Clean asserts in `torch/utils/*.py` (#165410)
KarhouTam Oct 27, 2025
d049ed2
[BE] Fix metal compilation warnings (#166315)
malfet Oct 27, 2025
ee7434b
[dynamo][guards] 1/N Guard selectively for DTensor (#165824)
anijain2305 Oct 27, 2025
60bcb4e
[pipeline][be] refactored pipeline composability tests (#165701)
anshul-si Oct 17, 2025
483845a
[DTensor][Op] fix for DTensor ops with Partial placements (#165962)
anshul-si Oct 21, 2025
7d16fcf
Re-re-re-re-apply "C++-accessible Placements via pybind11 (#163030)" …
swolchok Oct 27, 2025
904abfc
Export flex attention with kwargs and DTensor (#166045)
fduwjj Oct 27, 2025
47ec1e9
Support regional inductor with custom config (#166269)
tugsbayasgalan Oct 27, 2025
2ce894b
[dynamo] Dont guard on numpy Cython functions (#166328)
anijain2305 Oct 27, 2025
840d63c
Update cuDNN 9.10.2 in Manylinux 2.28 Docker files (#165913)
atalman Oct 27, 2025
2a5f87d
[cuDNN] Smoke-test runtime cuDNN version matches compile time version…
eqy Oct 27, 2025
92381a5
[ROCm] Custom OpenBLAS library name (#166333)
jayhawk-commits Oct 27, 2025
9a91486
[Inductor-FX] Don't flatten constant args (#166144)
chenmillie Oct 27, 2025
1e836bc
[MPS] fix large matmul test device (#166271)
Isalia20 Oct 27, 2025
8e1e4ee
[reland][dynamo][easy] Support torch.accelerator.current_accelerator …
anijain2305 Oct 27, 2025
b44423b
[inductor][choices] lookup table choices 1/3 (#164978)
coconutruben Oct 27, 2025
a51f877
Enable local tensor mode for another set of DTensor tests (#166105)
dzmitry-huba Oct 27, 2025
47f50cf
[torchfuzz] check in more ignore regexes (#166187)
bobrenjc93 Oct 27, 2025
0ae3e30
[torchfuzz] fix group norm operator (#166188)
bobrenjc93 Oct 27, 2025
5e769ff
[CD] Upgrade to CUDA 13.0.2 for nightly binaries (#165470)
tinglvv Oct 28, 2025
e95920e
[Optimus] Rename the post_grad_graph tlparse log (#166109)
mengluy0125 Oct 28, 2025
dc011d3
[inductor][ez] add overridable env var for disabling fx graph cache (…
shunting314 Oct 23, 2025
46d17e8
[Symm mem] Add a unit test for mempool tensor with dist collective (#…
fduwjj Oct 24, 2025
f245079
[torchfuzz] make pointwise subclasses defined torch_op_name (#166220)
bobrenjc93 Oct 27, 2025
7ae8aaf
[torchfuzz] add sdpa operator (#166189)
bobrenjc93 Oct 27, 2025
7045aab
[torchfuzz] add mhaf operator (#166190)
bobrenjc93 Oct 27, 2025
8af9ed0
[torchfuzz] split, chunk, stack, cat, expand, gather, cumsum, clamp, …
bobrenjc93 Oct 27, 2025
1425b40
[inductor] Fix argmin/argmax returning incorrect indices for non-cont…
karthickai Oct 27, 2025
add37ba
[MPS] Better error checking for FFT ops (#166272)
malfet Oct 28, 2025
17bdb23
[GR v0] AOTI Enablement - Fix GR model AOTI inplace update by skippin…
yingjizhang Oct 28, 2025
236ce73
[reland] Add provenance to inductor IR nodes created after graph.run …
yushangdi Oct 28, 2025
74336f8
Revert "[CD] Upgrade to CUDA 13.0.2 for nightly binaries (#165470)"
pytorchmergebot Oct 28, 2025
a76b59c
[dynamo] local_map error message for reordered inputs (#164780)
xmfan Oct 26, 2025
06e71c8
[hop] local_map MoE: fix unbacked symints during tracing and symint a…
xmfan Oct 27, 2025
8417981
[dynamo, nested graph breaks] add TestCaseWithNestedGraphBreaks subcl…
williamwen42 Oct 27, 2025
e0ca304
[dynamo, nested graph breaks] remove _dynamo.utils.counter patch on i…
williamwen42 Oct 27, 2025
d8283a3
[dynamo, nested graph breaks] fix RETURN_VALUE tx skipping in nested …
williamwen42 Oct 27, 2025
7f7a280
[dynamo, nested graph breaks] disable nested graph breaks in generato…
williamwen42 Oct 27, 2025
ea698e8
[dynamo, nested graph breaks] disallow nested graph breaks in HOPs (#…
williamwen42 Oct 27, 2025
f452edd
[dynamo, 3.14] fix misc. bugs to get most dynamo unittests passing lo…
williamwen42 Oct 27, 2025
ff46d5a
[Inductor][Triton][FP8] Support deepseek-style scaling in Inductor (#…
jananisriram Oct 28, 2025
a77f5d9
[ROCm] Use a ROCm version string without hash. (#166336)
naromero77amd Oct 28, 2025
f93ea7d
[export] Update dynamo_graph_capture_for_export to return GraphModule…
zhxchen17 Oct 28, 2025
6586815
[dynamo] Guard selectively on the torch APIs (#166329)
anijain2305 Oct 27, 2025
02095cc
[dynamo] Dont guard on getset descriptors for torch_function (#166346)
anijain2305 Oct 27, 2025
9139368
[PyTorch] Use events from pool in copy_device_to_device (#165647)
banitag1 Oct 28, 2025
5d0b3e2
[inductor] generate fused rms/layer norm bwd (#165370)
shunting314 Oct 27, 2025
13413b3
[AMP][Refactor] Autocast dtype handling to simplify device-specific c…
KarhouTam Oct 28, 2025
ebb2b2e
[dynamo] fix store attr graph break in with block (#166036)
williamwen42 Oct 21, 2025
32fe4f6
[dynamo] fix keyerror in resume_execution (again) (#166040)
williamwen42 Oct 27, 2025
85a7c74
[triton][nativert] Add num_cpu_threads for triton-cpu (#166255)
minjang Oct 28, 2025
be28329
[Pytorch] Update Kineto Submodule (#166317)
sraikund16 Oct 28, 2025
e137cd0
docs: fix typos (#164879)
RomanKrasavtsev Oct 28, 2025
110efe4
Revert "[inductor][choices] lookup table choices 1/3 (#164978)"
pytorchmergebot Oct 28, 2025
34d6ef7
Update gm.print_readable to include Annotation (#165397)
SherlockNoMad Oct 28, 2025
3041ede
Improve eig tests in preparation for new eig backends (#166322)
johannesz-codes Oct 28, 2025
544b443
[CD] Upgrade to CUDA 13.0.2 for nightly binaries (#165470)
tinglvv Oct 28, 2025
5016e7b
[FlexAttention] Add mechanism to get optimal autotune decision (#165817)
drisspg Oct 28, 2025
0eacd93
Revert "Update cuDNN 9.10.2 in Manylinux 2.28 Docker files (#165913)"
pytorchmergebot Oct 28, 2025
ac84126
[ROCm] skip AsyncTP test class as AsyncTP is not supported on ROCm (#…
pragupta Oct 28, 2025
a4a0378
Revert "[cuDNN] Smoke-test runtime cuDNN version matches compile time…
pytorchmergebot Oct 28, 2025
acd936c
[1/2] Split `cublasCommonArgs` into its own file (#166313)
slayton58 Oct 27, 2025
5ebf74a
[2/2] Move scaled_mm routines to their own file (#166314)
slayton58 Oct 27, 2025
43c30f6
Use correct layout convention for skills (#166265)
ezyang Oct 26, 2025
8110ce0
Add a skill for writing skills (#166266)
ezyang Oct 26, 2025
2dc5645
refactor: pull _replace_node common functionality out of Scheduler.fi…
aorenste Oct 27, 2025
895795f
[ROCm][CI] forward fix kineto submodule bump (#166421)
jeffdaily Oct 28, 2025
687c15c
[AOTI][BE] Change test_aoti_inference to one-pass build (#164277)
desertfire Oct 28, 2025
1abfa5f
[EZ][MPS] Improve distribution error checking (#166425)
malfet Oct 28, 2025
e3e93c7
[MPS] Fix random in-place ops on non-contiguous tensors (#165267)
ElanaPearl Oct 28, 2025
a25818c
Fix image display on pypi project description section (#166404)
atalman Oct 28, 2025
0e46a10
[ONNX] Warn when it's training (#166412)
titaiwangms Oct 28, 2025
009ea77
Remove not needed code path. (#166278)
laithsakka Oct 28, 2025
21b48f8
Fixes torch.compile(nn.ModuleList()) changes bool() behavior (#159208)
i3hz Oct 28, 2025
b903018
[CD] Windows builds migrate python 3.14rc1->3.14.0 (#166408)
atalman Oct 28, 2025
7379972
Revert "[Inductor] Naive foreach autotune support (#162053)"
pytorchmergebot Oct 28, 2025
8aa087a
[ez] Fix print for failing test when entire file fails (#166420)
clee2000 Oct 28, 2025
3895ce0
[inductor] add in-kernel nan-check (#166008)
shunting314 Oct 27, 2025
b5189e2
NVFP4 grouped gemm support via. FBGEMM kernels (#166308)
slayton58 Oct 28, 2025
551921d
Change t.is_cuda to t.device.type == 'cuda' in torch/utils/viz (#156418)
azzhipa Oct 28, 2025
08ae550
support batch size=0 for flash attention (#166318)
liangel-02 Oct 28, 2025
1fdef66
Revert "[Pytorch] Update Kineto Submodule (#166317)"
pytorchmergebot Oct 28, 2025
572cc12
Move MaskPartial to placement_types to improve discoverability (#164414)
swolchok Oct 28, 2025
84a2715
[dynamo] Revert C++-fying of symbolic shape guards (#166427)
anijain2305 Oct 28, 2025
fea819e
added type annotation to _NoParamDecoratorContextManager.__new__ (#16…
randolf-scholz Oct 28, 2025
d9483d4
[dynamo] Clean up assert in dynamo [3/N] (#165903)
can-gaa-hou Oct 28, 2025
f36f372
bwd pass (#164504)
liangel-02 Oct 28, 2025
a1eb6b5
[dynamo][guards] Do not guard on the queue_callback (#166437)
anijain2305 Oct 28, 2025
68b3984
[xpu][test] Enable skipped `SparseAdam` UTs (#166375)
hoshibara Oct 28, 2025
f167fd0
[annotation] Override metadata on regenerated node in functional mode…
yushangdi Oct 28, 2025
3cc5949
Remove global pytree registration for blockmask (#166434)
yiming0416 Oct 28, 2025
6d5e651
[user-streams] update stream context to use fork/join (#162903)
mlazos Oct 28, 2025
b060e5c
[dynamo] Move more FUNCTION_MATCH to CLOSURE_MATCH (#166444)
anijain2305 Oct 28, 2025
0d4992c
[dynamo][easy] Use CONSTANT_MATCH for __code__ guard (#166445)
anijain2305 Oct 28, 2025
a9b29ca
Add attention benchmarking numbers to pytorch operator microbenchmark…
jainapurva Oct 28, 2025
31e42eb
Fix pyrefly ignore syntax (#166438)
maggiemoss Oct 29, 2025
2a058bf
[ROCm][tunableop] Fixed Offline Tuning file writing (#166074)
sarthaktandon9amd Oct 29, 2025
56afad4
[precompile] Pickle and check closure variable properly. (#166351)
zhxchen17 Oct 29, 2025
84fe848
Fix pyrefly error syntax (2/n) (#166448)
maggiemoss Oct 29, 2025
afaaaa3
[BE] Move GreenContext implementation details to cpp (#166462)
malfet Oct 29, 2025
48e672d
[dcp][state_dict] Make `_flatten_optim_state_dict` and `_unflatten_op…
wz337 Oct 29, 2025
bea89d6
[PyTorch] Improve conversion from/to bool on aarch64+sve (#166330)
Nicoshev Oct 29, 2025
adedf26
Support python slicing with tensor inputs. (#165074)
laithsakka Oct 28, 2025
76b2c37
[1/N] Remove unused loop variables (#166258)
cyyever Oct 29, 2025
4fada51
Fix existing Pyrefly errors (#166439)
maggiemoss Oct 29, 2025
877f126
[MPS] Improve index_select error checking (#166468)
malfet Oct 28, 2025
f8b4c00
intfs + unit tests (#164723)
nmacchioni Oct 29, 2025
aab27b0
[user-streams] Move StreamContextVariable into streams module (#164343)
mlazos Oct 28, 2025
e105a47
[user-streams] Have StreamVariable inherit from StreamContextVariable…
mlazos Oct 28, 2025
c201a1c
[OpenReg] Update Installation in README.md (#166235)
can-gaa-hou Oct 29, 2025
c9eabad
Suppress std::hardware_destructive_interference_size warning on GCC 1…
guangyey Oct 27, 2025
1764f3a
[Fix] fix gramma error in PyTorch docs (#166158)
lingebeng Oct 29, 2025
695cb0d
[2/N][Fix] Fix typo in test folder (#166374)
lingebeng Oct 29, 2025
dd1fe7c
Remove clang-tidy type conversion suppressions (#166398)
cyyever Oct 29, 2025
753d9bd
Introduce a new API torch.xpu.set_per_process_memory_fraction (#165510)
guangyey Oct 15, 2025
94eaeb9
[Conv1d] Check overflow before we compute padding size. (#162363)
thenumberouscode Oct 29, 2025
20be077
[Inductor] support masked vectorization for the tail_loop for float64…
jiayisunx Oct 28, 2025
924482a
Replace NUMA inheritance approach (#166026)
pytorchmergebot Oct 29, 2025
5849eea
[vision hash update] update the pinned vision hash (#166356)
pytorchupdatebot Oct 29, 2025
c2e3cc7
[Inductor] No longer throw error in bmm out_dtype lowering due to tem…
PaulZhang12 Oct 28, 2025
1fa520e
[ROCm] Enable group gemm through CK (#166334)
jagadish-amd Oct 29, 2025
0e19561
Add back Windows and macOS to tensorboard tests (#166389)
cyyever Oct 29, 2025
774abb0
[ptd] Fix test config in destroy_pg (#166463)
fduwjj Oct 29, 2025
e8d887a
[user-streams] Support streams as contexts (#164507)
mlazos Oct 29, 2025
23669d0
[user-cuda-streams] Add cuda streams test suite (#162901)
mlazos Oct 29, 2025
c5701d0
[ONNX] Create fake implementations for onnx ops; fix boolean mask in …
justinchuby Oct 29, 2025
bfc2050
[user-streams] Make device-agnostic streams weakref compatible (#164304)
mlazos Oct 29, 2025
cde81e9
[User-streams] Make torch.Event weakref compatible (#164522)
mlazos Oct 29, 2025
17d5aa4
disable jiterator for complex tan and tanh (#165250)
Aminsed Oct 29, 2025
cb69667
Add merge rule for PrivateUse1 Module (#166394)
fffrog Oct 28, 2025
1b655a8
[xpu][test] Enable more UTs for Intel GPU. (#166047)
etaf Oct 29, 2025
96b6184
[BE]: Update nvshmem to 3.4.5 (#164046)
Skylion007 Oct 29, 2025
8b18864
[2/N] Fix unused loop variables (#166500)
cyyever Oct 29, 2025
284716a
[pytree] add `treespec_{leaf,tuple,dict}` functions for args_spec mod…
XuehaiPan Oct 29, 2025
1dd6b76
Revert "[1/N] Remove unused loop variables (#166258)"
pytorchmergebot Oct 29, 2025
5e7272b
Revert "[BE] Move GreenContext implementation details to cpp (#166462)"
pytorchmergebot Oct 29, 2025
4a94591
filter out alloc-free pairs from trace plot (#165752)
waysg Oct 29, 2025
467c21a
`nn.Linear`: nD contiguous input + bias -- dispatch to addmm also whe…
nikitaved Oct 29, 2025
d6d6fa2
Revert "bwd pass (#164504)"
pytorchmergebot Oct 29, 2025
fefb546
Add TORCH_TARGET_VERSION for stable ABI (#164356)
mikaylagawarecki Oct 28, 2025
c0bbda3
Move static from_ivalue/to_ivalue to new shim_common.cpp (#166373)
mikaylagawarecki Oct 28, 2025
8f51556
Add scaffolding for aoti_torch_call_dispatcher BC with native ops (#1…
mikaylagawarecki Oct 28, 2025
eae701c
Add scaffolding for StableIValue FC/BC (no PoC) (#164332)
mikaylagawarecki Oct 28, 2025
5cdbcb5
Revert "[User-streams] Make torch.Event weakref compatible (#164522)"
pytorchmergebot Oct 29, 2025
14102fb
add new line in log (#164240)
laithsakka Sep 30, 2025
c594950
Revert "`nn.Linear`: nD contiguous input + bias -- dispatch to addmm …
pytorchmergebot Oct 29, 2025
5fd1d41
Revert "[user-streams] Make device-agnostic streams weakref compatibl…
pytorchmergebot Oct 29, 2025
398fdd3
[Inductor] Lower fallback nodes annotated with "should_fallback" (#16…
chenmillie Oct 29, 2025
bc5111c
[Inductor] Prevent kernel fusion with too many unique inputs and outp…
andyanwang Oct 27, 2025
35f3572
Revert "[ROCm] Enable group gemm through CK (#166334)"
pytorchmergebot Oct 29, 2025
d7040e6
Revert "[dynamo][guards] 1/N Guard selectively for DTensor (#165824)"
pytorchmergebot Oct 29, 2025
deb7763
[ROCm] Reduce duplication in bfloat16_support_literal definition (#16…
rraminen Oct 29, 2025
a3fe182
 Fix incomplete torch.cdist tests (#166507)
cyyever Oct 29, 2025
fa560e1
[ao][pruning] Replace assert statements with AssertionError exception…
RohitRathore1 Oct 29, 2025
d1a6e00
Fix syntax for pyrefly errors (#166496)
maggiemoss Oct 29, 2025
fc540ce
set pg name based on ranks (#166182)
tushar00jain Oct 29, 2025
b483030
Merge remote-tracking branch 'upstream/main' into rocm7.1_internal_te…
github-actions[bot] Oct 29, 2025
c56fe7d
Fix merge conflicts
pragupta Oct 29, 2025
74d7455
Fix bad merge of triton_heuristics.py
pragupta Oct 30, 2025
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
The table of contents is too big for display.
Diff view
Diff view
  •  
  •  
  •  
The diff you're trying to view is too large. We only load the first 3000 changed files.
4 changes: 4 additions & 0 deletions .ci/aarch64_linux/aarch64_ci_build.sh
Original file line number Diff line number Diff line change
Expand Up @@ -8,13 +8,17 @@ if [[ "$GPU_ARCH_VERSION" == *"12.6"* ]]; then
export TORCH_CUDA_ARCH_LIST="8.0;9.0"
elif [[ "$GPU_ARCH_VERSION" == *"12.8"* ]]; then
export TORCH_CUDA_ARCH_LIST="8.0;9.0;10.0;12.0"
elif [[ "$GPU_ARCH_VERSION" == *"12.9"* ]]; then
export TORCH_CUDA_ARCH_LIST="8.0;9.0;10.0;12.0"
elif [[ "$GPU_ARCH_VERSION" == *"13.0"* ]]; then
export TORCH_CUDA_ARCH_LIST="8.0;9.0;10.0;11.0;12.0+PTX"
fi

# Compress the fatbin with -compress-mode=size for CUDA 13
if [[ "$DESIRED_CUDA" == *"13"* ]]; then
export TORCH_NVCC_FLAGS="-compress-mode=size"
# Bundle ptxas into the cu13 wheel, see https://github.com/pytorch/pytorch/issues/163801
export BUILD_BUNDLE_PTXAS=1
fi

SCRIPTPATH="$( cd -- "$(dirname "$0")" >/dev/null 2>&1 ; pwd -P )"
Expand Down
57 changes: 4 additions & 53 deletions .ci/aarch64_linux/aarch64_wheel_ci_build.py
Original file line number Diff line number Diff line change
Expand Up @@ -13,49 +13,6 @@ def list_dir(path: str) -> list[str]:
return check_output(["ls", "-1", path]).decode().split("\n")


def build_ArmComputeLibrary() -> None:
"""
Using ArmComputeLibrary for aarch64 PyTorch
"""
print("Building Arm Compute Library")
acl_build_flags = [
"debug=0",
"neon=1",
"opencl=0",
"os=linux",
"openmp=1",
"cppthreads=0",
"arch=armv8a",
"multi_isa=1",
"fixed_format_kernels=1",
"build=native",
]
acl_install_dir = "/acl"
acl_checkout_dir = os.getenv("ACL_SOURCE_DIR", "ComputeLibrary")
if os.path.isdir(acl_install_dir):
shutil.rmtree(acl_install_dir)
if not os.path.isdir(acl_checkout_dir) or not len(os.listdir(acl_checkout_dir)):
check_call(
[
"git",
"clone",
"https://github.com/ARM-software/ComputeLibrary.git",
"-b",
"v25.02",
"--depth",
"1",
"--shallow-submodules",
]
)

check_call(
["scons", "Werror=1", f"-j{os.cpu_count()}"] + acl_build_flags,
cwd=acl_checkout_dir,
)
for d in ["arm_compute", "include", "utils", "support", "src", "build"]:
shutil.copytree(f"{acl_checkout_dir}/{d}", f"{acl_install_dir}/{d}")


def replace_tag(filename) -> None:
with open(filename) as f:
lines = f.readlines()
Expand Down Expand Up @@ -356,23 +313,17 @@ def parse_arguments():
build_vars += f"BUILD_TEST=0 PYTORCH_BUILD_VERSION={branch[1 : branch.find('-')]} PYTORCH_BUILD_NUMBER=1 "

if enable_mkldnn:
build_ArmComputeLibrary()
print("build pytorch with mkldnn+acl backend")
build_vars += (
"USE_MKLDNN=ON USE_MKLDNN_ACL=ON "
"ACL_ROOT_DIR=/acl "
"LD_LIBRARY_PATH=/pytorch/build/lib:/acl/build:$LD_LIBRARY_PATH "
"ACL_INCLUDE_DIR=/acl/build "
"ACL_LIBRARY=/acl/build "
)
build_vars += "USE_MKLDNN=ON USE_MKLDNN_ACL=ON "
build_vars += "ACL_ROOT_DIR=/acl "
if enable_cuda:
build_vars += "BLAS=NVPL "
else:
build_vars += "BLAS=OpenBLAS OpenBLAS_HOME=/OpenBLAS "
build_vars += "BLAS=OpenBLAS OpenBLAS_HOME=/opt/OpenBLAS "
else:
print("build pytorch without mkldnn backend")

os.system(f"cd /pytorch; {build_vars} python3 setup.py bdist_wheel")
os.system(f"cd /pytorch; {build_vars} python3 -m build --wheel --no-isolation")
if enable_cuda:
print("Updating Cuda Dependency")
filename = os.listdir("/pytorch/dist/")
Expand Down
60 changes: 15 additions & 45 deletions .ci/aarch64_linux/build_aarch64_wheel.py
Original file line number Diff line number Diff line change
Expand Up @@ -299,40 +299,6 @@ def install_condaforge_python(host: RemoteHost, python_version="3.8") -> None:
)


def build_OpenBLAS(host: RemoteHost, git_clone_flags: str = "") -> None:
print("Building OpenBLAS")
host.run_cmd(
f"git clone https://github.com/xianyi/OpenBLAS -b v0.3.28 {git_clone_flags}"
)
make_flags = "NUM_THREADS=64 USE_OPENMP=1 NO_SHARED=1 DYNAMIC_ARCH=1 TARGET=ARMV8"
host.run_cmd(
f"pushd OpenBLAS && make {make_flags} -j8 && sudo make {make_flags} install && popd && rm -rf OpenBLAS"
)


def build_ArmComputeLibrary(host: RemoteHost, git_clone_flags: str = "") -> None:
print("Building Arm Compute Library")
acl_build_flags = " ".join(
[
"debug=0",
"neon=1",
"opencl=0",
"os=linux",
"openmp=1",
"cppthreads=0",
"arch=armv8a",
"multi_isa=1",
"fixed_format_kernels=1",
"build=native",
]
)
host.run_cmd(
f"git clone https://github.com/ARM-software/ComputeLibrary.git -b v25.02 {git_clone_flags}"
)

host.run_cmd(f"cd ComputeLibrary && scons Werror=1 -j8 {acl_build_flags}")


def embed_libgomp(host: RemoteHost, use_conda, wheel_name) -> None:
host.run_cmd("pip3 install auditwheel")
host.run_cmd(
Expand Down Expand Up @@ -442,7 +408,7 @@ def build_torchvision(
if host.using_docker():
build_vars += " CMAKE_SHARED_LINKER_FLAGS=-Wl,-z,max-page-size=0x10000"

host.run_cmd(f"cd vision && {build_vars} python3 setup.py bdist_wheel")
host.run_cmd(f"cd vision && {build_vars} python3 -m build --wheel --no-isolation")
vision_wheel_name = host.list_dir("vision/dist")[0]
embed_libgomp(host, use_conda, os.path.join("vision", "dist", vision_wheel_name))

Expand Down Expand Up @@ -497,7 +463,7 @@ def build_torchdata(
if host.using_docker():
build_vars += " CMAKE_SHARED_LINKER_FLAGS=-Wl,-z,max-page-size=0x10000"

host.run_cmd(f"cd data && {build_vars} python3 setup.py bdist_wheel")
host.run_cmd(f"cd data && {build_vars} python3 -m build --wheel --no-isolation")
wheel_name = host.list_dir("data/dist")[0]
embed_libgomp(host, use_conda, os.path.join("data", "dist", wheel_name))

Expand Down Expand Up @@ -553,7 +519,7 @@ def build_torchtext(
if host.using_docker():
build_vars += " CMAKE_SHARED_LINKER_FLAGS=-Wl,-z,max-page-size=0x10000"

host.run_cmd(f"cd text && {build_vars} python3 setup.py bdist_wheel")
host.run_cmd(f"cd text && {build_vars} python3 -m build --wheel --no-isolation")
wheel_name = host.list_dir("text/dist")[0]
embed_libgomp(host, use_conda, os.path.join("text", "dist", wheel_name))

Expand Down Expand Up @@ -614,7 +580,7 @@ def build_torchaudio(
host.run_cmd(
f"cd audio && export FFMPEG_ROOT=$(pwd)/third_party/ffmpeg && export USE_FFMPEG=1 \
&& ./packaging/ffmpeg/build.sh \
&& {build_vars} python3 setup.py bdist_wheel"
&& {build_vars} python3 -m build --wheel --no-isolation"
)

wheel_name = host.list_dir("audio/dist")[0]
Expand Down Expand Up @@ -700,7 +666,6 @@ def start_build(
configure_system(
host, compiler=compiler, use_conda=use_conda, python_version=python_version
)
build_OpenBLAS(host, git_clone_flags)

if host.using_docker():
print("Move libgfortant.a into a standard location")
Expand All @@ -723,10 +688,12 @@ def start_build(
f"git clone --recurse-submodules -b {branch} https://github.com/pytorch/pytorch {git_clone_flags}"
)

host.run_cmd("pytorch/.ci/docker/common/install_openblas.sh")

print("Building PyTorch wheel")
build_opts = ""
if pytorch_build_number is not None:
build_opts += f" --build-number {pytorch_build_number}"
build_opts += f" -C--build-option=--build-number={pytorch_build_number}"
# Breakpad build fails on aarch64
build_vars = "USE_BREAKPAD=0 "
if branch == "nightly":
Expand All @@ -743,15 +710,18 @@ def start_build(
if host.using_docker():
build_vars += " CMAKE_SHARED_LINKER_FLAGS=-Wl,-z,max-page-size=0x10000"
if enable_mkldnn:
build_ArmComputeLibrary(host, git_clone_flags)
host.run_cmd("pytorch/.ci/docker/common/install_acl.sh")
print("build pytorch with mkldnn+acl backend")
build_vars += " USE_MKLDNN=ON USE_MKLDNN_ACL=ON"
build_vars += " BLAS=OpenBLAS"
build_vars += " OpenBLAS_HOME=/opt/OpenBLAS"
build_vars += " ACL_ROOT_DIR=/acl"
host.run_cmd(
f"cd $HOME/pytorch && export ACL_ROOT_DIR=$HOME/ComputeLibrary && {build_vars} python3 setup.py bdist_wheel{build_opts}"
f"cd $HOME/pytorch && {build_vars} python3 -m build --wheel --no-isolation{build_opts}"
)
print("Repair the wheel")
pytorch_wheel_name = host.list_dir("pytorch/dist")[0]
ld_library_path = "$HOME/acl/build:$HOME/pytorch/build/lib"
ld_library_path = "/acl/build:$HOME/pytorch/build/lib"
host.run_cmd(
f"export LD_LIBRARY_PATH={ld_library_path} && auditwheel repair $HOME/pytorch/dist/{pytorch_wheel_name}"
)
Expand All @@ -763,7 +733,7 @@ def start_build(
else:
print("build pytorch without mkldnn backend")
host.run_cmd(
f"cd pytorch && {build_vars} python3 setup.py bdist_wheel{build_opts}"
f"cd pytorch && {build_vars} python3 -m build --wheel --no-isolation{build_opts}"
)

print("Deleting build folder")
Expand Down Expand Up @@ -907,7 +877,7 @@ def terminate_instances(instance_type: str) -> None:
def parse_arguments():
from argparse import ArgumentParser

parser = ArgumentParser("Builid and test AARCH64 wheels using EC2")
parser = ArgumentParser("Build and test AARCH64 wheels using EC2")
parser.add_argument("--key-name", type=str)
parser.add_argument("--debug", action="store_true")
parser.add_argument("--build-only", action="store_true")
Expand Down
3 changes: 2 additions & 1 deletion .ci/docker/almalinux/Dockerfile
Original file line number Diff line number Diff line change
Expand Up @@ -69,7 +69,8 @@ RUN bash ./install_cuda.sh 13.0
ENV DESIRED_CUDA=13.0

FROM ${ROCM_IMAGE} as rocm
ENV PYTORCH_ROCM_ARCH="gfx900;gfx906;gfx908;gfx90a;gfx942;gfx1030;gfx1100;gfx1101;gfx1102;gfx1200;gfx1201"
ARG PYTORCH_ROCM_ARCH
ENV PYTORCH_ROCM_ARCH ${PYTORCH_ROCM_ARCH}
ADD ./common/install_mkl.sh install_mkl.sh
RUN bash ./install_mkl.sh && rm install_mkl.sh
ENV MKLROOT /opt/intel
Expand Down
6 changes: 6 additions & 0 deletions .ci/docker/almalinux/build.sh
Original file line number Diff line number Diff line change
Expand Up @@ -36,6 +36,12 @@ case ${DOCKER_TAG_PREFIX} in
;;
rocm*)
BASE_TARGET=rocm
PYTORCH_ROCM_ARCH="gfx900;gfx906;gfx908;gfx90a;gfx942;gfx1030;gfx1100;gfx1101;gfx1102;gfx1200;gfx1201"
# add gfx950, gfx115x conditionally starting in ROCm 7.0
if [[ "$ROCM_VERSION" == *"7.0"* ]]; then
PYTORCH_ROCM_ARCH="${PYTORCH_ROCM_ARCH};gfx950;gfx1150;gfx1151"
fi
EXTRA_BUILD_ARGS="${EXTRA_BUILD_ARGS} --build-arg PYTORCH_ROCM_ARCH=${PYTORCH_ROCM_ARCH}"
;;
*)
echo "ERROR: Unknown docker tag ${DOCKER_TAG_PREFIX}"
Expand Down
34 changes: 8 additions & 26 deletions .ci/docker/build.sh
Original file line number Diff line number Diff line change
Expand Up @@ -88,8 +88,8 @@ fi
_UCX_COMMIT=7836b165abdbe468a2f607e7254011c07d788152
_UCC_COMMIT=430e241bf5d38cbc73fc7a6b89155397232e3f96
if [[ "$image" == *rocm* ]]; then
_UCX_COMMIT=cc312eaa4655c0cc5c2bcd796db938f90563bcf6
_UCC_COMMIT=0c0fc21559835044ab107199e334f7157d6a0d3d
_UCX_COMMIT=29831d319e6be55cb8c768ca61de335c934ca39e
_UCC_COMMIT=9f4b242cbbd8b1462cbc732eb29316cdfa124b77
fi

tag=$(echo $image | awk -F':' '{print $2}')
Expand Down Expand Up @@ -117,6 +117,7 @@ case "$tag" in
UCX_COMMIT=${_UCX_COMMIT}
UCC_COMMIT=${_UCC_COMMIT}
TRITON=yes
INSTALL_MINGW=yes
;;
pytorch-linux-jammy-cuda13.0-cudnn9-py3-gcc11)
CUDA_VERSION=13.0.0
Expand Down Expand Up @@ -179,28 +180,17 @@ case "$tag" in
fi
GCC_VERSION=11
VISION=yes
ROCM_VERSION=6.4
ROCM_VERSION=7.0
NINJA_VERSION=1.9.0
TRITON=yes
KATEX=yes
UCX_COMMIT=${_UCX_COMMIT}
UCC_COMMIT=${_UCC_COMMIT}
PYTORCH_ROCM_ARCH="gfx90a;gfx942;gfx950;gfx1100"
if [[ $tag =~ "benchmarks" ]]; then
INDUCTOR_BENCHMARKS=yes
fi
;;
pytorch-linux-noble-rocm-alpha-py3)
ANACONDA_PYTHON_VERSION=3.12
GCC_VERSION=11
VISION=yes
ROCM_VERSION=7.0
NINJA_VERSION=1.9.0
TRITON=yes
KATEX=yes
UCX_COMMIT=${_UCX_COMMIT}
UCC_COMMIT=${_UCC_COMMIT}
PYTORCH_ROCM_ARCH="gfx90a;gfx942;gfx950"
;;
pytorch-linux-jammy-xpu-n-1-py3)
ANACONDA_PYTHON_VERSION=3.10
GCC_VERSION=11
Expand Down Expand Up @@ -298,7 +288,7 @@ case "$tag" in
;;
*)
# Catch-all for builds that are not hardcoded.
PROTOBUF=yes
PROTOBUF=yes
VISION=yes
echo "image '$image' did not match an existing build configuration"
if [[ "$image" == *py* ]]; then
Expand Down Expand Up @@ -371,7 +361,7 @@ docker build \
--build-arg "NINJA_VERSION=${NINJA_VERSION:-}" \
--build-arg "KATEX=${KATEX:-}" \
--build-arg "ROCM_VERSION=${ROCM_VERSION:-}" \
--build-arg "PYTORCH_ROCM_ARCH=${PYTORCH_ROCM_ARCH:-gfx90a;gfx942}" \
--build-arg "PYTORCH_ROCM_ARCH=${PYTORCH_ROCM_ARCH}" \
--build-arg "IMAGE_NAME=${IMAGE_NAME}" \
--build-arg "UCX_COMMIT=${UCX_COMMIT}" \
--build-arg "UCC_COMMIT=${UCC_COMMIT}" \
Expand All @@ -389,6 +379,7 @@ docker build \
--build-arg "OPENBLAS=${OPENBLAS:-}" \
--build-arg "SKIP_SCCACHE_INSTALL=${SKIP_SCCACHE_INSTALL:-}" \
--build-arg "SKIP_LLVM_SRC_BUILD_INSTALL=${SKIP_LLVM_SRC_BUILD_INSTALL:-}" \
--build-arg "INSTALL_MINGW=${INSTALL_MINGW:-}" \
-f $(dirname ${DOCKERFILE})/Dockerfile \
-t "$tmp_tag" \
"$@" \
Expand Down Expand Up @@ -469,12 +460,3 @@ elif [ "$HAS_TRITON" = "yes" ]; then
echo "expecting triton to not be installed, but it is"
exit 0
fi

# Sanity check cmake version. Executorch reinstalls cmake and I'm not sure if
# they support 4.0.0 yet, so exclude them from this check.
CMAKE_VERSION=$(drun cmake --version)
if [[ "$EXECUTORCH" != *yes* && "$CMAKE_VERSION" != *4.* ]]; then
echo "CMake version is not 4.0.0:"
drun cmake --version
exit 0
fi
2 changes: 1 addition & 1 deletion .ci/docker/ci_commit_pins/executorch.txt
Original file line number Diff line number Diff line change
@@ -1 +1 @@
e0dda9059d082537cee36be6c5e4fe3b18c880c0
deb42f2a8e48f5032b4a98ee781a15fa87a157cf
2 changes: 1 addition & 1 deletion .ci/docker/ci_commit_pins/nccl-cu12.txt
Original file line number Diff line number Diff line change
@@ -1 +1 @@
v2.27.5-1
v2.27.5-1
2 changes: 1 addition & 1 deletion .ci/docker/ci_commit_pins/triton.txt
Original file line number Diff line number Diff line change
@@ -1 +1 @@
d704bc6e69c1a588c8edd3cbb67505d554ed65f6
ac80c4190aa0321f761a08af97e1e1eee41f01d9
27 changes: 19 additions & 8 deletions .ci/docker/common/install_acl.sh
100644 → 100755
Original file line number Diff line number Diff line change
@@ -1,16 +1,27 @@
set -euo pipefail
#!/bin/bash
# Script used only in CD pipeline

readonly version=v25.02
readonly src_host=https://github.com/ARM-software
readonly src_repo=ComputeLibrary
set -eux

# Clone ACL
[[ ! -d ${src_repo} ]] && git clone ${src_host}/${src_repo}.git
cd ${src_repo}
ACL_VERSION=${ACL_VERSION:-"v25.02"}
ACL_INSTALL_DIR="/acl"

git checkout $version
# Clone ACL
git clone https://github.com/ARM-software/ComputeLibrary.git -b "${ACL_VERSION}" --depth 1 --shallow-submodules

ACL_CHECKOUT_DIR="ComputeLibrary"
# Build with scons
pushd $ACL_CHECKOUT_DIR
scons -j8 Werror=0 debug=0 neon=1 opencl=0 embed_kernels=0 \
os=linux arch=armv8a build=native multi_isa=1 \
fixed_format_kernels=1 openmp=1 cppthreads=0
popd

# Install ACL
sudo mkdir -p ${ACL_INSTALL_DIR}
for d in arm_compute include utils support src build
do
sudo cp -r ${ACL_CHECKOUT_DIR}/${d} ${ACL_INSTALL_DIR}/${d}
done

rm -rf $ACL_CHECKOUT_DIR
Loading