Skip to content
Closed
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
1425 commits
Select commit Hold shift + click to select a range
00636e0
[Reland][Inductor] Prune configs that require more shared memory than…
wychi Sep 3, 2025
8875d6e
[vllm hash update] update the pinned vllm hash (#161929)
pytorchupdatebot Sep 3, 2025
e381d4b
[DTensor] forbid view ops to redistribute when local split is impossi…
tianyu-l Sep 2, 2025
50fc22d
[Intel GPU] Fix XPU SDPA default priority_order UT fail (#161690)
LuFinch Sep 3, 2025
f8ffa91
Perf nitpicks on python_arg_parser's is_int_or_symint_list (#161998)
swolchok Sep 2, 2025
2c03f0a
[MPS] enable cat op for sparse (#162007)
Isalia20 Sep 3, 2025
827f0d4
Using get_paths() to get correct installation path for PYTHONPATY (#1…
fffrog Sep 3, 2025
fa1514a
Outline SymInt::maybe_as_int_slow_path (#161466)
swolchok Sep 2, 2025
b0a3e58
Add inline fast paths for SymInt operators (#161586)
swolchok Sep 3, 2025
90b0864
Always build USE_DISTRIBUTED. (#160449)
ezyang Sep 3, 2025
4ae57d4
Make distributed modules importable even when backend not built (#159…
ezyang Sep 3, 2025
b16d3f4
[AOTI] Fix a bug from load_constants (#161887)
hl475 Sep 3, 2025
aed33a8
[Inductor][Tritonparse] Get Inductor kernel params (#161953)
NikhilAPatel Sep 3, 2025
02c83f1
[BLAS] Avoid downcasts for fp16fp16->fp32 BLAS (#161999)
malfet Sep 2, 2025
b40d943
[BE] Cleanup stale comments/copy from `gemm` (#162001)
malfet Sep 2, 2025
0cd6c56
Revert "test: ensure editable cached wrapper is respected (#160943)"
pytorchmergebot Sep 3, 2025
f27985b
Revert "[CUDAGraph] add config to error on skipping cudagraph (#161862)"
pytorchmergebot Sep 3, 2025
bb95028
Revert "[inductor][ez] add hook for heuristics to adjust kernel input…
pytorchmergebot Sep 3, 2025
c157cf6
port distributed tensor parallel test files for Intel GPU (#161261)
wincent8 Sep 3, 2025
9491d28
Support generic dynamic shape with padding (#160997)
nandesuka Sep 3, 2025
451ed93
[inductor] fix split_aot_inductor_output_path on Windows. (#162058)
xuhancn Sep 3, 2025
889f01e
Add CPython test `test_range` (#161799)
guilhermeleobas Sep 1, 2025
eb18d32
Add `range_iterator` (#161800)
guilhermeleobas Sep 1, 2025
d647185
Contiguous subgraph decomposition (#161241)
exclamaforte Sep 3, 2025
71992dd
S390x: build nightly binaries for new pythons (#161920)
AlekseiNikiforovIBM Sep 3, 2025
3559c35
stop suggesting using guard_size_oblivious on data dependent errors (…
laithsakka Sep 2, 2025
f00445b
[inductor][ez] add hook for heuristics to adjust kernel input nodes (…
coconutruben Sep 2, 2025
8076a18
Offload set method execution to CPython when possible (#160763)
guilhermeleobas Sep 3, 2025
62c3f9a
[inductor] Follow integer overflow rules in TypedExpr (#161922)
isuruf Sep 1, 2025
cd529b6
[ROCm] Use MI325 (gfx942) runners for binary smoke testing (#162044)
jithunnair-amd Sep 3, 2025
92576a5
Prototype for building non-strict leak detector (#160456)
tugsbayasgalan Sep 3, 2025
f4c33cd
[pt2e] Avoid getting model device once per node (#159901)
andrewor14 Sep 3, 2025
c465b3d
[2/n][export] Refactor PT2 Archive weight saving and loading (#161520)
yiming0416 Sep 3, 2025
3c0ff1b
[SymmMem] Add root argument to broadcast op (#161090)
kwen2501 Sep 3, 2025
850e138
[hipify] Replace cudaStreamCaptureStatusNone (#161992)
YulunW Sep 3, 2025
8e23a12
[ROCm/Windows] Fix build failures and support some BLAS calls (#161981)
jammm Sep 3, 2025
1aa7476
fix to segmentation fault when empty tensor is passed to choose_qpara…
arkadip-maitra Sep 3, 2025
d1706d9
[Symmetric memory] set handle type for ROCm (#161741)
ngimel Sep 3, 2025
994f2a5
[SymmMem][CI] Make sure group names are consistent (#162035)
kwen2501 Sep 3, 2025
98efc9e
[ROCm] Bump AOTriton to 0.11b (#161754)
xinyazhang Sep 3, 2025
734ce8e
Rename propagate_tensor_meta to make private again (#161744)
azahed98 Sep 3, 2025
abc4471
[PP] Add profiling to schedule execution (#160753)
H-Huang Sep 3, 2025
b1bb98d
[ROCm] TunableOp should use HIP version, not ROCm version (#162067)
jeffdaily Sep 3, 2025
0af70e2
Modify ROCm MI2xx-based workflows to run on cron schedule (#162103)
jithunnair-amd Sep 3, 2025
99f356f
[ROCm] revamp miopen integration (#161687)
jeffdaily Sep 3, 2025
36d207f
[CI] viable strict upgrade: Explicitly name which linux binary wheels…
clee2000 Sep 3, 2025
8ec551b
[aot-compile] strip internal tracebacks for non-verbose graph breaks …
dolpm Sep 3, 2025
a918bba
[inductor] fix test output path 2 (#162085)
xuhancn Sep 4, 2025
5f3cbc9
fixed typo error (#162055)
rohit-kumar-manav Sep 4, 2025
aad96a2
Revert "Contiguous subgraph decomposition (#161241)"
pytorchmergebot Sep 4, 2025
3c45af0
kill allow_complex_guards_as_runtime_asserts (#161794)
avikchaudhuri Sep 4, 2025
9458d1a
[inductor] pdl inductor option (disabled by default) (#160928)
v0i0 Sep 2, 2025
1281470
[DCP][HuggingFace] Add Support for dequantization of SafeTensors chec…
saumishr Sep 4, 2025
8678d83
[dynamo] rename set_fullgraph to error_on_graph_break (#161739)
williamwen42 Sep 3, 2025
fbf3d20
use sym_or instead of any to avoid dde in calc_conv_nd_return_shape (…
laithsakka Sep 3, 2025
6598593
expose number of outputs in native runtime for unified runtime (#161723)
JacobSzwejbka Sep 4, 2025
cec0ff1
[Quant][Inductor][CPU] add qlinear int8-mixed-bf16 patterns (#161486)
jiayisunx Sep 3, 2025
57278d4
[Quant][Inductor][CPU] add qconv int8-mixed-bf16 patterns (#161487)
jiayisunx Sep 3, 2025
1ef7efa
Add `range_equals` (#161801)
guilhermeleobas Sep 3, 2025
485a7bd
Add `range_count` and `range.__contains__` (#161802)
guilhermeleobas Sep 3, 2025
c8255c6
redirect `iter(range)` to `range.__iter__()` (#161803)
guilhermeleobas Sep 3, 2025
d636c18
Fix `range.__getitem__()` (#161804)
guilhermeleobas Sep 3, 2025
8975cda
[pt] strip error messages in profile builds (#162076)
rmaz Sep 4, 2025
dec72ea
[reland] Add inductor provenance mapping for cpp extern kernel (#1616…
yushangdi Sep 4, 2025
302df2a
[vllm hash update] update the pinned vllm hash (#162115)
pytorchupdatebot Sep 4, 2025
66f3b4a
Contiguous subgraph decomposition (#161241)
exclamaforte Sep 4, 2025
480c739
Capture TypeError in `CONTAINS_OP` (#161069)
guilhermeleobas Sep 4, 2025
8906266
[DLPACK] Optimize toDLPack Conversion Speed (#162111)
tqchen Sep 4, 2025
69a25f6
[ROCm] Enable USE_FBGEMM_GENAI (#160676)
cthi Sep 4, 2025
e19e02c
port distributed tensor test files for Intel GPU (#161604)
wincent8 Sep 4, 2025
8fd3c9c
Optimize AMP custom_backend_name error message (#162037)
zeshengzong Sep 4, 2025
c024b1f
[AMD] [Reland] Fix AMD User Defined Kernel Autotune (#161521)
oniononion36 Sep 4, 2025
09587da
Adding missing example of torch.full_like Issue#161899 (#162051)
vishalgoyal316 Sep 4, 2025
d67c29a
[inductor] Fix int64 from MutationOutput Buffer (#162020)
yushangdi Sep 4, 2025
ea1883d
Fixes #154982: add missing to_result_dtype in vector_norm (#155111)
kbabiuchx Sep 4, 2025
acece97
[Intel GPU] Upgrade OneDNN XPU Tag to v3.9.1 (#161932)
LuFinch Sep 4, 2025
9c95772
Replace setup.py develop with pip install -e (#156710)
zklaus Sep 2, 2025
040d00a
[2/N]Port several test files under test/distributed to Intel GPU (#15…
daisyden Sep 4, 2025
34aa782
Revert "Make distributed modules importable even when backend not bui…
pytorchmergebot Sep 4, 2025
e532c9d
Relax tolerance for test_quick_baddbmm_cpu_complex64 (#152424)
Flamefire Sep 4, 2025
b7dad7d
Revert "Always build USE_DISTRIBUTED. (#160449)"
pytorchmergebot Sep 4, 2025
601ae8e
[CUDAGraph] add config to error on skipping cudagraph (#161862)
BoyuanFeng Sep 4, 2025
6b8b3ac
Revert "[ROCm] Use MI325 (gfx942) runners for binary smoke testing (#…
pytorchmergebot Sep 4, 2025
3a20a20
Fix largeTensorTest malfunction on XPU (#161988)
guangyey Sep 3, 2025
cc5bdd1
Keep default `CMAKE_PREFIX_PATH` in test_aot_inductor_package (#161907)
Flamefire Sep 4, 2025
1ebd70d
Fix usage of forwarding references (#161094)
lakshayg Sep 4, 2025
248355f
Don't require FakeStore to be passed into fake backend (#162164)
ezyang Sep 4, 2025
81aeefa
Add torch.compile support for triton.constexpr_function (#162106)
oulgen Sep 3, 2025
019fed3
[ROCm] [CK] Composable Kernel integration for inductor backend (#158747)
iupaikov-amd Sep 4, 2025
43b7c86
Add dependency-groups.dev to pyproject.toml (#161216)
lakshayg Sep 4, 2025
ba7f546
Update torch-xpu-ops commit pin (#162062)
CuiYifeng Sep 4, 2025
f36f285
[dynamo] change error_on_graph_break/fullgraph semantics (#161747)
williamwen42 Sep 3, 2025
0c0e056
[CUDA] Reuse blocks with record_stream during CUDA Graph capture in t…
eee4017 Sep 4, 2025
869cbcc
[SymmMem] Add a helper API to distinguish intra- and inter- node (#16…
kwen2501 Sep 2, 2025
8bb213b
[SymmMem] Increase signal pad size for NVL72 (#162026)
kwen2501 Sep 2, 2025
8a736fa
create torch._grouped_mm fallback path with for loops / bmm (#161407)
vkuzo Sep 3, 2025
61fb632
move `_grouped_mm` fallback to composite explicit autograd (#161717)
vkuzo Sep 3, 2025
9eadb37
enable float32 and float16 in `torch._grouped_mm` fallback (#162059)
vkuzo Sep 4, 2025
3302859
[dynamo] Make the MRO walk more narrow (#162105)
anijain2305 Sep 3, 2025
d1a15ab
export: add explicit decomposition for aten.expand_copy and unit test…
albertw7711 Sep 4, 2025
6f7608d
[cuDNN][SDPA] Enable cuDNN SDPA by default for SM 9.0, SM 10.0 (#162073)
eqy Sep 4, 2025
9480cdc
Modified the docs to add example for torch.is_floating_point and torc…
mansiag05 Sep 4, 2025
6b1900c
[dynamo][hops] Remove const outputs from the speculated subgraph (#16…
anijain2305 Aug 28, 2025
1f51056
[BE]: Update cpp-httplib submodule to 0.26.0 (#162181)
Skylion007 Sep 4, 2025
3dde5d7
[nativert] triton runtime implementation (#161798)
dolpm Sep 4, 2025
c371032
Always build USE_DISTRIBUTED. (#160449)
ezyang Sep 4, 2025
9e5247f
Revert "[MPS] enable cat op for sparse (#162007)"
pytorchmergebot Sep 4, 2025
afa6e56
Revert "[BE] Cleanup stale comments/copy from `gemm` (#162001)"
pytorchmergebot Sep 4, 2025
c3d54de
Revert "[BLAS] Avoid downcasts for fp16fp16->fp32 BLAS (#161999)"
pytorchmergebot Sep 4, 2025
dbec087
Fix Arm64 OSS pytorch build with FBGEMM (#161527)
mcfi Sep 4, 2025
95ee0bf
Revert "[nativert] triton runtime implementation (#161798)"
pytorchmergebot Sep 4, 2025
ef3be67
Make distributed modules importable even when backend not built (#159…
ezyang Sep 4, 2025
a3d72b0
Apply Triton tensor descriptor for flex-decoding for performance (#16…
EikanWang Sep 4, 2025
48bedd7
Revert "Fix usage of forwarding references (#161094)"
pytorchmergebot Sep 4, 2025
d5b3841
Revert "[SymmMem] Add root argument to broadcast op (#161090)"
pytorchmergebot Sep 4, 2025
b9ba612
[ROCm] Enabling several UTs (#161715)
pragupta Sep 4, 2025
9bdcee0
[SymmMem] Add root argument to broadcast op (#161090)
kwen2501 Sep 3, 2025
89d41d3
[SymmMem] Feed tensor.data_ptr instead of handle.buffer_ptr into kern…
kwen2501 Sep 4, 2025
0d71a9d
fix incorrect interaction between DDPOptimizer and donated buffers (#…
bdhirsh Aug 15, 2025
b04e922
Fix memory leak in AOTI when calling `aoti_torch_as_strided` (#162118)
yushangdi Sep 4, 2025
1ec2c15
Revert "Fix Arm64 OSS pytorch build with FBGEMM (#161527)"
pytorchmergebot Sep 4, 2025
0d84ff3
[PGO] log add_extra_remote PGO to tlparse (#161751)
pianpwk Sep 4, 2025
09be189
[export] Fix torch.export.load with storage offset (#162172)
yiming0416 Sep 4, 2025
3a20781
Forward fix for user defined triton kernel grid calc (#162162)
nandesuka Sep 4, 2025
9499c87
[Inductor][Intel GPU] Register triton template heuristic for addmm tm…
etaf Sep 4, 2025
c7e4107
[B200][MXFP8] Fix regex in `test_blockwise_mxfp8_nvfp4_error_messages…
eqy Sep 4, 2025
d2d4c8e
[BLAS] Avoid downcasts for fp16fp16->fp32 BLAS (#161999)
malfet Sep 2, 2025
5c67426
[dynamo] Add support for const prop on .item (#162204)
angelayi Sep 5, 2025
2928086
Add new parameter for gen_pyi.py to make it more configureable. (#161…
0xjeffro Sep 5, 2025
73eb451
[B200][NVFP4] Fix argument passing in `test_blockwise_mxfp8_nvfp4_mxf…
eqy Sep 5, 2025
be5b03d
Allow for using a dedicated binary for the torch subproc pool. (#162093)
c00w Sep 5, 2025
b67c410
[BE] [Inductor] Add Kernel name to all coor-desc tuning (#161409)
njriasan Sep 5, 2025
3bbc2e3
[vllm hash update] update the pinned vllm hash (#162226)
pytorchupdatebot Sep 5, 2025
494878a
[audio hash update] update the pinned audio hash (#162114)
pytorchupdatebot Sep 5, 2025
5da573c
[PGO] handle PGO profile merges (#162097)
pianpwk Sep 5, 2025
5c473e9
[1/N] Port 5 _composable/fsdp distributed test cases to Intel GPU (#1…
zxd1997066 Sep 5, 2025
bffc7dd
[CD] Add cuda 13.0 libtorch builds, remove CUDA 12.9 builds (#161916)
atalman Sep 5, 2025
a714437
[ez][inductor] add a few outer dimension reduction cases for LOAF (#1…
shunting314 Sep 3, 2025
2dd529d
A basic CLAUDE.md based on bad things I see claude code doing (#162163)
ezyang Sep 4, 2025
06da7c0
[DCP][Quantization] Fix for FP8 multiplication during dequantization …
saumishr Sep 5, 2025
f3cebec
Revert "Rename propagate_tensor_meta to make private again (#161744)"
pytorchmergebot Sep 5, 2025
b2c7b9a
[Intel GPU][FlexAttention] Enable TMA path on Intel GPU (#162138)
hoshibara Sep 5, 2025
c2a3024
[cuBLASLt][FP8] `cuBLASLt` appears to support float8 rowwise-scaling …
eqy Sep 5, 2025
9837461
[Intel GPU] Update Intel triton commit pin to Triton 3.5.x (#161777)
etaf Sep 5, 2025
261a84a
[CD][BE] Remove unnecessary checks for XCode version (#162263)
malfet Sep 5, 2025
d711f27
Revert "[ROCm] [CK] Composable Kernel integration for inductor backen…
pytorchmergebot Sep 5, 2025
b18bb67
Add const to stable amax (#162082)
mikaylagawarecki Sep 3, 2025
2ef665a
[inductor][contigous mm] mild refactor (#162075)
coconutruben Sep 5, 2025
9602590
[inductor] move scaled_mm input nodes logic (#161340)
coconutruben Sep 5, 2025
4902c76
[inductor][ez] add template/externchoice uid (#161341)
coconutruben Sep 5, 2025
af590cb
[inductor][aten] treat like a template in GEMMs (#161342)
coconutruben Sep 5, 2025
a301dc3
[inductor][ez] pass template rather than template.uid (#161343)
coconutruben Sep 5, 2025
031d79c
[inductor] move max-autotune logic inside V.choices.get_mm_configs (#…
coconutruben Sep 5, 2025
d63ad53
[inductor][ez] return choicecallers directly (#161345)
coconutruben Sep 5, 2025
e02e9ed
[inductor] V.choice.get_mm_configs takes a stack of templates (#161346)
coconutruben Sep 5, 2025
9a8d454
[inductor] add kernel template choice (ktc) (#161347)
coconutruben Sep 5, 2025
c321111
[inductor][ez] V.choices.get_mm_configs returns list of ChoiceCallers…
coconutruben Sep 5, 2025
88d94d1
Add torch.Tensor._make_dtensor to accelerate DTensor.__new__ further …
swolchok Sep 4, 2025
70f865a
Revert "Make distributed modules importable even when backend not bui…
pytorchmergebot Sep 5, 2025
adae7f6
Revert "Always build USE_DISTRIBUTED. (#160449)"
pytorchmergebot Sep 5, 2025
3771380
[ONNX] Hide draft export under a flag (#162225)
justinchuby Sep 5, 2025
a3c7f77
[EZ][CD] Update MacOS deployment platform to 11.0 (#162264)
malfet Sep 5, 2025
6087ef4
[BE] Cleanup stale comments/copy from `gemm` (#162001)
malfet Sep 2, 2025
de893e9
Always build USE_DISTRIBUTED. (#160449)
ezyang Sep 4, 2025
01edcd4
Make distributed modules importable even when backend not built (#159…
ezyang Sep 4, 2025
2fa0520
[BE][pytree] cleanup parameterized pytree tests (#160842)
XuehaiPan Sep 5, 2025
92a4302
[cutlass backend] Add FP8 tests for multiple linears (#160782)
henrylhtsang Sep 4, 2025
771f369
[Inductor] Improve RoPE (#161420)
BoyuanFeng Sep 5, 2025
c10195e
[C10d][Gloo] Enable complex datatype support in ProcessGroupGloo (#15…
shunzhiwen Sep 5, 2025
a00cdc1
[CD][BE] Get rid of SETUPTOOLS and PYYAML extra pins (#162266)
malfet Sep 5, 2025
70d36e0
Making batching rule for F.embedding DTensor-aware (#162117)
zou3519 Sep 4, 2025
79fcd52
symbolic cpp channels_last_contiguous (#160402)
laithsakka Sep 5, 2025
01ab325
[DCP][Quantization] Fix the issue when scale vector is in a different…
saumishr Sep 5, 2025
e0a62b2
[aot-precompile] default-filter global guards (#162090)
dolpm Sep 5, 2025
8d50355
[CD][EZ] Update libtorch python version to 3.10 (#162297)
malfet Sep 5, 2025
9c03d6b
[CD][BE] Delete Python-3.9 case (#162265)
malfet Sep 5, 2025
4d4abec
allow user to pass in custom partitioner function (#157580)
xuanzhang816 Sep 5, 2025
486b20b
Add return-max-scores to flex-attention (#161667)
drisspg Sep 5, 2025
081cab0
Resize to 0 if not going to be used (#161730)
drisspg Sep 5, 2025
1463714
[dynamo] Graph break on on user-defined class in compiled region (#16…
rtimpe Sep 4, 2025
4f72d93
re-land triton runtime implementation" (#162217)
dolpm Sep 6, 2025
0f45aaf
Disable autocast when running joint graph passes (#162304)
yf225 Sep 6, 2025
7f4ff79
remove deprecated vllm test (#162306)
yangw-dev Sep 6, 2025
291cd11
[inductor] estimate peak memory in codegen only when buffer reuse (#1…
ruisizhang123 Sep 6, 2025
145a3a7
[CUDA 13][cuDNN] Bump CUDA 13 to cuDNN 9.13.0 (#162268)
eqy Sep 6, 2025
c3ceca2
codebase structure documentation to include torchgen (#162261)
Raman-RH Sep 6, 2025
20629b1
Add contiguous subgraph transformation threshold (#162192)
exclamaforte Sep 6, 2025
b2b4add
Docs on export joint with descriptors (#159006)
ezyang Aug 12, 2025
c0983e6
[Graph Partition] interface for custom cg wrapper (#162207)
BoyuanFeng Sep 6, 2025
a3e5466
Revert "Resize to 0 if not going to be used (#161730)"
pytorchmergebot Sep 6, 2025
da4db4b
Fix `DeviceMesh._flatten` docstring example (#162277)
mariosasko Sep 6, 2025
20b47ac
[fx] fix qualified name for methods of torch.Tensor (#162224)
isuruf Sep 4, 2025
aac1a50
Add api info for torch._C._nn.pyi (#162148)
orangeH25 Sep 6, 2025
bc50597
torch.zeros bound checks for symint (#161976)
morrison-turnansky Sep 6, 2025
c98ddac
Fixed comment to match logic in distributed_c10d.py (#162158)
Codeboi007 Sep 6, 2025
28f4ab0
Add -Wno-ctad-maybe-unsupported compiler flag (#162223)
0xjeffro Sep 6, 2025
0ff8eab
Revert "[dynamo] Graph break on on user-defined class in compiled reg…
pytorchmergebot Sep 6, 2025
9aedb3c
[AOTI-FX] Support registering custom FX backends (#162317)
blaine-rister Sep 6, 2025
5985e28
[CUDA 13][cuDNN][Windows] Roll back cuDNN upgrade from 9.13 to 9.12 o…
eqy Sep 6, 2025
b6d0a9e
MXFP8 grouped GEMM support for torch._scaled_grouped_mm + submodule b…
danielvegamyhre Sep 6, 2025
ae0edc1
[3/N] Enable 6 fsdp test on Intel GPU (#161601)
daisyden Sep 6, 2025
047603d
New export implementation with flat inp/out (#162167)
tugsbayasgalan Sep 4, 2025
541aa23
[inductor] fix TemplateBuffer.extract_read_writes (#162221)
shunting314 Sep 6, 2025
1a588ac
[inductor] rename deps during refreshing (#162303)
shunting314 Sep 6, 2025
5927a70
NLLLoss: validate target is 0D when input is 1D (#161412)
mansiag05 Sep 6, 2025
48e3be3
[while_loop][autograd] add hop while_loop_stack_output (#160467)
ydwu4 Sep 5, 2025
2b8a839
[while_loop][autograd] support autograd_key of while_loop (#160483)
ydwu4 Sep 5, 2025
5211f1f
[export] Move example inputs in move_to_device_pass (#162301)
yiming0416 Sep 6, 2025
e3068cd
[dynamo] Use relaxed CLOSURE_MATCH guard then ID_MATCH (#162247)
anijain2305 Sep 5, 2025
b919560
[nativert] AOTI lowering and packaging as NativeRT delegate (#162285)
yiming0416 Sep 7, 2025
2a45837
[inductor] fuse for scalar shared data (#162311)
shunting314 Sep 6, 2025
fea2077
[vllm hash update] update the pinned vllm hash (#162314)
pytorchupdatebot Sep 7, 2025
eac3d6f
Revert "[inductor] fuse for scalar shared data (#162311)"
pytorchmergebot Sep 7, 2025
104f268
Revert "Add return-max-scores to flex-attention (#161667)"
pytorchmergebot Sep 7, 2025
93fb23d
Build vLLM nightly wheels (#162000)
huydhn Sep 7, 2025
ada43ed
Revert "[inductor] pdl inductor option (disabled by default) (#160928)"
pytorchmergebot Sep 7, 2025
7a83cf4
Revert " [while_loop][autograd] support autograd_key of while_loop (#…
pytorchmergebot Sep 7, 2025
9ad5e8e
Improve typing of ONNX decorators with ParamSpec (#162332)
Vinayak-Pawar Sep 7, 2025
4348db0
Revert "[inductor][ez] V.choices.get_mm_configs returns list of Choic…
pytorchmergebot Sep 7, 2025
093ab5f
Revert "[inductor] add kernel template choice (ktc) (#161347)"
pytorchmergebot Sep 7, 2025
df59c21
Revert "[BE] Cleanup stale comments/copy from `gemm` (#162001)"
pytorchmergebot Sep 7, 2025
e246a85
Revert "[1/N] Port 5 _composable/fsdp distributed test cases to Intel…
pytorchmergebot Sep 7, 2025
8235c4f
Revert "[ROCm] Enabling several UTs (#161715)"
pytorchmergebot Sep 7, 2025
ff2de5d
Revert "[2/N]Port several test files under test/distributed to Intel …
pytorchmergebot Sep 7, 2025
ec2e368
[while_loop][autograd] support autograd_key of while_loop (#160483)
ydwu4 Sep 7, 2025
eb9073a
[easy] [precompile] Convert CompileArtifacts to callable (#162169)
jamesjwu Sep 7, 2025
5babb4d
Add BundledAOTAutogradSerializableCallable (#162170)
jamesjwu Sep 7, 2025
103f725
[associative_scan] Autograd separated (#139939)
bohnstingl Sep 8, 2025
c9ac8c2
[audio hash update] update the pinned audio hash (#162315)
pytorchupdatebot Sep 8, 2025
29e09a6
Revert "Make distributed modules importable even when backend not bui…
pytorchmergebot Sep 8, 2025
1e0656f
Revert "Always build USE_DISTRIBUTED. (#160449)"
pytorchmergebot Sep 8, 2025
fb0afa8
[inductor][triton] more JITCallable._hash_lock support (#162244)
davidberard98 Sep 5, 2025
31d5c67
[inductor][triton] support static cuda launcher after triton # 7866 (…
davidberard98 Sep 5, 2025
5b90e85
[AsyncTP] Fixes AsyncMM (#162040)
fegin Sep 8, 2025
32911ff
[xla hash update] update the pinned xla hash (#162372)
pytorchupdatebot Sep 8, 2025
e101411
Update slow tests (#161395)
pytorchupdatebot Sep 8, 2025
3f59933
[upstream triton] update triton pin to triton 3.5 (#162278)
davidberard98 Sep 5, 2025
25c170b
[inductor] Runtime estimations: use nccl estimator; mm only benchmark…
IvanKobzarev Sep 8, 2025
53297f6
Revert "[audio hash update] update the pinned audio hash (#162315)"
pytorchmergebot Sep 8, 2025
a92773e
Revert "Use vectorized stores for all dtypes in cat (#161649)"
pytorchmergebot Sep 8, 2025
f044fa2
[AsyncTP] Use assertEqual instead of allClose for bf16 tests (#162041)
fegin Sep 8, 2025
8e076d8
Don't call check_has_torch_dispatch in THPVariable_NewWithVar if we a…
swolchok Sep 6, 2025
49c446c
Add C++ function for torch.distributed.tensor._op_schema.is_view_op (…
swolchok Sep 6, 2025
5793dd7
[Intel GPU] Integrate OneDNN SDPA training forward and backward (#161…
LuFinch Sep 8, 2025
ebd29a1
[inductor] fuse for scalar shared data (#162311)
shunting314 Sep 8, 2025
72e6717
Avoid crash with release_available_cached_blocks (#162269)
morrison-turnansky Sep 8, 2025
de5dc1f
[cuDNN][SDPA][Nested Tensor] add forward/backward caching support for…
eqy Sep 8, 2025
bc4176c
CD Windows CUDA 13.0 build - fix packaging of cuda dlls (#162383)
atalman Sep 8, 2025
2b05fbd
Merge remote-tracking branch 'upstream/main' into rocm7.1_internal_te…
github-actions[bot] Sep 8, 2025
b88024c
Fix merge conflicts
pragupta Sep 9, 2025
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
The table of contents is too big for display.
Diff view
Diff view
  •  
  •  
  •  
15 changes: 15 additions & 0 deletions .bc-linter.yml
Original file line number Diff line number Diff line change
@@ -0,0 +1,15 @@
version: 1
paths:
include:
- "**/*.py"
exclude:
- ".*"
- ".*/**"
- "**/.*/**"
- "**/.*"
- "**/_*/**"
- "**/_*.py"
- "**/test/**"
- "**/benchmarks/**"
- "**/test_*.py"
- "**/*_test.py"
9 changes: 9 additions & 0 deletions .ci/aarch64_linux/aarch64_ci_build.sh
Original file line number Diff line number Diff line change
Expand Up @@ -7,6 +7,15 @@ if [[ "$GPU_ARCH_VERSION" == *"12.9"* ]]; then
export TORCH_CUDA_ARCH_LIST="8.0;9.0;10.0;12.0"
fi

if [[ "$GPU_ARCH_VERSION" == *"13.0"* ]]; then
export TORCH_CUDA_ARCH_LIST="8.0;9.0;10.0;11.0;12.0"
fi

# Compress the fatbin with -compress-mode=size for CUDA 13
if [[ "$DESIRED_CUDA" == *"13"* ]]; then
export TORCH_NVCC_FLAGS="-compress-mode=size"
fi

SCRIPTPATH="$( cd -- "$(dirname "$0")" >/dev/null 2>&1 ; pwd -P )"
source $SCRIPTPATH/aarch64_ci_setup.sh

Expand Down
70 changes: 46 additions & 24 deletions .ci/aarch64_linux/aarch64_wheel_ci_build.py
Original file line number Diff line number Diff line change
Expand Up @@ -77,44 +77,66 @@ def package_cuda_wheel(wheel_path, desired_cuda) -> None:
wheelname = os.path.basename(wheel_path)
os.mkdir(f"{folder}/tmp")
os.system(f"unzip {wheel_path} -d {folder}/tmp")
libs_to_copy = [
"/usr/local/cuda/extras/CUPTI/lib64/libcupti.so.12",
# Common libraries for all CUDA versions
common_libs = [
# Non-NVIDIA system libraries
"/lib64/libgomp.so.1",
"/usr/lib64/libgfortran.so.5",
"/acl/build/libarm_compute.so",
"/acl/build/libarm_compute_graph.so",
# Common CUDA libraries (same for all versions)
"/usr/local/lib/libnvpl_lapack_lp64_gomp.so.0",
"/usr/local/lib/libnvpl_blas_lp64_gomp.so.0",
"/usr/local/lib/libnvpl_lapack_core.so.0",
"/usr/local/lib/libnvpl_blas_core.so.0",
"/usr/local/cuda/extras/CUPTI/lib64/libnvperf_host.so",
"/usr/local/cuda/lib64/libcudnn.so.9",
"/usr/local/cuda/lib64/libcublas.so.12",
"/usr/local/cuda/lib64/libcublasLt.so.12",
"/usr/local/cuda/lib64/libcudart.so.12",
"/usr/local/cuda/lib64/libcufft.so.11",
"/usr/local/cuda/lib64/libcusparse.so.12",
"/usr/local/cuda/lib64/libcusparseLt.so.0",
"/usr/local/cuda/lib64/libcusolver.so.11",
"/usr/local/cuda/lib64/libcurand.so.10",
"/usr/local/cuda/lib64/libnccl.so.2",
"/usr/local/cuda/lib64/libnvJitLink.so.12",
"/usr/local/cuda/lib64/libnvrtc.so.12",
"/usr/local/cuda/lib64/libnvshmem_host.so.3",
"/usr/local/cuda/lib64/libcudnn_adv.so.9",
"/usr/local/cuda/lib64/libcudnn_cnn.so.9",
"/usr/local/cuda/lib64/libcudnn_graph.so.9",
"/usr/local/cuda/lib64/libcudnn_ops.so.9",
"/usr/local/cuda/lib64/libcudnn_engines_runtime_compiled.so.9",
"/usr/local/cuda/lib64/libcudnn_engines_precompiled.so.9",
"/usr/local/cuda/lib64/libcudnn_heuristic.so.9",
"/lib64/libgomp.so.1",
"/usr/lib64/libgfortran.so.5",
"/acl/build/libarm_compute.so",
"/acl/build/libarm_compute_graph.so",
"/usr/local/lib/libnvpl_lapack_lp64_gomp.so.0",
"/usr/local/lib/libnvpl_blas_lp64_gomp.so.0",
"/usr/local/lib/libnvpl_lapack_core.so.0",
"/usr/local/lib/libnvpl_blas_core.so.0",
"/usr/local/cuda/lib64/libcufile.so.0",
"/usr/local/cuda/lib64/libcufile_rdma.so.1",
"/usr/local/cuda/lib64/libcusparse.so.12",
]

if "129" in desired_cuda:
libs_to_copy += [
"/usr/local/cuda/lib64/libnvrtc-builtins.so.12.9",
"/usr/local/cuda/lib64/libcufile.so.0",
"/usr/local/cuda/lib64/libcufile_rdma.so.1",
# CUDA version-specific libraries
if "130" in desired_cuda:
version_specific_libs = [
"/usr/local/cuda/extras/CUPTI/lib64/libcupti.so.13",
"/usr/local/cuda/lib64/libcublas.so.13",
"/usr/local/cuda/lib64/libcublasLt.so.13",
"/usr/local/cuda/lib64/libcudart.so.13",
"/usr/local/cuda/lib64/libcufft.so.12",
"/usr/local/cuda/lib64/libcusolver.so.12",
"/usr/local/cuda/lib64/libnvJitLink.so.13",
"/usr/local/cuda/lib64/libnvrtc.so.13",
"/usr/local/cuda/lib64/libnvrtc-builtins.so.13.0",
]
elif "12" in desired_cuda:
# Get the last character for libnvrtc-builtins version (e.g., "129" -> "9")
minor_version = desired_cuda[-1]
version_specific_libs = [
"/usr/local/cuda/extras/CUPTI/lib64/libcupti.so.12",
"/usr/local/cuda/lib64/libcublas.so.12",
"/usr/local/cuda/lib64/libcublasLt.so.12",
"/usr/local/cuda/lib64/libcudart.so.12",
"/usr/local/cuda/lib64/libcufft.so.11",
"/usr/local/cuda/lib64/libcusolver.so.11",
"/usr/local/cuda/lib64/libnvJitLink.so.12",
"/usr/local/cuda/lib64/libnvrtc.so.12",
f"/usr/local/cuda/lib64/libnvrtc-builtins.so.12.{minor_version}",
]

# Combine all libraries
libs_to_copy = common_libs + version_specific_libs

# Copy libraries to unzipped_folder/a/lib
for lib_path in libs_to_copy:
Expand Down Expand Up @@ -208,7 +230,7 @@ def parse_arguments():
build_vars = "CMAKE_SHARED_LINKER_FLAGS=-Wl,-z,max-page-size=0x10000 "
# MAX_JOB=5 is not required for CPU backend (see commit 465d98b)
if enable_cuda:
build_vars = "MAX_JOBS=5 " + build_vars
build_vars += "MAX_JOBS=5 "

override_package_version = os.getenv("OVERRIDE_PACKAGE_VERSION")
desired_cuda = os.getenv("DESIRED_CUDA")
Expand Down
16 changes: 4 additions & 12 deletions .ci/aarch64_linux/build_aarch64_wheel.py
Original file line number Diff line number Diff line change
Expand Up @@ -438,9 +438,7 @@ def build_torchvision(
)
build_vars += f"BUILD_VERSION={version}.dev{build_date}"
elif build_version is not None:
build_vars += (
f"BUILD_VERSION={build_version} PYTORCH_VERSION={branch[1:].split('-')[0]}"
)
build_vars += f"BUILD_VERSION={build_version} PYTORCH_VERSION={branch[1:].split('-', maxsplit=1)[0]}"
if host.using_docker():
build_vars += " CMAKE_SHARED_LINKER_FLAGS=-Wl,-z,max-page-size=0x10000"

Expand Down Expand Up @@ -495,9 +493,7 @@ def build_torchdata(
)
build_vars += f"BUILD_VERSION={version}.dev{build_date}"
elif build_version is not None:
build_vars += (
f"BUILD_VERSION={build_version} PYTORCH_VERSION={branch[1:].split('-')[0]}"
)
build_vars += f"BUILD_VERSION={build_version} PYTORCH_VERSION={branch[1:].split('-', maxsplit=1)[0]}"
if host.using_docker():
build_vars += " CMAKE_SHARED_LINKER_FLAGS=-Wl,-z,max-page-size=0x10000"

Expand Down Expand Up @@ -553,9 +549,7 @@ def build_torchtext(
)
build_vars += f"BUILD_VERSION={version}.dev{build_date}"
elif build_version is not None:
build_vars += (
f"BUILD_VERSION={build_version} PYTORCH_VERSION={branch[1:].split('-')[0]}"
)
build_vars += f"BUILD_VERSION={build_version} PYTORCH_VERSION={branch[1:].split('-', maxsplit=1)[0]}"
if host.using_docker():
build_vars += " CMAKE_SHARED_LINKER_FLAGS=-Wl,-z,max-page-size=0x10000"

Expand Down Expand Up @@ -613,9 +607,7 @@ def build_torchaudio(
)
build_vars += f"BUILD_VERSION={version}.dev{build_date}"
elif build_version is not None:
build_vars += (
f"BUILD_VERSION={build_version} PYTORCH_VERSION={branch[1:].split('-')[0]}"
)
build_vars += f"BUILD_VERSION={build_version} PYTORCH_VERSION={branch[1:].split('-', maxsplit=1)[0]}"
if host.using_docker():
build_vars += " CMAKE_SHARED_LINKER_FLAGS=-Wl,-z,max-page-size=0x10000"

Expand Down
4 changes: 2 additions & 2 deletions .ci/docker/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -120,8 +120,8 @@ If your new Docker image needs a library installed from a specific pinned commit
If you're introducing a new argument to the Docker build, make sure to add it in the Docker build step in `.ci/docker/build.sh`:
```bash
docker build \
....
--build-arg "NEW_ARG_1=${NEW_ARG_1}"
....
--build-arg "NEW_ARG_1=${NEW_ARG_1}"
```

3. **Update Dockerfile logic**:
Expand Down
6 changes: 5 additions & 1 deletion .ci/docker/almalinux/Dockerfile
Original file line number Diff line number Diff line change
Expand Up @@ -64,6 +64,10 @@ FROM cuda as cuda12.9
RUN bash ./install_cuda.sh 12.9
ENV DESIRED_CUDA=12.9

FROM cuda as cuda13.0
RUN bash ./install_cuda.sh 13.0
ENV DESIRED_CUDA=13.0

FROM ${ROCM_IMAGE} as rocm
ENV PYTORCH_ROCM_ARCH="gfx900;gfx906;gfx908;gfx90a;gfx942;gfx1030;gfx1100;gfx1101;gfx1102;gfx1200;gfx1201"
ADD ./common/install_mkl.sh install_mkl.sh
Expand All @@ -76,10 +80,10 @@ ADD ./common/install_mnist.sh install_mnist.sh
RUN bash ./install_mnist.sh

FROM base as all_cuda
COPY --from=cuda11.8 /usr/local/cuda-11.8 /usr/local/cuda-11.8
COPY --from=cuda12.6 /usr/local/cuda-12.6 /usr/local/cuda-12.6
COPY --from=cuda12.8 /usr/local/cuda-12.8 /usr/local/cuda-12.8
COPY --from=cuda12.9 /usr/local/cuda-12.9 /usr/local/cuda-12.9
COPY --from=cuda13.0 /usr/local/cuda-13.0 /usr/local/cuda-13.0

# Final step
FROM ${BASE_TARGET} as final
Expand Down
Loading