Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
694 commits
Select commit Hold shift + click to select a range
76a841f
Port OpSchema.__post_init__ and OpSchema._recompute_comparison_key to…
swolchok Sep 18, 2025
46c647d
[vllm hash update] update the pinned vllm hash (#163304)
pytorchupdatebot Sep 19, 2025
3016616
[BE] Update Python min version to 3.10 (#162310)
malfet Sep 19, 2025
c91f59b
Fix performance regression when indexing by Numpy arrays (#163280)
ezyang Sep 18, 2025
ce5637b
Fix invalid indices bug for max_unpool2d/3d on MPS (#163036)
can-gaa-hou Sep 19, 2025
5780478
Revert "[BE] Update Python min version to 3.10 (#162310)"
pytorchmergebot Sep 19, 2025
1708120
Revert "[CI] Move Windows build/tests to Python-3.10 (#162862)"
pytorchmergebot Sep 19, 2025
e0bcd58
[MTIA] Add MTIA dispatch for kernel foreach_maximum(Add D80022242 bac…
DoubleBiao Sep 19, 2025
1302637
Revert "[dynamo][guards] Do not construct entire framelocals dict for…
pytorchmergebot Sep 19, 2025
32ad29b
Revert "[dynamo][guards] Fail on an unknown framelocals to dict conve…
pytorchmergebot Sep 19, 2025
0815091
[CP][BE] Cosmetic refactors for CP code base (#163115)
fegin Sep 18, 2025
ab5086a
[WOQ] Add XPU kernel for _weight_int8pack_mm (#160938)
xiaowangintel Sep 19, 2025
33e6c5a
[Dependabot] Update(deps): Bump transformers from 4.54.0 to 4.56.0 in…
dependabot[bot] Sep 19, 2025
bee362c
[ROCm][SymmMem] Fix skip condition for PLATFORM_SUPPORTS_SYMM_MEM (#1…
pragupta Sep 19, 2025
264e7f6
[ROCm] Fix mx fp8 and fp4 code after scaling refactor changes. (#163127)
jagadish-amd Sep 19, 2025
f8f230a
[FP8][cuBLAS][H100] only test fp32 outputs for rowwise `_scaled_mm` o…
eqy Sep 19, 2025
e631d76
[Flex] Changing how bwd configs are setup and updating default b200 c…
drisspg Sep 19, 2025
4967ad8
[Graph Partition] improve custom op output alias (#163227)
BoyuanFeng Sep 19, 2025
3e663ce
[Inductor][Triton][FP8] Add a Blackwell-specific scaled persistent + …
jananisriram Sep 19, 2025
2984bfe
[ez][CI] Run vllm workflow on vllm pin updates (#163353)
clee2000 Sep 19, 2025
a3b68c7
Revert "Fix boxcox to return same result for same input in one batch …
pytorchmergebot Sep 19, 2025
607469b
Revert "[ROCm] Bump FBGEMM commit to avoid CK errors (#162590)"
pytorchmergebot Sep 19, 2025
a0d2d84
Handling overflow for long int overflow for the product of kernel_hei…
arkadip-maitra Sep 19, 2025
b8c5ec5
[CD] Simplify NVIDIA driver installation step (#163349)
malfet Sep 19, 2025
52dd7a8
Move ROCM trunk wheel builds to 3.10 (#163339)
malfet Sep 19, 2025
03f34fd
Add explicit typing to nn.Module.__init__() parameters (#157389)
dsashidh Sep 19, 2025
bc7b17a
Realize LazyVariableTracker before raising exception (#163350)
guilhermeleobas Sep 19, 2025
979e10f
[Bugfix] Match eager stride semantics for cloned tensors with preserv…
Lucaskabela Sep 19, 2025
a273475
[BE] Introduce `CONDA_ROOT_DIR` (#163341)
malfet Sep 19, 2025
4a160da
[CUDA] revert PR 130472 (#162950)
thenumberouscode Sep 19, 2025
2a308c7
Revert "Improve device info with new flops and bandwidth formula base…
pytorchmergebot Sep 19, 2025
f8fb437
[SymmMem] Barrier on team instead of world (#163298)
kwen2501 Sep 18, 2025
7130b17
[SymmMem] Fix memory allocation hold-up (#162680)
kwen2501 Sep 18, 2025
ba3c2c8
SDP Backend function fix (#161169)
ahkush Sep 19, 2025
466122b
[inductor] avoid creating LoopBody twice (#162101)
shunting314 Sep 11, 2025
e88460f
[Inductor] don't call sympy_str when not needed (#162126)
shunting314 Sep 11, 2025
248156e
[Inductor] do loop reordering in a separate final round (#162355)
shunting314 Sep 11, 2025
df9a482
Bugfix for doing negative padding (#161639)
skpark-rh Sep 19, 2025
9f8a311
[Inductor][Intel GPU] Save `threads_per_warp` from tirton compiled ke…
etaf Sep 19, 2025
fab8455
Don't use declarations in global namespace in stable headers (#163352)
mikaylagawarecki Sep 19, 2025
e6a9db5
Add analytics ID to cpp docs (#163370)
svekars Sep 19, 2025
9b5ec0f
Use computed buffer sizes of torch for cusparseLt metadata (#163125)
aartbik Sep 19, 2025
0098e56
[CI] Move Windows build/tests to Python-3.10 (#162862)
malfet Sep 19, 2025
ee7bdd8
[graph partition] Add way to register custom rule (#163310)
zou3519 Sep 19, 2025
093f064
[CP][BE] Correct an incorrect docstring (#163131)
fegin Sep 18, 2025
8225a26
[dynamo] Fix issue with namedtuple slicing (#163351)
jansel Sep 19, 2025
bfe9e60
Simplify PrecompileContext to no longer be a CacheArtifactManager (#1…
jamesjwu Sep 20, 2025
a1df0b4
Lazy import to avoid circular import issue for DebugMode (#163381)
SherlockNoMad Sep 20, 2025
a31acf3
Clean up obsoleted vLLM tests (#163383)
huydhn Sep 20, 2025
e56dd5d
[Inductor-FX] Support torch.cond (#163234)
blaine-rister Sep 20, 2025
a87aea0
Update RandomSampler docstring. data_source must be Sized not Dataset…
dsashidh Sep 20, 2025
0b5a99b
remove duplicate import for defaultdict (#160519)
parsshar-RH Sep 20, 2025
df5d6d5
[inductor][triton heuristics] move allow tf32 out of config params (#…
coconutruben Sep 20, 2025
0ee331b
[inductor][choices] move extra kwargs out of get_template_configs (#1…
coconutruben Sep 20, 2025
d55c9d5
[CP] Fix cuDNN CP LSE dimension bug (#163231)
fegin Sep 18, 2025
5050cfa
[Opitmus] fix fp8 activation quatization for duplicates forward outpu…
mengluy0125 Sep 20, 2025
eb11d17
[Caffe2] Improve SVE batch box cox by 2% (#163360)
Nicoshev Sep 20, 2025
f9074c7
[STABLE ABI] Add copy_ operation. (#161895)
pearu Sep 19, 2025
d70c0ba
minimize graph capture output (#162211)
avikchaudhuri Sep 20, 2025
3938175
[1/n] Support cpu_tensor.to("cuda:0") in FakeTensorMode on cuda-less …
SherlockNoMad Sep 20, 2025
9e3725e
make fullgraph_capture work on mod, args, kwargs (#162849)
avikchaudhuri Sep 20, 2025
8e3fd3d
[AI Codemod][DevmatePerfOptimizationVectorReallocation] fbcode/caffe2…
yfeldblum Sep 20, 2025
e37b600
[CUDA][cuBLAS][FP8] Forward-fix #162022 (#163354)
eqy Sep 21, 2025
2887f3f
[BE] Slight improvements to documentation in python_dispatch (#162963)
ezyang Sep 19, 2025
97eb7a2
torchdim Python port (#160236)
ezyang Sep 20, 2025
5b386ee
[vllm hash update] update the pinned vllm hash (#163392)
pytorchupdatebot Sep 21, 2025
1ca9445
[BE][Ez]: Prevent copies of std::vector in CUDA ForeachOps (#163416)
Skylion007 Sep 21, 2025
f591bb5
Remove data_source argument from Sampler (#163134)
cyyever Sep 21, 2025
4a96a6f
[Docs] Fix indentations in cond.md (#156147)
windsonsea Sep 21, 2025
1faf636
Delete functorch C extension entirely. (#163340)
ezyang Sep 21, 2025
9ba9180
Add api info for torch._C._nn.pyi (#162707)
orangeH25 Sep 21, 2025
d8cbbc0
[Easy][AMP] Refactor the AMP logic for getting dtype (#162796)
fffrog Sep 12, 2025
5d8a226
[SymmMem] Promote `@requires_nvshmem` instead of `enable_triton` (#16…
kwen2501 Sep 21, 2025
f34744d
[inductor] bugfix: keep WeakDeps (WAR deps) during fusion (#162316)
v0i0 Sep 19, 2025
51152ef
Remove autograd code for Python < 3.9 (#163313)
cyyever Sep 21, 2025
5599f48
Fully native DTensor.__new__ (#162508)
swolchok Sep 18, 2025
4d3d32f
Add torchfuzz initial impl. (#163417)
laithsakka Sep 20, 2025
8b14f43
[torch] DRY a couple of lines in unpickler (#163447)
yfeldblum Sep 21, 2025
6ac2b3a
[BE] Adding aliases for CUDA and XPU API documentation (#162984)
jiannanWang Sep 21, 2025
8a281d7
[submodule] Bump libfmt to 12.0.0 (#163441)
cyyever Sep 21, 2025
0b59492
[export] Fix wrap_with_set_grad_enabled retracing (#163295)
angelayi Sep 21, 2025
01f927e
Remove workarounds for Python 3.6 (#163440)
cyyever Sep 22, 2025
281bb56
Enable half precision types on test_conv_cudnn_nhwc_support (#163444)
cyyever Sep 22, 2025
3a7db34
Revert "[SymmMem] Promote `@requires_nvshmem` instead of `enable_trit…
pytorchmergebot Sep 22, 2025
f007894
Revert "[RELAND] Always build USE_DISTRIBUTED (#160449) and Make dist…
pytorchmergebot Sep 22, 2025
ae5be03
Revert "Delete functorch C extension entirely. (#163340)"
pytorchmergebot Sep 22, 2025
edafc90
Revert "[BE] Make PyObjectSlot use a global PyInterpreter (#162659)"
pytorchmergebot Sep 22, 2025
96a3afb
Simplify BFLOAT16_AVAILABLE (#163445)
cyyever Sep 22, 2025
60b4791
[MPS] Fix compile linalg inv (#163452)
Isalia20 Sep 22, 2025
9f5a644
[BE] Update Python min version to 3.10 (#162310)
malfet Sep 22, 2025
10adeb9
Revert "[BE] Update Python min version to 3.10 (#162310)"
pytorchmergebot Sep 22, 2025
509c4e8
Update cutlass version for fbcode (#163091)
henrylhtsang Sep 19, 2025
eaac218
[ROCm] Fix environment variable AOTRITON_INSTALLED_PREFIX (#163373)
xinyazhang Sep 22, 2025
e310cc5
Update fbgemm submodule (#163411)
cthi Sep 22, 2025
9ca183e
switch from stack based to graph based aproach (#163459)
laithsakka Sep 22, 2025
06fe5b9
[AOTI] fix TestAOTInductorPackage temp file locked handler. (#163499)
xuhancn Sep 22, 2025
5e7be98
[BE] Update Python min version to 3.10 (#162310)
malfet Sep 22, 2025
281f8f4
Combine strong and weak refcounts in intrusive_ptr in a single refcou…
mcfi Sep 22, 2025
d279a6a
ci: Add a way to lint all files in a PR from label (#163525)
seemethere Sep 22, 2025
bec967e
Remove C++ and test branches for CUDA<12 (#163443)
cyyever Sep 22, 2025
3be9c86
[opaque obj] Initial OpaqueObject (#162660)
angelayi Sep 22, 2025
dd30667
[opaque_obj] Add set_payload + docs (#163276)
angelayi Sep 22, 2025
4941719
Enable logging for absolute memory estimation (#158799)
basilwong Sep 22, 2025
7e97811
Fix lint (#163542)
angelayi Sep 22, 2025
1818c36
[Fix] Restrict stride normalization to 1D tensors on export (#163282)
Kathryn-cat Sep 22, 2025
eaa613b
Revert "[opaque_obj] Add set_payload + docs (#163276)"
pytorchmergebot Sep 22, 2025
bf28990
Add support for NestedTensor share_memory_ (#162272)
adabeyta Sep 22, 2025
d150484
[opaque_obj] Add set_payload + docs (#163276)
angelayi Sep 22, 2025
6f9aef5
[2/n] Support module.to("cuda:0") in FakeTensorMode on cuda-less mach…
SherlockNoMad Sep 22, 2025
d008670
[triton] update 3.5 pin to bbb06c0334a6772b92d24bde54956e675c8c6604 (…
davidberard98 Sep 19, 2025
fd785b1
Add NestedTensor dispatch for _is_any_true/_is_all_true (#162096)
adabeyta Sep 22, 2025
e065d35
[BE]: Add a few more missing move from return indices (#163456)
Skylion007 Sep 22, 2025
46e1b7d
remove allow-untyped-defs from ./torch/utils/data/datapipes/iter/file…
bobrenjc93 Sep 22, 2025
cf28ab2
remove allow-untyped-defs from ./torch/ao/quantization/pt2e/duplicate…
bobrenjc93 Sep 22, 2025
02da475
Triton template IMA reads on B200 (#163460)
drisspg Sep 22, 2025
8abc2af
[STABLE ABI] Add clone method to torch::stable::Tensor (#161896)
pearu Sep 22, 2025
8e62d01
Add dynamic shapes doc (#159428)
svekars Sep 22, 2025
4027e97
[BE] Delete `skipIfMPSOnMacOS13` (#163515)
malfet Sep 22, 2025
09cb34c
[RELAND] Always build USE_DISTRIBUTED (#160449) and Make distributed …
ezyang Sep 22, 2025
e558f7a
[vllm hash update] update the pinned vllm hash (#163463)
pytorchupdatebot Sep 22, 2025
da05aa7
[BE] Use `output_t` directly (#163518)
malfet Sep 22, 2025
0256f91
[BUG] MaxUnpool2d/3d should check output dim before accessing its ele…
can-gaa-hou Sep 22, 2025
2b03663
Allow add_persistent_r_block to scale up rblock up to a limit (#162296)
PaulZhang12 Sep 17, 2025
7ea8998
Better decomp for torch.eye (#163386)
jansel Sep 22, 2025
36c2a13
[inductor] Fix bug where viewed outputs get padded (#163398)
jansel Sep 22, 2025
a1bd924
[inductor] Fallback on strided complex add (#163387)
jansel Sep 22, 2025
c8fd2b4
[inductor] Skip test_baddmm on XPU (#163414)
jansel Sep 22, 2025
4fc271e
[inductor] Don't require_dense for grid_sampler_2d_backward (#163415)
jansel Sep 22, 2025
e0cbab4
[Inductor] avoid CUDA__equal when constant tensors are from different…
cp2923 Sep 22, 2025
b756b58
Improve fake tensor leakage detection in export by not relying on gc …
tugsbayasgalan Sep 22, 2025
60c2bde
Replace Literal[None] with None in typing (#163489)
cyyever Sep 22, 2025
33daaad
dynamo: Handle objects in graph that do not support weakref (#163168)
c00w Sep 17, 2025
fa15fb0
[EZ] Remove XLA from unstable.yml (#163564)
malfet Sep 22, 2025
8da0086
Remove outdated commented CMake code (#163442)
cyyever Sep 22, 2025
68e75be
Update pytorch_sphinx_theme2 to latest hash (#163269)
svekars Sep 22, 2025
539e84e
[precompile] Add option to disable guard check on aot-compiled functi…
zhxchen17 Sep 23, 2025
3ef1bef
[sdpa] make sure to recompile if alignment is different than before (…
ColinPeppler Sep 19, 2025
2c7959e
[ignore][codex-test] Add typing to simple library registry (#161367)
bobrenjc93 Sep 23, 2025
8f30a8d
[AOTInductor] Add grid information for Triton Kernels (#160131)
muchulee8 Sep 22, 2025
e9300b2
remove allow-untyped-defs from ./torch/onnx/_internal/torchscript_exp…
bobrenjc93 Sep 22, 2025
6a48f57
[1/N] Remove 'type: ignore' suppressions (#163468)
cyyever Sep 23, 2025
447b8fc
[2/N] Use filesystem in inductor (#163465)
cyyever Sep 23, 2025
27164b6
Add fake_impl for _native_multi_head_attention (#163167)
ydwu4 Sep 23, 2025
0b75a16
[torchfuzz] Encapsulate fuzzing and codegen logic into ops (#163547)
bobrenjc93 Sep 22, 2025
95ac7d7
Rename to _debug_mode.py to make it private (#163534)
SherlockNoMad Sep 23, 2025
fcd79d5
[vllm hash update] update the pinned vllm hash (#163590)
pytorchupdatebot Sep 23, 2025
0e12238
[torchfuzz] remove supports_variable_inputs for now (#163553)
bobrenjc93 Sep 22, 2025
bb5be56
[torch][cuda][device_limits] Library for querying device hardware lim…
valentinandrei Sep 23, 2025
e3b392b
[BC breaking] Remove deprecated imports for torch.utils.data.datapipe…
cyyever Sep 23, 2025
d3a1345
Use functools.cache on has_efa (#163439)
cyyever Sep 23, 2025
19b754d
Revert "Update cutlass version for fbcode (#163091)"
pytorchmergebot Sep 23, 2025
08c5efd
[torchfuzz] cache operators (#163554)
bobrenjc93 Sep 22, 2025
d5e51d3
[torchfuzz] decompose -> fuzz_inputs_specs (#163555)
bobrenjc93 Sep 22, 2025
1545bb1
[torchfuzz] shuffle compatible ops (#163556)
bobrenjc93 Sep 22, 2025
309fe03
[torchfuzz] remove unneeded try catch (#163557)
bobrenjc93 Sep 22, 2025
45d9dcc
Update Kineto Submodule (#162222)
sraikund16 Sep 23, 2025
375f3e3
[OpenReg][Docs] Correct docs about `openreg` usage example. (#163235)
KarhouTam Sep 23, 2025
b426ba1
[torchfuzz] introduce tensor and scalar pointwise ops (#163558)
bobrenjc93 Sep 22, 2025
8d81564
[pt2][cache] rework cache for true generic usage + better tests (#163…
nmacchioni Sep 23, 2025
5d749ce
Remove test conditions for CUDA<12 (#163495)
cyyever Sep 23, 2025
3c64b2a
CUDA 13.0 Warning update for supported architectures (#163585)
atalman Sep 23, 2025
bda9ab2
[inductor] fix as_strided lowering with .view(dtype) inputs (#163319)
xmfan Sep 22, 2025
1a42656
[Flex attention] Fix flex attention head broadcast (#163426)
Isalia20 Sep 23, 2025
aff76c0
Revert "Add fake_impl for _native_multi_head_attention (#163167)"
pytorchmergebot Sep 23, 2025
e05c9c0
[ROCm][CI] cudagraph trees ut fixes (#163592)
jeffdaily Sep 23, 2025
4264fd3
Add basic tests for torch.distributed.tensor._utils.compute_global_te…
swolchok Sep 18, 2025
518c320
[inductor] libdevice.sqrt => tl.sqrt_rn (#163419)
jansel Sep 23, 2025
ed84e80
[inductor] Freeze layouts in FlexAttention (#163434)
jansel Sep 23, 2025
9c4d9f9
[inductor] Support out_dtype arg to matmul (#163393)
jansel Sep 23, 2025
6ef7487
[dynamo] Fix TorchFunctionMode handling with get_rng_state (#163412)
jansel Sep 23, 2025
49e7b2f
[inductor] Fix error from custom CUDA allocators (#163422)
jansel Sep 23, 2025
720a7b2
[export] Remove .contiguous() when saving weights to raw bytes (#163587)
yiming0416 Sep 23, 2025
0f67407
Large tests failing on bfloat16 (#163537)
drisspg Sep 22, 2025
b3cf5c7
Skip on sm100 later since Tests are non determinisitic (#163552)
drisspg Sep 22, 2025
5f0c7cb
Add B200 smoke test (#159494)
drisspg Sep 22, 2025
ebddbe7
[ROCm][CI] skip test_sparse_triangular_solve (#163651)
jeffdaily Sep 23, 2025
6e5dddb
Use accelerator API in common_dtensor (#163498)
dilililiwhy Sep 23, 2025
221ac81
Revert "[precompile] Add option to disable guard check on aot-compile…
pytorchmergebot Sep 23, 2025
134dfbe
[DCP] DTensor slice dequantization with proper block alignment (#163532)
saumishr Sep 23, 2025
fde929c
[AOTI] Fix model_package_loader get_cpp_compile_command (#163561)
xuhancn Sep 23, 2025
2aadcea
[ROCm] Improve perf for elementwise broadcast with mixed dtype (#163562)
jerrymannil Sep 23, 2025
649ceda
[export] handling NamedTuple inputs (#162959)
Raman-RH Sep 23, 2025
ca35dc2
[EZ] Fix UP041 violations (#163648)
malfet Sep 23, 2025
0696a4b
[EZ] Perma-ignore UP038 (#163649)
malfet Sep 23, 2025
8e6b0c7
[Inductor] Remove `no_type_check` annotation on properties (#163570)
blaine-rister Sep 23, 2025
bcb893a
[ROCm] Build FBGEMM_GENAI for gfx942 only (#162648)
jithunnair-amd Sep 23, 2025
22c5e8c
Add num_store to inductor_meta and use it to scale persistent reducti…
PaulZhang12 Sep 22, 2025
2a9745d
[multi-kernel] shape-similarity kernel selection (#163090)
pianpwk Sep 23, 2025
fc84743
Implement CUDA stream protocol (#163614)
msaroufim Sep 23, 2025
e671dcc
Update tests to check for more robust pattern (#163107)
tugsbayasgalan Sep 23, 2025
5ca563e
symintify fill_diagonol_ (#163485)
bobrenjc93 Sep 23, 2025
b182365
[ez] use list initializer syntax in fill_diagonal_ (#163607)
bobrenjc93 Sep 23, 2025
8c8416b
Update pytorch.org links in docs/conf.py (#163682)
svekars Sep 23, 2025
29af258
Less aggressive persistent reduction when it could induce large maski…
eellison Sep 23, 2025
c3d9f08
[torchfuzz] introduce multi process fuzzer (#163560)
bobrenjc93 Sep 23, 2025
c63e417
use reduction hint for aggressive rblock (#163371)
eellison Sep 23, 2025
b879ef7
[ROCm][CI] skip TestCudaPrimaryCtx.test_set_device_0 (#163693)
jeffdaily Sep 23, 2025
2014908
[MPS] Compute `offset2bag/bag_size/max_indices` in `_embedding_bag` (…
kurtamohler Sep 19, 2025
6b5ad5f
[Kineto] Add list of string parsing for profiler (#163593)
muchulee8 Sep 23, 2025
f3f67ff
Fix warn message (#163578)
drisspg Sep 22, 2025
f9fa138
[BE] Delete all pre py-3.10 checks (#163653)
malfet Sep 23, 2025
ee75c3d
Support for amin, amax, and aminmax (#163669)
srsuryadev Sep 23, 2025
eb3fbf5
[inductor] in emulate_precision_casts, disable fma fusion in triton (…
v0i0 Sep 23, 2025
4535254
[3/N] Use std::filesystem in inductor (#163632)
cyyever Sep 24, 2025
dc93529
[Triton] [Inductor] Restrict subprocess autotuning to just Triton (#1…
njriasan Sep 24, 2025
1e754d5
docs and optional kwargs for full graph capture (#163550)
avikchaudhuri Sep 24, 2025
be6c127
[AOTI] Pass comments from metadata to the autotune block (#163600)
desertfire Sep 23, 2025
e2ce79e
[Flex] Fix silent correctness w/ backpropping grads (#163677)
drisspg Sep 23, 2025
c261c71
Simplify _compute_local_shape_and_global_offset and make it SPMD. (#1…
ezyang Sep 19, 2025
ca512af
[inductor] Fix issue with scalar arg handling (#163481)
jansel Sep 23, 2025
6fa9727
[inductor] Fix bugs in emulate_precision_casts (#163520)
jansel Sep 23, 2025
d746b98
[inductor] Fix divmod error in decomp (#163482)
jansel Sep 23, 2025
42e9902
cd: Move arm64 to linux.arm64.r7g.12xlarge.memory (#163681)
seemethere Sep 23, 2025
6f1d962
[vllm hash update] update the pinned vllm hash (#163711)
pytorchupdatebot Sep 24, 2025
20eeb54
Add api info for torch._C._nn.pyi (#162936)
orangeH25 Sep 24, 2025
124dd36
[hop] support local_map + SAC (#163322)
xmfan Sep 23, 2025
0390798
[Triton] [Inductor] Enable Epilogue Subtiling in the blackwell ws tem…
njriasan Sep 24, 2025
a8e9ed2
[inductor] turn on loaf (for oss) by default (#162030)
shunting314 Sep 22, 2025
f68de58
[Inductor-FX] Support symbol and dynamic scalar graph inputs and outp…
blaine-rister Sep 24, 2025
2c5a3d7
Delete functorch C extension entirely. (#163340)
ezyang Sep 24, 2025
dad54ca
Add mistral/gpt-oss to benchmarks (#163565)
angelayi Sep 24, 2025
11a231e
[c10d] P2P tensors must be dense (#163719)
kwen2501 Sep 24, 2025
bf0747c
[Code Clean] Remove deadcodes about Python3.9 [1/N] (#163626)
fffrog Sep 24, 2025
0bca779
[Code Clean] Remove deadcodes about Python3.9 [2/N] (#163627)
fffrog Sep 24, 2025
33aabdd
[Code Clean] Remove deadcodes about Python3.9 [3/N] (#163629)
fffrog Sep 24, 2025
ec0cd81
[Code Clean] Remove deadcodes about Python3.9 [4/N] (#163643)
fffrog Sep 24, 2025
6f34cc0
[Code Clean] Remove deadcodes about Python3.9 [5/N] (#163644)
fffrog Sep 24, 2025
a635505
[Code Clean] Remove deadcodes about Python3.9 [6/N] (#163645)
fffrog Sep 24, 2025
2390d34
[Code Clean] Remove deadcodes about Python3.9 [7/N] (#163646)
fffrog Sep 24, 2025
3e1b1a3
Revert "[inductor] Fix issue with scalar arg handling" (#163737)
jansel Sep 24, 2025
207f104
[Triton] [Inductor] Set default configs for Blackwell Matmul Template…
njriasan Sep 24, 2025
b66aa1a
[ARM] Add test_memory_profiler to aarch64 tests (#145260)
robert-hardwick Sep 24, 2025
141fc72
[CD] CUDA 13.0 fix preload logic to include nvidia/cu13/lib/ (#163661)
atalman Sep 24, 2025
3b73841
update test_quantization tests to run weekly (#163077)
liangel-02 Sep 24, 2025
9d0d98a
Use cuda nvrtc so file based on cuda version used by torch (#163642)
atalman Sep 24, 2025
5d0f639
Make `Tensor.__dlpack__(stream=None)` capture-safe during CUDA Graph …
eee4017 Sep 24, 2025
4c2c401
Record redistribute_local_tensor in DebugMode (#163704)
SherlockNoMad Sep 24, 2025
9341ede
Revert to old behaviour of not padding strides if shape or stride is …
nandesuka Sep 24, 2025
768361e
Add less warps config to inner reductions (#162447)
PaulZhang12 Sep 24, 2025
c414f75
[WOQ][Inductor] Enable CUDA coverage for _weight_int8pack_mm (#163461)
bbeckca Sep 24, 2025
0456b23
[AOTI] Add verbose error information for extract file (#163718)
xuhancn Sep 24, 2025
71eec6a
[dist] handle discontiguous allgather/reducescatter inputs (#163712)
ngimel Sep 24, 2025
0dce2af
[ROCm][CI] adjust tf32 tolerance for test_compile_kernel_advanced (#1…
jeffdaily Sep 24, 2025
90a2825
Add `inference_mode` hint message to use `eval` with inference. (#163…
zeshengzong Sep 24, 2025
1495b35
Remove Python 3.9 for Triton builds (#163778)
atalman Sep 24, 2025
b40191b
Merge remote-tracking branch 'upstream/main' into rocm7.1_internal_te…
github-actions[bot] Sep 24, 2025
f3e8213
Fix merge conflicts
pragupta Sep 24, 2025
0ad8381
Address review comments wrt triton_heuristics and install_rocm
pragupta Sep 30, 2025
63fcd9b
update related_commits
pragupta Sep 30, 2025
77f4534
Fix more conflicts with triton_heuristics.py
pragupta Sep 30, 2025
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
The table of contents is too big for display.
Diff view
Diff view
  •  
  •  
  •  
13 changes: 4 additions & 9 deletions .ci/aarch64_linux/aarch64_ci_build.sh
Original file line number Diff line number Diff line change
Expand Up @@ -5,9 +5,9 @@ GPU_ARCH_VERSION=${GPU_ARCH_VERSION:-}

# Set CUDA architecture lists to match x86 build_cuda.sh
if [[ "$GPU_ARCH_VERSION" == *"12.6"* ]]; then
export TORCH_CUDA_ARCH_LIST="5.0;6.0;7.0;8.0;9.0"
export TORCH_CUDA_ARCH_LIST="8.0;9.0"
elif [[ "$GPU_ARCH_VERSION" == *"12.8"* ]]; then
export TORCH_CUDA_ARCH_LIST="7.0;8.0;9.0;10.0;12.0"
export TORCH_CUDA_ARCH_LIST="8.0;9.0;10.0;12.0"
elif [[ "$GPU_ARCH_VERSION" == *"13.0"* ]]; then
export TORCH_CUDA_ARCH_LIST="8.0;9.0;10.0;11.0;12.0+PTX"
fi
Expand All @@ -31,8 +31,7 @@ pip install -r /pytorch/requirements.txt
pip install auditwheel==6.2.0 wheel
if [ "$DESIRED_CUDA" = "cpu" ]; then
echo "BASE_CUDA_VERSION is not set. Building cpu wheel."
#USE_PRIORITIZED_TEXT_FOR_LD for enable linker script optimization https://github.com/pytorch/pytorch/pull/121975/files
USE_PRIORITIZED_TEXT_FOR_LD=1 python /pytorch/.ci/aarch64_linux/aarch64_wheel_ci_build.py --enable-mkldnn
python /pytorch/.ci/aarch64_linux/aarch64_wheel_ci_build.py --enable-mkldnn
else
echo "BASE_CUDA_VERSION is set to: $DESIRED_CUDA"
export USE_SYSTEM_NCCL=1
Expand All @@ -42,13 +41,9 @@ else
echo "Bundling CUDA libraries with wheel for aarch64."
else
echo "Using nvidia libs from pypi for aarch64."
# Fix platform constraints in PYTORCH_EXTRA_INSTALL_REQUIREMENTS for aarch64
# Replace 'platform_machine == "x86_64"' with 'platform_machine == "aarch64"'
export PYTORCH_EXTRA_INSTALL_REQUIREMENTS="${PYTORCH_EXTRA_INSTALL_REQUIREMENTS//platform_machine == \'x86_64\'/platform_machine == \'aarch64\'}"
echo "Updated PYTORCH_EXTRA_INSTALL_REQUIREMENTS for aarch64: $PYTORCH_EXTRA_INSTALL_REQUIREMENTS"
export USE_NVIDIA_PYPI_LIBS=1
fi

#USE_PRIORITIZED_TEXT_FOR_LD for enable linker script optimization https://github.com/pytorch/pytorch/pull/121975/files
USE_PRIORITIZED_TEXT_FOR_LD=1 python /pytorch/.ci/aarch64_linux/aarch64_wheel_ci_build.py --enable-mkldnn --enable-cuda
python /pytorch/.ci/aarch64_linux/aarch64_wheel_ci_build.py --enable-mkldnn --enable-cuda
fi
20 changes: 9 additions & 11 deletions .ci/aarch64_linux/aarch64_wheel_ci_build.py
Original file line number Diff line number Diff line change
Expand Up @@ -138,6 +138,8 @@ def package_cuda_wheel(wheel_path, desired_cuda) -> None:
folder = os.path.dirname(wheel_path)
os.mkdir(f"{folder}/tmp")
os.system(f"unzip {wheel_path} -d {folder}/tmp")
# Delete original wheel since it will be repackaged
os.system(f"rm {wheel_path}")

# Check if we should use PyPI NVIDIA libraries or bundle system libraries
use_nvidia_pypi_libs = os.getenv("USE_NVIDIA_PYPI_LIBS", "0") == "1"
Expand Down Expand Up @@ -211,7 +213,8 @@ def package_cuda_wheel(wheel_path, desired_cuda) -> None:
]

# CUDA version-specific libraries
if "130" in desired_cuda:
if "13" in desired_cuda:
minor_version = desired_cuda[-1]
version_specific_libs = [
"/usr/local/cuda/extras/CUPTI/lib64/libcupti.so.13",
"/usr/local/cuda/lib64/libcublas.so.13",
Expand All @@ -221,7 +224,7 @@ def package_cuda_wheel(wheel_path, desired_cuda) -> None:
"/usr/local/cuda/lib64/libcusolver.so.12",
"/usr/local/cuda/lib64/libnvJitLink.so.13",
"/usr/local/cuda/lib64/libnvrtc.so.13",
"/usr/local/cuda/lib64/libnvrtc-builtins.so.13.0",
f"/usr/local/cuda/lib64/libnvrtc-builtins.so.13.{minor_version}",
]
elif "12" in desired_cuda:
# Get the last character for libnvrtc-builtins version (e.g., "129" -> "9")
Expand All @@ -237,6 +240,8 @@ def package_cuda_wheel(wheel_path, desired_cuda) -> None:
"/usr/local/cuda/lib64/libnvrtc.so.12",
f"/usr/local/cuda/lib64/libnvrtc-builtins.so.12.{minor_version}",
]
else:
raise ValueError(f"Unsupported CUDA version: {desired_cuda}.")

# Combine all libraries
libs_to_copy = common_libs + version_specific_libs
Expand Down Expand Up @@ -275,14 +280,7 @@ def complete_wheel(folder: str) -> str:
f"/{folder}/dist/{repaired_wheel_name}",
)
else:
repaired_wheel_name = wheel_name.replace(
"linux_aarch64", "manylinux_2_28_aarch64"
)
print(f"Renaming {wheel_name} wheel to {repaired_wheel_name}")
os.rename(
f"/{folder}/dist/{wheel_name}",
f"/{folder}/dist/{repaired_wheel_name}",
)
repaired_wheel_name = list_dir(f"/{folder}/dist")[0]

print(f"Copying {repaired_wheel_name} to artifacts")
shutil.copy2(
Expand Down Expand Up @@ -319,7 +317,7 @@ def parse_arguments():
).decode()

print("Building PyTorch wheel")
build_vars = "CMAKE_SHARED_LINKER_FLAGS=-Wl,-z,max-page-size=0x10000 "
build_vars = ""
# MAX_JOB=5 is not required for CPU backend (see commit 465d98b)
if enable_cuda:
build_vars += "MAX_JOBS=5 "
Expand Down
4 changes: 2 additions & 2 deletions .ci/aarch64_linux/build_aarch64_wheel.py
Original file line number Diff line number Diff line change
Expand Up @@ -241,7 +241,7 @@ def wait_for_connection(addr, port, timeout=15, attempt_cnt=5):
try:
with socket.create_connection((addr, port), timeout=timeout):
return
except (ConnectionRefusedError, socket.timeout): # noqa: PERF203
except (ConnectionRefusedError, TimeoutError): # noqa: PERF203
if i == attempt_cnt - 1:
raise
time.sleep(timeout)
Expand Down Expand Up @@ -1004,7 +1004,7 @@ def parse_arguments():
install_condaforge_python(host, args.python_version)
sys.exit(0)

python_version = args.python_version if args.python_version is not None else "3.9"
python_version = args.python_version if args.python_version is not None else "3.10"

if args.use_torch_from_pypi:
configure_system(host, compiler=args.compiler, python_version=python_version)
Expand Down
12 changes: 4 additions & 8 deletions .ci/docker/build.sh
Original file line number Diff line number Diff line change
Expand Up @@ -214,8 +214,7 @@ case "$tag" in
TRITON=yes
;;
pytorch-linux-jammy-py3-gcc11-inductor-benchmarks)
# TODO (huydhn): Upgrade this to Python >= 3.10
ANACONDA_PYTHON_VERSION=3.9
ANACONDA_PYTHON_VERSION=3.10
GCC_VERSION=11
VISION=yes
KATEX=yes
Expand Down Expand Up @@ -263,13 +262,10 @@ case "$tag" in
TRITON_CPU=yes
;;
pytorch-linux-jammy-linter)
# TODO: Use 3.9 here because of this issue https://github.com/python/mypy/issues/13627.
# We will need to update mypy version eventually, but that's for another day. The task
# would be to upgrade mypy to 1.0.0 with Python 3.11
PYTHON_VERSION=3.9
PYTHON_VERSION=3.10
;;
pytorch-linux-jammy-cuda12.8-cudnn9-py3.9-linter)
PYTHON_VERSION=3.9
pytorch-linux-jammy-cuda12.8-cudnn9-py3.10-linter)
PYTHON_VERSION=3.10
CUDA_VERSION=12.8.1
;;
pytorch-linux-jammy-aarch64-py3.10-gcc11)
Expand Down
6 changes: 5 additions & 1 deletion .ci/docker/centos-rocm/Dockerfile
Original file line number Diff line number Diff line change
Expand Up @@ -59,9 +59,13 @@ ENV INSTALLED_VISION ${VISION}

# Install rocm
ARG ROCM_VERSION
RUN mkdir ci_commit_pins
COPY ./common/common_utils.sh common_utils.sh
COPY ./ci_commit_pins/rocm-composable-kernel.txt ci_commit_pins/rocm-composable-kernel.txt
COPY ./common/install_rocm.sh install_rocm.sh
RUN bash ./install_rocm.sh
RUN rm install_rocm.sh
RUN rm install_rocm.sh common_utils.sh
RUN rm -r ci_commit_pins
COPY ./common/install_rocm_magma.sh install_rocm_magma.sh
RUN bash ./install_rocm_magma.sh ${ROCM_VERSION}
RUN rm install_rocm_magma.sh
Expand Down
2 changes: 1 addition & 1 deletion .ci/docker/ci_commit_pins/executorch.txt
Original file line number Diff line number Diff line change
@@ -1 +1 @@
56392aa978594cc155fa8af48cd949f5b5f1823a
e0dda9059d082537cee36be6c5e4fe3b18c880c0
2 changes: 1 addition & 1 deletion .ci/docker/ci_commit_pins/huggingface-requirements.txt
Original file line number Diff line number Diff line change
@@ -1,2 +1,2 @@
transformers==4.54.0
transformers==4.56.0
soxr==0.5.0
1 change: 1 addition & 0 deletions .ci/docker/ci_commit_pins/rocm-composable-kernel.txt
Original file line number Diff line number Diff line change
@@ -0,0 +1 @@
7fe50dc3da2069d6645d9deb8c017a876472a977
23 changes: 14 additions & 9 deletions .ci/docker/common/install_executorch.sh
Original file line number Diff line number Diff line change
Expand Up @@ -42,22 +42,27 @@ install_pip_dependencies() {
# A workaround, ExecuTorch has moved to numpy 2.0 which is not compatible with the current
# numba and scipy version used in PyTorch CI
conda_run pip uninstall -y numba scipy
# Yaspin is needed for running CI test (get_benchmark_analysis_data.py)
pip_install yaspin==3.1.0

popd
}

setup_executorch() {
pushd executorch

export PYTHON_EXECUTABLE=python
export CMAKE_ARGS="-DEXECUTORCH_BUILD_PYBIND=ON -DEXECUTORCH_BUILD_XNNPACK=ON -DEXECUTORCH_BUILD_KERNELS_QUANTIZED=ON"
export CMAKE_ARGS="-DEXECUTORCH_BUILD_PYBIND=ON -DEXECUTORCH_BUILD_XNNPACK=ON -DEXECUTORCH_BUILD_KERNELS_QUANTIZED=ON -DEXECUTORCH_BUILD_TESTS=ON"

as_jenkins .ci/scripts/setup-linux.sh --build-tool cmake || true
popd
}

clone_executorch
install_buck2
install_conda_dependencies
install_pip_dependencies
setup_executorch
if [ $# -eq 0 ]; then
clone_executorch
install_buck2
install_conda_dependencies
install_pip_dependencies
pushd executorch
setup_executorch
popd
else
"$@"
fi
14 changes: 10 additions & 4 deletions .ci/docker/common/install_rocm.sh
Original file line number Diff line number Diff line change
Expand Up @@ -2,6 +2,11 @@

set -ex

# for pip_install function
source "$(dirname "${BASH_SOURCE[0]}")/common_utils.sh"

ROCM_COMPOSABLE_KERNEL_VERSION="$(cat $(dirname $0)/../ci_commit_pins/rocm-composable-kernel.txt)"

ver() {
printf "%3d%03d%03d%03d" $(echo "$1" | tr '.' ' ');
}
Expand Down Expand Up @@ -109,8 +114,7 @@ EOF
rm -rf HIP clr
fi

# temporary hipblasLT dependency install
apt install libmsgpackc2
pip_install "git+https://github.com/rocm/composable_kernel@$ROCM_COMPOSABLE_KERNEL_VERSION"

# Cleanup
apt-get autoclean && apt-get clean
Expand All @@ -122,8 +126,8 @@ install_centos() {
yum update -y
yum install -y kmod
yum install -y wget
if [[ $OS_VERSION == 9 ]]; then

if [[ $OS_VERSION == 9 ]]; then
dnf install -y openblas-serial
dnf install -y dkms kernel-headers kernel-devel
else
Expand Down Expand Up @@ -195,6 +199,8 @@ install_centos() {
sqlite3 $kdb "PRAGMA journal_mode=off; PRAGMA VACUUM;"
done

pip_install "git+https://github.com/rocm/composable_kernel@$ROCM_COMPOSABLE_KERNEL_VERSION"

# Cleanup
yum clean all
rm -rf /var/cache/yum
Expand Down
5 changes: 2 additions & 3 deletions .ci/docker/requirements-ci.txt
Original file line number Diff line number Diff line change
Expand Up @@ -93,8 +93,9 @@ librosa==0.10.2 ; python_version == "3.12" and platform_machine != "s390x"
#Pinned versions:
#test that import:

mypy==1.16.0
mypy==1.16.0 ; platform_system != "Windows"
# Pin MyPy version because new errors are likely to appear with each release
# Skip on Windows as lots of type annotations are POSIX specific
#Description: linter
#Pinned versions: 1.16.0
#test that import: test_typing.py, test_type_hints.py
Expand Down Expand Up @@ -322,8 +323,6 @@ lxml==5.3.0 ; python_version <= "3.12"
lxml==6.0.0 ; python_version == "3.13"
#Description: This is a requirement of unittest-xml-reporting

# Python-3.9 binaries

PyGithub==2.3.0

sympy==1.13.3
Expand Down
2 changes: 1 addition & 1 deletion .ci/docker/requirements-docs.txt
Original file line number Diff line number Diff line change
@@ -1,7 +1,7 @@
sphinx==5.3.0
#Description: This is used to generate PyTorch docs
#Pinned versions: 5.3.0
-e git+https://github.com/pytorch/pytorch_sphinx_theme.git@1657ad2fc1acdc98aa719eebecbb0128a7c13ce4#egg=pytorch_sphinx_theme2
-e git+https://github.com/pytorch/pytorch_sphinx_theme.git@d53b0ffb9b1cda68260693ea98f3483823c88d8e#egg=pytorch_sphinx_theme2

# TODO: sphinxcontrib.katex 0.9.0 adds a local KaTeX server to speed up pre-rendering
# but it doesn't seem to work and hangs around idly. The initial thought that it is probably
Expand Down
6 changes: 5 additions & 1 deletion .ci/docker/ubuntu-rocm/Dockerfile
Original file line number Diff line number Diff line change
Expand Up @@ -52,9 +52,13 @@ ENV INSTALLED_VISION ${VISION}

# Install rocm
ARG ROCM_VERSION
RUN mkdir ci_commit_pins
COPY ./common/common_utils.sh common_utils.sh
COPY ./ci_commit_pins/rocm-composable-kernel.txt ci_commit_pins/rocm-composable-kernel.txt
COPY ./common/install_rocm.sh install_rocm.sh
RUN bash ./install_rocm.sh
RUN rm install_rocm.sh
RUN rm install_rocm.sh common_utils.sh
RUN rm -r ci_commit_pins
COPY ./common/install_rocm_magma.sh install_rocm_magma.sh
RUN bash ./install_rocm_magma.sh ${ROCM_VERSION}
RUN rm install_rocm_magma.sh
Expand Down
2 changes: 1 addition & 1 deletion .ci/libtorch/build.sh
Original file line number Diff line number Diff line change
Expand Up @@ -7,4 +7,4 @@ set -ex

SCRIPTPATH="$( cd "$( dirname "${BASH_SOURCE[0]}" )" >/dev/null 2>&1 && pwd )"

USE_NVSHMEM=0 USE_CUSPARSELT=0 BUILD_PYTHONLESS=1 DESIRED_PYTHON="3.9" ${SCRIPTPATH}/../manywheel/build.sh
USE_NVSHMEM=0 USE_CUSPARSELT=0 BUILD_PYTHONLESS=1 DESIRED_PYTHON="3.10" ${SCRIPTPATH}/../manywheel/build.sh
8 changes: 2 additions & 6 deletions .ci/lumen_cli/cli/lib/core/vllm/lib.py
Original file line number Diff line number Diff line change
Expand Up @@ -41,7 +41,6 @@ def sample_vllm_test_library():
"pytest -v -s basic_correctness/test_cumem.py",
"pytest -v -s basic_correctness/test_basic_correctness.py",
"pytest -v -s basic_correctness/test_cpu_offload.py",
"VLLM_TEST_ENABLE_ARTIFICIAL_PREEMPT=1 pytest -v -s basic_correctness/test_preemption.py",
],
},
"vllm_basic_models_test": {
Expand All @@ -68,15 +67,12 @@ def sample_vllm_test_library():
"-v",
"-s",
"entrypoints/llm",
"--ignore=entrypoints/llm/test_lazy_outlines.py",
"--ignore=entrypoints/llm/test_generate.py",
"--ignore=entrypoints/llm/test_generate_multiple_loras.py",
"--ignore=entrypoints/llm/test_collective_rpc.py",
]
),
"pytest -v -s entrypoints/llm/test_lazy_outlines.py",
"pytest -v -s entrypoints/llm/test_generate.py ",
"VLLM_USE_V1=0 pytest -v -s entrypoints/offline_mode",
"pytest -v -s entrypoints/llm/test_generate.py",
"pytest -v -s entrypoints/offline_mode",
],
},
"vllm_regression_test": {
Expand Down
11 changes: 11 additions & 0 deletions .ci/lumen_cli/cli/lib/core/vllm/vllm_build.py
Original file line number Diff line number Diff line change
Expand Up @@ -66,6 +66,11 @@ class VllmBuildParameters:
"DOCKERFILE_PATH", ".github/ci_configs/vllm/Dockerfile.tmp_vllm"
)

# the cleaning script to remove torch dependencies from pip
cleaning_script: Path = env_path_field(
"cleaning_script", ".github/ci_configs/vllm/use_existing_torch.py"
)

# OUTPUT_DIR: where docker buildx (local exporter) will write artifacts
output_dir: Path = env_path_field("OUTPUT_DIR", "external/vllm")

Expand Down Expand Up @@ -160,6 +165,7 @@ def run(self):
logger.info("Running vllm build with inputs: %s", inputs)
vllm_commit = clone_vllm()

self.cp_torch_cleaning_script(inputs)
self.cp_dockerfile_if_exist(inputs)
# cp torch wheels from root direct to vllm workspace if exist
self.cp_torch_whls_if_exist(inputs)
Expand Down Expand Up @@ -205,6 +211,11 @@ def cp_torch_whls_if_exist(self, inputs: VllmBuildParameters) -> str:
copy(inputs.torch_whls_path, tmp_dir)
return tmp_dir

def cp_torch_cleaning_script(self, inputs: VllmBuildParameters):
script = get_path(inputs.cleaning_script, resolve=True)
vllm_script = Path(f"./{self.work_directory}/use_existing_torch.py")
copy(script, vllm_script)

def cp_dockerfile_if_exist(self, inputs: VllmBuildParameters):
if not inputs.use_local_dockerfile:
logger.info("using vllm default dockerfile.torch_nightly for build")
Expand Down
Loading