[AUTOGENERATED] rocm7.1_internal_testing_IFU_2025-10-29 #2769

jithunnair-amd · 2025-10-29T20:33:34Z

~~rocm_base: f054e91~~

Preview wheel build job: http://rocm-ci.amd.com/view/preview/job/pytorch-latest-manylinux-wheels-preview/193/

rocm_base: f054e91
upstream_main: fc540ce

) # Problem Inductor implicitly upcasts certain rank-0 kernel arguments from float16 to float32. Currently, this happens only on the `"cpu"` device, which appears to be related to float16 support in CPU Triton. However, it can also affect the behavior of GPU kernels, when a model contains tensors from multiple devices. Upcasting may be undesirable on some platforms, so users can typically disable it with the `config.triton.codegen_upcast_to_fp32` flag. However, this flag was not respected by the rank-0 kernel argument codepath. Through an improbable series of events, float32 upcasting caused an internal model to fail compilation on MTIA. (Internal reviewers see T242444110.) # Fix If `config.triton.codegen_upcast_to_fp32` evaluates to `False`, cast the kernel argument to the original dtype. # Test plan Added a new CI test checking for the downcast iff the config flag is false. The test mixes GPU and CPU tensors to generate a GPU kernel with the implicit float32 upcast and explicit float16 downcast. Pull Request resolved: pytorch#166118 Approved by: https://github.com/jfix71, https://github.com/jansel, https://github.com/kundaMwiza

Generated with prompt: > torch/_tensor_docs.py and torch/nn/functional.py contain the "gold standard" for docstrings in the PyTorch project. Write a skill describing how to write a docstring for a function/method in the PyTorch project. Note that add_docstring is specifically for C binded functions; a native Python function can just be a direct docstring. Sphinx is used to generate docs. Signed-off-by: Edward Yang <[email protected]> Pull Request resolved: pytorch#166175 Approved by: https://github.com/Skylion007

Pull Request resolved: pytorch#166154 Approved by: https://github.com/BoyuanFeng

pytorch#165946) … adding verbose check (pytorch#165926) [ghstack-poisoned] Fixes #ISSUE_NUMBER Pull Request resolved: pytorch#165946 Approved by: https://github.com/williamwen42

Fixes pytorch#166089 Pull Request resolved: pytorch#166090 Approved by: https://github.com/malfet

…st changed files (pytorch#165171) As in title If you change only one workflow file, lintrunner (default arg, also the one in CI since it only inputs changed files) won't look at other files in the repo, but the sync-tag might come from those other files This makes it so that it looks at all workflow files so it will catch those failures Also change output line so it prints which file + which job it is different from Pros: catches errors Cons: unusual behavior (getting around what lintrunner says the linter should run on) Pull Request resolved: pytorch#165171 Approved by: https://github.com/malfet, https://github.com/izaitsevfb, https://github.com/atalman

) Follow-up pytorch#166151 Pull Request resolved: pytorch#166196 Approved by: https://github.com/justinchuby

vllm-compile implies "module: vllm" and "oncall: pt2". The volume of issues in Flex -> HigherOrderOperators is too noisy, plus we have a different set of folks looking at each, so I'm going to make that not automatic anymore. We can still manually label flex issues as higher order operator issues. Pull Request resolved: pytorch#166172 Approved by: https://github.com/angelayi

…ytorch#166145) Summary: Currently all sub modules of UnflattenedModule have orginal type name. This diff will orginal type for UnflattenedModule. Test Plan: ``` buck test mode/opt caffe2/test:test_export ``` https://www.internalfb.com/intern/testinfra/testrun/17732923654320197 Differential Revision: D85373454 Pull Request resolved: pytorch#166145 Approved by: https://github.com/angelayi

Summary: Like D85463674 (pr pytorch#166195) but for D85357351 (pytorch#165927) Differential Revision: D85464917 Pull Request resolved: pytorch#166199 Approved by: https://github.com/Camyll, https://github.com/malfet, https://github.com/Skylion007

Fix pytorch#165724 . The typo does not affect the compilation result. It just affect compilation time a little bit. Pull Request resolved: pytorch#166029 Approved by: https://github.com/eellison

Pull Request resolved: pytorch#166193 Approved by: https://github.com/BoyuanFeng

@malfet

…orch#166076) Spurred by the conversation started in pytorch#163343. Context: * Header implementations may be inlined _but_ are not necessarily inlined, even when using the `inline` keyword. * When someone wants to use multiple extensions in the same runtime, e,g., with FA3 and AO, then 2 `.so`s are loaded that may have been built with different libtorch versions. Thus, if an API is not inlined and are differently implemented, one implementation will be arbitrarily picked up and used across the runtime, depending on link order. This is bad! * Consequently, we need to be very good at guaranteeing that we don't modify header implementations within a namespace. This is easy to mess up by accident, which would be a dire mistake. Solution: In essence, we want APIs in torch::headeronly and torch::stable to be visible in each individual extension only, and nowhere else. We want to hide these symbols! Thankfully, pybind already solved this problem (thanks @malfet for bringing that to my attention). This PR is heavily inspired by the code in pybind here: https://github.com/pybind/pybind11/blob/e6984c805ec09c0e5f826e3081a32f322a6bfe63/include/pybind11/detail/pybind11_namespace_macros.h#L73-L82. In this PR, we introduce the macros for defining hidden namespaces in PyTorch. Pull Request resolved: pytorch#166076 Approved by: https://github.com/malfet

Pull Request resolved: pytorch#166077 Approved by: https://github.com/malfet ghstack dependencies: pytorch#166076

…166078) Pull Request resolved: pytorch#166078 Approved by: https://github.com/malfet ghstack dependencies: pytorch#166076, pytorch#166077

Pull Request resolved: pytorch#166079 Approved by: https://github.com/malfet, https://github.com/cyyever ghstack dependencies: pytorch#166076, pytorch#166077, pytorch#166078

This is a follow-up of pytorch#165515. ruff `UP035` rules are applied to dynamo code to use Py 3.10+ typing. Pull Request resolved: pytorch#165709 Approved by: https://github.com/ezyang

The gatherKthValue kernel had a race condition where multiple threads could write to the same output location without synchronization when duplicate k-th values exist, resulting in non-deterministic output. Changes: - aten/src/ATen/native/cuda/Sorting.cu: Use atomicMin with shared memory to deterministically find minimum index. Add early termination and remove redundant inRange checks. (We have to cast the index to `int32_t`, but this is already assumed to fit earlier in the kernel.) - aten/src/ATen/native/cuda/Sorting.cpp: Remove non-deterministic alert since kthvalue is now deterministic on CUDA. - torch/__init__.py: Remove kthvalue from non-deterministic operations list and remove kthvalue example from use_deterministic_algorithms() docstring. - test/test_torch.py: Remove test_nondeterministic_alert_kthvalue since kthvalue no longer raises alerts on CUDA. Benefits: - Deterministic: always returns minimum index when duplicates exist - Potential performance improvement on large arrays with repetitions Test Results: - All existing PyTorch tests pass (test_kthvalue) - Custom determinism tests confirm consistent results - Custom CUDA vs CPU correctness validated across 50+ scenarios - Custom performance benchmarks show improvements with no visible regressions Addresses pytorch#165227 Pull Request resolved: pytorch#165762 Approved by: https://github.com/ngimel, https://github.com/eqy

…ytorch#165315) Please refer to this [link](pytorch#163979) for more background. - Allow register fallback for AutogradPrivateUse1 multiple. - Add Autograd fallback implemetation for AutogradPrivateUse1 PyTorch can privide a common implementation for AutogradPrivateUse1, and the user can override it based on the need of specififc accelerator. Pull Request resolved: pytorch#165315 Approved by: https://github.com/albanD

…Private1 (pytorch#165316) As the title stated. The fallback for AutogradPrivateUse1 is builtin in PyTorch, so it is no need to register general implementation for out of tree backend. Pull Request resolved: pytorch#165316 Approved by: https://github.com/ezyang, https://github.com/albanD ghstack dependencies: pytorch#165315

…ytorch#166155) id on methods can change from invocation to invocation. Here we guard on __code__ objects which does not change Pull Request resolved: pytorch#166155 Approved by: https://github.com/jansel

Pull Request resolved: pytorch#166112 Approved by: https://github.com/Lucaskabela ghstack dependencies: pytorch#166155

Pull Request resolved: pytorch#166210 Approved by: https://github.com/Skylion007

…4572) Changed the implementation from an output-based approach to an input-based one to remove `atomicAdd` operations, and it appears to deliver at least a 20× speedup. The changes are from Yu-Yun <[email protected]>. # Summary: Refactor of the implementation of the `upsample_bilinear2d_backward` opertion on MI300X/MI325X - The original "scatter-add" approach - Each thread, representing an output pixel, scattered gradient contributions to four input pixels, using costly atomic operations on MI300X/MI325X GPUs. - The new "gather-sum" approach - Each thread is responsible for a single input pixel and gathers all relevant gradient contributions from a small, calculated region of the output tensor (done by the `compute_output_range` device function). # Breakdown of the code changes - Inversion of the parallelization strategy of the kernel function `upsample_bilinear2d_backward_out_frame` - Originally, the main kernel loop was parallelized over the number of elements in the output gradient tensor (`const size_t o_numel = nc * width2 * height2;`). - Each thread processed one output pixel. - The new loop is parallelized over the number of elements in the input gradient tensor (`const size_t i_numel = nc * height1 * width1;`). - Each thread is responsible for calculating the final gradient for a single input pixel. - The kernel launch changes accordingly in the function `upsample_bilinear2d_backward_out_cuda_template`. - Added a device function for calculating the range of output pixels that could have possibly used that the input pixel (`input_pos`) during the forward pass interpolation - This is essentially the mathematical inverse of the forward pass. - This function tries to prune a thread's search space so that it only needs to inspect a small, local window of the output tensor. - Gradient calculation approach switching from "scatter-add" to "gather-sum" - Scatter-add - For each output pixel, the thread calculated 4 gradient contributions and use `fastAtomicAdd` 4 times to add these values to 4 different (and potentially highly contended) memory locations in the input gradient tensor. - Gather-sum - A thread responsible for one input pixel calls `compute_output_range` to determine the small rectangular region of output pixels that influence the input's final gradient value. - The thread iterates through this region, and for each output pixel in the regionre, it re-calculates the interpolation weights to determine the exact contribution to its specific input pixel. - All these contributions are accumulated into a private, per-thread register variable (`accscalar_t grad_sum = 0;`). - W/o any gloabl memory access, this accumulation is extremely fast. - When the loops are done, the thread performs a single, direct write (non-atomic) of the final summed gradient to its designated location in global memory (`idata[index] = static_cast<scalar_t>(grad_sum);`). # Why performance gets boosted - Analysis of the root cause of performance drop - Ref. (internal only) - https://amd.atlassian.net/wiki/spaces/~glencao2/pages/1140493327/PyTorch__upsample_bilinear2d_backward - First and foremost, elimination of the contention of atomic operations - Many parallel threads called `atomicAdd` frequently attempting to update the exact same memory location in the input gradient tensor at the same time. - The GPU's memory controler has to serialize these operations, effectively nullifying the benefit of parallel capability at those contention points. - MI300X/MI325X chiplet-based CDNA 3 architeture amplified the issue. - When contending threads reside on different XCDs, resolving the atomic operation requires high-latency coherence traffic across the Infinity Fabric interconnect. - The implementation change eliminates hardware-level serialization and cross-chiplet coherence traffic caused by many `atomicAdd`. - Improved memory access pattern and locality - Write coalescing - The regular sum writes `idata[index] = static_cast<scalar_t>(grad_sum);` can be perfectly coalesced by GPUs. - Read locality - Even though there are many (potentially repeated) reads from the output tensor (`static_cast<accscalar_t>(odata[output_idx])`), these are highly cache-friendly, meaning the data for one thread is likely to be in the L1 or L2 cache already due to an access from a neighboring thread. - Trade-off: computation for memory synchronization - The recalculation of interpolation weights fits well on high-computational-throughput modern GPUs like MI300X/MI325X. - Removal of atomic operations avoids expensive memory synchronization. --- Optimizations of `grid_sampler_2d_backward` will be addressed in a separate PR. Doc for reference: (internal only) https://amd.atlassian.net/wiki/spaces/~glencao2/pages/1162750701/PyTorch__grid_sampler_2d_backward Pull Request resolved: pytorch#164572 Approved by: https://github.com/jeffdaily Co-authored-by: Eli Uriegas <[email protected]>

… be contiguous (pytorch#166181) Per title Pull Request resolved: pytorch#166181 Approved by: https://github.com/kwen2501

Summary: Implementing autovec template for type conversions on aarch64-NEON Generated code can be seen here: https://godbolt.org/z/1K6T1d9TE We've seen significant performance improvements for converting to and from bytes, compiling using clang with -march=armv9-a+sve2: Before float->uint8->float ===> 683.212us float->int8->float ===> 687.846us int32->uint8->int32 ===> 497.121us int32->int8->int32 ===> 481.889us After: float->uint8->float ===> 198.204us ----> 245% higher throughput float->int8->float ===> 200.241us ----> 244% higher throughput int32->uint8->int32 ===> 197.970us ----> 151% higher throughput int32->int8->int32 ===> 198.206us ----> 143% higher throughput Test Plan: buck2 test mode/opt //caffe2/test:test_ops buck2 test mode/opt //caffe2/test:torch Differential Revision: D85213420 Pull Request resolved: pytorch#166049 Approved by: https://github.com/ezyang, https://github.com/mcfi, https://github.com/aditew01

Fixes pytorch#165948 Adding registration of the MaskBlock makes flex attention with kwargs exportable. Also modified unittests to accept kwargs ``` python test/distributed/tensor/test_dtensor_export.py -k test_flex_attention_dtensor_export python test/inductor/test_flex_attention.py -k test_pytree_ ``` Pull Request resolved: pytorch#166045 Approved by: https://github.com/drisspg

…#165893) At a high level after this fix we get the following nice tlparse https://manifold.edge.x2p.facebook.net/v0/read/tree/logs/bobren/54a57665-7dcc-41e0-8ca7-df01393cd4aa/custom/index.html?bucketName=tlparse_reports&apiKey=tlparse_reports-key&withPayload=1&timeoutMsec=10000 As seen in this doc, previously we were simply dropping assert post dynamo: https://docs.google.com/document/d/1nRQwvw_gWL0_9T3VKb5Ly3_tNI1fgqG9WtryeD6qaZI/edit?tab=t.0 The fixes are a couple things: 1) Actually run the runtime assertion fx graph pass on subgraphs 2) Reset fake mode unbacked memo across speculate subgraph invocations since the memos actually break the runtime assertion insertions since calls like nonzero end up not allocating new unbacked symints and hence not populating pending_unbacked which then results in incorrect unbacked_bindings on fx_nodes in subgraphs. This is a first step in hardening runtime asserts across all phases of the compiler (eager, aot_eager, inductor, etc.). I will continue kicking tires and fixing bugs until we get runtime assert generations in a good place. One obvious next step is the added test case in this PR fails when compiled with inductor with the following error (NB: it fails before this PR as well): ``` File "/data/users/bobren/a/pytorch/torch/_inductor/ir.py", line 659, in get_dtype return self.dtype torch._dynamo.exc.BackendCompilerFailed: backend='inductor' raised: LoweringException: AttributeError: 'ShapeAsConstantBuffer' object has no attribute 'dtype' target: cond args[0]: Eq(Mod(s77, 4), 0) args[1]: Subgraph(name='true_graph_0', graph_module=<lambda>(), graph=<torch._inductor.graph.SubgraphLowering object at 0x7fbcbb11e110>) args[2]: Subgraph(name='false_graph_0', graph_module=<lambda>(), graph=<torch._inductor.graph.SubgraphLowering object at 0x7fbcbb21cf70>) args[3]: (s77, TensorBox(StorageBox( ComputedBuffer(name='buf0', layout=FlexibleLayout('cuda:0', torch.float32, size=[s77, s77], stride=[s77, 1]), data=Pointwise(device=device(type='cuda', index=0), dtype=torch.float32, inner_fn=<function make_pointwise.<locals>.inner.<locals>.inner_fn at 0x7fbcbb2f37f0>, ranges=[s77, s77])) ))) ``` Pull Request resolved: pytorch#165893 Approved by: https://github.com/zou3519

Set prefer_deferred_runtime_asserts_over_guards to True and allow a flag to control the behavior, just in case. This option has enable the gemma3 model export with transformers==4.57. I am not sure how best to test it though. Pull Request resolved: pytorch#165820 Approved by: https://github.com/titaiwangms

Pull Request resolved: pytorch#166148 Approved by: https://github.com/fduwjj

…orch#166147) This PR refactors the bfloat16_support_literal constant in the PyTorch build logic to eliminate duplicated ROCm-specific code. Previously, there were two nearly identical branches for ROCM_VERSION < 70000 and ROCM_VERSION >= 70000, differing only by a single typedef. These have been unified into one conditional block with a minimal version guard inside. (#2502) Pull Request resolved: pytorch#166147 Approved by: https://github.com/jerrymannil, https://github.com/jeffdaily

Because the `p` value is not used. Pull Request resolved: pytorch#166507 Approved by: https://github.com/Skylion007

pytorch#164926) Replace assert statement with explicit ValueError exception to ensure the validation check is not removed when Python runs with optimization flag (-O). This is a draft PR to confirm the process. Fixes partially pytorch#164878. Pull Request resolved: pytorch#164926 Approved by: https://github.com/fffrog, https://github.com/albanD Co-authored-by: Jiawei Li <[email protected]>

Last one! This ensures all existing suppressions match the syntax expected and will silence only one error code pyrefly check lintrunner Pull Request resolved: pytorch#166496 Approved by: https://github.com/Skylion007, https://github.com/mlazos

Summary: - in torchft we have multiple default pg's, 1 for each task group - for flight recorder to work, each of these need to have a different name, so entries can be matched - change the `init_process_group` api to optionally take a list of ranks. if provided, we use the hash of the ranks as the name of the pg. for torchft, we'll pass global ranks here so the default pg have a different name on each task group Pull Request resolved: pytorch#166182 Approved by: https://github.com/fduwjj

…sting_IFU_2025-10-29 # Conflicts: # .ci/docker/build.sh # .ci/docker/ci_commit_pins/triton.txt # .ci/docker/libtorch/build.sh # CMakeLists.txt # aten/src/ATen/native/cuda/Blas.cpp # aten/src/ATen/native/cuda/Normalization.cuh # aten/src/ATen/native/sparse/cuda/SparseMatMul.cu # requirements-build.txt # test/dynamo/test_structured_trace.py # test/inductor/test_cuda_repro.py # test/inductor/test_decompose_mem_bound_mm.py # test/inductor/test_max_autotune.py # test/test_linalg.py # test/test_matmul_cuda.py # torch/_inductor/runtime/coordinate_descent_tuner.py # torch/_inductor/runtime/triton_heuristics.py # torch/testing/_internal/common_utils.py

rocm-repo-management-api · 2025-10-29T20:39:55Z

Jenkins build for b29013612303b29d7e9322876c89e137e3102092 commit finished as ABORTED
Links: Blue Ocean view / Build artifacts

rocm-repo-management-api · 2025-10-29T21:09:24Z

Jenkins build for b29013612303b29d7e9322876c89e137e3102092 commit finished as ABORTED
Links: Blue Ocean view / Build artifacts

rocm-repo-management-api · 2025-10-29T21:14:12Z

Jenkins build for b29013612303b29d7e9322876c89e137e3102092 commit finished as NOT_BUILT
Links: Blue Ocean view / Build artifacts

rocm-repo-management-api · 2025-10-29T21:37:50Z

Jenkins build for b29013612303b29d7e9322876c89e137e3102092 commit finished as FAILURE
Links: Blue Ocean view / Build artifacts

Detected error during Pytorch building:

[7446/8136] Building HIPCC object caffe2/CMakeFiles/torch_hip.dir/__/aten/src/ATen/native/hip/torch_hip_generated_shifted_chebyshev_polynomial_v.hip.o
clang++: warning: argument unused during compilation: '--offload-compress' [-Wunused-command-line-argument]
[7447/8136] Building HIPCC object caffe2/CMakeFiles/torch_hip.dir/__/aten/src/ATen/native/hip/torch_hip_generated_shifted_chebyshev_polynomial_u.hip.o
clang++: warning: argument unused during compilation: '--offload-compress' [-Wunused-command-line-argument]
[7448/8136] Building HIPCC object caffe2/CMakeFiles/torch_hip.dir/__/aten/src/ATen/native/sparse/hip/torch_hip_generated_SparseMatMul.hip.o
FAILED: caffe2/CMakeFiles/torch_hip.dir/__/aten/src/ATen/native/sparse/hip/torch_hip_generated_SparseMatMul.hip.o /var/lib/jenkins/pytorch/build/caffe2/CMakeFiles/torch_hip.dir/__/aten/src/ATen/native/sparse/hip/torch_hip_generated_SparseMatMul.hip.o 
cd /var/lib/jenkins/pytorch/build/caffe2/CMakeFiles/torch_hip.dir/__/aten/src/ATen/native/sparse/hip && /opt/conda/envs/py_3.12/lib/python3.12/site-packages/cmake/data/bin/cmake -E make_directory /var/lib/jenkins/pytorch/build/caffe2/CMakeFiles/torch_hip.dir/__/aten/src/ATen/native/sparse/hip/. && /opt/conda/envs/py_3.12/lib/python3.12/site-packages/cmake/data/bin/cmake -D verbose:BOOL=OFF -D build_configuration:STRING=RELEASE -D generated_file:STRING=/var/lib/jenkins/pytorch/build/caffe2/CMakeFiles/torch_hip.dir/__/aten/src/ATen/native/sparse/hip/./torch_hip_generated_SparseMatMul.hip.o -P /var/lib/jenkins/pytorch/build/caffe2/CMakeFiles/torch_hip.dir/__/aten/src/ATen/native/sparse/hip/torch_hip_generated_SparseMatMul.hip.o.cmake
clang++: warning: argument unused during compilation: '--offload-compress' [-Wunused-command-line-argument]
/var/lib/jenkins/pytorch/aten/src/ATen/native/sparse/hip/SparseMatMul.hip:62:2: error: unterminated conditional directive
   62 | #if IS_CUSPARSE11_AVAILABLE()
      |  ^

rocm-repo-management-api · 2025-10-30T04:24:58Z

Jenkins build for fe6df6c0d464c6f533050c4f9c8eb8a2235bf119 commit finished as NOT_BUILT
Links: Blue Ocean view / Build artifacts

rocm-repo-management-api · 2025-10-30T04:41:10Z

Jenkins build for fe6df6c0d464c6f533050c4f9c8eb8a2235bf119 commit finished as FAILURE
Links: Blue Ocean view / Build artifacts

rocm-repo-management-api · 2025-10-30T14:40:43Z

Jenkins build for 74d7455145b66f3b3208be43991b5d1f14d2e23c commit finished as NOT_BUILT
Links: Blue Ocean view / Build artifacts

Even though I had picked all upstream changes during merge conflicts, other parts that didn't have conflicts still picked local changes. Now, this file is broken with missing symbols. I am just copying the upstream file into this branch now.

rocm-repo-management-api · 2025-10-30T14:55:16Z

Jenkins build for 74d7455145b66f3b3208be43991b5d1f14d2e23c commit finished as FAILURE
Links: Blue Ocean view / Build artifacts

pruthvistony · 2025-11-03T16:40:20Z

The build is fine.

pruthvistony · 2025-11-03T16:43:50Z

Triton commit is dumped to newer 3.5.x commit.

blaine-rister and others added 30 commits October 24, 2025 19:59

[FX][ez] fix the split_module tutorial code (pytorch#166154)

2c851c1

Pull Request resolved: pytorch#166154 Approved by: https://github.com/BoyuanFeng

[Dynamo][Logging]Fix regression on stack adding to latest bytecode by… (

6038e47

pytorch#165946) … adding verbose check (pytorch#165926) [ghstack-poisoned] Fixes #ISSUE_NUMBER Pull Request resolved: pytorch#165946 Approved by: https://github.com/williamwen42

[MPS] Add linalg.householder_product for MPS (pytorch#166090)

c9b49e5

Fixes pytorch#166089 Pull Request resolved: pytorch#166090 Approved by: https://github.com/malfet

[ONNX] Add a test to backed_size_oblivious patch in onnx (pytorch#166196

b04173b

) Follow-up pytorch#166151 Pull Request resolved: pytorch#166196 Approved by: https://github.com/justinchuby

[inductor][ez] fix score fusion memory typo (pytorch#166029)

bc11a42

Fix pytorch#165724 . The typo does not affect the compilation result. It just affect compilation time a little bit. Pull Request resolved: pytorch#166029 Approved by: https://github.com/eellison

[FlexFlash] update names (pytorch#166193)

cc20b7a

Pull Request resolved: pytorch#166193 Approved by: https://github.com/BoyuanFeng

Hide all APIs in torch::stable (pytorch#166077)

dfdb68e

Pull Request resolved: pytorch#166077 Approved by: https://github.com/malfet ghstack dependencies: pytorch#166076

Hide stable Library structs instead of using anon namespace (pytorch#…

cddd5f7

…166078) Pull Request resolved: pytorch#166078 Approved by: https://github.com/malfet ghstack dependencies: pytorch#166076, pytorch#166077

Hide APIs in torch::headeronly (pytorch#166079)

d486eee

Pull Request resolved: pytorch#166079 Approved by: https://github.com/malfet, https://github.com/cyyever ghstack dependencies: pytorch#166076, pytorch#166077, pytorch#166078

[10/N] Apply ruff UP035 rule (pytorch#165709)

9d0b77f

This is a follow-up of pytorch#165515. ruff `UP035` rules are applied to dynamo code to use Py 3.10+ typing. Pull Request resolved: pytorch#165709 Approved by: https://github.com/ezyang

[dynamo] Avoid ID_MATCH on methods - use CLOSURE_MATCH on functions (p…

42bd210

…ytorch#166155) id on methods can change from invocation to invocation. Here we guard on __code__ objects which does not change Pull Request resolved: pytorch#166155 Approved by: https://github.com/jansel

[dynamo] Remove unnecessary NAME_MATCH guard (pytorch#166112)

0a5d68d

Pull Request resolved: pytorch#166112 Approved by: https://github.com/Lucaskabela ghstack dependencies: pytorch#166155

[MPS] Migrate angle to Metal ops (pytorch#166210)

8aa465f

Pull Request resolved: pytorch#166210 Approved by: https://github.com/Skylion007

Reverts pytorch#163712 and forces allgather/scatter inputs/outputs to…

2efcf3c

… be contiguous (pytorch#166181) Per title Pull Request resolved: pytorch#166181 Approved by: https://github.com/kwen2501

Add doc for Symmetric Memory (pytorch#166148)

1e2e7cb

Pull Request resolved: pytorch#166148 Approved by: https://github.com/fduwjj

rraminen and others added 6 commits October 29, 2025 16:59

Fix incomplete torch.cdist tests (pytorch#166507)

a3fe182

Because the `p` value is not used. Pull Request resolved: pytorch#166507 Approved by: https://github.com/Skylion007

jithunnair-amd requested review from jataylo, jeffdaily and pruthvistony as code owners October 29, 2025 20:33

pragupta mentioned this pull request Oct 29, 2025

[AUTOGENERATED] rocm7.1_internal_testing_IFU_2025-10-27 #2753

Closed

pragupta force-pushed the rocm7.1_internal_testing_IFU_2025-10-29 branch from b290136 to fe6df6c Compare October 30, 2025 04:20

pragupta added 2 commits October 30, 2025 14:43

Fix merge conflicts

c56fe7d

Fix bad merge of triton_heuristics.py

74d7455

Even though I had picked all upstream changes during merge conflicts, other parts that didn't have conflicts still picked local changes. Now, this file is broken with missing symbols. I am just copying the upstream file into this branch now.

pragupta force-pushed the rocm7.1_internal_testing_IFU_2025-10-29 branch from 2fa8e0f to 74d7455 Compare October 30, 2025 14:43

pruthvistony approved these changes Nov 3, 2025

View reviewed changes

pragupta merged commit 5fc1aea into rocm7.1_internal_testing Nov 3, 2025
61 of 65 checks passed

pragupta deleted the rocm7.1_internal_testing_IFU_2025-10-29 branch November 3, 2025 16:55

pragupta restored the rocm7.1_internal_testing_IFU_2025-10-29 branch November 3, 2025 17:01

pragupta deleted the rocm7.1_internal_testing_IFU_2025-10-29 branch November 3, 2025 18:19

pragupta mentioned this pull request Nov 3, 2025

[AUTOGENERATED] rocm7.1_internal_testing_IFU_2025-10-03 #2694

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[AUTOGENERATED] rocm7.1_internal_testing_IFU_2025-10-29 #2769

[AUTOGENERATED] rocm7.1_internal_testing_IFU_2025-10-29 #2769

Uh oh!

jithunnair-amd commented Oct 29, 2025 •

edited by pragupta

Loading

Uh oh!

rocm-repo-management-api bot commented Oct 29, 2025 •

edited

Loading

Uh oh!

rocm-repo-management-api bot commented Oct 29, 2025

Uh oh!

rocm-repo-management-api bot commented Oct 29, 2025 •

edited

Loading

Uh oh!

rocm-repo-management-api bot commented Oct 29, 2025 •

edited

Loading

Uh oh!

rocm-repo-management-api bot commented Oct 30, 2025 •

edited

Loading

Uh oh!

rocm-repo-management-api bot commented Oct 30, 2025 •

edited

Loading

Uh oh!

rocm-repo-management-api bot commented Oct 30, 2025 •

edited

Loading

Uh oh!

rocm-repo-management-api bot commented Oct 30, 2025 •

edited

Loading

Uh oh!

pruthvistony commented Nov 3, 2025

Uh oh!

pruthvistony commented Nov 3, 2025

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

108 participants

[AUTOGENERATED] rocm7.1_internal_testing_IFU_2025-10-29 #2769

[AUTOGENERATED] rocm7.1_internal_testing_IFU_2025-10-29 #2769

Uh oh!

Conversation

jithunnair-amd commented Oct 29, 2025 • edited by pragupta Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

rocm-repo-management-api bot commented Oct 29, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

rocm-repo-management-api bot commented Oct 29, 2025

Uh oh!

rocm-repo-management-api bot commented Oct 29, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

rocm-repo-management-api bot commented Oct 29, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

rocm-repo-management-api bot commented Oct 30, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

rocm-repo-management-api bot commented Oct 30, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

rocm-repo-management-api bot commented Oct 30, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

rocm-repo-management-api bot commented Oct 30, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

pruthvistony commented Nov 3, 2025

Uh oh!

pruthvistony commented Nov 3, 2025

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

108 participants

jithunnair-amd commented Oct 29, 2025 •

edited by pragupta

Loading

rocm-repo-management-api bot commented Oct 29, 2025 •

edited

Loading

rocm-repo-management-api bot commented Oct 29, 2025 •

edited

Loading

rocm-repo-management-api bot commented Oct 29, 2025 •

edited

Loading

rocm-repo-management-api bot commented Oct 30, 2025 •

edited

Loading

rocm-repo-management-api bot commented Oct 30, 2025 •

edited

Loading

rocm-repo-management-api bot commented Oct 30, 2025 •

edited

Loading

rocm-repo-management-api bot commented Oct 30, 2025 •

edited

Loading