forked from pytorch/pytorch
-
Notifications
You must be signed in to change notification settings - Fork 75
[AUTOGENERATED] develop_IFU_20251118 #2812
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Merged
Merged
Conversation
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
…rch#167516) Pull Request resolved: pytorch#167516 Approved by: https://github.com/oulgen
We probably need something similar for expand Pull Request resolved: pytorch#167232 Approved by: https://github.com/ColinPeppler
# Motivation Move `XPUEvent` to `c10/xpu` to keep consistent with `XPUStream`, which is already in `c10/xpu`. The most important thing is that we will leverage `XPUEven`t in our caching allocator instead of a raw sycl event. Pull Request resolved: pytorch#158336 Approved by: https://github.com/EikanWang, https://github.com/albanD
These tests were failing since they were added in pytorch#165381 Evidence: scroll back in HUD, on that commit they were failing. I'm going to (1) set the accuracy to get CI green and (2) file an issue for this. Pull Request resolved: pytorch#167609 Approved by: https://github.com/choijon5, https://github.com/desertfire
There are two motivating use cases for this change: 1) export (when we trace pytree calls into a graph, we don't want to accidentally trace the side effect bytecode which will pollute the initial state) -> We want to warn about side effects and don't want to actually apply them 2) VLLM -> They want to detect side effects and error out. We implement this with two configs where one config controls whether we want to apply side effects (by default yes) and the warning level for side effects (warning for export and error for VLLM). We intentionally ignore input side effects, because they are captured in the graph and export would never trace the actual dynamo graph module when tracing the pytree calls). Pull Request resolved: pytorch#167239 Approved by: https://github.com/williamwen42, https://github.com/anijain2305
This reverts commit 406719c. Reverted pytorch#166708 on behalf of https://github.com/jeanschmidt due to breaks internal signals, see D86606212 ([comment](pytorch#166708 (comment)))
…y_info (pytorch#162564)" This reverts commit 3cfbf98. Reverted pytorch#162564 on behalf of https://github.com/jeanschmidt due to seems to be breaking 1000s of internal build rules, see D86638790 ([comment](pytorch#156812 (comment)))
…h#156812)" This reverts commit abf31db. Reverted pytorch#156812 on behalf of https://github.com/jeanschmidt due to seems to be breaking 1000s of internal build rules, see D86638790 ([comment](pytorch#156812 (comment)))
…167335) Modified cuda_to_hip_mappings.py to map cuSPARSELt headers and types to their hipSPARSELt counterparts, improving compatibility and functionality for ROCm users. Pull Request resolved: pytorch#167335 Approved by: https://github.com/jeffdaily, https://github.com/Skylion007
This PR applies new Union and Optional typing syntax to some files. Pull Request resolved: pytorch#167449 Approved by: https://github.com/albanD
Need to wait for: Dao-AILab/flash-attention#1998 to land Pull Request resolved: pytorch#167392 Approved by: https://github.com/jbschlosser ghstack dependencies: pytorch#167348
This reverts commit c7007e7. Reverted pytorch#167343 on behalf of https://github.com/jeffdaily due to causing ROCm distributed jobs to time out ([comment](pytorch#167343 (comment)))
Update dynamo results due to flaky model https://github.com/pytorch/pytorch/actions/runs/19283051320/job/55139788014 Pull Request resolved: pytorch#167660 Approved by: https://github.com/jeffdaily Co-authored-by: Jeff Daily <[email protected]>
…pytorch#164992) I found that running any compiled function under DebugMode more than once will trigger recompilations, e.g. with the really simple modified test case in `test_compile`: ``` [0/1] [__recompiles] Recompiling function f in /data/users/pianpwk/ptclone/pytorch/test/distributed/tensor/debug/test_debug_mode.py:268 [0/1] [__recompiles] triggered by the following guard failure(s): [0/1] [__recompiles] - 0/0: [0/2] [__recompiles] Recompiling function f in /data/users/pianpwk/ptclone/pytorch/test/distributed/tensor/debug/test_debug_mode.py:268 [0/2] [__recompiles] triggered by the following guard failure(s): [0/2] [__recompiles] - 0/1: [0/2] [__recompiles] - 0/0: ``` Digging deeper, the guard failures were due to TENSOR_MATCH guards failing on dispatch key set checks (seemingly on the Python dispatch key): https://github.com/pytorch/pytorch/blob/5a1fbf45ad727353e367740ecd8825ca7ee857e9/torch/csrc/dynamo/guards.cpp#L199-L203 This seems to due to the `ignore_compile_internals=True` flag on custom dispatch modes being on, which causes these modes to "hide" themselves during compilation, making dynamo guard on the Python dispatch key being off. The (maybe imperfect) solution is to mask out the Python keys for guard comparisons, when `_is_in_any_mode_without_ignore_compile_internals` is False. Pull Request resolved: pytorch#164992 Approved by: https://github.com/williamwen42
Summary: Update caffe2/torch/csrc to build under CUDA 13. As of CUDA 13, CCCL v3 is the default, and as such, nvToolsExt.h has been moved to nvtx3/nvtx3.hpp. This is needed for building FBGEMM_GPU under CUDA 13 (see D86372925) Test Plan: ``` # Default build buck build --flagfile fbcode//mode/dev-nosan fbcode//caffe2:_C_impl buck build --flagfile fbcode//mode/dev-nosan fbcode//caffe2:_C_impl_cuda # CUDA 13 build buck build @//mode/opt -c fbcode.arch=aarch64 -c fbcode.nvcc_arch=b200 -c fbcode.platform010_cuda_version=13.0 fbcode//caffe2:_C_impl buck build @//mode/opt -c fbcode.arch=aarch64 -c fbcode.nvcc_arch=b200 -c fbcode.platform010_cuda_version=13.0 fbcode//caffe2:_C_impl_cuda ``` Differential Revision: D86517946 Pull Request resolved: pytorch#167401 Approved by: https://github.com/Skylion007
… is defined (pytorch#167496) Fixes pytorch#161660 This extends the `TORCH_STABLE_ONLY` stopgap added in pytorch#161658 Pull Request resolved: pytorch#167496 Approved by: https://github.com/janeyx99 ghstack dependencies: pytorch#167495
segfaults dont gen xml, so we get no info about them in clickhouse or in the xml or in the json, so this manually generates something and uploads it to s3 to be ingested at some point some of the existing code for test reports should be changed to just use the json that gets uploaded in the job or something Pull Request resolved: pytorch#167250 Approved by: https://github.com/huydhn
…ytorch#167620) Pull Request resolved: pytorch#167620 Approved by: https://github.com/xuanzhang816
Summary: Improve compatibility with projects that have -Wswitch-default errors/warnings enabled by suppressing those errors/warnings in caffe2 headers. Test Plan: CI Pass Differential Revision: D86785451 Pull Request resolved: pytorch#167563 Approved by: https://github.com/shoumikhin
Implementation greatly adapted from @lw's pytorch#163505. TORCH_BOX is the StableIValue version of `make_boxed_from_unboxed_functor`. the differences: - uses headeronly concepts - adds an unbox type mapping to support user kernels taking in torch::headeronly::HeaderOnlyArrayRef<T> (by calling to<std::vector<T>> in those cases) Pull Request resolved: pytorch#167582 Approved by: https://github.com/swolchok ghstack dependencies: pytorch#167386
…7397)" This reverts commit 7886070. Reverted pytorch#167397 on behalf of https://github.com/jeanschmidt due to seems to be breaking executorch signals internally, see D86780724 ([comment](pytorch#167397 (comment)))
…hapes (pytorch#166358)" This reverts commit 416421c. Reverted pytorch#166358 on behalf of https://github.com/jeanschmidt due to seems to be breaking internal signals, see D86790405, @angelayi may you help the author get this change landed? ([comment](pytorch#166358 (comment)))
Which is a regression introduced by pytorch#167046 That causes CuDNN SDPA fail with actionable `cuDNN Frontend error: [cudnn_frontend] Error: No valid execution plans built.` error Change `cuda_libs` from dict to list, and add `test_sdpa` regression test to binary smoke tests Fixes pytorch#167602 Pull Request resolved: pytorch#167614 Approved by: https://github.com/Aidyn-A, https://github.com/atalman, https://github.com/nWEIdia
…lectorCache.__call__ (pytorch#167487) Summary: What: moves `create_no_valid_choices` out of `AlgorithmSelectorCache.__call__` and into the body of `AlgorithmSelectorCache` Why: nested function definitions make it harder to understand what `AlgorithmSelectorCache.__call__` is doing, on top of making patching/testing/etc more difficult Test Plan: CI Differential Revision: D86712921 Pull Request resolved: pytorch#167487 Approved by: https://github.com/aorenste
Summary: Update caffe2/c10/cuda to build under CUDA 13 As of CUDA 13, the cudaMemAdvise() has been updated to take in `cudaMemLocation` as argument instead of `int` device id This is needed for building FBGEMM_GPU under CUDA 13 (see D86372925) Test Plan: ``` # Default build buck build @//mode/opt fbcode//caffe2/c10/cuda:cuda # CUDA 13 build buck build @//mode/opt -c fbcode.arch=aarch64 -c fbcode.nvcc_arch=b200 -c fbcode.platform010_cuda_version=13.0 fbcode//caffe2/c10/cuda:cuda # AMD build buck build --flagfile fbcode//mode/dev-nosan-amd-gpu fbcode//caffe2/c10/cuda:cuda ``` Reviewed By: atalman Differential Revision: D86578286 Pull Request resolved: pytorch#167534 Approved by: https://github.com/seemethere
Summary: Folding logic on Matmal can be decomposed to BMM or folding + MM. Current common Training path for 3D * 2D matmul: library will always fold, since Tensor1 or Tensor2 BOTH require a grad, so we fold since Tensor2 has grad. But reasoning isn't really sound, it was done as a memory optimization - when its also generally same/more performant. However, in Chemistry / Modular Modeling its common to directly calculate Forces as derivate of Energy (ie. dl/dX, but NOT dl/dW) in inference. This exposed bug where we only have 1 of 2 Tensors requires grad, and may choose NOT to fold, resulting in 30% regression due to suboptimal BMM decomposition of torch.nn.Linear (-> calls into matmul). I actually think even in cases we need either dl/dX or dl/dW, we should be folding when working with inputs of [B, M, N] and weights of [N, K]. Its strictly better for memory and same/faster when you consider both forward + backward runtime, and M's that are not multiples of 8 are particularly brutally slow using BMM vs MM. Also, compiler out of box could not solve this issue, which raise another concern (was actually highlighted 2 years ago in comments, but seems still case today: (pytorch#118548 (comment)) Differential Revision: D86128493 Pull Request resolved: pytorch#166891 Approved by: https://github.com/ngimel
…ace (pytorch#167248) Summary: getCurrentCUDABlasHandle() and getCUDABlasLtWorkspace() use static mutable maps that are not protected from concurrent read-and-write. This leads to crashes. This diff adds mutexes to synchronize access to the static maps. Test Plan: Use a GPU OD, run multi-threaded tests with TSAN: ``` buck test fbcode//mode/dev-tsan fbcode//caffe2:cuda_cublas_handle_pool_test -- --stress-runs 100 ``` https://www.internalfb.com/intern/testinfra/testrun/14355223937501118 TSAN: P2026731804 Differential Revision: D86316117 Pull Request resolved: pytorch#167248 Approved by: https://github.com/Skylion007, https://github.com/malfet
…ytorch#167663) Summary: as title. Test Plan: CI Fixes #ISSUE_NUMBER Pull Request resolved: pytorch#167663 Approved by: https://github.com/tugsbayasgalan
Should be merged after pytorch#166561 Pull Request resolved: pytorch#166708 Approved by: https://github.com/malfet
Fixes #ISSUE_NUMBER Pull Request resolved: pytorch#167338 Approved by: https://github.com/jamesjwu
Pull Request resolved: pytorch#167947 Approved by: https://github.com/jansel
Pull Request resolved: pytorch#167951 Approved by: https://github.com/jansel ghstack dependencies: pytorch#167947
Pull Request resolved: pytorch#167953 Approved by: https://github.com/jansel ghstack dependencies: pytorch#167947, pytorch#167951
…7047) Pull Request resolved: pytorch#167047 Approved by: https://github.com/etaf, https://github.com/jansel Co-authored-by: xinan.lin <[email protected]>
…kwards (pytorch#167705) Pull Request resolved: pytorch#167705 Approved by: https://github.com/williamwen42
Helps with reducing Dynamo tracing time. Earlier the generator object would cause more polyfills. Pull Request resolved: pytorch#168024 Approved by: https://github.com/williamwen42
Fix for this issue on DSV3 autobucketing pass: pytorch/torchtitan#2037; Now users should be able to run DSV3 autobucketing E2E. It fixed three things: (1) fix bug in NCCL estimation support for All-to-all. (2) For dynamic token dispatch/combine in MoE, add fall_back value hint to all-to-all's collective size estimation. (3) Previously, for schedulable node check, I directly modified `is_wait` in bucketing.py. It might be safer to add these criteria in overlap_scheduling.py as another function `_schedulable_wait_node` Pull Request resolved: pytorch#167797 Approved by: https://github.com/eellison
Pull Request resolved: pytorch#168049 Approved by: https://github.com/fduwjj
…sion guards (pytorch#168025) Address pytorch#161891 (comment) Pull Request resolved: pytorch#168025 Approved by: https://github.com/janeyx99
This is tested by pytorch#167962 which ensures we get compilation errors when using functions that convert Device/HeaderOnlyArrayRef to StableIValue and target 2.9 Pull Request resolved: pytorch#167802 Approved by: https://github.com/janeyx99 ghstack dependencies: pytorch#168025
Tests are split into libtorch_agnostic_2_9_extension and libtorch_agnostic_2_10_extension depending on the minimum version they should compile+run in Pull Request resolved: pytorch#167803 Approved by: https://github.com/janeyx99 ghstack dependencies: pytorch#168025, pytorch#167802
…rsion (pytorch#167804) Adds a CI workflow that tests the wheel built on current main targeting 2.9 with a 2.9 runtime Pull Request resolved: pytorch#167804 Approved by: https://github.com/janeyx99 ghstack dependencies: pytorch#168025, pytorch#167802, pytorch#167803
…#167962) Splits each torch library registration in the 2.10 folder into its own file -- I had a script that parsed kernel.cpp to do this but I felt like forcing this responsibility on the user might be less error prone Compiles each file targetting 2.9 and asserts that compilation fails. (There are 2 2.9 kernels we use as negative tests where compilation is expected to succeed) Pull Request resolved: pytorch#167962 Approved by: https://github.com/janeyx99 ghstack dependencies: pytorch#168025, pytorch#167802, pytorch#167803, pytorch#167804
This PR add a sm_121a flag for row-wise scaled matmuls on DGX Spark. Pull Request resolved: pytorch#167734 Approved by: https://github.com/eqy, https://github.com/cyyever
This PR outputs chars to stream without building temporary strings. They were modified by (on fish) ``` sed -i -e 's/<< "\([^\\\']\)"/<< \'\1\'/g' (grep '<< "."' -r torch c10 aten -l) ``` and revert some invalid changes. Pull Request resolved: pytorch#167899 Approved by: https://github.com/Skylion007
) Pull Request resolved: pytorch#166618 Approved by: https://github.com/EikanWang, https://github.com/desertfire, https://github.com/jansel
) Otherwise is causes null pointer deref Pull Request resolved: pytorch#167869 Approved by: https://github.com/slayton58, https://github.com/Skylion007 ghstack dependencies: pytorch#167868
Upgrade all the ROCm docker images to ROCm 7.1 release version. Pull Request resolved: pytorch#166743 Approved by: https://github.com/atalman Co-authored-by: Jeff Daily <[email protected]> Co-authored-by: Prachi Gupta <[email protected]>
Pull Request resolved: pytorch#168055 Approved by: https://github.com/eellison ghstack dependencies: pytorch#166536
Removed distributed related paths from labeler configuration. Pull Request resolved: pytorch#168084 Approved by: https://github.com/wconstab
# Conflicts: # .ci/docker/ci_commit_pins/triton.txt # requirements.txt
|
Jenkins build for da5ac4a82178862a6da89a7b573bdba2c4f6c3c0 commit finished as FAILURE Detected error during base docker image building: |
To keep triton version consistent with what is in rocm/triton's release/internal/3.5.x branch, we need to keep triton_version.txt at 3.5.0 and move triton hash to ToT of that branch.
|
Jenkins build for a3c49a95de48914e369aa08899a683c2db88ed5f commit finished as SUCCESS |
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
rocm_base: 3d74218
rocm_base: 3d74218
upstream_main: e2b53ba