Skip to content
Open
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
131 commits
Select commit Hold shift + click to select a range
56dfd4c
Add CUDA MXFP4 scaled mm support via. FBGEMM (#166526)
slayton58 Nov 3, 2025
afd50bd
[CI] Use smaller amx + avx2 runners for inductor test? (#164989)
clee2000 Nov 4, 2025
8d4b8ab
[ez] Print some more test timing info in the logs (#166447)
clee2000 Nov 4, 2025
68eb55c
Add model code stack trace to cuda.memory._snapshot (#166676)
yushangdi Nov 4, 2025
d02f68f
[BE] Use `[[maybe_unused]]` (#166865)
malfet Nov 3, 2025
eefa163
[Inductor] addmm with bias -> unfuse bias if there is a pointwise/red…
nikitaved Nov 4, 2025
3144713
subproc_pool: Add support for enabling quiesce via a timer (#166467)
c00w Nov 3, 2025
527b110
Delete deprecated fp32 precision warnings (#166956)
zou3519 Nov 4, 2025
53f75cd
Fixed some syntax errors in SECURITY.md file. (#166718)
wenlinchong17-web Nov 4, 2025
496277a
[ROCm][CI] Lower runner check gpu count for distributed jobs (#166961)
amdfaa Nov 4, 2025
1d3f5e1
[cuDNN] Smoke-test runtime cuDNN version matches compile time version…
eqy Nov 4, 2025
a5f3035
More pyrefly local errors (#166976)
oulgen Nov 4, 2025
52ea135
[BE] Delete Python-3.9 stdlib definitions from torch.package (#166768)
malfet Nov 3, 2025
cef98ae
[aotd] Compiled saved tensor hooks context (#166887)
IvanKobzarev Nov 4, 2025
d77c24c
Revert "[Inductor][Grouped Gemm] Add Blackwell CuTeDSL Kernel (#165036)"
pytorchmergebot Nov 4, 2025
397d9fe
[inductor] coordesc not tune XBLOCK for mix-order-reduction (#166669)
shunting314 Nov 4, 2025
3283eaa
Upload test stats for trunk/sha tag (#166916)
izaitsevfb Nov 4, 2025
b4e4ee8
Update triton to 3.5.1 release (#166968)
atalman Nov 4, 2025
2bba373
[inductor] runtime estimations disable use_nccl_estimator by default …
IvanKobzarev Nov 4, 2025
871d0cd
If USE_CUDA=1 is set, do not fallback to no CUDA (#166982)
oulgen Nov 4, 2025
4e1bd16
inductor: Switch quiesce to use timer based implementation. (#166581)
c00w Nov 4, 2025
2673f8b
Fix torch.linalg.eig inductor stride mismatch (#162484)
parsshar-RH Nov 4, 2025
7f0e932
[dynamo] don't use LocalSource for temp variables created by side_ef…
williamwen42 Nov 4, 2025
ed45c5f
Avoid DDE in narrow with unbacked start (#166361)
laithsakka Nov 4, 2025
cdca63d
Fix quoting in pytest_cache.py invocations (#166955)
Flamefire Nov 4, 2025
a64c7d7
[DebugMode] output, tensor id annotations for DebugMode (#165076)
pianpwk Nov 4, 2025
e8052f2
Add model code stack trace to torch.profile (#166677)
yushangdi Nov 4, 2025
e020fb3
[Minor][Inductor] move some combo kernel log from warning to debug (#…
BoyuanFeng Nov 4, 2025
81038fd
Revert "Add model code stack trace to torch.profile (#166677)"
pytorchmergebot Nov 4, 2025
d7e2d0a
make narrow_tensor_symint DDE-free (#166379)
laithsakka Nov 1, 2025
c1e91bd
[export] Codemod unittests to use new graph capture API (#166957)
zhxchen17 Nov 4, 2025
a96728d
Clarify safety of CUDA graph memory pool sharing across graphs that a…
galv Nov 4, 2025
0cd809f
[inductor][AMD] Filter out invalid Triton Configs for MI350X _scaled_…
JChunX Nov 4, 2025
661b639
use_cpp_bmm_template supports more use cases (#165469)
helloguo Nov 4, 2025
4b12c03
Add default `.github/copilot-instructions.md` and item in `.gitignore…
KarhouTam Nov 4, 2025
7eefcfb
[BE][Typing][Dynamo] Type torch/_dynamo/variables/ctx_manager.py (#16…
Lucaskabela Nov 4, 2025
4271ffe
don't produce invalid grid configs (#166974)
ngimel Nov 5, 2025
f2fbc81
[RFC] Add experimental Pallas TorchInductor backend (#166822)
oulgen Nov 4, 2025
39160db
shrink_group implementation to expose ncclCommShrink API (#164518)
brchang24 Nov 5, 2025
45da6e1
[CD] Upload XPU inductor benchmark test reports to s3 (#166954)
chuanqi129 Nov 5, 2025
64ae31c
[HOP][print] Add HOP subclass for printing (#166660)
fxdawnn Nov 4, 2025
bcd159b
Fix the vmap op fallback bug (#166032)
haifeng-jin Nov 5, 2025
01e6e35
Send / recv support in local tensor (#166595)
dzmitry-huba Nov 4, 2025
cd5d810
Annotation should be deepcopied (#167017)
yushangdi Nov 5, 2025
53b03f1
Revert "make narrow_tensor_symint DDE-free (#166379)"
pytorchmergebot Nov 5, 2025
a743f9e
Revert "Avoid DDE in narrow with unbacked start (#166361)"
pytorchmergebot Nov 5, 2025
5863ba1
[12/N] Apply ruff UP035 rule (#166929)
cyyever Nov 5, 2025
56fc999
Fix typos in complex numbers docs (#166671)
klamike Nov 5, 2025
08ef852
[unified v2][apple] Clean up `APPLETVOS` from caffe2 (#166953)
mzlee Nov 5, 2025
066c5c5
Fix typo in gloo_hip library name (#166502)
AngryLoki Nov 5, 2025
14956ea
[ROCm][CI] revert ROCm magma commit hash to last known good (#167044)
jeffdaily Nov 5, 2025
9ffc480
Add min/max support for barebones uint types (#166813)
ezyang Nov 4, 2025
c006961
Add model code stack trace to torch.profile (#166677)
yushangdi Nov 5, 2025
431dfe8
[dynamo] extend `collections.defaultdict` support with `*args`, `**kw…
XuehaiPan Nov 4, 2025
59a6c83
[fx] Add strict argument validation to Interpreter.boxed_run (#166784)
meghendra6 Nov 5, 2025
658c5f8
[Inductor][Grouped Gemm] Add Blackwell CuTeDSL Kernel (#167003)
NikhilAPatel Nov 5, 2025
edd8d35
fixes keyerror when loading parameter with unsaved optimizer state (#…
arkadip-maitra Nov 5, 2025
0b4dd08
[dynamo] Introduce _set_lru_cache (#167038)
xmfan Nov 5, 2025
5c63946
Revert "[Inductor][Grouped Gemm] Add Blackwell CuTeDSL Kernel (#167003)"
pytorchmergebot Nov 5, 2025
59563df
Refactor out headeronly ArrayRef (#164991)
janeyx99 Nov 4, 2025
7a6ff88
Widen ops support to take in IntHOArrayRef vs only std::vec (#165152)
janeyx99 Nov 4, 2025
d2d13bf
Invert unary read and write for fusion (#161404)
eellison Nov 5, 2025
aba2fa3
Fix clang-21 warnings (#166859)
Sqvid Nov 5, 2025
d4dcd03
[pytree][dynamo] add test to ensure `tree_map` preserves `dict` order…
XuehaiPan Nov 5, 2025
9c2c3db
Revert "Update triton to 3.5.1 release (#166968)"
pytorchmergebot Nov 5, 2025
f93ee16
[CI] Parse xml and upload json while running (#166988)
clee2000 Nov 5, 2025
0c7a4a6
[Inductor] Fix unbacked float symbol handling in kernel codegen (#166…
karthickai Nov 3, 2025
4ff068c
[Code Clean] Replace `assert` with if statement and raise `AssertionE…
KarhouTam Nov 5, 2025
c17aa0f
[ROCm] Enable group gemm through CK (#166334)
jagadish-amd Nov 5, 2025
c86540f
Revert "Add model code stack trace to torch.profile (#166677)"
pytorchmergebot Nov 5, 2025
ad5c7c2
Revert "[cuDNN] Smoke-test runtime cuDNN version matches compile time…
pytorchmergebot Nov 5, 2025
dcc2ba4
Add some code for exploring the space of accessible size/stride confi…
ezyang Nov 5, 2025
89165c0
Update triton to 3.5.1 release (#166968)
atalman Nov 5, 2025
641de23
ci: Add aarch64 docker builds for modern clang (#166416)
seemethere Nov 5, 2025
14b153b
include DTensor metadata when pretty-printing fx.Graphs (#166750)
bdhirsh Nov 5, 2025
6052a01
[BE][Typing][Dynamo] Type torch/_dynamo/variables/dicts.py (#167022)
Lucaskabela Nov 5, 2025
6c5db82
[Inductor] Naive foreach autotune support (#162053)
jataylo Nov 5, 2025
fbd70fb
Update typing docs to reference pyrefly (#166883)
maggiemoss Nov 5, 2025
8e8cbb8
Revert "[Inductor] Fix unbacked float symbol handling in kernel codeg…
pytorchmergebot Nov 5, 2025
6d30666
Revert "[12/N] Apply ruff UP035 rule (#166929)"
pytorchmergebot Nov 5, 2025
a74fe75
Don't hardcode double argument for reduction base (#166951)
ezyang Nov 5, 2025
ea44f12
[13/N] Apply ruff UP035 rule (#167048)
cyyever Nov 5, 2025
ef3f953
Revert "[DebugMode] output, tensor id annotations for DebugMode (#165…
pytorchmergebot Nov 5, 2025
c6c913d
Add torch::stable::Tensor sizes and strides (#165153)
janeyx99 Nov 4, 2025
13d2cc7
Remove python workaround for ContextDecorator (#167049)
cyyever Nov 5, 2025
fd8f368
[user-streams] Add graph annotation checks (#167019)
mlazos Nov 5, 2025
e69aaaf
[user-streams] Add backward test (#167021)
mlazos Nov 5, 2025
e9a688f
[DebugMode] output, tensor id annotations for DebugMode (#165076)
pianpwk Nov 5, 2025
711a775
fix nccl estimations (#167093)
IvanKobzarev Nov 5, 2025
ad7a572
[12/N] Apply ruff UP035 rule (#166929)
cyyever Nov 5, 2025
0820028
[CP][BE][3/N] Add _templated_ring_attention to the backward compatili…
fegin Nov 4, 2025
47eb34b
[ATEN][CUDA] Reduce register pressure in radix_sort_pairs to improve …
YyWangCS Nov 5, 2025
3869aa1
fix fr reset api (#166970)
tushar00jain Nov 5, 2025
af829c0
[ROCm] Skip nvfp4 tests on ROCm (#167066)
jagadish-amd Nov 5, 2025
a344069
Add missing skipIf(not PLATFORM_SUPPORTS_MEM_EFF_ATTENTION) to test/t…
xinyazhang Nov 5, 2025
d29efba
Move almalinux docker image to DEVTOOLSET 13 (#167018)
atalman Nov 6, 2025
6cd57e6
[cuBLAS] Force tensor-core-no-reduction algo in `cuBLASLt` for `n=1` …
eqy Nov 6, 2025
872d1da
Avoid DDE in narrow with unbacked start (#166361)
laithsakka Nov 5, 2025
fd5edda
Reland "Add model code stack trace to torch.profile (#166677)" (#167110)
yushangdi Nov 6, 2025
7432676
[MPS] Fix crash in BCELoss backwards with reduction="none" and inputs…
inventshah Nov 6, 2025
69af749
Bugfix to forward autodiff causing different datatype 2 (#165784)
skpark-rh Nov 6, 2025
3a2d75a
Change template 'Release highlight for proposed Feature'->'New Featur…
atalman Nov 6, 2025
943227f
[c10d] Fix split_group bug by having the parent pg option deep copied…
fduwjj Nov 6, 2025
e1a1aea
[1/N] Use `key in dict` for existence checks (#167035)
cyyever Nov 6, 2025
c08ce30
[ci][cpu] Update compiler to GCC-13 in jammy-aarch64 (#166849)
fadara01 Nov 5, 2025
85fab6c
Fix duplicate benchmarking entries for addmm (#166652)
jainapurva Nov 6, 2025
d31599f
[7/N] Fix unused loop variables in tests (#167043)
cyyever Nov 6, 2025
981dd71
Refactor: extract OperatorArgsKwargsView from parseIValuesToPyArgsKwa…
swolchok Nov 1, 2025
f72772b
[PP] make runtime dbg log print custom actions (#167113)
wconstab Nov 5, 2025
c3c3653
[1/N] Add return types of Python functions (#167162)
cyyever Nov 6, 2025
3feea29
torch.fx: add debug-level logging to Interpreter.run_node (#117351) (…
mdbarnesUCSD Nov 6, 2025
eea9517
[dynamo, 3.14] disable dynamo cpython tests in 3.14 (again) (#167000)
williamwen42 Nov 5, 2025
91337ae
[audio hash update] update the pinned audio hash (#167031)
pytorchupdatebot Nov 6, 2025
f7b7f40
[user-streams] Enable stream ops to work in eager (#167141)
mlazos Nov 6, 2025
46b3f91
[user-streams] Add record/wait ops (#167151)
mlazos Nov 6, 2025
7b423c2
[user-streams] Mark stream ops as side effectful (#167152)
mlazos Nov 6, 2025
8b23650
Expose torch.compiler.config.force_disable_caches as a public API (#1…
gmagogsfm Nov 6, 2025
09d8953
Update `tensorpipe` submodule (#167108)
malfet Nov 5, 2025
9eebda9
make narrow_tensor_symint DDE-free (#166379)
laithsakka Nov 5, 2025
ed4aa44
CustomOp Inline Fusion (#165952)
tianrengao Nov 6, 2025
a51208c
Check cluster_dims attribute exists before access (#167187)
yf225 Nov 6, 2025
c724f00
[2/N] Use `key in dict` for existence checks (#167174)
cyyever Nov 6, 2025
80ec2ab
[8/N] Fix unused loop variables in tests (#166921)
cyyever Nov 6, 2025
b2d72a4
Revert "Don't hardcode double argument for reduction base (#166951)"
pytorchmergebot Nov 6, 2025
2005b5f
[inductor] Use runtime estimations in iterative reorder collectives p…
IvanKobzarev Nov 5, 2025
da2eb31
[MTIA][PyTorch] Add mtia as native device for PyTorch tests (#167089)
jvandebon Nov 6, 2025
7b055a0
Add per_process_memory_fraction to PYTORCH_CUDA_ALLOC_CONF (#161035)
lakshayg Nov 6, 2025
cc477f6
[inductor] Use runtime estimations in iterative sink waits pass (#167…
IvanKobzarev Nov 5, 2025
3fdc5db
Make CUDA preload logic more straightforward (#167046)
malfet Nov 6, 2025
bfc0ba4
`nn.Linear`: nD contiguous input + bias -- dispatch to addmm also whe…
nikitaved Nov 6, 2025
d81ea9c
Merge remote-tracking branch 'upstream/main' into develop_IFU_20251106
github-actions[bot] Nov 6, 2025
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
The table of contents is too big for display.
Diff view
Diff view
  •  
  •  
  •  
25 changes: 21 additions & 4 deletions .ci/docker/almalinux/Dockerfile
Original file line number Diff line number Diff line change
Expand Up @@ -7,13 +7,13 @@ ENV LC_ALL en_US.UTF-8
ENV LANG en_US.UTF-8
ENV LANGUAGE en_US.UTF-8

ARG DEVTOOLSET_VERSION=11
ARG DEVTOOLSET_VERSION=13

RUN yum -y update
RUN yum -y install epel-release
# install glibc-langpack-en make sure en_US.UTF-8 locale is available
RUN yum -y install glibc-langpack-en
RUN yum install -y sudo wget curl perl util-linux xz bzip2 git patch which perl zlib-devel openssl-devel yum-utils autoconf automake make gcc-toolset-${DEVTOOLSET_VERSION}-toolchain
RUN yum install -y sudo wget curl perl util-linux xz bzip2 git patch which perl zlib-devel openssl-devel yum-utils autoconf automake make gcc-toolset-${DEVTOOLSET_VERSION}-gcc gcc-toolset-${DEVTOOLSET_VERSION}-gcc-c++ gcc-toolset-${DEVTOOLSET_VERSION}-gcc-gfortran gcc-toolset-${DEVTOOLSET_VERSION}-gdb
# Just add everything as a safe.directory for git since these will be used in multiple places with git
RUN git config --global --add safe.directory '*'
ENV PATH=/opt/rh/gcc-toolset-${DEVTOOLSET_VERSION}/root/usr/bin:$PATH
Expand Down Expand Up @@ -41,6 +41,7 @@ RUN bash ./install_conda.sh && rm install_conda.sh
# Install CUDA
FROM base as cuda
ARG CUDA_VERSION=12.6
ARG DEVTOOLSET_VERSION=13
RUN rm -rf /usr/local/cuda-*
ADD ./common/install_cuda.sh install_cuda.sh
COPY ./common/install_nccl.sh install_nccl.sh
Expand All @@ -50,7 +51,8 @@ ENV CUDA_HOME=/usr/local/cuda-${CUDA_VERSION}
# Preserve CUDA_VERSION for the builds
ENV CUDA_VERSION=${CUDA_VERSION}
# Make things in our path by default
ENV PATH=/usr/local/cuda-${CUDA_VERSION}/bin:$PATH
ENV PATH=/usr/local/cuda-${CUDA_VERSION}/bin:/opt/rh/gcc-toolset-${DEVTOOLSET_VERSION}/root/usr/bin:$PATH


FROM cuda as cuda12.6
RUN bash ./install_cuda.sh 12.6
Expand All @@ -68,8 +70,22 @@ FROM cuda as cuda13.0
RUN bash ./install_cuda.sh 13.0
ENV DESIRED_CUDA=13.0

FROM ${ROCM_IMAGE} as rocm
FROM ${ROCM_IMAGE} as rocm_base
ARG DEVTOOLSET_VERSION=13
ENV LC_ALL en_US.UTF-8
ENV LANG en_US.UTF-8
ENV LANGUAGE en_US.UTF-8
# Install devtoolset on ROCm base image
RUN yum -y update && \
yum -y install epel-release && \
yum -y install glibc-langpack-en && \
yum install -y sudo wget curl perl util-linux xz bzip2 git patch which perl zlib-devel openssl-devel yum-utils autoconf automake make gcc-toolset-${DEVTOOLSET_VERSION}-gcc gcc-toolset-${DEVTOOLSET_VERSION}-gcc-c++ gcc-toolset-${DEVTOOLSET_VERSION}-gcc-gfortran gcc-toolset-${DEVTOOLSET_VERSION}-gdb
RUN git config --global --add safe.directory '*'
ENV PATH=/opt/rh/gcc-toolset-${DEVTOOLSET_VERSION}/root/usr/bin:$PATH

FROM rocm_base as rocm
ARG PYTORCH_ROCM_ARCH
ARG DEVTOOLSET_VERSION=13
ENV PYTORCH_ROCM_ARCH ${PYTORCH_ROCM_ARCH}
ADD ./common/install_mkl.sh install_mkl.sh
RUN bash ./install_mkl.sh && rm install_mkl.sh
Expand All @@ -88,6 +104,7 @@ COPY --from=cuda13.0 /usr/local/cuda-13.0 /usr/local/cuda-13.0

# Final step
FROM ${BASE_TARGET} as final
ARG DEVTOOLSET_VERSION=13
COPY --from=openssl /opt/openssl /opt/openssl
COPY --from=patchelf /patchelf /usr/local/bin/patchelf
COPY --from=conda /opt/conda /opt/conda
Expand Down
2 changes: 1 addition & 1 deletion .ci/docker/almalinux/build.sh
Original file line number Diff line number Diff line change
Expand Up @@ -63,7 +63,7 @@ docker build \
--target final \
--progress plain \
--build-arg "BASE_TARGET=${BASE_TARGET}" \
--build-arg "DEVTOOLSET_VERSION=11" \
--build-arg "DEVTOOLSET_VERSION=13" \
${EXTRA_BUILD_ARGS} \
-t ${tmp_tag} \
$@ \
Expand Down
18 changes: 14 additions & 4 deletions .ci/docker/build.sh
Original file line number Diff line number Diff line change
Expand Up @@ -261,19 +261,29 @@ case "$tag" in
PYTHON_VERSION=3.10
CUDA_VERSION=12.8.1
;;
pytorch-linux-jammy-aarch64-py3.10-gcc11)
pytorch-linux-jammy-aarch64-py3.10-gcc13)
ANACONDA_PYTHON_VERSION=3.10
GCC_VERSION=11
GCC_VERSION=13
ACL=yes
VISION=yes
OPENBLAS=yes
# snadampal: skipping llvm src build install because the current version
# from pytorch/llvm:9.0.1 is x86 specific
SKIP_LLVM_SRC_BUILD_INSTALL=yes
;;
pytorch-linux-jammy-aarch64-py3.10-gcc11-inductor-benchmarks)
pytorch-linux-jammy-aarch64-py3.10-clang21)
ANACONDA_PYTHON_VERSION=3.10
GCC_VERSION=11
CLANG_VERSION=21
ACL=yes
VISION=yes
OPENBLAS=yes
# snadampal: skipping llvm src build install because the current version
# from pytorch/llvm:9.0.1 is x86 specific
SKIP_LLVM_SRC_BUILD_INSTALL=yes
;;
pytorch-linux-jammy-aarch64-py3.10-gcc13-inductor-benchmarks)
ANACONDA_PYTHON_VERSION=3.10
GCC_VERSION=13
ACL=yes
VISION=yes
OPENBLAS=yes
Expand Down
4 changes: 4 additions & 0 deletions .ci/docker/ci_commit_pins/triton.txt
Original file line number Diff line number Diff line change
@@ -1 +1,5 @@
<<<<<<< HEAD
ac80c4190aa0321f761a08af97e1e1eee41f01d9
=======
bfeb066872bc1e8b2d2bc0a3b295b99dd77206e7
>>>>>>> upstream/main
4 changes: 2 additions & 2 deletions .ci/docker/common/install_clang.sh
Original file line number Diff line number Diff line change
Expand Up @@ -8,8 +8,8 @@ if [ -n "$CLANG_VERSION" ]; then
# work around ubuntu apt-get conflicts
sudo apt-get -y -f install
wget --no-check-certificate -O - https://apt.llvm.org/llvm-snapshot.gpg.key | sudo apt-key add -
if [[ $CLANG_VERSION == 18 ]]; then
apt-add-repository "deb http://apt.llvm.org/jammy/ llvm-toolchain-jammy-18 main"
if [[ $CLANG_VERSION -ge 18 ]]; then
apt-add-repository "deb http://apt.llvm.org/jammy/ llvm-toolchain-jammy-${CLANG_VERSION} main"
fi
fi

Expand Down
4 changes: 2 additions & 2 deletions .ci/docker/common/install_gcc.sh
Original file line number Diff line number Diff line change
Expand Up @@ -7,11 +7,11 @@ if [ -n "$GCC_VERSION" ]; then
# Need the official toolchain repo to get alternate packages
add-apt-repository ppa:ubuntu-toolchain-r/test
apt-get update
apt-get install -y g++-$GCC_VERSION
apt-get install -y g++-$GCC_VERSION gfortran-$GCC_VERSION
update-alternatives --install /usr/bin/gcc gcc /usr/bin/gcc-"$GCC_VERSION" 50
update-alternatives --install /usr/bin/g++ g++ /usr/bin/g++-"$GCC_VERSION" 50
update-alternatives --install /usr/bin/gcov gcov /usr/bin/gcov-"$GCC_VERSION" 50

update-alternatives --install /usr/bin/gfortran gfortran /usr/bin/gfortran-"$GCC_VERSION" 50

# Cleanup package manager
apt-get autoclean && apt-get clean
Expand Down
1 change: 1 addition & 0 deletions .ci/docker/common/install_openblas.sh
Original file line number Diff line number Diff line change
Expand Up @@ -10,6 +10,7 @@ git clone https://github.com/OpenMathLib/OpenBLAS.git -b "${OPENBLAS_VERSION}" -

OPENBLAS_CHECKOUT_DIR="OpenBLAS"
OPENBLAS_BUILD_FLAGS="
CC=gcc
NUM_THREADS=128
USE_OPENMP=1
NO_SHARED=0
Expand Down
2 changes: 1 addition & 1 deletion .ci/docker/triton_version.txt
Original file line number Diff line number Diff line change
@@ -1 +1 @@
3.5.0
3.5.1
6 changes: 3 additions & 3 deletions .ci/magma-rocm/build_magma.sh
Original file line number Diff line number Diff line change
Expand Up @@ -6,8 +6,8 @@ set -eou pipefail
# The script expects DESIRED_CUDA and PACKAGE_NAME to be set
ROOT_DIR="$(cd "$(dirname "${BASH_SOURCE[0]}")/.." && pwd)"

# post merge of https://github.com/icl-utk-edu/magma/pull/65
MAGMA_VERSION=c0792ae825fb36872784892ea643dd6f3456bc5f
# https://github.com/icl-utk-edu/magma/pull/65
MAGMA_VERSION=d6e4117bc88e73f06d26c6c2e14f064e8fc3d1ec

# Folders for the build
PACKAGE_FILES=${ROOT_DIR}/magma-rocm/package_files # metadata
Expand All @@ -20,7 +20,7 @@ mkdir -p ${PACKAGE_DIR} ${PACKAGE_OUTPUT}/linux-64 ${PACKAGE_BUILD} ${PACKAGE_RE

# Fetch magma sources and verify checksum
pushd ${PACKAGE_DIR}
git clone https://github.com/icl-utk-edu/magma
git clone https://github.com/jeffdaily/magma
pushd magma
git checkout ${MAGMA_VERSION}
popd
Expand Down
2 changes: 1 addition & 1 deletion .ci/pytorch/test.sh
Original file line number Diff line number Diff line change
Expand Up @@ -337,7 +337,7 @@ test_python() {

test_python_smoke() {
# Smoke tests for H100/B200
time python test/run_test.py --include test_matmul_cuda test_scaled_matmul_cuda inductor/test_fp8 inductor/test_max_autotune inductor/test_cutedsl_grouped_mm $PYTHON_TEST_EXTRA_OPTION --upload-artifacts-while-running
time python test/run_test.py --include test_matmul_cuda test_scaled_matmul_cuda inductor/test_fp8 inductor/test_max_autotune $PYTHON_TEST_EXTRA_OPTION --upload-artifacts-while-running
assert_git_not_dirty
}

Expand Down
4 changes: 2 additions & 2 deletions .github/ISSUE_TEMPLATE/release-feature-request.yml
Original file line number Diff line number Diff line change
@@ -1,11 +1,11 @@
name: 🚀 Release highlight for proposed Feature
name: 🚀 New Feature for Release
description: Submit a Release highlight for proposed Feature
labels: ["release-feature-request"]

body:
- type: textarea
attributes:
label: Release highlight for proposed Feature
label: New Feature for Release
description: >
Example: “A torch.special module, analogous to SciPy's special module.”
- type: input
Expand Down
12 changes: 6 additions & 6 deletions .github/actions/pytest-cache-download/action.yml
Original file line number Diff line number Diff line change
Expand Up @@ -38,9 +38,9 @@ runs:
run: |
python3 .github/scripts/pytest_cache.py \
--download \
--cache_dir $GITHUB_WORKSPACE/$CACHE_DIR \
--pr_identifier $GITHUB_REF \
--job_identifier $JOB_IDENTIFIER \
--temp_dir $RUNNER_TEMP \
--repo $REPO \
--bucket $BUCKET \
--cache_dir "$GITHUB_WORKSPACE/$CACHE_DIR" \
--pr_identifier "$GITHUB_REF" \
--job_identifier "$JOB_IDENTIFIER" \
--temp_dir "$RUNNER_TEMP" \
--repo "$REPO" \
--bucket "$BUCKET" \
16 changes: 8 additions & 8 deletions .github/actions/pytest-cache-upload/action.yml
Original file line number Diff line number Diff line change
Expand Up @@ -47,11 +47,11 @@ runs:
run: |
python3 .github/scripts/pytest_cache.py \
--upload \
--cache_dir $GITHUB_WORKSPACE/$CACHE_DIR \
--pr_identifier $GITHUB_REF \
--job_identifier $JOB_IDENTIFIER \
--sha $SHA \
--test_config $TEST_CONFIG \
--shard $SHARD \
--repo $REPO \
--temp_dir $RUNNER_TEMP \
--cache_dir "$GITHUB_WORKSPACE/$CACHE_DIR" \
--pr_identifier "$GITHUB_REF" \
--job_identifier "$JOB_IDENTIFIER" \
--sha "$SHA" \
--test_config "$TEST_CONFIG" \
--shard "$SHARD" \
--repo "$REPO" \
--temp_dir "$RUNNER_TEMP" \
2 changes: 1 addition & 1 deletion .github/ci_commit_pins/audio.txt
Original file line number Diff line number Diff line change
@@ -1 +1 @@
3b0e7a6f192ca2715e7e6cbe5db007aea7165fe2
ad5816f0eee1c873df1b7d371c69f1f811a89387
125 changes: 125 additions & 0 deletions .github/copilot-instructions.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,125 @@
# PyTorch Copilot Instructions

This is the PyTorch machine learning framework codebase. These instructions help AI agents navigate and contribute effectively.

## Architecture Overview

### Core Components

- **c10/** - Core library (C++-10 compatible) for essential, binary-size-conscious functionality
- **aten/** - ATen tensor library (C++), PyTorch's foundation without autograd
- `aten/src/ATen/native/` - Modern operator implementations (CPU/CUDA/MPS/sparse)
- `aten/src/ATen/native/native_functions.yaml` - **Critical**: Declarative operator registry
- **torch/** - Python bindings and public API
- `torch/csrc/` - C++ Python bindings (hand-written and generated)
- `torch/csrc/autograd/` - Reverse-mode automatic differentiation
- `torch/csrc/jit/` - TorchScript JIT compiler
- **torchgen/** - Code generation tooling that reads `native_functions.yaml`
- **tools/** - Build scripts, autograd derivatives, code generation

### The Code Generation Workflow

**Most operator changes require editing `native_functions.yaml`**, not direct C++ files. This YAML file:
1. Declares operator signatures, variants (function/method), and dispatch behavior
2. Gets processed by `torchgen/` to generate C++/Python bindings
3. Produces headers in `build/aten/src/ATen/` during compilation

Example entry structure:
```yaml
- func: my_op(Tensor self, Scalar alpha=1) -> Tensor
variants: function, method
dispatch:
CPU: my_op_cpu
CUDA: my_op_cuda
```

After editing `native_functions.yaml`, implement kernels in `aten/src/ATen/native/` (see `aten/src/ATen/native/README.md`).

## Development Workflows

### Building from Source

**Never run `setup.py` directly** - use pip with editable install:
```bash
python -m pip install --no-build-isolation -v -e .
```

Speed up builds:
- `DEBUG=1` - Debug symbols with `-g -O0`
- `USE_CUDA=0` - Skip CUDA compilation
- `BUILD_TEST=0` - Skip C++ test binaries
- Install `ninja` (`pip install ninja`) for faster builds
- Use `ccache` for incremental compilation caching

Rebuild specific targets: `(cd build && ninja <target>)`

### Testing

**Critical**: DO NOT run entire test suites. Run specific tests only:
```bash
python test/test_torch.py TestTorch.test_specific_case
```

**Test structure**: All tests use `torch.testing._internal.common_utils`:
```python
from torch.testing._internal.common_utils import run_tests, TestCase

class TestFeature(TestCase):
def test_something(self):
# Use self.assertEqual for tensor comparisons
pass

if __name__ == "__main__":
run_tests()
```

**For bug fixes**: Create a standalone reproduction script first, verify it fails, then fix and add to appropriate test file.

### Linting

Run linter (not pre-commit): `lintrunner -a` (auto-applies fixes)

## Project-Specific Conventions

### Memory and Storage
- **Storage is never nullptr** (but `StorageImpl.data` may be nullptr for unallocated outputs)
- CUDA device info lives in storage objects

### Python-C++ Integration (`torch/csrc/`)
- Always include `Python.h` **first** to avoid `_XOPEN_SOURCE` redefinition errors
- Use `pybind11::gil_scoped_acquire` before calling Python API or using `THPObjectPtr`
- Wrap entry points with `HANDLE_TH_ERRORS` / `END_HANDLE_TH_ERRORS` for exception conversion

### Dispatch System
- PyTorch uses operator dispatch to route calls to backend-specific kernels
- Prefer `CompositeExplicitAutograd` dispatch when writing device-agnostic compound ops
- See `aten/src/ATen/native/README.md` for dispatch keyword guidance

## Git Workflow (AI Agent Specific)

When preparing PRs from this environment:
```bash
git stash -u
git reset --hard $(cat /tmp/orig_work.txt) # Reset to LOCAL branch
git stash pop
# Resolve conflicts if necessary
```

## Common Gotchas

1. **Editing generated files** - If it's in `build/`, don't edit it. Edit the source template or `native_functions.yaml`
2. **NVCC template compilation** - NVCC is stricter about C++ than gcc/clang; code working on Linux may fail Windows CI
3. **Windows symbol visibility** - Use `TORCH_API` macros for exported symbols (required on Windows, optional on Linux)
4. **No internet access** - DO NOT attempt to install dependencies during development

## Key Files Reference

- `AGENTS.md` - Instructions specific to AI coding agents
- `CONTRIBUTING.md` - Comprehensive human contributor guide
- `GLOSSARY.md` - Terminology (ATen, kernels, operations, JIT, TorchScript)
- `aten/src/ATen/native/README.md` - Operator implementation guide
- `tools/autograd/derivatives.yaml` - Gradient definitions for autograd

## Performance Debugging

Use `TORCH_SHOW_CPP_STACKTRACES=1` for C++ traces in Python errors. For profiling, prefer `py-spy` over manual instrumentation.
4 changes: 2 additions & 2 deletions .github/workflows/_rocm-test.yml
Original file line number Diff line number Diff line change
Expand Up @@ -97,8 +97,8 @@ jobs:
shell: bash
run: |
ngpu=$(rocminfo | grep -c -E 'Name:.*\sgfx')
if [[ $ngpu -lt 4 ]]; then
echo "Error: only $ngpu GPU(s) detected, at least 4 GPUs are needed for distributed jobs"
if [[ $ngpu -lt 2 ]]; then #We are temporarily reducing this down to 2 from 4 so that we can run tests on nodes with less gpus.
echo "Error: only $ngpu GPU(s) detected, at least 2 GPUs are needed for distributed jobs"
exit 1
fi

Expand Down
16 changes: 16 additions & 0 deletions .github/workflows/_xpu-test.yml
Original file line number Diff line number Diff line change
Expand Up @@ -344,5 +344,21 @@ jobs:
if-no-files-found: ignore
path: ./**/core.[1-9]*

- name: Authenticate with AWS
uses: aws-actions/configure-aws-credentials@ececac1a45f3b08a01d2dd070d28d111c5fe6722 # v4.1.0
with:
role-to-assume: arn:aws:iam::308535385114:role/gha_workflow_upload-benchmark-results
# The max duration enforced by the server side
role-duration-seconds: 18000
aws-region: us-east-1

- name: Upload the benchmark results
uses: pytorch/test-infra/.github/actions/upload-benchmark-results@main
with:
benchmark-results-dir: test/test-reports
dry-run: false
schema-version: v3
github-token: ${{ secrets.GITHUB_TOKEN }}

- name: Teardown XPU
uses: ./.github/actions/teardown-xpu
Loading