What's Changed
- Install libtpu directly instead of torch_xla[tpu] by @bhavya01 in #9423
- Optimize w8a8 quantized matmul kernel by @vanbasten23 in #9412
- Update CI docker images to use 3.12 by @bhavya01 in #9427
- Add Build Trigger for 2.8-rc1 release by @pgmoka in #9425
- Error Handling: refactor
XlaCoordinatorto use status types. by @ysiraichi in #9386 - Add JAX dependency for Python 3.12 by @tengyifei in #9438
- Update TPU CI container image by @bhavya01 in #9434
- Change update_deps script so that latest stable version can be pulled instead of latest nightly by @bfolie in #9424
- Fix CPU tests for python 3.12 by @bhavya01 in #9443
- Partially disable tpu-info CLI tests by @bhavya01 in #9463
- Change nightly_package_version to 2.9 by @pgmoka in #9461
- Make assume_pure able to work with functions that depends on random by @qihqi in #9460
- Support editable install with setuptools>=80.0.0 by @tengyifei in #9428
- Unify the return type of w8a8 matmul between fallback and the actual impl. by @vanbasten23 in #9452
- Remove the clamp op when we do symmetric quantization on a tensor by @vanbasten23 in #9465
- By default, to("jax") should go to TPU by @zzzwen in #9468
- Error Handling: refactor the PjRt registry to use status QOL functions. by @ysiraichi in #9419
- Suppress C++ stacktrace on
XLA_CHECK*()calls. by @ysiraichi in #9448 - Error Handling: replace
XLA_CHECK_OK()with status functions. by @ysiraichi in #9457 - Fix status source code location logic. by @ysiraichi in #9440
- Update cuda version check by @pgmoka in #9469
- Remove duplicate artifact creation for 2.8.0-rc1 by @pgmoka in #9471
- Calculate vmem limit dynamically in the quantized matmul kernel. by @vanbasten23 in #9470
- Update torch compat version to 2.7.1 by @qihqi in #9455
- Add dtensor placement test by @jeffhataws in #9458
- [Kernel] Update ragged attention block table by @yaochengji in #9476
- Error Handling: replace
ConsumeValuewithGetValueOrThrow. by @ysiraichi in #9464 - implement collective reduce op by @bfolie in #9437
- Add dtensor mesh conversion test by @aws-cph in #9474
- Update TPU CI with latest docker container by @bhavya01 in #9483
- Torchax: Allow reuse of the jittable procedure with different functio… by @zmelumian972 in #9374
- Optimize w8a8 pallas kernel by @kyuyeunk in #9473
- Error Handling: refactor the existing
ComputationClientimplementations to use status QOL functions. by @ysiraichi in #9420 - Bump version by @qihqi in #9480
- Fix duplicate labels and other docs build warnings by @melissawm in #9446
- Update Setuptools version in build dependencies. by @bhavya01 in #9487
- Fix spmd sharding visualization when device index is >= 10 by @jeffhataws in #9475
- Add r2.9 release nightly tests to README by @pgmoka in #9478
- Improve error message for shape promotion on lowering. by @ysiraichi in #9486
- Rename
XLA_SHOW_CPP_ERROR_CONTEXTtoTORCH_SHOW_CPP_STACKTRACESby @ysiraichi in #9482 - implement collective all_to_all op by @bfolie in #9442
- Implement collective gather op by @bfolie in #9435
- Add support for callable in torchax.interop.JittableModule.functional_call in the first parameter by @zmelumian972 in #9451
- Update README.md to reflect supported python versions by @bhavya01 in #9484
- Remove support for one-process-per-device style of distributed. by @qihqi in #9490
- Allow mixed tensor type math if one of them is a scalar by @qihqi in #9453
- Fix nested stableHLO composite regions by @Carlomus in #9385
- Misc fixes: by @qihqi in #9491
- Fix python 3.11 cuda wheel link in the readme by @vfdev-5 in #9493
- [Bugfix] fix ragged attention kernel auto-tuning table key by @yaochengji in #9497
- Error Handling: refactor
ComputationClient::TransferFromDeviceto propagate status. by @ysiraichi in #9429 - Implement XLAShardedTensor._spec and test by @aws-cph in #9488
- Clean up quantized matmul condition code by @kyuyeunk in #9506
- Move mutable properties of env to thread local, misc changes by @qihqi in #9501
- Optimize w8a8 kernel vmem limit by @kyuyeunk in #9508
- Error Handling: return status value when loading PjRt dynamic plugin. by @ysiraichi in #9495
- Add block sizes for Qwen/Qwen2.5-32B-Instruct by @vanbasten23 in #9516
- Error Handling: propagate status for
ReleaseGilAndTransferDataandXlaDataToTensors. by @ysiraichi in #9431 - Error Handling: refactor
ExecuteComputationandExecuteReplicatedto propagate status. by @ysiraichi in #9445 - Error Handling: refactor
GetXlaTensorand related functions to use status types. by @ysiraichi in #9510 - Dump C++ and Status propagation stacktraces. by @ysiraichi in #9492
- Add w8a8 kernel blocks for Qwen 2.5 7B by @kyuyeunk in #9517
- Deduplicate
GetXlaTensors()function. by @ysiraichi in #9518 - [XLA] Add placements property to XLAShardedTensor for DTensor compatibility by @Hoomaaan in #9509
- Update artifacts_builds.tf for 2.8.0-rc2 by @bhavya01 in #9522
- Update artifacts_builds.tf for 2.8.0-rc3 wheel by @bhavya01 in #9527
- make jax as an optional dependency by @qihqi in #9521
- Reorganize PyTorch/XLA Overview page by @melissawm in #9498
- Support torch.nn.functional.one_hot by @vanbasten23 in #9523
- Introduce PlatformVersion bindings by @rpsilva-aws in #9513
- Update artifacts_builds.tf for 2.8.0-rc4 by @bhavya01 in #9532
- Fix pip install torch_xla[pallas] by @bhavya01 in #9531
- Remove cuda builds for release wheels by @bhavya01 in #9533
- Optimize KV cache dequantization performance by @kyuyeunk in #9528
- Add gemini edited docstring by @qihqi in #9534
- Revert 2 accidental commits that I made. by @qihqi in #9536
- Implement XLAShardedTensor.redistribute and test by @aws-cph in #9529
- Do not set
PJRT_DEVICE=CUDAautomatically on import. by @ysiraichi in #9540 - Add triggers for release 2.8.0 by @bhavya01 in #9545
- Update torchbench pin location. by @ysiraichi in #9543
- Improve error message of functions related to
GetXlaTensor(). by @ysiraichi in #9520 - Refactor the status error message builder. by @ysiraichi in #9546
- Use
TORCH_CHECK()instead of throwingstd::runtime_errorinXLA_CHECK*()macros' implementation. by @ysiraichi in #9542 - Error Handling: make
XLATensor::Create()return status type. by @ysiraichi in #9544 cat: improve error handling and error messages. by @ysiraichi in #9548div: improve error handling and error messages. by @ysiraichi in #9549- Bug fixes by @qihqi in #9554
- Run torchprime CI only when the pull requests have torchprimeci label by @bhavya01 in #9551
- [Documentation] Fixed typo in C++ debugging docs by @hinriksnaer in #9559
- Update README.md to mention 2.8 release by @bhavya01 in #9560
flip: improve error handling and error messages. by @ysiraichi in #9550- Generalize crash message for non-ok status. by @ysiraichi in #9552
- Rename
MaybeThrowtoOkOrThrow. by @ysiraichi in #9561 - Add xla random generator. by @iwknow in #9539
- [EZ] Replace
pytorch-labswithmeta-pytorchby @ZainRizvi in #9556 - Added missing "#"s for the comments in triton.md by @SriRangaTarun in #9571
- Remove tests that are defined outside of this repo. by @qihqi in #9577
- Update XLA pin then fix up to make it compile by @qihqi in #9565
- Create mapping for FP8 torch dtypes by @kyuyeunk in #9573
- refactor: DTensor inheritance for XLAShardedTensor by @aws-cph in #9576
full: improve error handling and error messages. by @ysiraichi in #9564gather: improve error handling and error messages. by @ysiraichi in #9566random_: improve error handling and error messages. by @ysiraichi in #9567- Remove
XLA_CUDAand other CUDA build flags. by @ysiraichi in #9582 - Remove OpenXLA CUDA fallback and
_XLAC_cuda_functions.soextension. by @ysiraichi in #9581 - Fix case when both device & dtype are given in .to by @qihqi in #9583
- implement send and recv using collective_permute by @bfolie in #9373
- Set environment variables for tpu7x by @bhavya01 in #9586
- Create new macros for throwing status errors. by @ysiraichi in #9588
test: Use new macros for throwing exceptions. by @ysiraichi in #9590runtime: Use new macros for throwing exceptions. by @ysiraichi in #9591ops: Use new macros for throwing exceptions. by @ysiraichi in #9592init_python_bindings.cpp: Use new macros for throwing exceptions. by @ysiraichi in #9595aten_xla_type.cpp: Use new macros for throwing exceptions. by @ysiraichi in #9596- Remove CUDA plugin. by @ysiraichi in #9597
- Remove triton. by @ysiraichi in #9601
torch_xla: Use new macros for throwing exceptions (part 1). by @ysiraichi in #9593torch_xla: Use new macros for throwing exceptions (part 2). by @ysiraichi in #9594- Remove CUDA specific logic from runtime. by @ysiraichi in #9598
- Remove
gpu_custom_calllogic. by @ysiraichi in #9600 - Remove functions that throw status error. by @ysiraichi in #9602
- Remove CUDA logic from C++ files in
torch_xla/csrcdirectory. by @ysiraichi in #9603 - Remove CUDA specific path from internal Python packages. by @ysiraichi in #9606
- Move
_jax_forwardand_jax_backwardinsidej2t_autogradto avoid cache collisions by @jialei777 in #9585 - Remove remaining GPU/CUDA mentions in
torch_xladirectory. by @ysiraichi in #9608 - Update version to 0.0.6 by @qihqi in #9611
- Remove CUDA from PyTorch/XLA build. by @ysiraichi in #9609
- Remove CUDA from
benchmarksdirectory. by @ysiraichi in #9610 - Remove CUDA tests from distributed tests. by @ysiraichi in #9612
- Make torch_xla package PEP 561 compliant by @wirthual in #9515
- Remove other CUDA usage from PyTorch/XLA repository. by @ysiraichi in #9618
- Remove CUDA from remaining tests. by @ysiraichi in #9613
- Miscelanous cleanup by @qihqi in #9619
- Replace
GetComputationClientOrDie()withGetComputationClient()(part 1). by @ysiraichi in #9617 mm: improve error handling and error messages. by @ysiraichi in #9621- Replace
GetComputationClientOrDie()withGetComputationClient()(part 2). by @ysiraichi in #9620 - Upgrade build infra to use debian-12 and gcc-11 by @bhavya01 in #9631
- Remove libopenblas-dev from ansible dependencies by @bhavya01 in #9632
- support load and save checkpoint in torchax by @junjieqian in #9616
- Set
allow_broken_conditionalsconfiguration variable atansible.cfg. by @ysiraichi in #9634 - Move torch ops error message tests into a new file. by @ysiraichi in #9622
- Fix
test_ops_error_message.pyand run it on CI. by @ysiraichi in #9640 - Do not warn on jax usage when workarounds are available by @bhavya01 in #9624
roll: improve error handling and error messages. by @ysiraichi in #9628stack: improve error handling and error messages. by @ysiraichi in #9629expand: improve error handling and error messages. by @ysiraichi in #9645- update gcc by @qihqi in #9650
- Add default args for _aten_conv2d by @hsjts0u in #9623
- Pin
flaxand skip C++ testSiLUBackward. by @ysiraichi in #9660 trace: improve error handling and error messages. by @ysiraichi in #9630- Fix Terraform usage of
cuda_version. by @ysiraichi in #9655 - Create PyTorch commit pin. by @ysiraichi in #9654
- Accept conda channels' ToS when building the upstream docker image. by @ysiraichi in #9661
- Revert "Fix Terraform usage of
cuda_version. (#9655)" by @ysiraichi in #9664 - Bump Python version of
ci-tpu-test-triggerto 3.12. by @ysiraichi in #9665 - fix(xla): convert group-local to global ranks in broadcast by @Hoomaaan in #9657
- Accept conda channels' ToS with environment variable. by @ysiraichi in #9666
- mul: remove opmath cast sequence by @sshonTT in #9663
- Update PyTorch and XLA pin. by @ysiraichi in #9668
- Support manylinux build for r2.9. by @bhavya01 in #9674
- Revert XLA backward incompatible breaking change - #9668 - Fixes Issue #9685 by @rajkthakur in #9686
- Work-around for protobuf init crash in sentencepiece by @jeffhataws in #9693
- Revert "Work-around for protobuf init crash in sentencepiece" by @jeffhataws in #9697
- [cherry-pick] Make API visibility hidden by default. by @ysiraichi in #9705
- Add
torchaxmaximum version. by @ysiraichi in #9706 - Revert "mul: remove opmath cast sequence (#9663)" by @rajkthakur in #9701
New Contributors
- @aws-cph made their first contribution in #9474
- @kyuyeunk made their first contribution in #9473
- @melissawm made their first contribution in #9446
- @Carlomus made their first contribution in #9385
- @Hoomaaan made their first contribution in #9509
- @hinriksnaer made their first contribution in #9559
- @ZainRizvi made their first contribution in #9556
- @jialei777 made their first contribution in #9585
- @wirthual made their first contribution in #9515
- @junjieqian made their first contribution in #9616
- @hsjts0u made their first contribution in #9623
Full Changelog: v2.8.1...v2.9.0