Release PyTorch/XLA 2.9.0 release · pytorch/xla

What's Changed

Install libtpu directly instead of torch_xla[tpu] by @bhavya01 in #9423
Optimize w8a8 quantized matmul kernel by @vanbasten23 in #9412
Update CI docker images to use 3.12 by @bhavya01 in #9427
Add Build Trigger for 2.8-rc1 release by @pgmoka in #9425
Error Handling: refactor XlaCoordinator to use status types. by @ysiraichi in #9386
Add JAX dependency for Python 3.12 by @tengyifei in #9438
Update TPU CI container image by @bhavya01 in #9434
Change update_deps script so that latest stable version can be pulled instead of latest nightly by @bfolie in #9424
Fix CPU tests for python 3.12 by @bhavya01 in #9443
Partially disable tpu-info CLI tests by @bhavya01 in #9463
Change nightly_package_version to 2.9 by @pgmoka in #9461
Make assume_pure able to work with functions that depends on random by @qihqi in #9460
Support editable install with setuptools>=80.0.0 by @tengyifei in #9428
Unify the return type of w8a8 matmul between fallback and the actual impl. by @vanbasten23 in #9452
Remove the clamp op when we do symmetric quantization on a tensor by @vanbasten23 in #9465
By default, to("jax") should go to TPU by @zzzwen in #9468
Error Handling: refactor the PjRt registry to use status QOL functions. by @ysiraichi in #9419
Suppress C++ stacktrace on XLA_CHECK*() calls. by @ysiraichi in #9448
Error Handling: replace XLA_CHECK_OK() with status functions. by @ysiraichi in #9457
Fix status source code location logic. by @ysiraichi in #9440
Update cuda version check by @pgmoka in #9469
Remove duplicate artifact creation for 2.8.0-rc1 by @pgmoka in #9471
Calculate vmem limit dynamically in the quantized matmul kernel. by @vanbasten23 in #9470
Update torch compat version to 2.7.1 by @qihqi in #9455
Add dtensor placement test by @jeffhataws in #9458
[Kernel] Update ragged attention block table by @yaochengji in #9476
Error Handling: replace ConsumeValue with GetValueOrThrow. by @ysiraichi in #9464
implement collective reduce op by @bfolie in #9437
Add dtensor mesh conversion test by @aws-cph in #9474
Update TPU CI with latest docker container by @bhavya01 in #9483
Torchax: Allow reuse of the jittable procedure with different functio… by @zmelumian972 in #9374
Optimize w8a8 pallas kernel by @kyuyeunk in #9473
Error Handling: refactor the existing ComputationClient implementations to use status QOL functions. by @ysiraichi in #9420
Bump version by @qihqi in #9480
Fix duplicate labels and other docs build warnings by @melissawm in #9446
Update Setuptools version in build dependencies. by @bhavya01 in #9487
Fix spmd sharding visualization when device index is >= 10 by @jeffhataws in #9475
Add r2.9 release nightly tests to README by @pgmoka in #9478
Improve error message for shape promotion on lowering. by @ysiraichi in #9486
Rename XLA_SHOW_CPP_ERROR_CONTEXT to TORCH_SHOW_CPP_STACKTRACES by @ysiraichi in #9482
implement collective all_to_all op by @bfolie in #9442
Implement collective gather op by @bfolie in #9435
Add support for callable in torchax.interop.JittableModule.functional_call in the first parameter by @zmelumian972 in #9451
Update README.md to reflect supported python versions by @bhavya01 in #9484
Remove support for one-process-per-device style of distributed. by @qihqi in #9490
Allow mixed tensor type math if one of them is a scalar by @qihqi in #9453
Fix nested stableHLO composite regions by @Carlomus in #9385
Misc fixes: by @qihqi in #9491
Fix python 3.11 cuda wheel link in the readme by @vfdev-5 in #9493
[Bugfix] fix ragged attention kernel auto-tuning table key by @yaochengji in #9497
Error Handling: refactor ComputationClient::TransferFromDevice to propagate status. by @ysiraichi in #9429
Implement XLAShardedTensor._spec and test by @aws-cph in #9488
Clean up quantized matmul condition code by @kyuyeunk in #9506
Move mutable properties of env to thread local, misc changes by @qihqi in #9501
Optimize w8a8 kernel vmem limit by @kyuyeunk in #9508
Error Handling: return status value when loading PjRt dynamic plugin. by @ysiraichi in #9495
Add block sizes for Qwen/Qwen2.5-32B-Instruct by @vanbasten23 in #9516
Error Handling: propagate status for ReleaseGilAndTransferData and XlaDataToTensors. by @ysiraichi in #9431
Error Handling: refactor ExecuteComputation and ExecuteReplicated to propagate status. by @ysiraichi in #9445
Error Handling: refactor GetXlaTensor and related functions to use status types. by @ysiraichi in #9510
Dump C++ and Status propagation stacktraces. by @ysiraichi in #9492
Add w8a8 kernel blocks for Qwen 2.5 7B by @kyuyeunk in #9517
Deduplicate GetXlaTensors() function. by @ysiraichi in #9518
[XLA] Add placements property to XLAShardedTensor for DTensor compatibility by @Hoomaaan in #9509
Update artifacts_builds.tf for 2.8.0-rc2 by @bhavya01 in #9522
Update artifacts_builds.tf for 2.8.0-rc3 wheel by @bhavya01 in #9527
make jax as an optional dependency by @qihqi in #9521
Reorganize PyTorch/XLA Overview page by @melissawm in #9498
Support torch.nn.functional.one_hot by @vanbasten23 in #9523
Introduce PlatformVersion bindings by @rpsilva-aws in #9513
Update artifacts_builds.tf for 2.8.0-rc4 by @bhavya01 in #9532
Fix pip install torch_xla[pallas] by @bhavya01 in #9531
Remove cuda builds for release wheels by @bhavya01 in #9533
Optimize KV cache dequantization performance by @kyuyeunk in #9528
Add gemini edited docstring by @qihqi in #9534
Revert 2 accidental commits that I made. by @qihqi in #9536
Implement XLAShardedTensor.redistribute and test by @aws-cph in #9529
Do not set PJRT_DEVICE=CUDA automatically on import. by @ysiraichi in #9540
Add triggers for release 2.8.0 by @bhavya01 in #9545
Update torchbench pin location. by @ysiraichi in #9543
Improve error message of functions related to GetXlaTensor(). by @ysiraichi in #9520
Refactor the status error message builder. by @ysiraichi in #9546
Use TORCH_CHECK() instead of throwing std::runtime_error in XLA_CHECK*() macros' implementation. by @ysiraichi in #9542
Error Handling: make XLATensor::Create() return status type. by @ysiraichi in #9544
cat: improve error handling and error messages. by @ysiraichi in #9548
div: improve error handling and error messages. by @ysiraichi in #9549
Bug fixes by @qihqi in #9554
Run torchprime CI only when the pull requests have torchprimeci label by @bhavya01 in #9551
[Documentation] Fixed typo in C++ debugging docs by @hinriksnaer in #9559
Update README.md to mention 2.8 release by @bhavya01 in #9560
flip: improve error handling and error messages. by @ysiraichi in #9550
Generalize crash message for non-ok status. by @ysiraichi in #9552
Rename MaybeThrow to OkOrThrow. by @ysiraichi in #9561
Add xla random generator. by @iwknow in #9539
[EZ] Replace pytorch-labs with meta-pytorch by @ZainRizvi in #9556
Added missing "#"s for the comments in triton.md by @SriRangaTarun in #9571
Remove tests that are defined outside of this repo. by @qihqi in #9577
Update XLA pin then fix up to make it compile by @qihqi in #9565
Create mapping for FP8 torch dtypes by @kyuyeunk in #9573
refactor: DTensor inheritance for XLAShardedTensor by @aws-cph in #9576
full: improve error handling and error messages. by @ysiraichi in #9564
gather: improve error handling and error messages. by @ysiraichi in #9566
random_: improve error handling and error messages. by @ysiraichi in #9567
Remove XLA_CUDA and other CUDA build flags. by @ysiraichi in #9582
Remove OpenXLA CUDA fallback and _XLAC_cuda_functions.so extension. by @ysiraichi in #9581
Fix case when both device & dtype are given in .to by @qihqi in #9583
implement send and recv using collective_permute by @bfolie in #9373
Set environment variables for tpu7x by @bhavya01 in #9586
Create new macros for throwing status errors. by @ysiraichi in #9588
test: Use new macros for throwing exceptions. by @ysiraichi in #9590
runtime: Use new macros for throwing exceptions. by @ysiraichi in #9591
ops: Use new macros for throwing exceptions. by @ysiraichi in #9592
init_python_bindings.cpp: Use new macros for throwing exceptions. by @ysiraichi in #9595
aten_xla_type.cpp: Use new macros for throwing exceptions. by @ysiraichi in #9596
Remove CUDA plugin. by @ysiraichi in #9597
Remove triton. by @ysiraichi in #9601
torch_xla: Use new macros for throwing exceptions (part 1). by @ysiraichi in #9593
torch_xla: Use new macros for throwing exceptions (part 2). by @ysiraichi in #9594
Remove CUDA specific logic from runtime. by @ysiraichi in #9598
Remove gpu_custom_call logic. by @ysiraichi in #9600
Remove functions that throw status error. by @ysiraichi in #9602
Remove CUDA logic from C++ files in torch_xla/csrc directory. by @ysiraichi in #9603
Remove CUDA specific path from internal Python packages. by @ysiraichi in #9606
Move _jax_forward and _jax_backward inside j2t_autograd to avoid cache collisions by @jialei777 in #9585
Remove remaining GPU/CUDA mentions in torch_xla directory. by @ysiraichi in #9608
Update version to 0.0.6 by @qihqi in #9611
Remove CUDA from PyTorch/XLA build. by @ysiraichi in #9609
Remove CUDA from benchmarks directory. by @ysiraichi in #9610
Remove CUDA tests from distributed tests. by @ysiraichi in #9612
Make torch_xla package PEP 561 compliant by @wirthual in #9515
Remove other CUDA usage from PyTorch/XLA repository. by @ysiraichi in #9618
Remove CUDA from remaining tests. by @ysiraichi in #9613
Miscelanous cleanup by @qihqi in #9619
Replace GetComputationClientOrDie() with GetComputationClient() (part 1). by @ysiraichi in #9617
mm: improve error handling and error messages. by @ysiraichi in #9621
Replace GetComputationClientOrDie() with GetComputationClient() (part 2). by @ysiraichi in #9620
Upgrade build infra to use debian-12 and gcc-11 by @bhavya01 in #9631
Remove libopenblas-dev from ansible dependencies by @bhavya01 in #9632
support load and save checkpoint in torchax by @junjieqian in #9616
Set allow_broken_conditionals configuration variable at ansible.cfg. by @ysiraichi in #9634
Move torch ops error message tests into a new file. by @ysiraichi in #9622
Fix test_ops_error_message.py and run it on CI. by @ysiraichi in #9640
Do not warn on jax usage when workarounds are available by @bhavya01 in #9624
roll: improve error handling and error messages. by @ysiraichi in #9628
stack: improve error handling and error messages. by @ysiraichi in #9629
expand: improve error handling and error messages. by @ysiraichi in #9645
update gcc by @qihqi in #9650
Add default args for _aten_conv2d by @hsjts0u in #9623
Pin flax and skip C++ test SiLUBackward. by @ysiraichi in #9660
trace: improve error handling and error messages. by @ysiraichi in #9630
Fix Terraform usage of cuda_version. by @ysiraichi in #9655
Create PyTorch commit pin. by @ysiraichi in #9654
Accept conda channels' ToS when building the upstream docker image. by @ysiraichi in #9661
Revert "Fix Terraform usage of cuda_version. (#9655)" by @ysiraichi in #9664
Bump Python version of ci-tpu-test-trigger to 3.12. by @ysiraichi in #9665
fix(xla): convert group-local to global ranks in broadcast by @Hoomaaan in #9657
Accept conda channels' ToS with environment variable. by @ysiraichi in #9666
mul: remove opmath cast sequence by @sshonTT in #9663
Update PyTorch and XLA pin. by @ysiraichi in #9668
Support manylinux build for r2.9. by @bhavya01 in #9674
Revert XLA backward incompatible breaking change - #9668 - Fixes Issue #9685 by @rajkthakur in #9686
Work-around for protobuf init crash in sentencepiece by @jeffhataws in #9693
Revert "Work-around for protobuf init crash in sentencepiece" by @jeffhataws in #9697
[cherry-pick] Make API visibility hidden by default. by @ysiraichi in #9705
Add torchax maximum version. by @ysiraichi in #9706
Revert "mul: remove opmath cast sequence (#9663)" by @rajkthakur in #9701

New Contributors

@aws-cph made their first contribution in #9474
@kyuyeunk made their first contribution in #9473
@melissawm made their first contribution in #9446
@Carlomus made their first contribution in #9385
@Hoomaaan made their first contribution in #9509
@hinriksnaer made their first contribution in #9559
@ZainRizvi made their first contribution in #9556
@jialei777 made their first contribution in #9585
@wirthual made their first contribution in #9515
@junjieqian made their first contribution in #9616
@hsjts0u made their first contribution in #9623

Full Changelog: v2.8.1...v2.9.0

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

PyTorch/XLA 2.9.0 release

Choose a tag to compare

Sorry, something went wrong.

Sorry, something went wrong.

Uh oh!

No results found

What's Changed

New Contributors

Contributors

Uh oh!