14 Jul 02:37

Hzfengsy

97c5de6

Apache TVM v0.13.0

Introduction

The TVM community has worked since the v0.12.0 release to deliver the following new exciting improvements! The main tags are below (bold text is with lots of progress):

Community, RFC;
Frontend: TensorFlow/TFLite, Pytorch/Torch, Paddle, keras;
Runtime: Adreno, OpenCL & CLML, ROCm, CUDA & CUTLASS & TensorRT, Ethosn, Vulkan, Hexagon, Metal, others about runtime;
Relay, BYOC, TOPI, Arith, TIR, TVMScript, MetaSchedule;
microTVM, AOT, TVMC, LLVM;
CI, BugFix, Docs, Docker, Miscs;

Please visit the full listing of commits for a complete view: v0.12.0...v0.13.0.

Community

#15086 - Aleksei-grovety -> Reviewer
#14676 - Jiajun Jiang -> Reviewer
#14677 - Qiang Zhang -> Reviewer
#14622 - Sunghyun Park -> Reviewer
#14578 - Zihao Ye -> Committer
#14853 - Anirudh Sundar Subramaniam -> Committer
#14772 - Add new key for release signing

RFC

apache/tvm-rfcs#100

Frontend

#14830 - Use f-strings for string formatting, NFC
Keras
- #15122 - [Relay][Keras] Fix SeparableConv2D conversion in dilation_rate attribute
- #15107 - [Relay][Keras] Fix a wrong variable name in keras frontend
- #15053 - [Relay][Keras] Fix the wrong implementation logic about cropping2D
- #15082 - [Relay][Keras] Fix UpSampling2D about the wrong assertion about size
- #15060 - [Relay][keras] Fix the bug about the attribute 'output_padding' in Deconv
- #14707 - [Keras]fix a bug about alpha attribute in LeakyReLU which lead to passes conflict
- #15175 - [Relay][Keras] Fix concatenate convert function in axis parsing
Paddle
- #14801 - [Paddle] [PaddlePaddle Hackathon 4]add attribute support for gaussian_random/softplus/Conv3d/Conv2d
- #14973 - [Paddle] [PaddlePaddle Hackathon 4] add convert support for tanhshrink/pool3d/set_value ops for paddle frontend
- #14826 - [Paddle] [PaddlePaddle Hackathon 4] add convert support for p_norm/roi_align/softmax_with_cross_entropy
- #14575 - [Paddle] [PaddlePaddle Hackathon 4]add attribute support for dropout/hard_sigmoid/pixel_shuffle
TFLite
- #14667 - [TFLite]Support for quantized squared difference
- #14819 - [TFLite]Generate name when tensor name is missing
- #15173 - [FRONTEND][TFLITE]Fix int16 transpose conv loading
TensorFlow
- #14546 - [Tensorflow] Fix conv2d_transpose for NHWC layout
PyTorch
- #14747 - [PyTorch] Add aten::new_zeros
- #14699 - [Torch] fix typo in new_full
- #14963 - [PyTorch] Support use_input_stats in instance_norm
- #14930 - Fix pytorch axis
ONNX
- #15017 - [ONNX] Fix bug in scatter_elements

Runtime

#15182 - Add weak symbol to builtin fp16
#15161 - Clean TVM stacktrace in error messages
#15162 - Support void as dtype in FFI
#14902 - Update Module and Registry to use String Container
#14967 - [Runtime,RPC] Use f-strings for string formatting, NFC
#14887 - Make systemlib unique per prefix
#14775 - Added str for tvm._ffi.runtime_ctypes.TVMArray
#14656 - Fix Can't "query_imports" Bug of VM Executable

Adreno

#15061 - [TOPI]Fix problem with ceil_log2
#14996 - [OpenCL]Fix conv2d when output channels < 4

CMSIS-NN

#15059 - Update CMSIS-NN release to v4.1.0

OpenCL & CLML

#14972 - [OPENCL] Always use convert_T for type conversion
#14995 - [OpenCL] Improve diagnostic message
#14833 - [Codegen][OpenCL] fix amibiguous selection operator call
#14792 - [OpenCL] Refactor OpenCL runtime to support SPIRV binary ingestion
#14922 - [OpenCLML] Reactor and introduce on chip memory and memory planner
#14949 - [CodegenC] Updated unit test for sorted CodegenC output
#14767 - [OpenCLML] Transposed convolution support and other fixes

cuda & cutlass & tensorrt

#14751 - [CUDA] Fixed the call of the min function in the schedule for cuda
#14798 - [CUTLASS] Add NDEBUG option to CUTLASS compile to speed up attention kernel
#14782 - [Bugfix][Codegen][CUDA] Wrong casting in ASM

metal

#14962 - Fix int8 vectorized cast
#14846 - Fix vectorized select
#14727 - Update metal runtime to directly store kernel map
#14671 - Fix flaky memory issue due to racing

Vulkan

#15035 - [Vulkan] Allow DeclBuffer in CodeGenSPIRV
#14817 - [Vulkan] Add cooperative matrix support

Hexagon

#14997 - Remove "c" as aot_host_target tvm/contrib/hexagon/pytest_pl…
#14948 - Update instructions to compile hexagon runtime
#14965 - Add support for v73, make v68 default
#14720 - [TIR] Add get_vtcm_allocation_sizes with lowering
#14567 - [TIR] Use the "target" value in T.func_attr for VTCM limit

ROCm

#15106 - [TensorIR]AMD Matrix Core Support
#15088 - [Target]Replace rocm arch parsing from int to string

microTVM

#14872 - Use self.close_transport() on error

AOT

#15033 - Avoid Var-to-Var Let binding in AOTExecutorCodegen
#15032 - Remove duplication in tvm.testing.aot.compile_models
#14529 - Fix warning on dropping const in TVMAotExecutor_GetInputName

micoNPU

#15159 - [microNPU][ETHOSU] Fix compiler attributes types
#15147 - [microNPU][ETHOSU] Add option to disable copying constants for case without cascader
#15069 - [microNPU][ETHOSU] Fix SoftMax legalization parameters
#15115 - [microNPU][ETHOSU] Upgrade to 23.05 version of Arm(R) Ethos(TM)-U NPU drivers
#15114 - [microNPU] Upgrade Vela to v3.8.0
#15104 - [microNPU][ETHOSU] Fix minimum buffer size
#15063 - [microNPU][ETHOSU] Fix CopyComputeReordering pass arguments
#14861 - [microNPU][ETHOSU] Add offloading to the NPU the nn.avg_pool2d operator with a stride > 3
#14765 - [microNPU][ETHOSU] Channel pad offloaded to NPU
#14774 - [microNPU][ETHOSU] Fix Softmax quantization parameters
#14629 - [microNPU][ETHOSU] Softmax int8 legalization support
#14353 - [microNPU] Add support for MEAN with uint8 ifm
#14587 - [microNPU] Fix skip tests when Vela is not present
#14464 - [microNPU][ETHOSU] Add restrictions to convert to NHCWB16 layout in LayoutOptimization pass

BYOC

#15046 - Add GEMM kernel from FasterTransformer as submodule
#15029 - Hide internal cutlass symbols

Re...

Assets 5

17 May 07:08

Hzfengsy

v0.12.0

47e0440

Apache TVM v0.12.0

Introduction

The TVM community has worked since the v0.11.1 release to deliver the following new exciting improvements! The main tags are below (bold text is with lots of progress):

Community, RFC;
Runtime: ACL(ArmComputeLibrary), Adreno, OpenCL & CLML, ROCm, CUDA & CUTLASS & TensorRT, Ethosn, CRT, Hexagon, Metal, Web & WASM, others about runtime;
Frontend: TensorFlow/tflite, Pytorch/Torch, Paddle, OneFlow, keras;
TE, Relay, BYOC, TOPI, Arith, TIR, TVMScript, MetaSchedule, Schedule;
CI, Tests, BugFix, Docs, Docker, Build;
Android, microTVM, Target, AutoTVM, AOT, LLVM.

Please visit the full listing of commits for a complete view: v0.11.1...v0.12.0.

Thanks @ysh329 for the great effort to the release process as the release manager.

Community

RFC

Runtime

ArmComputeLibrary

Adreno

OpenCL & CLML

ROCm

[ROCM] Fixes compiling on ROCM 5 and accuracy on dense op

CMSIS-NN

CUDA & CUTLASS & TensorRT

Ethosn

CRT

Hexagon

Metal

MicroNPU

[microNPU] Sum legalization support
[microNPU] Add rescale parameters for binary elementwise
[microNPU] Add hardware constraints for binary elementwise
[microNPU] Add support for TFLite PAD
[microNPU] Upgrade Vela to v3.7.0
[microNPU] Merge LUT activation with binary elementwise operation
[microNPU] Upgrade to 22.08 version of Arm(R) Ethos(TM)-U NPU drivers
[microNPU] Add relu6 relu_n1_to_1 test cases for Ethos-U
[microNPU] Add a legalization test for TFLite PAD
[[microNPU] Disable copying weights to SRAM for FullyConnected ops in CopyConstants scheduler](https://github.com/ap...

Contributors

ysh329

Assets 5

09 Mar 19:47

leandron

v0.11.1

046910a

Apache TVM v0.11.1

Introduction

This is a v0.11.1 bug fix release on top of v0.11.0 (see #13899), incorporating a fix to the Python dependencies description.

What's Changed

Python dependencies

Add typing_extensions requirement (#14244)
Adjust version to 0.11.1 (#14300)

Assets 5

25 Feb 11:33

leandron

v0.11.0

cd9193a

Apache TVM v0.11.0

Introduction

The TVM community has worked since the v0.10.0 release to deliver the following new exciting improvements!

Metaschedule
- Tuning API improvements and anchor-block tuning
TVMSCript metaprogramming
- Lots of progress wiht TVMScript, with the introduction of a core parser, AST, Evaluator, Source and diagnostics

And many other general improvements to microTVM, code quality, CI, frontends, and more! Please visit the full listing of commits for a complete view: v0.10.0...v0.11.0.

RFCs

These RFCs have been merged in apache/tvm-rfcs since the last release.

CodeGenAArch64 backend with Scalable Vector Extension (SVE) #94 apache/tvm-rfcs@04b9909

What's Changed

Note that this list is not comprehensive of all PRs and discussions since v0.10. Please visit the full listing of commits for a complete view: v0.10.0...v0.11.0.

Adreno

[Adreno] Add global pooling schedule (#13573)
[Adreno] Add documentation for Adreno deployment (#13393)
[Adreno] Fix mem_scope annotations for prim funcs having several heads (#13153)
[Adreno] Adapt reduction schedule for adreno (#13100)
[Adreno] Fix winograd accuracy (#13117)
[Adreno][Textures] Fix static memory planner (#13253)
[DOCKER][Adreno]Docker infra for Adreno target with CLML support (#12833)

AoT

[AOT] Add CreateExecutorMetadata analysis pass (#13250)
[AOT] Add CreateFunctionMetadata analysis pass (#13095)
[AOT] Sanitize input/output name in runtime (#13046)

Arith

[Arith] Add internal NarrowPredicateExpression utility (#13041)
[Arith] Optional rewriting and simplification into AND of ORs (#12972)

arm

[bfloat16] Fixed dtype conversion in the arm_cpu injective schedule (#13417)

AutoTVM

[AutoTVM] Introducing multi_filter into ConfigSpace autotvm (#12545)

Build

[BUILD] Re-enable ccache by default (#12839)

CI

[ci] Fix docs deploy (#13570)
[ci] Split Jenkinsfile into platform-specific jobs (#13300)
[ci] Dis-allow any non-S3 URLs in CI (#13283)
[ci] Split out C++ unittests (#13335)
[CI] Separate the ci scripts into Github and Jenkins scripts (#13368)
[ci] Assert some tests are not skipped in the CI (#12915)
[ci] Ignore JUnit upload failures (#13142)
[ci] Lint for trailing newlines and spaces (#13058)
[ci] Template build steps (#12983)
[ci][docker] Allow usage of ECR images in PRs (#13590)
[ci][docker] Read docker image tags during CI runs (#13572)
[ci][wasm] Add package-lock.json to git (#13505)

CL

[ACL] Enable int8 data type in pooling operators (#13488)

CMSIS-NN

[CMSIS-NN] Support for int16 conv2d (#12950)
[CMSIS-NN] Support for int16 in fully connected layer (#13484)

DNNL

[AMP] refine AMP and the corresponding tests for bfloat16 (#12787)

Docker

[Docker]Refactor timezone script and NRF installation (#13342)

Docs

[docs] Fix empty code blocks in tutorials (#13188)

Ethos-N

[ETHOSN] Consolidate target string usage (#13159)
[ETHOSN] Throw error message when inference fails (#13022)
[ETHOSN] Inline non-compute-intensive partitions (#13092)
[ETHOSN] Transpose fully connected weights (#12970)
[ETHOSN] Support conversion of add/mul to requantize where possible (#12887)

Frontend

[TFLite] Enable int64 biases for int16 quantized operators (#12042)

Hexagon

[Hexagon] Add HVX quant conv2d implementation (#13256)
[Hexagon] Add test to show scheduling of resnet50 with async dma pipe… (#13352)
[Hexagon] Enable Hexagon User DMA bypass mode (#13381)
[Hexagon] Lint tests part 2 (#13271)
[Hexagon] Add pylint on tests (#13233)
[Hexagon] Add E2E test demonstrating how to apply blocked layout schedule to conv2d via metaschedule (#13180)
[Hexagon] Add a test to show how to use multi input async dma pipelin… (#13110)
[Hexagon]: Add upload function to hexagon session (#13161)
[Hexagon] Add support for instrumentation based profiling for Hexagon (#12971)
[Hexagon] Add power manager (#13162)
[Hexagon] Add scripts for e2e MetaSchedule tuning demonstration (#13135)
[Hexagon] Add feature to copy logcat to --hexagon-debug and add new --sysmon-profile option to run sysmon profiler during the test (#13107)
[Hexagon] Async DMA pipelining test suite (#13005)
[Hexagon] Enable multi input Async DMA; same queue / stage (#13037)
[Hexagon] Do not use target test fixture in Hexagon tests (#12981)
[Hexagon] 3-stage pipeline; multi queue async DMA for cache read / write (#12954)
[Hexagon] vrmpy tensorization for e2e compilation of int8 models (#12911)
[Hexagon] Support template-free meta schedule tuning (#12854)
[Hexagon] depth_to_space slice op (#12669)
[Hexagon] Make allocate_hexagon_array a hexagon contrib API (#13336)
[Hexagon] Add fix for vtcm allocation searches (#13197)
[MetaSchedule][Hexagon] Add postproc for verifying VTCM usage (#13538)
[Hexagon][QNN] Add TOPI strategies for qnn ops mul/tanh/subtract (#13416)
[Logging][Hexagon] Improve logging on Hexagon (#13072)
[Hexagon] [runtime] Per-thread hardware resource management (#13181)
[Hexagon] [runtime] Create objects to manage thread hardware resources (#13111)
[QNN][Hexagon] Disable QNN canonicalization pass (#12398)
[Hexagon] [runtime] Manage RPC and runtime buffers separately (#13028)
[Hexagon] [runtime] VTCM Allocator (#12947)
[TOPI][Hexagon] Add schedule and test for maxpool uint8 layout (#12826)
[TOPI][Hexagon] Implement quantize op for hexagon (#12820)
[Meta Schedule][XGBoost] Update the custom callback function of xgboost in meta schedule (#12141)
[TIR] [Hexagon] Add vdmpy intrinsic and transform_layout for tests (#13557)
[Hexagon] [runtime] Support VTCM alignments of 128 or 2k (#12999)
[HEXAGON][QHL] Clippling the inputs of HVX version of QHL Sigmoid operation (#12919)
[Hexagon] [runtime] Add user DMA to device API resource management (#12918)

LLVM

[LLVM] Emit fp16/fp32 builtins directly into target module (#12877)
[LLVM] Switch to using New Pass Manager (NPM) with LLVM 16+ (#13515)

MetaSchedule

[MetaSchedule] Make MultiLevelTiling apply condition customizable (#13535)
[MetaSchedule] Enhance Database Validation Script (#13459)
[MetaSchedule] Fix Dynamic Loop from AutoBinding (#13421)
[MetaSchedule] Support schedules with cache read in RewriteLayout (#13384)
[MetaSchedule] Improve inlining and VerifyGPUCode for quantized model workload (#13334)
[MetaSchedule] Add JSON Database Validation Scripts (#12948)
[MetaSchedule] Fix the order of applying AutoInline in ScheduleUsingAnchorTrace (#13329)
[MetaSchedule] Refactor ScheduleRule Attributes (#13195)
[MetaSchedule] Improve the script for TorchBench model tuning & benchmarking (#13255)
[MetaSchedule] Enable anchor-block tuning (#13206)
[MetaSchedule] Introduce a variant of ModuleEquality to enable ignoring NDArray raw data (#13091)
[MetaSchedule] Consolidate module hashing and equality testing (#13050)
[MetaSchedule] Support RewriteLayout postproc on AllocateConst (#12991)
[MetaSchedule] Tuning API cleanup & ergonomics (#12895)
[MetaSchedule] Fix XGBoost Import Issue (#12936)
[MetaSchedule] Add Script for TorchBench Model Tuning & Benchmarking (#12914)
[MetaSchedule] Restore num_threads parameter in tuning API (#13561)
[MetaSchedule] TorchBench tuning script: add option to disallow operators in sub graph (#13453)
[MetaSchedule] Fix segfault in gradient based scheduler (#13399)
[MetaSchedule] Add from-target Defaults for x86 VNNI Targets (#13383)
[MetaSchedule] Fix Task Hanging in EvolutionarySearch (#13246)
[MetaSchedule] Allow skipping exact NDArray rewrite in RemoveWeightLayoutRewriteBlock (#13052)
[MetaSchedule][UX] Support Interactive Performance Table Printing in Notebook (#13006)
[MetaSchedule][UX] User Interface for Jupyter Notebook (#12866)

microNPU

[microNPU] Upgrade Vela to v3.5.0 (#13394)
[microNPU] Fixed MergeConstants pass on striped networks (#13281)

microTVM

[microNPU] Upgrade Vela to v3.5.0 (#13394)
[microNPU] Fixed MergeConstants pass on striped networks (#13281)
[microTVM] Modernize Arm Cortex-M convolution schedules (#13242)
[microTVM] Improve code reuse in Corstone300 conv2d tests (#13051)
[microTVM] Add Cortex-M DSP schedules for optimal conv2d layouts (#12969)
[microTVM] Use default Project Options in template projects and add Makefile for Arduino template project (#12818)
[microTVM] Generalize depthwise_conv2d schedule (#12856)
[microTVM] add the option to open a saved micro project for debugging (#12495)
Added macro generation in MLF export (#12789)
[microTVM][Arduino]Add serial_number to project options and tests (#13518)
[microTVM][Zephyr] Add 'serial_number' option (#13377)
[microTVM][PyTorch][Tutorial]Adding a PyTorch tutorial for microTVM with CRT (#13324)

Misc

[CodegenC] Explicit forward function declarations (#13522)
[FQ2I] Support converting dense -> add to qnn.dense -> add -> requantize (#13578)
[Minor][Testing] Consolidate IRs into corresponding functions (#13339)
Add recursive on loop with marked kUnrolled (#13536)
Skip stride check if shape is 1 in IsContiguous (#13121)
[TEST] CPU feature detection for x86 and ARM dot product instructions (#12980)
[Node] Expose StructuralEqual/Hash handler implemenation...

Assets 5

17 Oct 17:44

AndrewZhaoLuo

v0.10.0

7b50b2d

Apache TVM v0.10.0

Introduction

The TVM community has worked since the v0.9 release to deliver the following new exciting improvments!

Metaschedule
- Software pipelining and padding for irregular shapes for auto tensorization
- Stabilized and polished user-interfaces (e.g. database changes, tune_relay)
- A new MLP-based cost model
TIR
- New schedule primitive for PadEinsum
- A new TIR node: DeclBuffer
- INT8 Intrinsics for TensorCores for CUDA!
microTVM
- Improved schedule primitives for ARM v8-m ISA

And many other general improvements to code quality, TVMScript, and more! Please visit the full listing of commits for a complete view: v0.9.0...v0.10.0rc0.

RFCs

These RFCs have been merged in apache/tvm-rfcs since the last release.

What's Changed

Please visit the full listing of commits for a complete view: v0.9.0...v0.10.0rc0.

Note that this list is not comprehensive of all PRs and discussions since v0.9. A non-truncated summary can be found here: #12979

TIR

#12720 - [TIR] Implement API for padded layout transformations
#12797 - [TIR] Construct the inverse in SuggestIndexMap
#12827 - [TIR] Support pattern matching argmax/argmin generated by TOPI
#12750 - [TIR, Schedule] Add schedule primitive PadEinsum
#11639 - [TIR][Meta-Schedule] Tuple-reduction scheduling support
#12515 - [TIR][Arith] Add more strict checking in imm construction and folding.
#12717 - [TIR, Schedule] Check consumer in-bound and covered in reverse_compute_inline
#12652 - [TIR] Handle axis_separators during FlattenBuffer
#12623 - [TIR] Expose MMA-related PTX builtins
#12607 - [TIR][Schedule] enhance compute_at and reverse_compute_at primitive to choose possible position
...

Assets 5

14 Jul 22:33

driazati

v0.9.0

d361585

Apache TVM v0.9.0

Introduction

The TVM community has worked since the v0.8 release to deliver many exciting features and improvements. v0.9.0 is the first release on the new quarterly release schedule and includes many highlights, such as:

MetaSchedule's full implementation
ARM cascading scheduler for Arm Ethos(TM)-U NPUs
Collage which brings tuning to BYOC
Several microTVM improvements
New tvm.relay.build parameters - runtime=, executor=,
AOT - Support for the C++ runtime (with llvm and c targets only) and support for host-driven AOT in the C runtime
Hexagon RPC support
- Testing via Hexagon SDK simulator and on device via Snapdragon-based HDK boards and phones
- AOT and USMP support
- Threading
- Initial op support
MLF - Support for multiple modules in a single MLF artifact
Several TIR schedule primitives and transforms including (abridged):
- schedule.transform_layout - Applies a layout transformation to a buffer as specified by an IndexMap.
- schedule.transform_block_layout - Applies a schedule transformation to a block as specified by an IndexMap.
- schedule.set_axis_separators - Sets axis separators in a buffer to lower to multi-dimensional memory (e.g. texture memory).
- transform.InjectSoftwarePipeline - Transforms annotated loop nest into a pipeline prologue, body and epilogue where producers and consumers are overlapped.
- transform.CommonSubexprElimTIR - Implements common-subexpression elimination for TIR.
- transform.InjectPTXAsyncCopy - Rewrites global to shared memory copies in CUDA with async copy when annotated tir::attr::async_scope.
- transform.LowerCrossThreadReduction - Enables support for reductions across threads on GPUs.
And many more! See the list of RFCs and PRs included in v0.9.0 for a complete list, as well as the full change list.

RFCs

These RFCs have been merged in apache/tvm-rfcs since the last release.

What's Changed

Note that this list is not comprehensive of all PRs and discussions since v0.8. Please visit the full listing of commits for a complete view: v0.8.0...v0.9.0.rc0.

AOT

#11208 - Calculate used memory at the callsite of primitive functions
#11365 - Fix function number datatype from char to uint16_t
#11091 - Enable A-Normal Form in the AOT executor
#10753 - Support LLVM backend with C++ runtime
#10518 - Use python temporary directory for AOT tests
#10337 - BugFix of workspace calculation
#10282 - [runtime] Add Metadata classes for AOTExecutor
#9501 - [3/3][DeviceAPI] Wire up cpacked Device API context
#9500 - [2/3][DeviceAPI] Add Hooks for Activate/Deactivate/Open/Close
#9395 - [1/3][DeviceAPI] Connecting devices structure to relevant operators

BYOC

#11474 - Two helper passes for external codegen using RelayToTIR custom pass machinery
#11144 - Remove support for run-time linked-params from codegen
#10590 - Add order to functions in C Codegen
#11638 - [DNNL][CBLAS]Unifles all MKLDNN/DNNL to DNNL
#11619 - RelayToTIR custom codegen passes can still depend on dynamic shape functions
DNNL - #11902, #11642, #11513, #11571, #11560, #11345, #11111, #10837, #10421, #9995, #9797
TensorRT - #11923, #11203, #10759, #10772, #10388
CMSIS-NN - #11732, #11625, #10939, #11013, #10817, #10563, #10224, #10148, #10100, #9338, #9531, #9409, #9331
OpenCLML - #10243
CUTLASS - #11631, #10185, #10177, #10110, #10036, #9899, #9820, #9800, #9795, #9746, #9737, #9698, #95...

Assets 5

24 Nov 17:14

junrushao

v0.8.0

7b3a22e

Apache TVM v0.8 Release Note

Overview
Accepted RFCs
Features and Improvements

Overview

Apache TVM v0.8 brings several major exciting experimental features, including:

PaddlePaddle frontend
TVMScript: round-trippable python-based syntax for TIR
TorchScript integration
TensorIR scheduling language
TensorRT and CUTLASS integration via BYOC
Int4 TensorCore support in AutoTVM
MicroTVM Project API and Zephyr, Arduino support
AOT executor
Robust Windows support
Affine analysis infra: iter-affine-map
Improved Vulkan backend
CUDA graph support in TVM runtime

Besides, The community has been working together to refactor and evolve the existing infrastructure, including but not limited to:

Relay compilation engine
Relay pattern language
CI and build process
Refactoring documentation and tutorials
Stablizing AutoScheduler
Stablizing TVMC command line driver interface
Stablizing target system
Frontend coverage, quantization, dynamic shape, training

Full changelog: https://gist.github.com/junrushao1994/c669905dbc41edc2e691316df49d8562.

Accepted RFCs

The community has adopted a formal RFC process. Below is a list of the formal RFCs accepted by the community since then:

[RFC-0005] Meta schedule (AutoTIR)
[RFC-0006] Automatic mixed-precision pass and support
[RFC-0007] Parametrized unit tests
[RFC-0008] MicroTVM Project API
[RFC-0009] Unified static memory planner
[RFC-0010] Target-registered compiler flow customisation
[RFC-0011] Arm® Ethos-U integration
[RFC-0014] Pipeline executor
[RFC-0015] Use CMSIS-NN with TVM
[RFC-0019] Add PaddlePaddle frontend
[RFC-0020] Extend metadata in project option
[RFC-0022] TIR non-scalar constants
[RFC-0023] Adding annotation field to tir.allocate nodes
[RFC-0025] PyTorchTVM
[RFC-0027] Formalize TVM documentation organization
[RFC-0028] Command line composition from internal registry
[RFC-0029] Migrating target attributes to IRModule
[RFC-0030] Command line configuration files
[RFC-0031] C Device API
[RFC-0036] TVMScript namespace
[RFC-0041] Update TVMScript block syntax

Features and Improvements

TE, TIR, TVMScript

TVMScript parser and printer #7630 #9115 #9286
Scheduleable TIR (S-TIR) infrastructure, analysis and lowering passes #7553 #7765 #7847 #8114 #8121 #7873 #7923 #7962 #7848 #8044 #7806
S-TIR schedule primitives: compute-inline, reverse-compute-inline, fuse, split, rfactor, storage-align, vectorize, unroll, bind, reorder, cache-read, cache-write, compute-at, reverse-compute-at, decompose-reduction #8170 #8467 #8544 #8693 #8716 #8767 #8863 #8943 #9041
While loop in TIR #7425 #9004
Metaprogramming in S-TIR via specialize #8354
Support Return value in TIR #7084 #7932
Storage scope support in PointerType #8017 #8366 #8463
Creation of S-TIR via TE compute #7987

AutoTVM, AutoScheduler, Meta Schedule

PopenPoolExecutor is used to replace python native library to provide better multiprocessing support as well as enable auto-tuning in Jupyter notebooks for AutoTVM and AutoScheduler #6959 #8492 #8913 #8820 #8851
AutoScheduler improvement and stabilization: task scheduler, layout rewrite, early stopping, dispatching #6945 #6750 #6987 #7156 #8862 #8995 #7571 #7376 #7377 #7344 #7185
AutoScheduler support for sparse workloads #7313 #7635 #8065
AutoScheduler support for Vulkan, ROCm, Mali #7626 #7038 #7132
AutoTVM support for int4 TensorCore #7831 #8402
Meta Schedule core infrastructure, builder runner and database #8615 #8623 #8642 #8817 #9079 #9132 #9154 #9053 #9059 #9044 #9111 #9061 #9153

Operator Coverage

Operators for Int-8 vision transformer on GPU #7814
Optimizing NMS and ROI-related kernel on GPU #7257 #7172 #7136 #7796 #7463 #6516 #7440 #7666 #8174
Support and optimize sparse operators #8605 #7477 #7435 #6889 #6580 #8437
Sort-related operators and optimization #9184 #7669 #8672 #7611 #7195 #7056 #6978
Support for einsum operator #6370
Matmul, dense operators and their optimization #8921 #8527 #8234 #8250 #6616 #8229 #8401 #7404 #8669
Convolution and pooling operators and their optimization #8620 #8936 #8584 #7075 #7142 #7515 #6999 #6899 #6840 #6137 #6802 #6445 [#671...

Assets 5

02 Oct 18:30

ZihengJiang

v0.7.0

728b829

Apache TVM (incubating) v0.7.0

Apache TVM (incubating) is an effort undergoing incubation at The Apache Software Foundation (ASF), sponsored by the Apache Incubator PMC.

Incubation is required of all newly accepted projects until a further review indicates that the infrastructure, communications, and decision making process have stabilized in a manner consistent with other successful ASF projects.

While incubation status is not necessarily a reflection of the completeness or stability of the code, it does indicate that the project has yet to be fully endorsed by the ASF.

Introduction

v0.7 brings many major features. The community works together to refactor the internal code base to bring an unified IR code structure with unified IRModule, type system and pass infrastructure. We have also bought many exciting new features, some highlights include:

Initial automatic scheduling support
Initial command line driver interface
WebGPU and webassembly support
Better first class rust support in the codebase
Intial Hexagon support
Bring your own codegen (BYOC) support

The community also continues to bring high quality improvements to the existing modules including, but not limited to: better frontend coverage, performance, quantization, uTVM and dynamic shape support.

New Features

Automatic Scheduling (Experimental)

Phase 0: Ansor minimum system for auto schedule generating #5962
Phase 1: Access Analyzer #6103
Phase 1: Add follow_split and follow_fused_split steps #6142
Phase 1: Add pragma/storage_align/rfactor steps #6141
Phase 1: Add RPC Runner #6077
Phase 1: Add annotation/compute_at/compute_root/compute_inline steps #6073
Phase 1: Add cache_read/cache_write steps #6107
Phase 1: Rename namspace form auto_schedule to auto_scheduler #6059
Phase 1: The base class for cost models #6187
Phase 1: feature extraction for cost models #6190
Phase 1: XGBoost Cost Model #6270
Phase 2: Basic GPU Sketch Search Policy #6269
Phase 2: Evolutionary Search #6310
Phase 2: Update heavy operations with parallel_for #6348
Parallel the InitPopulation (#6512)
Tutorial: Using the template-free auto-scheduler on CPU (#6488)

BYOC

External codegen support in Relay (#4482)，(#4544)
Bring Your Own Codegen Guide -- Part 1 #4602
Bring Your Own Codegen Guide -- Part 2 #4718
Relay annotation and partitioning for external compilers #4570
JSON Runtime with DNNL End-to-End Flow #5919
Handle one symbol for each runtime #5989
Run accelerator specific optimizations #6068
Arm Compute Library integration #5915
Retire the example json runtime #6177
json_node.h should include data_type.h #6224
Improve installation tutorial #6170
Add support for dense (fully connected) layer #6254
Introduce the Ethos-N BYOC integration #6222
Enable remote device via environment variables #6279
Improved pooling support #6248
Add support for quantized convolution #6335
CoreML codegen #5634

Operator Coverage

Add strided_set operation (#4303)
Add support for conv3d (#4400), pool3d (#4478), 3d upsampling ops (#4584)
Add group convolution for VTA (#4421)
Add 1d deconvolution op (#4476)
Allow batch matmul to be fused into injective ops (#4537)
Add native depthtospace and spacetodepth operators (#4566)
Add CUDNN conv3d support (#4418)
Dilation2D operator support #5033
Isfinite operator #4981
Unravel Index operator #5082
Add thrust support for nms #5116
Resize3d, Upsample3d op support #5633
Add operator Correlation #5628
affine_grid and grid_sample #5657
Sparse to dense operator #5447
Conv3d_transpose op support added #5737
add op crop_and_resize #4417
Add bitwise ops #4815
Sparse to dense operator #5447
support dynamic NMS(Non Maximum Suppression), symbolic begin, end, and strides for strided_slice #4312
Conv3d_transpose op support added #5737
ReverseSequence operator #5495
Conv1D #4639
1D Pooling #4663

Quantization

Channel wise quantization - Quantize & Requantize #4629
Support QNN ops. #5066
Adding support for QNN subtract op #5153
TFLite QNN Tutorial #5595
Tutorial: Deploy Quantized Model on CUDA #4667
Support asymmetric per-layer quantized operators #6109

Relay

Add convertlayout pass in Relay (#4335, #4600)
Added Merge Composite pass #4771
Call graph for relay #4922
Add inline pass #4927
Target annotation for external codegen #4933
GradientCell Relay Pass #5039
Add MergeCompilerRegions pass #5134
Non-recursive Graph Vistor and Rewriter (#4886)
[Blocksparse] Pipeline for lowering dense model to sparse-dense (#5377)
Relay op strategy #4644
Static Tensor Array (#5103)
Memory planner (part 1) #5144
ONNX codegen #5052
Add Parser 2.0 #5932, part 2 #6162
Basic block normal form #6152
Convert Layout pass. #4664
Pattern Language, Matcher, Rewriter, and Function Paritioner #5231

Runtime and Backend

Add ADTObject POD container type (#4346)
TFLite RPC runtime (#4439)
Standardized graph runtime export (#4532)
MISRA-C compliant TVM runtime #3934
Add String container #4628
Introduce Virtual Memory Allocator to CRT (#5124)
Initial implementation of Hexagon runtime support (#5252)
FastRPC interface for Hexagon runtime (#5353)
CoreML Runtime (#5283)
AutoTVM + uTVM for Cortex-M7 (#5417)
Windows Support for cpp_rpc (#4857)
Implement TVMDSOOp(TensorFlow custom op) for TVM runtime (#4459)
WebGPU support #5545
TVM WebAssembly JS Runtime #5506
Hexagon driver for offloading kernels to simulator #5492
Introduce runtime::Array #5585
Allow non-nullable ObjectRef, introduce Optional. (#5314)
Introduce static slots for common objects. (#5423)
ntroduce RValue reference(move) support to TypedPackedFunc (#5271)
Introduce MetadataModule to separate code compilation/interpretation and weight initialization #5770
Support module based interface runtime #5753
Add TVM application extension with WASM runtime #5892
Provide guide to user who has difficulty register SEqualReduce (#5300)

Rust Support

Revive the Rust + SGX refactor #4976
Improve Rust bindings: Map, Array, String, various IR nodes #6339
Rust Refactor Stage 4: Rewrite Rust graph runtime to use new APIs #5830
Second stage of Rust Refactor #5527
tvm crate stage 3 of Rust refactor #5769
Add first stage of updating and rewriting Rust bindings. #5526

TIR

Introduce StructuralHash for the Unified IR. #5160
Introduce StructuralEqual Infra for the unified IR. #5154
Introduce ExprDeepEqual, Remove IRDeepCompare #5206
[TIR] Introduce BufferLoad/Store (#5205)
Improved massive build times caused by tir.floormod and tir.floordiv. Fixed Topi testcase. #5666
Buffer logger assert removed #6147
Enhance VerifyGPUCode #6194
HoistIfThenElse added #6066
Hybrid Script Support for TIR #6227
Migrate Low-level Passes to Pass Manager #5198
HoistIfThenElse added #6066
Hybrid Script Support for TIR #6227
Block scope hoisting added #6238

TE

reverse-mode autodiff without any optimization #5121
Tensor Expression Debug Display (TEDD) #4651
Optimize and eliminate the Jacobian tensor for te.autodiff #6078

TVMC(Experimental)

TVMC - A command line driver for TVM (Part 1) #6112
TVMC - Linting error on onnx command line driver frontend #6536
TVMC - Command line driver 'compile' (part 2/4) #6302
TVMC - Introduce 'tune' subcommand (part 3/4) #6537
TVMC - Introduce 'run' subcommand (part 4/4) #6578
TVMC - Getting started tutorial for TVMC #6597

Feature Improvement

Accelerator and Microcontroller Support

Cleanup legacy verilog code (#4576)
uTVM support for ARM STM32F746XX boards (#4274)
Add --runtime=c, remove micro_dev target, enable LLVM backend #6145

Arithmetic Analysis

Linear system and equation solver (#5171)
Inequalities solver #5618
Improve IntervalSet's floormod (#5367)
Remove legacy const pattern functions (#5387)
Handle likely in IRMutatorWithAnalyzer #5665
ExtendedEuclidean merge impl to int_operator #5625
Rewrite simplify fix for Vectorized Cooperative Fetching #5924

AutoTVM and Graph Tuner

Adding ROCM schedules for TOPI (#4507)
NHWC conv2d schedule templates for ARM (#3859)
Use VM compile to extract autotvm tasks #4328
Download fallback schedule file if it does not exist #4671
Ignore error when removing tmpdir #4781
Fix a bug in generating the search space #4779
Minor bug fixes in AutoTVM for QNN graphs #4797
Fix autotvm customized template #5034
Add opt out operator for has_multiple_inputs for graph tuner #5000
Customize SI prefix in logging (#5411)
Update XGBoost verbosity option #5649
Support range in index based tuners #4870
Enable random fill and CPU cache flush for AutoTVM and Ansor (#6391)
Auto-scheduler tutorial for GPU and necessary refactor/fix (#6512)

BYOC

[BYOC] Bind constant tuples in graph partitioner (#5476)
[BYOC] Add support for composite functions in BYOC (#5261)
[BYOC] Register pattern tables from external codegens (#5262)
[BYOC] Enhance partitioning and external codegen (#5310)
[BYOC] Refine AnnotateTarget and MergeCompilerRegion Passes (#5277)
[BYOC] Use Non-Recursive Visitor/Mutator (#5410)
[BYOC] Refine DNNL Codegen (#5288)
[BYOC] Add example of Composite + Annotate for DNNL fused op (#5272)
[BYOC] Prevent duplicate outputs in subgraph Tuple (#5320)
[BYOC] Introduce further operator support (#6355)
[BYOC] Support input nodes with multiple entries (#6368)
[BYOC] Add maximum support for float32 (#6506)

Codegen

Intrinsic dispatching with OCML instead of LLVM for ROCm (#4499)
Make target codegen take IRModule and PrimFunc. #5107
Enhance CUDA codegen for SelectNode #4983
Vectorization for intrinsics #5101
[LLVM] Do not...

Assets 5

10 Jul 19:29

yzhliu

v0.6.1

0d0d515

Apache TVM (incubating) v0.6.1

Apache TVM (incubating) is an effort undergoing incubation at The Apache Software Foundation (ASF), sponsored by the Apache Incubator PMC.

While incubation status is not necessarily a reflection of the completeness or stability of the code, it does indicate that the project has yet to be fully endorsed by the ASF.

Apache TVM (incubating) 0.6.1 is a maintenance release incorporating important bug fixes and important performance improvements. All users of Apache TVM (incubating) 0.6.0 are advised to upgrade. Please review following release notes to learn the bug fixes.

Bug Fixes

Fixed process termination routine in windows #4844
[Runtime] Fix NDArray SaveDLTensor declaration and implementation signature different #4586
[NODE][Serialization]fix serialization precision loss in float #4503
[Relay][Frontend][TF] fix _parse_param bug #4711
Fix bias_add gradient #4516
Make sure to visit the arguments of inlined functions #4783
Fix Python syntax error in start_rpc_server_to_tracker.py #4682
[Bugfix] Fixed crash caused by reversing bitwise operations #4852
[Fix][VM] Fix copy constructor #5237
fix small bug about dense_grad #5695
[Fix] Fix conv2d alter op for arm cpu #5532
[Fix] Fix dense x86 schedule #4728
[Relay][Fix] Fix alter op layout when calling a global var #4454
[Relay][Pass] Fix lambda lift pass for recursive call #4432
[BUGFIX] Fix search path for libtvm_topi.so #4467
[Bugfix] Fix Python debugger segfaults with TVM built with LLVM #5685
[RUNTIME] Fix compile errors of OpenCL FPGA backend #4492
[BUGFIX][BACKPORT-0.6][ARITH] Fix FloorMod Simplifier #5509
Some Windows and MSVC fixes #4569
[Chisel][VTA] Fix multiple transfer issue in LoadUop module #4442
[VTA] Fix an issue in updating uop_idx in the TensorGemm module #4694
[VTA] Fixed a crash issue in TSIM driver #4527
[VTA] Enable streamlined GEMM execution #4392
[VTA][Chisel] End-to-end Inference with Chisel VTA #4574
Added declare of aluBits for TensorAlu #4624
[Quantization] Fix annotation for multiply op #4458
LRN only supports 4D tensors, remove it from alter_op_layout #5520
fix topi.nn.global_pool layout="NHWC" #4656
[FFI][Windows] Fix hasattr by extracting Python error type from Windows error message #4780
[Runtime] Export GraphRuntime in tvm_runtime.dll #5002
Fix Base64OutStream portability issue #4668
[AUTOTVM] Fix a bug in generating the search space #4779
[Relay][VM] Fix compilation of If-Elses #5040
[RELAY][FRONTEND][TENSORFLOW] Fix FuseBatchNorm output cast error if need_cast is True #4894
[Bugfix] fskip of EliminateCommonSubexpr cannot always return false #4620
[Fix] Add ConstantNode to IsAtomic #5457
[Fix] Fix RemoveUnusedFunctions pass #4700
[Realy][fix] Fix alpha_equal bug for attribute check #4897
[Arith] keep div_mode during floordiv simplify #5922
[ARITH][BACKPORT-0.6] fix a min/max simplify bug #5761
[0.6-BACKPORT] Improve robustness of the docs build #5583

Assets 5

05 Dec 06:47

yzhliu

v0.6.0

c6f8c23

Apache TVM (incubating) v0.6.0

Apache TVM (incubating) is an effort undergoing incubation at The Apache Software Foundation (ASF), sponsored by the Apache Incubator PMC.

While incubation status is not necessarily a reflection of the completeness or stability of the code, it does indicate that the project has yet to be fully endorsed by the ASF.

New Features

Relay in Production

Relay is a functional, differentiable programming language designed to be an expressive intermediate representation for machine learning systems. Relay supports algebraic data types, closures, control flow, and recursion, allowing it to directly represent more complex models than computation graph-based IRs (e.g., NNVM) can. In TVM v0.6, Relay is in stable phase and is ready for production.

Algebraic Data Types (ADT) support (#2442, #2575). ADT provides an expressive, efficient, and safe way to realize recursive computation (e.g., RNN). Refer to https://docs.tvm.ai/langref/relay_adt.html for more information.
Pass manager for Relay (#2546, #3226, #3234, #3191)
Most frameworks have been supported in Relay, including ONNX, Keras, Tensorflow, Caffe2, CoreML, NNVMv1, MXNet (#2246).
Explicitly manifest memory and tensor allocations in Relay. (#3560)

Relay Virtual Machine

The Relay Virtual Machine (Relay VM) is the new generation of runtime to strike a balance between performance and flexibility when deploying and executing Relay programs. Previously, the graph runtime is able to utilize the fully static nature of the input graphs to perform aggressive optimization such as fully static allocation, and optimal memory reuse. When we introduce models which make use of control-flow, recursion, dynamic shapes, dynamic allocation we must change how execution works.

Relay VM is now usable and is able to achieve decent performance for a various of models and targets.

Design (#2810 #2915) and a first version of implementation (#2889),
Add VM runtime for Relay and compiler support (#3120, #3121, #2889, #3139)
Relay VM (pattern matching #3470, port to python #3391, serialization #3647)
Relay VM Profiler (#3727)
Support execution on devices for Relay VM (#3678)
[Relay][VM] Add more passes to VMCompiler (#4058)
[relay][vm] Separate VM runtime with executable (#4100)
Port VM, VM compiler, and Object into Python (#3391)
VM: Add AllocTensor instruction and better instruction printer (#3306)
[Relay][VM][Interpreter] Enable first-class constructors in VM and interpreter via eta expansion. (#4218)
[Relay][VM] Clean up the VM and VM profiler code (#4391)

Training

Relay is designed to natively support first-order and higher-order differentiation. The automatic differentiation infrastructure is now usable and a count of operators with gradient support are available in v0.6 release.

Higher order reverse mode automatic differentiation that work with control flow (#2496)
Higher order continuation passing style (#3456, #3485 )
Relay gradient registration (clip #3509, max_pool2d and avg_pool2d #3601)
Relay AD algorithm (#3585)
Relay Training - allow gradient to return a tuple (#3600), numerical gradient check (#3630)
Improve AD for concatenate (#3729)
[Relay][Training] Add missing gradient check to gradient pass (#4169)
As a part of Relay's automatic differentiation system, we are adding primal gradients for Relay operators. Please refer to #2562 for tracking the progress.
Gradient for Conv2d (#3636)
Add gradient operators (#3857, #3894, #3901, #3915)
Add gradient for log-softmax (#4069)
[Relay][Training] Add gradient for Crossentropy (#3925)
[Relay][Training] Add and fix gradients (#4126)

Quantization

Low-bit inference is getting more and more popular as it benefits both the performance and storage usage. TVM now supports two types of quantization. 1. Automatic quantizaion takes floating-point precision model, does per-layer calibration and generates low-bit model. 2. TVM also imports pre-quantized model from Tensorflow and MXNet, a new dialect QNN is introduced to handle further lowering to normal operators.

Automatic Quantization
- Low-bit automatic quantization supported. (#2116). The workflow includes annotation, calibration and transformation.
- Refactor quantization codebase and fix model accuracy. (#3543)
- KL-divergence-based per-layer calibration. (#3538)
- Add option to select which convolution layers are quantized. (#3173)
- [Relay][Quantize] Integrate data-aware calibration into quantization. (#4295)
Pre-quantized model support (QNN operators and legalize pass).
- Add a legalize pass to Relay (#3672)
- Qnn Concatenate, quantize, dequantize and requantize operators (#3819, #3730, #3745, #3531)
- QNNtoRelay & QNNLegalize Pass utility (#3838, #3782)
- Requantize: Optimize lowering for some corner cases. (#3864)
- New quantized operator support: conv2d, add, dense (#3580, #3736, #3896, #3910)
- Do type checking for the input and kernel in the qnn conv2d (#3904)
- Legalize and AlterOpLayout for Intel int8. (#3961)
- Renaming tests to follow the Relay nomenclature. (#3975)
- Fix padding changes due to #3739 (#3989)
- Memorizing quantize node mapping to avoid duplicated simulated quantization (#3233)
- Infrastructure to support pre-quantized models (QNN) (#3971).
- [Relay][AlterOp] NHWC to NCHWc support for Pool, concatenate, sum. (#4059)
- [TOPI][x86] Cascade lake support. (#4123)
- [TOPI][x86] Legalize - Support int8xint8 convolution to use VNNI inst (#4196)
- Qnn dequantize with min max using Mxnet flavor to support Mxnet prequantized models. (#3945)
- Improve the lowering of Qnn Dense (#4213)
- Adding support for dequantizing from int32 to float32. (#4130)
- [QNN] Refactor fixed point multiplicat...

Assets 5

Releases: apache/tvm

Apache TVM v0.13.0

Introduction

Community

RFC

Frontend

Runtime

Adreno

CMSIS-NN

OpenCL & CLML

cuda & cutlass & tensorrt

metal

Vulkan

Hexagon

ROCm

microTVM

AOT

micoNPU

BYOC

Re...

Uh oh!

Apache TVM v0.12.0

Introduction

Community

RFC

Runtime

ArmComputeLibrary

Adreno

OpenCL & CLML

ROCm

CMSIS-NN

CUDA & CUTLASS & TensorRT

Ethosn

CRT

Hexagon

Metal

MicroNPU

Contributors

Uh oh!

Apache TVM v0.11.1

Introduction

What's Changed

Python dependencies

Uh oh!

Apache TVM v0.11.0

Introduction

RFCs

What's Changed

Adreno

AoT

Arith

arm

AutoTVM

Build

CI

CL

CMSIS-NN

DNNL

Docker

Docs

Ethos-N

Frontend

Hexagon

LLVM

MetaSchedule

microNPU

microTVM

Misc

Uh oh!

Apache TVM v0.10.0

Introduction

RFCs

What's Changed

TIR

Uh oh!

Apache TVM v0.9.0

Introduction

RFCs

What's Changed

AOT