Releases: apache/tvm
Apache TVM v0.13.0
Introduction
The TVM community has worked since the v0.12.0 release to deliver the following new exciting improvements! The main tags are below (bold text is with lots of progress):
- Community, RFC;
- Frontend: TensorFlow/TFLite, Pytorch/Torch, Paddle, keras;
- Runtime: Adreno, OpenCL & CLML, ROCm, CUDA & CUTLASS & TensorRT, Ethosn, Vulkan, Hexagon, Metal, others about runtime;
- Relay, BYOC, TOPI, Arith, TIR, TVMScript, MetaSchedule;
- microTVM, AOT, TVMC, LLVM;
- CI, BugFix, Docs, Docker, Miscs;
Please visit the full listing of commits for a complete view: v0.12.0...v0.13.0.
Community
- #15086 - Aleksei-grovety -> Reviewer
- #14676 - Jiajun Jiang -> Reviewer
- #14677 - Qiang Zhang -> Reviewer
- #14622 - Sunghyun Park -> Reviewer
- #14578 - Zihao Ye -> Committer
- #14853 - Anirudh Sundar Subramaniam -> Committer
- #14772 - Add new key for release signing
RFC
Frontend
- #14830 - Use f-strings for string formatting, NFC
- Keras
- #15122 - [Relay][Keras] Fix SeparableConv2D conversion in dilation_rate attribute
- #15107 - [Relay][Keras] Fix a wrong variable name in keras frontend
- #15053 - [Relay][Keras] Fix the wrong implementation logic about cropping2D
- #15082 - [Relay][Keras] Fix UpSampling2D about the wrong assertion about size
- #15060 - [Relay][keras] Fix the bug about the attribute 'output_padding' in Deconv
- #14707 - [Keras]fix a bug about alpha attribute in LeakyReLU which lead to passes conflict
- #15175 - [Relay][Keras] Fix concatenate convert function in axis parsing
- Paddle
- #14801 - [Paddle] [PaddlePaddle Hackathon 4]add attribute support for gaussian_random/softplus/Conv3d/Conv2d
- #14973 - [Paddle] [PaddlePaddle Hackathon 4] add convert support for tanhshrink/pool3d/set_value ops for paddle frontend
- #14826 - [Paddle] [PaddlePaddle Hackathon 4] add convert support for p_norm/roi_align/softmax_with_cross_entropy
- #14575 - [Paddle] [PaddlePaddle Hackathon 4]add attribute support for dropout/hard_sigmoid/pixel_shuffle
- TFLite
- TensorFlow
- #14546 - [Tensorflow] Fix conv2d_transpose for NHWC layout
- PyTorch
- ONNX
- #15017 - [ONNX] Fix bug in scatter_elements
Runtime
- #15182 - Add weak symbol to builtin fp16
- #15161 - Clean TVM stacktrace in error messages
- #15162 - Support void as dtype in FFI
- #14902 - Update Module and Registry to use String Container
- #14967 - [Runtime,RPC] Use f-strings for string formatting, NFC
- #14887 - Make systemlib unique per prefix
- #14775 - Added str for tvm._ffi.runtime_ctypes.TVMArray
- #14656 - Fix Can't "query_imports" Bug of VM Executable
Adreno
CMSIS-NN
- #15059 - Update CMSIS-NN release to v4.1.0
OpenCL & CLML
- #14972 - [OPENCL] Always use convert_T for type conversion
- #14995 - [OpenCL] Improve diagnostic message
- #14833 - [Codegen][OpenCL] fix amibiguous selection operator call
- #14792 - [OpenCL] Refactor OpenCL runtime to support SPIRV binary ingestion
- #14922 - [OpenCLML] Reactor and introduce on chip memory and memory planner
- #14949 - [CodegenC] Updated unit test for sorted CodegenC output
- #14767 - [OpenCLML] Transposed convolution support and other fixes
cuda & cutlass & tensorrt
- #14751 - [CUDA] Fixed the call of the min function in the schedule for cuda
- #14798 - [CUTLASS] Add NDEBUG option to CUTLASS compile to speed up attention kernel
- #14782 - [Bugfix][Codegen][CUDA] Wrong casting in ASM
metal
- #14962 - Fix int8 vectorized cast
- #14846 - Fix vectorized select
- #14727 - Update metal runtime to directly store kernel map
- #14671 - Fix flaky memory issue due to racing
Vulkan
Hexagon
- #14997 - Remove "c" as aot_host_target tvm/contrib/hexagon/pytest_pl…
- #14948 - Update instructions to compile hexagon runtime
- #14965 - Add support for v73, make v68 default
- #14720 - [TIR] Add get_vtcm_allocation_sizes with lowering
- #14567 - [TIR] Use the "target" value in T.func_attr for VTCM limit
ROCm
- #15106 - [TensorIR]AMD Matrix Core Support
- #15088 - [Target]Replace rocm arch parsing from int to string
microTVM
- #14872 - Use self.close_transport() on error
AOT
- #15033 - Avoid Var-to-Var Let binding in AOTExecutorCodegen
- #15032 - Remove duplication in tvm.testing.aot.compile_models
- #14529 - Fix warning on dropping const in TVMAotExecutor_GetInputName
micoNPU
- #15159 - [microNPU][ETHOSU] Fix compiler attributes types
- #15147 - [microNPU][ETHOSU] Add option to disable copying constants for case without cascader
- #15069 - [microNPU][ETHOSU] Fix SoftMax legalization parameters
- #15115 - [microNPU][ETHOSU] Upgrade to 23.05 version of Arm(R) Ethos(TM)-U NPU drivers
- #15114 - [microNPU] Upgrade Vela to v3.8.0
- #15104 - [microNPU][ETHOSU] Fix minimum buffer size
- #15063 - [microNPU][ETHOSU] Fix CopyComputeReordering pass arguments
- #14861 - [microNPU][ETHOSU] Add offloading to the NPU the nn.avg_pool2d operator with a stride > 3
- #14765 - [microNPU][ETHOSU] Channel pad offloaded to NPU
- #14774 - [microNPU][ETHOSU] Fix Softmax quantization parameters
- #14629 - [microNPU][ETHOSU] Softmax int8 legalization support
- #14353 - [microNPU] Add support for MEAN with uint8 ifm
- #14587 - [microNPU] Fix skip tests when Vela is not present
- #14464 - [microNPU][ETHOSU] Add restrictions to convert to NHCWB16 layout in LayoutOptimization pass
BYOC
Re...
Apache TVM v0.12.0
Introduction
The TVM community has worked since the v0.11.1 release to deliver the following new exciting improvements! The main tags are below (bold text is with lots of progress):
- Community, RFC;
- Runtime: ACL(ArmComputeLibrary), Adreno, OpenCL & CLML, ROCm, CUDA & CUTLASS & TensorRT, Ethosn, CRT, Hexagon, Metal, Web & WASM, others about runtime;
- Frontend: TensorFlow/tflite, Pytorch/Torch, Paddle, OneFlow, keras;
- TE, Relay, BYOC, TOPI, Arith, TIR, TVMScript, MetaSchedule, Schedule;
- CI, Tests, BugFix, Docs, Docker, Build;
- Android, microTVM, Target, AutoTVM, AOT, LLVM.
Please visit the full listing of commits for a complete view: v0.11.1...v0.12.0.
Thanks @ysh329 for the great effort to the release process as the release manager.
Community
- Reviewer
- Committer
- PMC
RFC
- [RFC] Introduce PresburgerSet (#99) (
e17994b) - [RFC] Further Unify Packed and Object in TVM Runtime (#97) (
d646a22)
Runtime
ArmComputeLibrary
- [ACL][TESTING] Use pytest.mark.parametrize in ACL conv2d tests
- [ACL] Prevent offloading of per-channel quantized operators
- [CL] Update Compute Library from v22.11 to v23.02.1
Adreno
- [Adreno] Extend pack_filter for HWIO layout
- [Adreno] Update interface of AnnotateMemoryScope pass
- [Adreno] Optimize reduction schedule
- [BENCHMARK][ADRENO] Adreno Benchmarks with texture
- [BENCHMARKS][CLML] Adreno benchmarks with CLML BYOC path added
- [BENCHMARKS][ADRENO] Documentation for Adreno (Texture) benchmarks
- [DOCS][ADRENO] Improved Adreno documentation
OpenCL & CLML
- OpenCL
- CLML
- [CLML][RUNTIME] Enable more ops in CLML runtime
- [CLML][RELAY] Enable Pad and Conv2d layer fusion
- [CLML][CODEGEN] CLML native codegen utility
- [CLML] Version compatibility and various test cases
- [CLML] Changes corresponding to OpenCL workspace refactorization
- [RUNTIME][CLML] OpenCLML tuning and profiling enhanced
ROCm
CMSIS-NN
- [CMSIS-NN] Global function that provides range based on dtype
- [CMSIS-NN] Add int16 add and mul operator support
- [CMSIS-NN] Add a runtime error message
- [CMSIS-NN] Reduction in code size of AOT test runner binary
- [CMSIS-NN] Remove support for the old CMSIS NN project
- [CMSIS-NN] Support CMSIS NN from new GitHub location
- [CMSIS-NN] Add Cortex-M85 support
CUDA & CUTLASS & TensorRT
- [CUDA][Schedule] Better Layout Transform Schedules
- [Profiler] Allow user to flush L2 cache in
time_evalutorfunction for profiling CUDA kernels - [Codegen][CUDA] Add error message for missing fragment info
- [CUTLASS][Ansor] Combine CUTLASS and Ansor
- [TensorRT] Fix BiasAdd with correct axis attribute
- [TRT][BYOC] allow strided_slice ops on selected dimensions (#14142)
Ethosn
- [ETHOSN] Update driver stack version to 22.11
- [ETHOSN] Support for addition with constant input
- [ETHOSN] Apply FoldConstant before NPU partitioning
- [ETHOSN] Remove support for NPU driver 22.08
- [ETHOSN] Fix for the mock inference after NPU driver update
- [ETHOSN] Remove requantize dependency on resize
- [ETHOSN] Add support for experimental compiler option
CRT
- [CRT] USE CMake for CRT standalone libraries
- [CRT][microTVM] Enable USMP by default for AoTExecutor + CRT runtime
- [CRT]Cleanup unused macros in crt_config.h.template
Hexagon
- [Hexagon][TOPI] Use IndexMap axis separator instead of TE
- [Hexagon] Add concept of DMA groups
- [Hexagon] Improve cache management strategy for HexagonBuffer
- [Hexagon] Denote DMA cache bypass as experimental feature
- [Hexagon] Adapt some intrinsics for high vector lanes
- Hexagon compilation on MacOS system
- [Hexagon] Enable depthwise conv2d NHWC with an HWIO kernel layout
- [Hexagon][QNN] Improve performance wo QNN canonicalization
- [Hexagon][Metaschedule] Add timeout_sec arg to get_hexagon_local_builder
- [Hexagon] Fix deprecated call for data layout size in bits
- [Hexagon] Allow scalar tensors to have null shape during allocation
- [Hexagon][runtime] Make HexagonThreadManager::CheckSemaphore thread safe
- [Hexagon] Float and quantized dense operators with schedules
- [Hexagon][CI] Updated sha for builder LLVM
- [Hexagon][CI] Update the docker image ID to reflect newer LLVM
- [Hexagon] Switch from default_rng to random in Hexagon tests
- [Hexagon] Add hexagon user DMA intrins for tensorization
- [hexagon] Hexagon inference fix
Metal
- [METAL][CODEGEN] testcase for ramp codegen
- [CODEGEN][METAL] Fix unaligned vector load
- [CODEGEN][METAL] Fix ramp codegen
MicroNPU
- [microNPU] Sum legalization support
- [microNPU] Add rescale parameters for binary elementwise
- [microNPU] Add hardware constraints for binary elementwise
- [microNPU] Add support for TFLite PAD
- [microNPU] Upgrade Vela to v3.7.0
- [microNPU] Merge LUT activation with binary elementwise operation
- [microNPU] Upgrade to 22.08 version of Arm(R) Ethos(TM)-U NPU drivers
- [microNPU] Add relu6 relu_n1_to_1 test cases for Ethos-U
- [microNPU] Add a legalization test for TFLite PAD
- [[microNPU] Disable copying weights to SRAM for FullyConnected ops in CopyConstants scheduler](https://github.com/ap...
Apache TVM v0.11.1
Apache TVM v0.11.0
Introduction
The TVM community has worked since the v0.10.0 release to deliver the following new exciting improvements!
-
Metaschedule
- Tuning API improvements and anchor-block tuning
-
TVMSCript metaprogramming
- Lots of progress wiht TVMScript, with the introduction of a core parser, AST, Evaluator, Source and diagnostics
And many other general improvements to microTVM, code quality, CI, frontends, and more! Please visit the full listing of commits for a complete view: v0.10.0...v0.11.0.
RFCs
These RFCs have been merged in apache/tvm-rfcs since the last release.
What's Changed
Note that this list is not comprehensive of all PRs and discussions since v0.10. Please visit the full listing of commits for a complete view: v0.10.0...v0.11.0.
Adreno
- [Adreno] Add global pooling schedule (#13573)
- [Adreno] Add documentation for Adreno deployment (#13393)
- [Adreno] Fix mem_scope annotations for prim funcs having several heads (#13153)
- [Adreno] Adapt reduction schedule for adreno (#13100)
- [Adreno] Fix winograd accuracy (#13117)
- [Adreno][Textures] Fix static memory planner (#13253)
- [DOCKER][Adreno]Docker infra for Adreno target with CLML support (#12833)
AoT
- [AOT] Add CreateExecutorMetadata analysis pass (#13250)
- [AOT] Add CreateFunctionMetadata analysis pass (#13095)
- [AOT] Sanitize input/output name in runtime (#13046)
Arith
- [Arith] Add internal NarrowPredicateExpression utility (#13041)
- [Arith] Optional rewriting and simplification into AND of ORs (#12972)
arm
- [bfloat16] Fixed dtype conversion in the arm_cpu injective schedule (#13417)
AutoTVM
- [AutoTVM] Introducing multi_filter into ConfigSpace autotvm (#12545)
Build
- [BUILD] Re-enable ccache by default (#12839)
CI
- [ci] Fix docs deploy (#13570)
- [ci] Split Jenkinsfile into platform-specific jobs (#13300)
- [ci] Dis-allow any non-S3 URLs in CI (#13283)
- [ci] Split out C++ unittests (#13335)
- [CI] Separate the ci scripts into Github and Jenkins scripts (#13368)
- [ci] Assert some tests are not skipped in the CI (#12915)
- [ci] Ignore JUnit upload failures (#13142)
- [ci] Lint for trailing newlines and spaces (#13058)
- [ci] Template build steps (#12983)
- [ci][docker] Allow usage of ECR images in PRs (#13590)
- [ci][docker] Read docker image tags during CI runs (#13572)
- [ci][wasm] Add package-lock.json to git (#13505)
CL
- [ACL] Enable int8 data type in pooling operators (#13488)
CMSIS-NN
- [CMSIS-NN] Support for int16 conv2d (#12950)
- [CMSIS-NN] Support for int16 in fully connected layer (#13484)
DNNL
- [AMP] refine AMP and the corresponding tests for bfloat16 (#12787)
Docker
- [Docker]Refactor timezone script and NRF installation (#13342)
Docs
- [docs] Fix empty code blocks in tutorials (#13188)
Ethos-N
- [ETHOSN] Consolidate target string usage (#13159)
- [ETHOSN] Throw error message when inference fails (#13022)
- [ETHOSN] Inline non-compute-intensive partitions (#13092)
- [ETHOSN] Transpose fully connected weights (#12970)
- [ETHOSN] Support conversion of add/mul to requantize where possible (#12887)
Frontend
- [TFLite] Enable int64 biases for int16 quantized operators (#12042)
Hexagon
- [Hexagon] Add HVX quant conv2d implementation (#13256)
- [Hexagon] Add test to show scheduling of resnet50 with async dma pipe… (#13352)
- [Hexagon] Enable Hexagon User DMA bypass mode (#13381)
- [Hexagon] Lint tests part 2 (#13271)
- [Hexagon] Add pylint on tests (#13233)
- [Hexagon] Add E2E test demonstrating how to apply blocked layout schedule to conv2d via metaschedule (#13180)
- [Hexagon] Add a test to show how to use multi input async dma pipelin… (#13110)
- [Hexagon]: Add upload function to hexagon session (#13161)
- [Hexagon] Add support for instrumentation based profiling for Hexagon (#12971)
- [Hexagon] Add power manager (#13162)
- [Hexagon] Add scripts for e2e MetaSchedule tuning demonstration (#13135)
- [Hexagon] Add feature to copy logcat to --hexagon-debug and add new --sysmon-profile option to run sysmon profiler during the test (#13107)
- [Hexagon] Async DMA pipelining test suite (#13005)
- [Hexagon] Enable multi input Async DMA; same queue / stage (#13037)
- [Hexagon] Do not use
targettest fixture in Hexagon tests (#12981) - [Hexagon] 3-stage pipeline; multi queue async DMA for cache read / write (#12954)
- [Hexagon] vrmpy tensorization for e2e compilation of int8 models (#12911)
- [Hexagon] Support template-free meta schedule tuning (#12854)
- [Hexagon] depth_to_space slice op (#12669)
- [Hexagon] Make allocate_hexagon_array a hexagon contrib API (#13336)
- [Hexagon] Add fix for vtcm allocation searches (#13197)
- [MetaSchedule][Hexagon] Add postproc for verifying VTCM usage (#13538)
- [Hexagon][QNN] Add TOPI strategies for qnn ops mul/tanh/subtract (#13416)
- [Logging][Hexagon] Improve logging on Hexagon (#13072)
- [Hexagon] [runtime] Per-thread hardware resource management (#13181)
- [Hexagon] [runtime] Create objects to manage thread hardware resources (#13111)
- [QNN][Hexagon] Disable QNN canonicalization pass (#12398)
- [Hexagon] [runtime] Manage RPC and runtime buffers separately (#13028)
- [Hexagon] [runtime] VTCM Allocator (#12947)
- [TOPI][Hexagon] Add schedule and test for maxpool uint8 layout (#12826)
- [TOPI][Hexagon] Implement quantize op for hexagon (#12820)
- [Meta Schedule][XGBoost] Update the custom callback function of xgboost in meta schedule (#12141)
- [TIR] [Hexagon] Add vdmpy intrinsic and transform_layout for tests (#13557)
- [Hexagon] [runtime] Support VTCM alignments of 128 or 2k (#12999)
- [HEXAGON][QHL] Clippling the inputs of HVX version of QHL Sigmoid operation (#12919)
- [Hexagon] [runtime] Add user DMA to device API resource management (#12918)
LLVM
- [LLVM] Emit fp16/fp32 builtins directly into target module (#12877)
- [LLVM] Switch to using New Pass Manager (NPM) with LLVM 16+ (#13515)
MetaSchedule
- [MetaSchedule] Make
MultiLevelTilingapply condition customizable (#13535) - [MetaSchedule] Enhance Database Validation Script (#13459)
- [MetaSchedule] Fix Dynamic Loop from AutoBinding (#13421)
- [MetaSchedule] Support schedules with cache read in RewriteLayout (#13384)
- [MetaSchedule] Improve inlining and
VerifyGPUCodefor quantized model workload (#13334) - [MetaSchedule] Add JSON Database Validation Scripts (#12948)
- [MetaSchedule] Fix the order of applying
AutoInlineinScheduleUsingAnchorTrace(#13329) - [MetaSchedule] Refactor ScheduleRule Attributes (#13195)
- [MetaSchedule] Improve the script for TorchBench model tuning & benchmarking (#13255)
- [MetaSchedule] Enable anchor-block tuning (#13206)
- [MetaSchedule] Introduce a variant of ModuleEquality to enable ignoring NDArray raw data (#13091)
- [MetaSchedule] Consolidate module hashing and equality testing (#13050)
- [MetaSchedule] Support RewriteLayout postproc on AllocateConst (#12991)
- [MetaSchedule] Tuning API cleanup & ergonomics (#12895)
- [MetaSchedule] Fix XGBoost Import Issue (#12936)
- [MetaSchedule] Add Script for TorchBench Model Tuning & Benchmarking (#12914)
- [MetaSchedule] Restore
num_threadsparameter in tuning API (#13561) - [MetaSchedule] TorchBench tuning script: add option to disallow operators in sub graph (#13453)
- [MetaSchedule] Fix segfault in gradient based scheduler (#13399)
- [MetaSchedule] Add
from-targetDefaults for x86 VNNI Targets (#13383) - [MetaSchedule] Fix Task Hanging in EvolutionarySearch (#13246)
- [MetaSchedule] Allow skipping exact NDArray rewrite in RemoveWeightLayoutRewriteBlock (#13052)
- [MetaSchedule][UX] Support Interactive Performance Table Printing in Notebook (#13006)
- [MetaSchedule][UX] User Interface for Jupyter Notebook (#12866)
microNPU
- [microNPU] Upgrade Vela to v3.5.0 (#13394)
- [microNPU] Fixed MergeConstants pass on striped networks (#13281)
microTVM
- [microNPU] Upgrade Vela to v3.5.0 (#13394)
- [microNPU] Fixed MergeConstants pass on striped networks (#13281)
- [microTVM] Modernize Arm Cortex-M convolution schedules (#13242)
- [microTVM] Improve code reuse in Corstone300 conv2d tests (#13051)
- [microTVM] Add Cortex-M DSP schedules for optimal conv2d layouts (#12969)
- [microTVM] Use default Project Options in template projects and add Makefile for Arduino template project (#12818)
- [microTVM] Generalize depthwise_conv2d schedule (#12856)
- [microTVM] add the option to open a saved micro project for debugging (#12495)
- Added macro generation in MLF export (#12789)
- [microTVM][Arduino]Add
serial_numberto project options and tests (#13518) - [microTVM][Zephyr] Add 'serial_number' option (#13377)
- [microTVM][PyTorch][Tutorial]Adding a PyTorch tutorial for microTVM with CRT (#13324)
Misc
- [CodegenC] Explicit forward function declarations (#13522)
- [FQ2I] Support converting
dense->addtoqnn.dense->add->requantize(#13578) - [Minor][Testing] Consolidate IRs into corresponding functions (#13339)
- Add recursive on loop with marked kUnrolled (#13536)
- Skip stride check if shape is 1 in IsContiguous (#13121)
- [TEST] CPU feature detection for x86 and ARM dot product instructions (#12980)
- [Node] Expose StructuralEqual/Hash handler implemenation...
Apache TVM v0.10.0
Introduction
The TVM community has worked since the v0.9 release to deliver the following new exciting improvments!
- Metaschedule
- Software pipelining and padding for irregular shapes for auto tensorization
- Stabilized and polished user-interfaces (e.g.
databasechanges,tune_relay) - A new MLP-based cost model
- TIR
- New schedule primitive for
PadEinsum - A new TIR node:
DeclBuffer - INT8 Intrinsics for TensorCores for CUDA!
- New schedule primitive for
- microTVM
- Improved schedule primitives for ARM v8-m ISA
And many other general improvements to code quality, TVMScript, and more! Please visit the full listing of commits for a complete view: v0.9.0...v0.10.0rc0.
RFCs
These RFCs have been merged in apache/tvm-rfcs since the last release.
What's Changed
Please visit the full listing of commits for a complete view: v0.9.0...v0.10.0rc0.
Note that this list is not comprehensive of all PRs and discussions since v0.9. A non-truncated summary can be found here: #12979
TIR
- #12720 - [TIR] Implement API for padded layout transformations
- #12797 - [TIR] Construct the inverse in SuggestIndexMap
- #12827 - [TIR] Support pattern matching argmax/argmin generated by TOPI
- #12750 - [TIR, Schedule] Add schedule primitive PadEinsum
- #11639 - [TIR][Meta-Schedule] Tuple-reduction scheduling support
- #12515 - [TIR][Arith] Add more strict checking in imm construction and folding.
- #12717 - [TIR, Schedule] Check consumer in-bound and covered in reverse_compute_inline
- #12652 - [TIR] Handle axis_separators during FlattenBuffer
- #12623 - [TIR] Expose MMA-related PTX builtins
- #12607 - [TIR][Schedule] enhance compute_at and reverse_compute_at primitive to choose possible position
...
Apache TVM v0.9.0
Introduction
The TVM community has worked since the v0.8 release to deliver many exciting features and improvements. v0.9.0 is the first release on the new quarterly release schedule and includes many highlights, such as:
- MetaSchedule's full implementation
- ARM cascading scheduler for Arm Ethos(TM)-U NPUs
- Collage which brings tuning to BYOC
- Several microTVM improvements
- New
tvm.relay.buildparameters -runtime=,executor=, - AOT - Support for the C++ runtime (with
llvmandctargets only) and support for host-driven AOT in the C runtime - Hexagon RPC support
- Testing via Hexagon SDK simulator and on device via Snapdragon-based HDK boards and phones
- AOT and USMP support
- Threading
- Initial op support
- MLF - Support for multiple modules in a single MLF artifact
- Several TIR schedule primitives and transforms including (abridged):
schedule.transform_layout- Applies a layout transformation to a buffer as specified by an IndexMap.schedule.transform_block_layout- Applies a schedule transformation to a block as specified by an IndexMap.schedule.set_axis_separators- Sets axis separators in a buffer to lower to multi-dimensional memory (e.g. texture memory).transform.InjectSoftwarePipeline- Transforms annotated loop nest into a pipeline prologue, body and epilogue where producers and consumers are overlapped.transform.CommonSubexprElimTIR- Implements common-subexpression elimination for TIR.transform.InjectPTXAsyncCopy- Rewrites global to shared memory copies in CUDA with async copy when annotated tir::attr::async_scope.transform.LowerCrossThreadReduction- Enables support for reductions across threads on GPUs.
- And many more! See the list of RFCs and PRs included in v0.9.0 for a complete list, as well as the full change list.
RFCs
These RFCs have been merged in apache/tvm-rfcs since the last release.
- [RFC] TUNIP: TVMScript Unified Printer (#74) (
48d47c5) - [RFC][Backend] RFC-CSI-NN2-Integration (#75) (
cfcf114) - [RFC] Introducing DeclBuffer (#70) (
87ff1fa) - [RFC][MLF] Model Library Format with Multiple Modules (#76) (
f47c6ad) - [RFC] UMA Universal Modular Accelerator Interface (#60) (
6990e13) - [RFC] DietCode: An Auto-Scheduler for Dynamic Tensor Programs (#72) (
a518000) - [RFC] Quarterly Releases (#67) (
70293c7) - RFC-BYOC-DNNL-Integration (#73) (
7aed0ca) - [RFC] Relay Next Roadmap (#69) (
ac15f2a) - RFC: clarifying buffer declaration and access (#63) (
de4fe97) - Inclusive Language RFC (#68) (#68) (
4203bd2) - [USMP] Adding U4 usecase (#65) (
b9e246f) - Collage RFC (#62) (
23250f5) - Replace codeowners with more relevant automation (#58) (
540c1f8) - [RFC][TIR] Layout transformations on buffer access (#39) (
b675ef8) - Module Based Model Runtime for AOT (#46) (
d9dd6eb) - @slow test RFC (#55) (
9b6203a) - [RFC][Roadmap] TVM Continuous Integration & Testing Roadmap (#54) (
41e5ba0) - Bring
PackedFuncinto TVM Object System (#51) (2e0de6c) - [RFC][OpenCLML] OpenCLML integration as BYOC (#52) (
f5ef65f) - Introduce the Arm(R) Ethos(TM)-U Cascading Scheduler (#37) (
f9fa824) - [RFC][Roadmap] microTVM roadmap (#53) (
1b14456) - Add Managed Jenkins Infrastructure for TVM RFC (#49) (
a3a7d2c) - TVM Roadmap RFC (#50) (
263335f) - [RFC] Integrate LIBXSMM with TVM. (#47) (
1a3d4f1) - [RELAY][AST] Add virtual device as a first class field to Relay expressions (#45) (
67c39d2)
What's Changed
Note that this list is not comprehensive of all PRs and discussions since v0.8. Please visit the full listing of commits for a complete view: v0.8.0...v0.9.0.rc0.
AOT
- #11208 - Calculate used memory at the callsite of primitive functions
- #11365 - Fix function number datatype from char to uint16_t
- #11091 - Enable A-Normal Form in the AOT executor
- #10753 - Support LLVM backend with C++ runtime
- #10518 - Use python temporary directory for AOT tests
- #10337 - BugFix of workspace calculation
- #10282 - [runtime] Add Metadata classes for AOTExecutor
- #9501 - [3/3][DeviceAPI] Wire up cpacked Device API context
- #9500 - [2/3][DeviceAPI] Add Hooks for Activate/Deactivate/Open/Close
- #9395 - [1/3][DeviceAPI] Connecting devices structure to relevant operators
BYOC
- #11474 - Two helper passes for external codegen using RelayToTIR custom pass machinery
- #11144 - Remove support for run-time linked-params from codegen
- #10590 - Add order to functions in C Codegen
- #11638 - [DNNL][CBLAS]Unifles all MKLDNN/DNNL to DNNL
- #11619 - RelayToTIR custom codegen passes can still depend on dynamic shape functions
- DNNL - #11902, #11642, #11513, #11571, #11560, #11345, #11111, #10837, #10421, #9995, #9797
- TensorRT - #11923, #11203, #10759, #10772, #10388
- CMSIS-NN - #11732, #11625, #10939, #11013, #10817, #10563, #10224, #10148, #10100, #9338, #9531, #9409, #9331
- OpenCLML - #10243
- CUTLASS - #11631, #10185, #10177, #10110, #10036, #9899, #9820, #9800, #9795, #9746, #9737, #9698, #95...
Apache TVM v0.8 Release Note
Overview
Apache TVM v0.8 brings several major exciting experimental features, including:
- PaddlePaddle frontend
- TVMScript: round-trippable python-based syntax for TIR
- TorchScript integration
- TensorIR scheduling language
- TensorRT and CUTLASS integration via BYOC
- Int4 TensorCore support in AutoTVM
- MicroTVM Project API and Zephyr, Arduino support
- AOT executor
- Robust Windows support
- Affine analysis infra: iter-affine-map
- Improved Vulkan backend
- CUDA graph support in TVM runtime
Besides, The community has been working together to refactor and evolve the existing infrastructure, including but not limited to:
- Relay compilation engine
- Relay pattern language
- CI and build process
- Refactoring documentation and tutorials
- Stablizing AutoScheduler
- Stablizing TVMC command line driver interface
- Stablizing target system
- Frontend coverage, quantization, dynamic shape, training
Full changelog: https://gist.github.com/junrushao1994/c669905dbc41edc2e691316df49d8562.
Accepted RFCs
The community has adopted a formal RFC process. Below is a list of the formal RFCs accepted by the community since then:
- [RFC-0005] Meta schedule (AutoTIR)
- [RFC-0006] Automatic mixed-precision pass and support
- [RFC-0007] Parametrized unit tests
- [RFC-0008] MicroTVM Project API
- [RFC-0009] Unified static memory planner
- [RFC-0010] Target-registered compiler flow customisation
- [RFC-0011] Arm® Ethos-U integration
- [RFC-0014] Pipeline executor
- [RFC-0015] Use CMSIS-NN with TVM
- [RFC-0019] Add PaddlePaddle frontend
- [RFC-0020] Extend metadata in project option
- [RFC-0022] TIR non-scalar constants
- [RFC-0023] Adding annotation field to
tir.allocatenodes - [RFC-0025] PyTorchTVM
- [RFC-0027] Formalize TVM documentation organization
- [RFC-0028] Command line composition from internal registry
- [RFC-0029] Migrating target attributes to IRModule
- [RFC-0030] Command line configuration files
- [RFC-0031] C Device API
- [RFC-0036] TVMScript namespace
- [RFC-0041] Update TVMScript block syntax
Features and Improvements
TE, TIR, TVMScript
- TVMScript parser and printer #7630 #9115 #9286
- Scheduleable TIR (S-TIR) infrastructure, analysis and lowering passes #7553 #7765 #7847 #8114 #8121 #7873 #7923 #7962 #7848 #8044 #7806
- S-TIR schedule primitives:
compute-inline,reverse-compute-inline,fuse,split,rfactor,storage-align,vectorize,unroll,bind,reorder,cache-read,cache-write,compute-at,reverse-compute-at,decompose-reduction#8170 #8467 #8544 #8693 #8716 #8767 #8863 #8943 #9041 - While loop in TIR #7425 #9004
- Metaprogramming in S-TIR via
specialize#8354 - Support Return value in TIR #7084 #7932
- Storage scope support in
PointerType#8017 #8366 #8463 - Creation of S-TIR via TE compute #7987
AutoTVM, AutoScheduler, Meta Schedule
- PopenPoolExecutor is used to replace python native library to provide better multiprocessing support as well as enable auto-tuning in Jupyter notebooks for AutoTVM and AutoScheduler #6959 #8492 #8913 #8820 #8851
- AutoScheduler improvement and stabilization: task scheduler, layout rewrite, early stopping, dispatching #6945 #6750 #6987 #7156 #8862 #8995 #7571 #7376 #7377 #7344 #7185
- AutoScheduler support for sparse workloads #7313 #7635 #8065
- AutoScheduler support for Vulkan, ROCm, Mali #7626 #7038 #7132
- AutoTVM support for int4 TensorCore #7831 #8402
- Meta Schedule core infrastructure, builder runner and database #8615 #8623 #8642 #8817 #9079 #9132 #9154 #9053 #9059 #9044 #9111 #9061 #9153
Operator Coverage
- Operators for Int-8 vision transformer on GPU #7814
- Optimizing NMS and ROI-related kernel on GPU #7257 #7172 #7136 #7796 #7463 #6516 #7440 #7666 #8174
- Support and optimize sparse operators #8605 #7477 #7435 #6889 #6580 #8437
- Sort-related operators and optimization #9184 #7669 #8672 #7611 #7195 #7056 #6978
- Support for einsum operator #6370
- Matmul, dense operators and their optimization #8921 #8527 #8234 #8250 #6616 #8229 #8401 #7404 #8669
- Convolution and pooling operators and their optimization #8620 #8936 #8584 #7075 #7142 #7515 #6999 #6899 #6840 #6137 #6802 #6445 [#671...
Apache TVM (incubating) v0.7.0
Apache TVM (incubating) is an effort undergoing incubation at The Apache Software Foundation (ASF), sponsored by the Apache Incubator PMC.
Incubation is required of all newly accepted projects until a further review indicates that the infrastructure, communications, and decision making process have stabilized in a manner consistent with other successful ASF projects.
While incubation status is not necessarily a reflection of the completeness or stability of the code, it does indicate that the project has yet to be fully endorsed by the ASF.
Introduction
v0.7 brings many major features. The community works together to refactor the internal code base to bring an unified IR code structure with unified IRModule, type system and pass infrastructure. We have also bought many exciting new features, some highlights include:
- Initial automatic scheduling support
- Initial command line driver interface
- WebGPU and webassembly support
- Better first class rust support in the codebase
- Intial Hexagon support
- Bring your own codegen (BYOC) support
The community also continues to bring high quality improvements to the existing modules including, but not limited to: better frontend coverage, performance, quantization, uTVM and dynamic shape support.
New Features
Automatic Scheduling (Experimental)
- Phase 0: Ansor minimum system for auto schedule generating #5962
- Phase 1: Access Analyzer #6103
- Phase 1: Add
follow_splitandfollow_fused_splitsteps #6142 - Phase 1: Add
pragma/storage_align/rfactorsteps #6141 - Phase 1: Add RPC Runner #6077
- Phase 1: Add
annotation/compute_at/compute_root/compute_inlinesteps #6073 - Phase 1: Add
cache_read/cache_writesteps #6107 - Phase 1: Rename namspace form
auto_scheduletoauto_scheduler#6059 - Phase 1: The base class for cost models #6187
- Phase 1: feature extraction for cost models #6190
- Phase 1: XGBoost Cost Model #6270
- Phase 2: Basic GPU Sketch Search Policy #6269
- Phase 2: Evolutionary Search #6310
- Phase 2: Update heavy operations with
parallel_for#6348 - Parallel the InitPopulation (#6512)
- Tutorial: Using the template-free auto-scheduler on CPU (#6488)
BYOC
- External codegen support in Relay (#4482),(#4544)
- Bring Your Own Codegen Guide -- Part 1 #4602
- Bring Your Own Codegen Guide -- Part 2 #4718
- Relay annotation and partitioning for external compilers #4570
- JSON Runtime with DNNL End-to-End Flow #5919
- Handle one symbol for each runtime #5989
- Run accelerator specific optimizations #6068
- Arm Compute Library integration #5915
- Retire the example json runtime #6177
json_node.hshould includedata_type.h#6224- Improve installation tutorial #6170
- Add support for dense (fully connected) layer #6254
- Introduce the Ethos-N BYOC integration #6222
- Enable remote device via environment variables #6279
- Improved pooling support #6248
- Add support for quantized convolution #6335
- CoreML codegen #5634
Operator Coverage
- Add
strided_setoperation (#4303) - Add support for conv3d (#4400), pool3d (#4478), 3d upsampling ops (#4584)
- Add group convolution for VTA (#4421)
- Add 1d deconvolution op (#4476)
- Allow batch matmul to be fused into injective ops (#4537)
- Add native depthtospace and spacetodepth operators (#4566)
- Add CUDNN conv3d support (#4418)
- Dilation2D operator support #5033
- Isfinite operator #4981
- Unravel Index operator #5082
- Add thrust support for nms #5116
- Resize3d, Upsample3d op support #5633
- Add operator Correlation #5628
affine_gridandgrid_sample#5657- Sparse to dense operator #5447
Conv3d_transposeop support added #5737- add op
crop_and_resize#4417 - Add bitwise ops #4815
- Sparse to dense operator #5447
- support dynamic NMS(Non Maximum Suppression), symbolic begin, end, and strides for strided_slice #4312
Conv3d_transposeop support added #5737- ReverseSequence operator #5495
- Conv1D #4639
- 1D Pooling #4663
Quantization
- Channel wise quantization - Quantize & Requantize #4629
- Support QNN ops. #5066
- Adding support for QNN subtract op #5153
- TFLite QNN Tutorial #5595
- Tutorial: Deploy Quantized Model on CUDA #4667
- Support asymmetric per-layer quantized operators #6109
Relay
- Add convertlayout pass in Relay (#4335, #4600)
- Added Merge Composite pass #4771
- Call graph for relay #4922
- Add inline pass #4927
- Target annotation for external codegen #4933
- GradientCell Relay Pass #5039
- Add MergeCompilerRegions pass #5134
- Non-recursive Graph Vistor and Rewriter (#4886)
- [Blocksparse] Pipeline for lowering dense model to sparse-dense (#5377)
- Relay op strategy #4644
- Static Tensor Array (#5103)
- Memory planner (part 1) #5144
- ONNX codegen #5052
- Add Parser 2.0 #5932, part 2 #6162
- Basic block normal form #6152
- Convert Layout pass. #4664
- Pattern Language, Matcher, Rewriter, and Function Paritioner #5231
Runtime and Backend
- Add ADTObject POD container type (#4346)
- TFLite RPC runtime (#4439)
- Standardized graph runtime export (#4532)
- MISRA-C compliant TVM runtime #3934
- Add String container #4628
- Introduce Virtual Memory Allocator to CRT (#5124)
- Initial implementation of Hexagon runtime support (#5252)
- FastRPC interface for Hexagon runtime (#5353)
- CoreML Runtime (#5283)
- AutoTVM + uTVM for Cortex-M7 (#5417)
- Windows Support for cpp_rpc (#4857)
- Implement TVMDSOOp(TensorFlow custom op) for TVM runtime (#4459)
- WebGPU support #5545
- TVM WebAssembly JS Runtime #5506
- Hexagon driver for offloading kernels to simulator #5492
- Introduce runtime::Array #5585
- Allow non-nullable ObjectRef, introduce Optional. (#5314)
- Introduce static slots for common objects. (#5423)
- ntroduce RValue reference(move) support to TypedPackedFunc (#5271)
- Introduce MetadataModule to separate code compilation/interpretation and weight initialization #5770
- Support module based interface runtime #5753
- Add TVM application extension with WASM runtime #5892
- Provide guide to user who has difficulty register SEqualReduce (#5300)
Rust Support
- Revive the Rust + SGX refactor #4976
- Improve Rust bindings: Map, Array, String, various IR nodes #6339
- Rust Refactor Stage 4: Rewrite Rust graph runtime to use new APIs #5830
- Second stage of Rust Refactor #5527
- tvm crate stage 3 of Rust refactor #5769
- Add first stage of updating and rewriting Rust bindings. #5526
TIR
- Introduce StructuralHash for the Unified IR. #5160
- Introduce StructuralEqual Infra for the unified IR. #5154
- Introduce ExprDeepEqual, Remove IRDeepCompare #5206
- [TIR] Introduce BufferLoad/Store (#5205)
- Improved massive build times caused by tir.floormod and tir.floordiv. Fixed Topi testcase. #5666
- Buffer logger assert removed #6147
- Enhance VerifyGPUCode #6194
- HoistIfThenElse added #6066
- Hybrid Script Support for TIR #6227
- Migrate Low-level Passes to Pass Manager #5198
- HoistIfThenElse added #6066
- Hybrid Script Support for TIR #6227
- Block scope hoisting added #6238
TE
- reverse-mode autodiff without any optimization #5121
- Tensor Expression Debug Display (TEDD) #4651
- Optimize and eliminate the Jacobian tensor for te.autodiff #6078
TVMC(Experimental)
- TVMC - A command line driver for TVM (Part 1) #6112
- TVMC - Linting error on onnx command line driver frontend #6536
- TVMC - Command line driver 'compile' (part 2/4) #6302
- TVMC - Introduce 'tune' subcommand (part 3/4) #6537
- TVMC - Introduce 'run' subcommand (part 4/4) #6578
- TVMC - Getting started tutorial for TVMC #6597
Feature Improvement
Accelerator and Microcontroller Support
- Cleanup legacy verilog code (#4576)
- uTVM support for ARM STM32F746XX boards (#4274)
- Add --runtime=c, remove
micro_devtarget, enable LLVM backend #6145
Arithmetic Analysis
- Linear system and equation solver (#5171)
- Inequalities solver #5618
- Improve IntervalSet's floormod (#5367)
- Remove legacy const pattern functions (#5387)
- Handle likely in IRMutatorWithAnalyzer #5665
- ExtendedEuclidean merge impl to int_operator #5625
- Rewrite simplify fix for Vectorized Cooperative Fetching #5924
AutoTVM and Graph Tuner
- Adding ROCM schedules for TOPI (#4507)
- NHWC conv2d schedule templates for ARM (#3859)
- Use VM compile to extract autotvm tasks #4328
- Download fallback schedule file if it does not exist #4671
- Ignore error when removing tmpdir #4781
- Fix a bug in generating the search space #4779
- Minor bug fixes in AutoTVM for QNN graphs #4797
- Fix autotvm customized template #5034
- Add opt out operator for
has_multiple_inputsfor graph tuner #5000 - Customize SI prefix in logging (#5411)
- Update XGBoost verbosity option #5649
- Support range in index based tuners #4870
- Enable random fill and CPU cache flush for AutoTVM and Ansor (#6391)
- Auto-scheduler tutorial for GPU and necessary refactor/fix (#6512)
BYOC
- [BYOC] Bind constant tuples in graph partitioner (#5476)
- [BYOC] Add support for composite functions in BYOC (#5261)
- [BYOC] Register pattern tables from external codegens (#5262)
- [BYOC] Enhance partitioning and external codegen (#5310)
- [BYOC] Refine AnnotateTarget and MergeCompilerRegion Passes (#5277)
- [BYOC] Use Non-Recursive Visitor/Mutator (#5410)
- [BYOC] Refine DNNL Codegen (#5288)
- [BYOC] Add example of Composite + Annotate for DNNL fused op (#5272)
- [BYOC] Prevent duplicate outputs in subgraph Tuple (#5320)
- [BYOC] Introduce further operator support (#6355)
- [BYOC] Support input nodes with multiple entries (#6368)
- [BYOC] Add maximum support for float32 (#6506)
Codegen
Apache TVM (incubating) v0.6.1
Apache TVM (incubating) is an effort undergoing incubation at The Apache Software Foundation (ASF), sponsored by the Apache Incubator PMC.
Incubation is required of all newly accepted projects until a further review indicates that the infrastructure, communications, and decision making process have stabilized in a manner consistent with other successful ASF projects.
While incubation status is not necessarily a reflection of the completeness or stability of the code, it does indicate that the project has yet to be fully endorsed by the ASF.
Apache TVM (incubating) 0.6.1 is a maintenance release incorporating important bug fixes and important performance improvements. All users of Apache TVM (incubating) 0.6.0 are advised to upgrade. Please review following release notes to learn the bug fixes.
Bug Fixes
- Fixed process termination routine in windows #4844
- [Runtime] Fix NDArray SaveDLTensor declaration and implementation signature different #4586
- [NODE][Serialization]fix serialization precision loss in float #4503
- [Relay][Frontend][TF] fix _parse_param bug #4711
- Fix bias_add gradient #4516
- Make sure to visit the arguments of inlined functions #4783
- Fix Python syntax error in start_rpc_server_to_tracker.py #4682
- [Bugfix] Fixed crash caused by reversing bitwise operations #4852
- [Fix][VM] Fix copy constructor #5237
- fix small bug about dense_grad #5695
- [Fix] Fix conv2d alter op for arm cpu #5532
- [Fix] Fix dense x86 schedule #4728
- [Relay][Fix] Fix alter op layout when calling a global var #4454
- [Relay][Pass] Fix lambda lift pass for recursive call #4432
- [BUGFIX] Fix search path for libtvm_topi.so #4467
- [Bugfix] Fix Python debugger segfaults with TVM built with LLVM #5685
- [RUNTIME] Fix compile errors of OpenCL FPGA backend #4492
- [BUGFIX][BACKPORT-0.6][ARITH] Fix FloorMod Simplifier #5509
- Some Windows and MSVC fixes #4569
- [Chisel][VTA] Fix multiple transfer issue in LoadUop module #4442
- [VTA] Fix an issue in updating uop_idx in the TensorGemm module #4694
- [VTA] Fixed a crash issue in TSIM driver #4527
- [VTA] Enable streamlined GEMM execution #4392
- [VTA][Chisel] End-to-end Inference with Chisel VTA #4574
- Added declare of aluBits for TensorAlu #4624
- [Quantization] Fix annotation for multiply op #4458
- LRN only supports 4D tensors, remove it from alter_op_layout #5520
- fix topi.nn.global_pool layout="NHWC" #4656
- [FFI][Windows] Fix hasattr by extracting Python error type from Windows error message #4780
- [Runtime] Export GraphRuntime in tvm_runtime.dll #5002
- Fix Base64OutStream portability issue #4668
- [AUTOTVM] Fix a bug in generating the search space #4779
- [Relay][VM] Fix compilation of If-Elses #5040
- [RELAY][FRONTEND][TENSORFLOW] Fix FuseBatchNorm output cast error if need_cast is True #4894
- [Bugfix] fskip of EliminateCommonSubexpr cannot always return false #4620
- [Fix] Add ConstantNode to IsAtomic #5457
- [Fix] Fix RemoveUnusedFunctions pass #4700
- [Realy][fix] Fix alpha_equal bug for attribute check #4897
- [Arith] keep div_mode during floordiv simplify #5922
- [ARITH][BACKPORT-0.6] fix a min/max simplify bug #5761
- [0.6-BACKPORT] Improve robustness of the docs build #5583
Apache TVM (incubating) v0.6.0
Apache TVM (incubating) is an effort undergoing incubation at The Apache Software Foundation (ASF), sponsored by the Apache Incubator PMC.
Incubation is required of all newly accepted projects until a further review indicates that the infrastructure, communications, and decision making process have stabilized in a manner consistent with other successful ASF projects.
While incubation status is not necessarily a reflection of the completeness or stability of the code, it does indicate that the project has yet to be fully endorsed by the ASF.
New Features
Relay in Production
Relay is a functional, differentiable programming language designed to be an expressive intermediate representation for machine learning systems. Relay supports algebraic data types, closures, control flow, and recursion, allowing it to directly represent more complex models than computation graph-based IRs (e.g., NNVM) can. In TVM v0.6, Relay is in stable phase and is ready for production.
- Algebraic Data Types (ADT) support (#2442, #2575). ADT provides an expressive, efficient, and safe way to realize recursive computation (e.g., RNN). Refer to https://docs.tvm.ai/langref/relay_adt.html for more information.
- Pass manager for Relay (#2546, #3226, #3234, #3191)
- Most frameworks have been supported in Relay, including ONNX, Keras, Tensorflow, Caffe2, CoreML, NNVMv1, MXNet (#2246).
- Explicitly manifest memory and tensor allocations in Relay. (#3560)
Relay Virtual Machine
The Relay Virtual Machine (Relay VM) is the new generation of runtime to strike a balance between performance and flexibility when deploying and executing Relay programs. Previously, the graph runtime is able to utilize the fully static nature of the input graphs to perform aggressive optimization such as fully static allocation, and optimal memory reuse. When we introduce models which make use of control-flow, recursion, dynamic shapes, dynamic allocation we must change how execution works.
Relay VM is now usable and is able to achieve decent performance for a various of models and targets.
- Design (#2810 #2915) and a first version of implementation (#2889),
- Add VM runtime for Relay and compiler support (#3120, #3121, #2889, #3139)
- Relay VM (pattern matching #3470, port to python #3391, serialization #3647)
- Relay VM Profiler (#3727)
- Support execution on devices for Relay VM (#3678)
- [Relay][VM] Add more passes to VMCompiler (#4058)
- [relay][vm] Separate VM runtime with executable (#4100)
- Port VM, VM compiler, and Object into Python (#3391)
- VM: Add AllocTensor instruction and better instruction printer (#3306)
- [Relay][VM][Interpreter] Enable first-class constructors in VM and interpreter via eta expansion. (#4218)
- [Relay][VM] Clean up the VM and VM profiler code (#4391)
Training
Relay is designed to natively support first-order and higher-order differentiation. The automatic differentiation infrastructure is now usable and a count of operators with gradient support are available in v0.6 release.
- Higher order reverse mode automatic differentiation that work with control flow (#2496)
- Higher order continuation passing style (#3456, #3485 )
- Relay gradient registration (clip #3509, max_pool2d and avg_pool2d #3601)
- Relay AD algorithm (#3585)
- Relay Training - allow gradient to return a tuple (#3600), numerical gradient check (#3630)
- Improve AD for concatenate (#3729)
- [Relay][Training] Add missing gradient check to gradient pass (#4169)
- As a part of Relay's automatic differentiation system, we are adding primal gradients for Relay operators. Please refer to #2562 for tracking the progress.
- Gradient for Conv2d (#3636)
- Add gradient operators (#3857, #3894, #3901, #3915)
- Add gradient for log-softmax (#4069)
- [Relay][Training] Add gradient for Crossentropy (#3925)
- [Relay][Training] Add and fix gradients (#4126)
Quantization
Low-bit inference is getting more and more popular as it benefits both the performance and storage usage. TVM now supports two types of quantization. 1. Automatic quantizaion takes floating-point precision model, does per-layer calibration and generates low-bit model. 2. TVM also imports pre-quantized model from Tensorflow and MXNet, a new dialect QNN is introduced to handle further lowering to normal operators.
- Automatic Quantization
- Low-bit automatic quantization supported. (#2116). The workflow includes annotation, calibration and transformation.
- Refactor quantization codebase and fix model accuracy. (#3543)
- KL-divergence-based per-layer calibration. (#3538)
- Add option to select which convolution layers are quantized. (#3173)
- [Relay][Quantize] Integrate data-aware calibration into quantization. (#4295)
- Pre-quantized model support (QNN operators and legalize pass).
- Add a legalize pass to Relay (#3672)
- Qnn Concatenate, quantize, dequantize and requantize operators (#3819, #3730, #3745, #3531)
- QNNtoRelay & QNNLegalize Pass utility (#3838, #3782)
- Requantize: Optimize lowering for some corner cases. (#3864)
- New quantized operator support: conv2d, add, dense (#3580, #3736, #3896, #3910)
- Do type checking for the input and kernel in the qnn conv2d (#3904)
- Legalize and AlterOpLayout for Intel int8. (#3961)
- Renaming tests to follow the Relay nomenclature. (#3975)
- Fix padding changes due to #3739 (#3989)
- Memorizing quantize node mapping to avoid duplicated simulated quantization (#3233)
- Infrastructure to support pre-quantized models (QNN) (#3971).
- [Relay][AlterOp] NHWC to NCHWc support for Pool, concatenate, sum. (#4059)
- [TOPI][x86] Cascade lake support. (#4123)
- [TOPI][x86] Legalize - Support int8xint8 convolution to use VNNI inst (#4196)
- Qnn dequantize with min max using Mxnet flavor to support Mxnet prequantized models. (#3945)
- Improve the lowering of Qnn Dense (#4213)
- Adding support for dequantizing from int32 to float32. (#4130)
- [QNN] Refactor fixed point multiplicat...