Skip to content
Draft
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
59 commits
Select commit Hold shift + click to select a range
198fc90
Cherry-pick 1st Round (#17308)
Lafi7e Aug 28, 2023
4296043
Cherry-pick 2nd Round (#17386)
Lafi7e Sep 7, 2023
2406e9c
[rel-1.16.0] Use name of temporary provisioning profile. (#17456)
edgchen1 Sep 8, 2023
196df08
[rel-1.16.0] Disable QNN QDQ test for release branch (#17463)
HectorSVC Sep 8, 2023
a9df3ae
Remove 52 from CMAKE_CUDA_ARCHITECTURES to reduce Nuget package size …
snnn Sep 8, 2023
0772d54
[rel-1.16.0] Cherry-pick 17507 (#17520)
chilo-ms Sep 12, 2023
06ea28b
[rel-1.16.0] Cherry-pick 16940 and 17523 (#17506)
Lafi7e Sep 14, 2023
e7a0495
Cherry-picks pipeline changes to 1.16.0 release branch (#17577)
snnn Sep 18, 2023
264a740
Cherry-picks for 1.16.1 release (#17741)
snnn Oct 2, 2023
6df4211
Cancel EP check in python for 1.16.1 (#17768)
RandySheriffH Oct 3, 2023
f480a36
[hotfix] fix session option access in Node.js binding (#17762)
fs-eire Oct 4, 2023
c3fd281
Fix onnx quantizer activation and weight type attribute
yufenglee Oct 5, 2023
2a1fd25
Upgrade transformers to fix CI (#17830)
snnn Oct 9, 2023
c829550
Increase version number for preparing the 1.16.2 release (#18070)
snnn Oct 26, 2023
53cb942
[DML EP] Enable more MHA masks (#18120)
PatriceVignola Oct 30, 2023
99b0f62
[DML EP] Complete python IO binding implementation (#18124)
PatriceVignola Oct 30, 2023
6ae7c51
Revert "Disable dml stage in windows GPU pipeline temporarily. (#1803…
snnn Oct 31, 2023
749bcc7
[DML EP] Add subgraph fusion support (#18125)
PatriceVignola Oct 31, 2023
0240274
Add support for GCC 13 (#18178)
snnn Nov 1, 2023
c273f7a
Cherry-pick LLaMA/SDXL to rel-1.16.2 (#18202)
tianleiwu Nov 1, 2023
bc533a6
[DML EP] Add dynamic graph compilation (#18199)
PatriceVignola Nov 2, 2023
2f57f1e
Some cherry-picks for the 1.16.2 release (#18218)
snnn Nov 2, 2023
70b8cda
Cherry pick LLaMA to rel-1.16.2 (round 2) (#18245)
tianleiwu Nov 3, 2023
27b0910
cherry pick resize grad pr (#18255)
askhade Nov 3, 2023
95c20d0
Cherry-pick two pipeline changes for the 1.16.2 patch release (#18249)
snnn Nov 3, 2023
ad7cecb
Update eigen's URL (#18301)
snnn Nov 6, 2023
0ccca88
Update eigen version (#18308)
snnn Nov 7, 2023
8f06330
Cherry pick LLaMA or SDXL to 1.16.2 release (round 3) (#18323)
tianleiwu Nov 8, 2023
0c5b95f
Cherry-pick LLaMA GQA mask to rel-1.16.2 (round 4) (#18350)
tianleiwu Nov 8, 2023
96451b1
Always quantize global average pool
mapetre Dec 7, 2022
792873d
added qlinearconvtranpose
xdrBogdan22 Jan 4, 2023
8f6dd3d
updated code for convtranspose2d to work for the quadric version of o…
xdrBogdan22 Feb 28, 2023
92f3be0
fixed registry
xdrBogdan22 Feb 28, 2023
6271d0a
Adds shape inference for conv2d_transpose
mapetre Mar 31, 2023
a17bd73
Add QLinearConvTranspose CPU implementation
mapetre Apr 5, 2023
b0b734c
Added int32_t templated version of Col2im
mapetre Apr 6, 2023
fb57d97
ci: Create wheel and release upon each push to main (#1)
syassami May 4, 2023
83d722c
Shape inference for QLinearAdd and QLinearConcat
mapetre Apr 13, 2023
1f2b1eb
Shape inference for QLinearMul
mapetre Apr 13, 2023
5533b52
Shape inference for QLinearLeakyReLU
mapetre Apr 18, 2023
5ad0ee1
Adds shape inference for remaining QLinear operators
mapetre Apr 19, 2023
7b0fdc8
Add unittest for QLinear shape inference
mapetre Apr 19, 2023
846eb6a
Updated shape inference to default to ONNX implementation
mapetre Apr 21, 2023
b6f98f7
Applied python linter
mapetre Apr 21, 2023
66be256
Refactored QLinear shape inference test
mapetre Apr 25, 2023
fab74f8
Addressed PR comments
mapetre Apr 25, 2023
9c27dce
Add type inference support for custom operators
mapetre May 17, 2023
fd2196f
Removed formatting changes
mapetre May 18, 2023
0a347e6
Add shape inference for QLinearConvTranspose
mapetre May 19, 2023
9ae4f2e
Add unnittest for QLinearConvTranspose shape inference
mapetre May 19, 2023
e8f4122
ci: Enable workflow_call
syassami Sep 21, 2023
314b40b
ci: Add mac python3.10 release
syassami Sep 23, 2023
c70f411
ci: Create release on github.ref == refs/heads/main
syassami Sep 28, 2023
365a6db
wheel.yaml: No warning as error during compile
ndrego Sep 29, 2023
3989509
QuadricCustomOp handling (#12)
ndrego Oct 1, 2023
4f4d738
ci: Use self hosted arm64 macos runner (#16)
syassami Oct 2, 2023
77c6035
QuadricCustomOp: Handle multiple outputs when shape inferencing (#17)
ndrego Oct 3, 2023
7845828
quadric_custom_op: Handle duplicated input names (#18)
ndrego Nov 14, 2023
190e428
Add calibrator option to add extra dtypes
mgleonard425 Jan 26, 2024
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
The table of contents is too big for display.
Diff view
Diff view
  •  
  •  
  •  
File renamed without changes.
File renamed without changes.
File renamed without changes.
File renamed without changes.
File renamed without changes.
133 changes: 0 additions & 133 deletions .github/workflows/sca.yml

This file was deleted.

97 changes: 97 additions & 0 deletions .github/workflows/wheel.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,97 @@
name: CI && Release & Upload Wheel

on:
workflow_call:
inputs:
onnxruntime_branch:
type: string
default: "main"
workflow_dispatch:
inputs:
onnxruntime_branch:
type: string
default: "main"
push:
branches:
- main
pull_request:
branches:
- main

jobs:
build_and_upload_wheel_linux:
runs-on: The_CTOs_Choice
container:
image: ghcr.io/quadric-io/tvm:devel
options: "--mount type=bind,source=${{ github.workspace }},target=/workspace"
steps:
- name: Checkout repository
uses: actions/checkout@v4
with:
repository: quadric-io/onnxruntime
ref: ${{ inputs.onnxruntime_branch || github.ref }}
- name: Build ONNX Runtime wheel
working-directory: /workspace
run: |
python3 -m pip install cmake --upgrade
./build.sh --build_wheel --config Release --parallel ${{ github.event_name == 'pull_request' && ' ' || '--skip_tests'}} --skip_submodule_sync --allow_running_as_root --compile_no_warning_as_error
wheel_path=$(find . -name '*.whl' | xargs readlink -f)
echo "wheel_path=$wheel_path" >> $GITHUB_ENV
- name: Upload Artifact
uses: actions/upload-artifact@v3
with:
name: ort-wheel-linux
path: ${{ env.wheel_path }}

build_and_upload_wheel_mac:
runs-on: [self-hosted, macOS, ARM64]
steps:
- name: Checkout repository
uses: actions/checkout@v4
with:
repository: quadric-io/onnxruntime
ref: ${{ inputs.onnxruntime_branch || github.ref }}
- name: Build ONNX Runtime wheel
run: |
./build.sh --build_wheel --config Release --parallel ${{ github.event_name == 'pull_request' && ' ' || '--skip_tests'}} --skip_submodule_sync --compile_no_warning_as_error --skip_submodule_sync --apple_deploy_target 12
wheel_path=$(find . -name '*.whl' | xargs readlink -f)
echo "wheel_path=$wheel_path" >> $GITHUB_ENV
- name: Upload Artifact
uses: actions/upload-artifact@v3
with:
name: ort-wheel-mac
path: ${{ env.wheel_path }}

create_release:
if: (github.ref == 'refs/heads/main') && ( github.event_name != 'workflow_call' && github.event_name != 'workflow_dispatch' )
needs: [build_and_upload_wheel_mac, build_and_upload_wheel_linux]
runs-on: ubuntu-latest
steps:
- name: Download ort-wheel-linux artifact
uses: actions/download-artifact@v3
with:
name: ort-wheel-linux
path: artifacts/
- name: Download ort-wheel-mac artifact
uses: actions/download-artifact@v3
with:
name: ort-wheel-mac
path: artifacts/
- name: Count releases
id: count_releases
env:
GITHUB_TOKEN: ${{ secrets.GITHUB_TOKEN }}
run: |
count=$(curl --request GET \
--url https://api.github.com/repos/${{ github.repository }}/releases \
--header "Authorization: Bearer $GITHUB_TOKEN" | jq length)
echo "count=$count" >> $GITHUB_ENV
- name: Create Release and Upload Both Assets
uses: softprops/action-gh-release@v1
env:
GITHUB_TOKEN: ${{ secrets.GITHUB_TOKEN }}
with:
tag_name: v${{ env.count }}
name: Release v${{ env.count }}
files: |
artifacts/*.whl
3 changes: 0 additions & 3 deletions .gitmodules
Original file line number Diff line number Diff line change
Expand Up @@ -8,6 +8,3 @@
path = cmake/external/emsdk
url = https://github.com/emscripten-core/emsdk.git
branch = 3.1.44
[submodule "cmake/external/onnxruntime-extensions"]
path = cmake/external/onnxruntime-extensions
url = https://github.com/microsoft/onnxruntime-extensions.git
29 changes: 29 additions & 0 deletions README_EPU.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,29 @@
# The Quadric Version of onnxruntime

This repository contains the a distribution of onnxruntime with additional operator quantization capabilities.


## Prerequisites:
- python 3.9
- pip

## Clone repository and build:
```
git clone --recursive https://github.com/quadric-io/onnxruntime onnxruntime
cd onnxruntime
python3.9 -m venv venv
source venv/bin/activate
# Install required packages. numpy version is restricted by TVM
pip3 install wheel packaging numpy==1.24.4
# Build the python package
./build.sh --build_wheel --config Release --parallel
```

## Install
```
# Find the wheel you just created
$ find . -name '*.whl'
./build/MacOS/Release/dist/onnxruntime-1.16.0-cp39-cp39-macosx_13_0_arm64.whl
# Install it
pip3 install ./build/MacOS/Release/dist/onnxruntime-1.16.0-cp39-cp39-macosx_13_0_arm64.whl
```
34 changes: 34 additions & 0 deletions ThirdPartyNotices.txt
Original file line number Diff line number Diff line change
Expand Up @@ -6230,3 +6230,37 @@ https://github.com/intel/neural-compressor
terms, and open source software license terms. These separate license terms
govern your use of the third party programs as set forth in the
"THIRD-PARTY-PROGRAMS" file.

_____

FlashAttention, https://github.com/Dao-AILab/flash-attention

BSD 3-Clause License

Copyright (c) 2022, the respective contributors, as shown by the AUTHORS file.
All rights reserved.

Redistribution and use in source and binary forms, with or without
modification, are permitted provided that the following conditions are met:

* Redistributions of source code must retain the above copyright notice, this
list of conditions and the following disclaimer.

* Redistributions in binary form must reproduce the above copyright notice,
this list of conditions and the following disclaimer in the documentation
and/or other materials provided with the distribution.

* Neither the name of the copyright holder nor the names of its
contributors may be used to endorse or promote products derived from
this software without specific prior written permission.

THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS "AS IS"
AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE
IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE ARE
DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT HOLDER OR CONTRIBUTORS BE LIABLE
FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL
DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR
SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER
CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY,
OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE
OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
2 changes: 1 addition & 1 deletion VERSION_NUMBER
Original file line number Diff line number Diff line change
@@ -1 +1 @@
1.16.0
1.16.2
2 changes: 1 addition & 1 deletion cgmanifests/cgmanifest.json
Original file line number Diff line number Diff line change
Expand Up @@ -568,7 +568,7 @@
"component": {
"type": "git",
"git": {
"commitHash": "d10b27fe37736d2944630ecd7557cefa95cf87c9",
"commitHash": "e7248b26a1ed53fa030c5c459f7ea095dfd276ac",
"repositoryUrl": "https://gitlab.com/libeigen/eigen.git"
}
}
Expand Down
11 changes: 10 additions & 1 deletion cmake/CMakeLists.txt
Original file line number Diff line number Diff line change
Expand Up @@ -84,7 +84,8 @@ option(onnxruntime_USE_PREINSTALLED_EIGEN "Use pre-installed EIGEN. Need to prov
option(onnxruntime_BUILD_BENCHMARKS "Build ONNXRuntime micro-benchmarks" OFF)
option(onnxruntime_USE_LLVM "Build TVM with LLVM" OFF)

option(onnxruntime_USE_FLASH_ATTENTION "Build memory efficient attention kernel for scaled dot product attention" ON)
cmake_dependent_option(onnxruntime_USE_FLASH_ATTENTION "Build flash attention kernel for scaled dot product attention" ON "NOT WIN32; onnxruntime_USE_CUDA" OFF)
option(onnxruntime_USE_MEMORY_EFFICIENT_ATTENTION "Build memory efficient attention kernel for scaled dot product attention" ON)

option(onnxruntime_BUILD_FOR_NATIVE_MACHINE "Enable this option for turning on optimization specific to this machine" OFF)
option(onnxruntime_USE_AVX "Use AVX instructions" OFF)
Expand Down Expand Up @@ -666,13 +667,16 @@ if (onnxruntime_USE_CUDA)

if (onnxruntime_DISABLE_CONTRIB_OPS)
set(onnxruntime_USE_FLASH_ATTENTION OFF)
set(onnxruntime_USE_MEMORY_EFFICIENT_ATTENTION OFF)
endif()
if (CMAKE_CUDA_COMPILER_VERSION VERSION_LESS 11.6)
message( STATUS "Turn off flash attention since CUDA compiler version < 11.6")
set(onnxruntime_USE_FLASH_ATTENTION OFF)
set(onnxruntime_USE_MEMORY_EFFICIENT_ATTENTION OFF)
endif()
else()
set(onnxruntime_USE_FLASH_ATTENTION OFF)
set(onnxruntime_USE_MEMORY_EFFICIENT_ATTENTION OFF)
endif()

if (onnxruntime_USE_CUDA)
Expand All @@ -685,6 +689,11 @@ if (onnxruntime_USE_CUDA)
list(APPEND ORT_PROVIDER_FLAGS -DUSE_FLASH_ATTENTION=1)
list(APPEND ORT_PROVIDER_CMAKE_FLAGS -Donnxruntime_USE_FLASH_ATTENTION=1)
endif()
if (onnxruntime_USE_MEMORY_EFFICIENT_ATTENTION)
message( STATUS "Enable memory efficient attention for CUDA EP")
list(APPEND ORT_PROVIDER_FLAGS -DUSE_MEMORY_EFFICIENT_ATTENTION=1)
list(APPEND ORT_PROVIDER_CMAKE_FLAGS -Donnxruntime_USE_MEMORY_EFFICIENT_ATTENTION=1)
endif()

endif()
if (onnxruntime_USE_VITISAI)
Expand Down
4 changes: 2 additions & 2 deletions cmake/deps.txt
Original file line number Diff line number Diff line change
Expand Up @@ -11,6 +11,7 @@ abseil_cpp;https://github.com/abseil/abseil-cpp/archive/refs/tags/20220623.1.zip
cxxopts;https://github.com/jarro2783/cxxopts/archive/3c73d91c0b04e2b59462f0a741be8c07024c1bc0.zip;6c6ca7f8480b26c8d00476e0e24b7184717fe4f0
date;https://github.com/HowardHinnant/date/archive/refs/tags/v2.4.1.zip;ea99f021262b1d804a872735c658860a6a13cc98
dlpack;https://github.com/dmlc/dlpack/archive/refs/tags/v0.6.zip;4d565dd2e5b31321e5549591d78aa7f377173445
eigen;https://gitlab.com/libeigen/eigen/-/archive/e7248b26a1ed53fa030c5c459f7ea095dfd276ac/eigen-e7248b26a1ed53fa030c5c459f7ea095dfd276ac.zip;be8be39fdbc6e60e94fa7870b280707069b5b81a
flatbuffers;https://github.com/google/flatbuffers/archive/refs/tags/v1.12.0.zip;ba0a75fd12dbef8f6557a74e611b7a3d0c5fe7bf
fp16;https://github.com/Maratyszcza/FP16/archive/0a92994d729ff76a58f692d3028ca1b64b145d91.zip;b985f6985a05a1c03ff1bb71190f66d8f98a1494
fxdiv;https://github.com/Maratyszcza/FXdiv/archive/63058eff77e11aa15bf531df5dd34395ec3017c8.zip;a5658f4036402dbca7cebee32be57fb8149811e1
Expand Down Expand Up @@ -41,5 +42,4 @@ re2;https://github.com/google/re2/archive/refs/tags/2022-06-01.zip;aa77313b76e91
safeint;https://github.com/dcleblanc/SafeInt/archive/ff15c6ada150a5018c5ef2172401cb4529eac9c0.zip;913a4046e5274d329af2806cb53194f617d8c0ab
tensorboard;https://github.com/tensorflow/tensorboard/archive/373eb09e4c5d2b3cc2493f0949dc4be6b6a45e81.zip;67b833913605a4f3f499894ab11528a702c2b381
cutlass;https://github.com/NVIDIA/cutlass/archive/refs/tags/v3.0.0.zip;0f95b3c1fc1bd1175c4a90b2c9e39074d1bccefd
extensions;https://github.com/microsoft/onnxruntime-extensions/archive/94142d8391c9791ec71c38336436319a2d4ac7a0.zip;4365ac5140338b4cb75a39944a4be276e3829b3c
eigen;https://gitlab.com/libeigen/eigen/-/archive/3.4/eigen-3.4.zip;ee201b07085203ea7bd8eb97cbcb31b07cfa3efb
extensions;https://github.com/microsoft/onnxruntime-extensions/archive/94142d8391c9791ec71c38336436319a2d4ac7a0.zip;4365ac5140338b4cb75a39944a4be276e3829b3c
Loading