diff --git a/extension/llm/custom_ops/spinquant/third-party/FFHT/LICENSE.md b/extension/llm/custom_ops/spinquant/third-party/FFHT/LICENSE.md
new file mode 100644
index 00000000000..52c4e01cd49
--- /dev/null
+++ b/extension/llm/custom_ops/spinquant/third-party/FFHT/LICENSE.md
@@ -0,0 +1,22 @@
+The MIT License (MIT)
+
+Copyright (c) 2015 Alexandr Andoni, Piotr Indyk, Thijs Laarhoven,
+Ilya Razenshteyn, Ludwig Schmidt
+
+Permission is hereby granted, free of charge, to any person obtaining a copy
+of this software and associated documentation files (the "Software"), to deal
+in the Software without restriction, including without limitation the rights
+to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
+copies of the Software, and to permit persons to whom the Software is
+furnished to do so, subject to the following conditions:
+
+The above copyright notice and this permission notice shall be included in
+all copies or substantial portions of the Software.
+
+THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
+IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
+FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
+AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
+LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
+OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN
+THE SOFTWARE.
diff --git a/extension/llm/custom_ops/spinquant/third-party/FFHT/Makefile b/extension/llm/custom_ops/spinquant/third-party/FFHT/Makefile
new file mode 100644
index 00000000000..cb68ad9f6f9
--- /dev/null
+++ b/extension/llm/custom_ops/spinquant/third-party/FFHT/Makefile
@@ -0,0 +1,21 @@
+CC = gcc
+CFLAGS = -O3 -march=native -std=c99 -pedantic -Wall -Wextra -Wshadow -Wpointer-arith -Wcast-qual -Wstrict-prototypes -Wmissing-prototypes
+
+all: test_float test_double fast_copy.o fht.o
+
+OBJ := fast_copy.o fht.o
+
+%.o: %.c
+ $(CC) $< -o $@ -c $(CFLAGS)
+
+test_%: test_%.c $(OBJ)
+ $(CC) $< $(OBJ) -o $@ $(CFLAGS)
+
+test_double_header_only: test_double_header_only.c
+ $(CC) $< -o $@ $(CFLAGS)
+
+test_float_header_only: test_double_header_only.c
+ $(CC) $< -o $@ $(CFLAGS)
+
+clean:
+ rm -f test_float test_double test_float_header_only test_double_header_only $(OBJ)
diff --git a/extension/llm/custom_ops/spinquant/third-party/FFHT/README.md b/extension/llm/custom_ops/spinquant/third-party/FFHT/README.md
new file mode 100644
index 00000000000..7e00d0eca9a
--- /dev/null
+++ b/extension/llm/custom_ops/spinquant/third-party/FFHT/README.md
@@ -0,0 +1,115 @@
+# Fast Fast Hadamard Transform
+
+FFHT (Fast Fast Hadamard Transform) is a library that provides a heavily
+optimized C99 implementation of the Fast Hadamard Transform. FFHT also provides
+a thin Python wrapper that allows to perform the Fast Hadamard Transform on
+one-dimensional [NumPy](http://www.numpy.org/) arrays.
+
+The Hadamard Transform is a linear orthogonal map defined on real vectors whose
+length is a _power of two_. For the precise definition, see the
+[Wikipedia entry](https://en.wikipedia.org/wiki/Hadamard_transform). The
+Hadamard Transform has been recently used a lot in various machine learning
+and numerical algorithms.
+
+FFHT uses [AVX](https://en.wikipedia.org/wiki/Advanced_Vector_Extensions)
+to speed up the computation.
+
+The header file `fht.h` exports two functions: `int fht_float(float *buf, int
+log_n)` and `int fht_double(double *buf, int log_n)`. The
+only difference between them is the type of vector entries. So, in what follows,
+we describe how the version for floats `fht_float` works.
+
+The function `fht_float` takes two parameters:
+
+* `buf` is a pointer to the data on which one needs to perform the Fast
+Hadamard Transform.
+* `log_n` is the binary logarithm of the length of `buffer`.
+That is, the length is equal to `2^log_n`.
+
+The return value is -1 if the input is invalid and is zero otherwise.
+
+A header-only version of the library is provided in `fht_header_only.h`.
+
+In addition to the Fast Hadamard Transform, we provide two auxiliary programs:
+`test_float` and `test_double`, which are implemented in C99. The exhaustively
+test and benchmark the library.
+
+FFHT has been tested on 64-bit versions of Linux, OS X and Windows (the latter
+is via Cygwin).
+
+To install the Python package, run `python setup.py install`. The script
+`example.py` shows how to use FFHT from Python.
+
+## Benchmarks
+
+Below are the times for the Fast Hadamard Transform for vectors of
+various lengths. The benchmarks were run on a machine with Intel
+Core i7-6700K and 2133 MHz DDR4 RAM. We compare FFHT,
+[FFTW 3.3.6](http://fftw.org/), and
+[fht](https://github.com/nbarbey/fht) by
+[Nicolas Barbey](https://github.com/nbarbey).
+
+Let us stress that FFTW is a great versatile tool, and the authors of FFTW did
+not try to optimize the performace of the Fast Hadamard Transform. On the other
+hand, FFHT does one thing (the Fast Hadamard Transform), but does it extremely
+well.
+
+Vector size | FFHT (float) | FFHT (double) | FFTW 3.3.6 (float) | FFTW 3.3.6 (double) | fht (float) | fht (double)
+:---: | :---: | :---: | :---: | :---: | :---: | :---:
+210 | 0.31 us | 0.49 us | 4.48 us | 7.72 us | 17.4 us | 19.3 us
+220 | 0.68 ms | 1.39 ms | 8.81 ms | 17.07 ms | 29.8 ms | 35.0 ms
+227 | 0.22 s | 0.50 s | 2.08 s | 3.57 s | 6.89 s | 7.49 s
+
+## Troubleshooting
+
+For some versions of OS X the native `clang` compiler (that mimicks `gcc`) may
+not recognize the availability of AVX. A solution for this problem is to use a
+genuine `gcc` (say from [Homebrew](http://brew.sh/)) or to use `-march=corei7-avx`
+instead of `-march=native` for compiler flags.
+
+A symptom of the above happening is the undefined macros `__AVX__`.
+
+## Related Work
+
+FFHT has been created as a part of
+[FALCONN](https://github.com/falconn-lib/falconn): a library for similarity
+search over high-dimensional data. FALCONN's underlying algorithms are described
+and analyzed in the following research paper:
+
+> Alexandr Andoni, Piotr Indyk, Thijs Laarhoven, Ilya Razenshteyn and Ludwig
+> Schmidt, "Practical and Optimal LSH for Angular Distance", NIPS 2015, full
+> version available at [arXiv:1509.02897](http://arxiv.org/abs/1509.02897)
+
+This is the right paper to cite, if you use FFHT for your research projects.
+
+## Acknowledgments
+
+We thank Ruslan Savchenko for useful discussions.
+
+Thanks to:
+
+* Clement Canonne
+* Michal Forisek
+* Rati Gelashvili
+* Daniel Grier
+* Dhiraj Holden
+* Justin Holmgren
+* Aleksandar Ivanovic
+* Vladislav Isenbaev
+* Jacob Kogler
+* Ilya Kornakov
+* Anton Lapshin
+* Rio LaVigne
+* Oleg Martynov
+* Linar Mikeev
+* Cameron Musco
+* Sam Park
+* Sunoo Park
+* Amelia Perry
+* Andrew Sabisch
+* Abhishek Sarkar
+* Ruslan Savchenko
+* Vadim Semenov
+* Arman Yessenamanov
+
+for helping us with testing FFHT.
diff --git a/extension/llm/custom_ops/spinquant/third-party/FFHT/_ffht_2.c b/extension/llm/custom_ops/spinquant/third-party/FFHT/_ffht_2.c
new file mode 100644
index 00000000000..2041e5eedec
--- /dev/null
+++ b/extension/llm/custom_ops/spinquant/third-party/FFHT/_ffht_2.c
@@ -0,0 +1,128 @@
+#include
+#include
+#include "fht.h"
+
+#define UNUSED(x) (void)(x)
+
+static char module_docstring[] =
+ "A C extension that computes the Fast Hadamard Transform";
+static char fht_docstring[] =
+ "Compute the Fast Hadamard Transform (FHT) for a given "
+ "one-dimensional NumPy array.\n\n"
+ "The Hadamard Transform is a linear orthogonal map defined on real vectors "
+ "whose length is a _power of two_. For the precise definition, see the "
+ "[Wikipedia entry](https://en.wikipedia.org/wiki/Hadamard_transform). The "
+ "Hadamard Transform has been recently used a lot in various machine "
+ "learning "
+ "and numerical algorithms.\n\n"
+ "The implementation uses "
+ "[AVX](https://en.wikipedia.org/wiki/Advanced_Vector_Extensions) "
+ "to speed up the computation. If AVX is not supported on your machine, "
+ "a simpler implementation without (explicit) vectorization is used.\n\n"
+ "The function takes two parameters:\n\n"
+ "* `buffer` is a NumPy array which is being transformed. It must be a "
+ "one-dimensional array with `dtype` equal to `float32` or `float64` (the "
+ "former is recommended unless you need high accuracy) and of size being a "
+ "power "
+ "of two. If your CPU supports AVX, then `buffer` must be aligned to 32 "
+ "bytes. "
+ "To allocate such an aligned buffer, use the function `created_aligned` "
+ "from this "
+ "module.\n"
+ "* `chunk` is a positive integer that controls when the implementation "
+ "switches "
+ "from recursive to iterative algorithm. The overall algorithm is "
+ "recursive, but as "
+ "soon as the vector becomes no longer than `chunk`, the iterative "
+ "algorithm is "
+ "invoked. For technical reasons, `chunk` must be at least 8. A good choice "
+ "is to "
+ "set `chunk` to 1024. But to fine-tune the performance one should use a "
+ "program "
+ "`best_chunk` supplied with the library.\n";
+
+static PyObject *ffht_fht(PyObject *self, PyObject *args);
+
+static PyMethodDef module_methods[] = {
+ {"fht", ffht_fht, METH_VARARGS, fht_docstring}, {NULL, NULL, 0, NULL}};
+
+PyMODINIT_FUNC initffht(void);
+
+PyMODINIT_FUNC initffht(void) {
+ PyObject *m = Py_InitModule3("ffht", module_methods, module_docstring);
+ if (!m) return;
+
+ import_array();
+}
+
+static PyObject *ffht_fht(PyObject *self, PyObject *args) {
+ UNUSED(self);
+
+ PyObject *buffer_obj;
+
+ if (!PyArg_ParseTuple(args, "O", &buffer_obj)) {
+ return NULL;
+ }
+
+ PyArray_Descr *dtype;
+ int ndim;
+ npy_intp dims[NPY_MAXDIMS];
+ PyArrayObject *arr = NULL;
+
+ if (PyArray_GetArrayParamsFromObject(buffer_obj, NULL, 1, &dtype, &ndim, dims,
+ &arr, NULL) < 0) {
+ return NULL;
+ }
+
+ if (arr == NULL) {
+ PyErr_SetString(PyExc_TypeError, "not a numpy array");
+ return NULL;
+ }
+
+ dtype = PyArray_DESCR(arr);
+
+ if (dtype->type_num != NPY_FLOAT && dtype->type_num != NPY_DOUBLE) {
+ PyErr_SetString(PyExc_TypeError, "array must consist of floats or doubles");
+ Py_DECREF(arr);
+ return NULL;
+ }
+
+ if (PyArray_NDIM(arr) != 1) {
+ PyErr_SetString(PyExc_TypeError, "array must be one-dimensional");
+ Py_DECREF(arr);
+ return NULL;
+ }
+
+ int n = PyArray_DIM(arr, 0);
+
+ if (n == 0 || (n & (n - 1))) {
+ PyErr_SetString(PyExc_ValueError, "array's length must be a power of two");
+ Py_DECREF(arr);
+ return NULL;
+ }
+
+ int log_n = 0;
+ while ((1 << log_n) < n) {
+ ++log_n;
+ }
+
+ void *raw_buffer = PyArray_DATA(arr);
+ int res;
+ if (dtype->type_num == NPY_FLOAT) {
+ float *buffer = (float *)raw_buffer;
+ res = fht_float(buffer, log_n);
+ } else {
+ double *buffer = (double *)raw_buffer;
+ res = fht_double(buffer, log_n);
+ }
+
+ if (res) {
+ PyErr_SetString(PyExc_RuntimeError, "FHT did not work properly");
+ Py_DECREF(arr);
+ return NULL;
+ }
+
+ Py_DECREF(arr);
+
+ return Py_BuildValue("");
+}
\ No newline at end of file
diff --git a/extension/llm/custom_ops/spinquant/third-party/FFHT/_ffht_3.c b/extension/llm/custom_ops/spinquant/third-party/FFHT/_ffht_3.c
new file mode 100644
index 00000000000..1afe8013e46
--- /dev/null
+++ b/extension/llm/custom_ops/spinquant/third-party/FFHT/_ffht_3.c
@@ -0,0 +1,142 @@
+#include
+#include
+#include "fht.h"
+
+#define UNUSED(x) (void)(x)
+
+static char module_docstring[] =
+ "A C extension that computes the Fast Hadamard Transform";
+static char fht_docstring[] =
+ "Compute the Fast Hadamard Transform (FHT) for a given "
+ "one-dimensional NumPy array.\n\n"
+ "The Hadamard Transform is a linear orthogonal map defined on real vectors "
+ "whose length is a _power of two_. For the precise definition, see the "
+ "[Wikipedia entry](https://en.wikipedia.org/wiki/Hadamard_transform). The "
+ "Hadamard Transform has been recently used a lot in various machine "
+ "learning "
+ "and numerical algorithms.\n\n"
+ "The implementation uses "
+ "[AVX](https://en.wikipedia.org/wiki/Advanced_Vector_Extensions) "
+ "to speed up the computation. If AVX is not supported on your machine, "
+ "a simpler implementation without (explicit) vectorization is used.\n\n"
+ "The function takes two parameters:\n\n"
+ "* `buffer` is a NumPy array which is being transformed. It must be a "
+ "one-dimensional array with `dtype` equal to `float32` or `float64` (the "
+ "former is recommended unless you need high accuracy) and of size being a "
+ "power "
+ "of two. If your CPU supports AVX, then `buffer` must be aligned to 32 "
+ "bytes. "
+ "To allocate such an aligned buffer, use the function `created_aligned` "
+ "from this "
+ "module.\n"
+ "* `chunk` is a positive integer that controls when the implementation "
+ "switches "
+ "from recursive to iterative algorithm. The overall algorithm is "
+ "recursive, but as "
+ "soon as the vector becomes no longer than `chunk`, the iterative "
+ "algorithm is "
+ "invoked. For technical reasons, `chunk` must be at least 8. A good choice "
+ "is to "
+ "set `chunk` to 1024. But to fine-tune the performance one should use a "
+ "program "
+ "`best_chunk` supplied with the library.\n";
+
+static PyObject *ffht_fht(PyObject *self, PyObject *args) {
+ UNUSED(self);
+
+ PyObject *buffer_obj;
+
+ if (!PyArg_ParseTuple(args, "O", &buffer_obj)) {
+ return NULL;
+ }
+
+ PyArray_Descr *dtype;
+ int ndim;
+ npy_intp dims[NPY_MAXDIMS];
+ PyArrayObject *arr = NULL;
+
+ if (PyArray_GetArrayParamsFromObject(buffer_obj, NULL, 1, &dtype, &ndim, dims,
+ &arr, NULL) < 0) {
+ return NULL;
+ }
+
+ if (arr == NULL) {
+ PyErr_SetString(PyExc_TypeError, "not a numpy array");
+ return NULL;
+ }
+
+ dtype = PyArray_DESCR(arr);
+
+ if (dtype->type_num != NPY_FLOAT && dtype->type_num != NPY_DOUBLE) {
+ PyErr_SetString(PyExc_TypeError, "array must consist of floats or doubles");
+ Py_DECREF(arr);
+ return NULL;
+ }
+
+ if (PyArray_NDIM(arr) != 1) {
+ PyErr_SetString(PyExc_TypeError, "array must be one-dimensional");
+ Py_DECREF(arr);
+ return NULL;
+ }
+
+ int n = PyArray_DIM(arr, 0);
+
+ if (n == 0 || (n & (n - 1))) {
+ PyErr_SetString(PyExc_ValueError, "array's length must be a power of two");
+ Py_DECREF(arr);
+ return NULL;
+ }
+
+ int log_n = 0;
+ while ((1 << log_n) < n) {
+ ++log_n;
+ }
+
+ void *raw_buffer = PyArray_DATA(arr);
+ int res;
+ if (dtype->type_num == NPY_FLOAT) {
+ float *buffer = (float *)raw_buffer;
+ res = fht_float(buffer, log_n);
+ } else {
+ double *buffer = (double *)raw_buffer;
+ res = fht_double(buffer, log_n);
+ }
+
+ if (res) {
+ PyErr_SetString(PyExc_RuntimeError, "FHT did not work properly");
+ Py_DECREF(arr);
+ return NULL;
+ }
+
+ Py_DECREF(arr);
+
+ return Py_BuildValue("");
+}
+
+static PyMethodDef module_methods[] = {
+ {"fht", ffht_fht, METH_VARARGS, fht_docstring},
+ {NULL, NULL, 0, NULL}
+};
+
+
+static struct PyModuleDef ffhtmodule = {
+ PyModuleDef_HEAD_INIT,
+ "ffht",
+ module_docstring,
+ -1,
+ module_methods
+};
+
+PyMODINIT_FUNC PyInit_ffht(void) {
+ PyObject *module = PyModule_Create(&ffhtmodule);
+
+ if (module == NULL) {
+ printf("NULL");
+ return NULL;
+ }
+
+ import_array();
+ return module;
+}
+
+
diff --git a/extension/llm/custom_ops/spinquant/third-party/FFHT/example.py b/extension/llm/custom_ops/spinquant/third-party/FFHT/example.py
new file mode 100644
index 00000000000..576c89830da
--- /dev/null
+++ b/extension/llm/custom_ops/spinquant/third-party/FFHT/example.py
@@ -0,0 +1,20 @@
+import numpy as np
+import ffht
+import timeit
+import sys
+
+reps = 1000
+n = 2**20
+chunk_size = 1024
+
+a = np.random.randn(n).astype(np.float32)
+
+t1 = timeit.default_timer()
+for i in range(reps):
+ ffht.fht(a)
+t2 = timeit.default_timer()
+
+if sys.version_info[0] == 2:
+ print (t2 - t1 + 0.0) / (reps + 0.0)
+if sys.version_info[0] == 3:
+ print('{}'.format((t2 - t1 + 0.0) / (reps + 0.0)))
diff --git a/extension/llm/custom_ops/spinquant/third-party/FFHT/external/benchmark/.clang-format b/extension/llm/custom_ops/spinquant/third-party/FFHT/external/benchmark/.clang-format
new file mode 100644
index 00000000000..4b3f13fa55e
--- /dev/null
+++ b/extension/llm/custom_ops/spinquant/third-party/FFHT/external/benchmark/.clang-format
@@ -0,0 +1,5 @@
+---
+Language: Cpp
+BasedOnStyle: Google
+...
+
diff --git a/extension/llm/custom_ops/spinquant/third-party/FFHT/external/benchmark/.gitignore b/extension/llm/custom_ops/spinquant/third-party/FFHT/external/benchmark/.gitignore
new file mode 100644
index 00000000000..3c1b4f2183e
--- /dev/null
+++ b/extension/llm/custom_ops/spinquant/third-party/FFHT/external/benchmark/.gitignore
@@ -0,0 +1,46 @@
+*.a
+*.so
+*.so.?*
+*.dll
+*.exe
+*.dylib
+*.cmake
+!/cmake/*.cmake
+*~
+*.pyc
+__pycache__
+
+# lcov
+*.lcov
+/lcov
+
+# cmake files.
+/Testing
+CMakeCache.txt
+CMakeFiles/
+cmake_install.cmake
+
+# makefiles.
+Makefile
+
+# in-source build.
+bin/
+lib/
+/test/*_test
+
+# exuberant ctags.
+tags
+
+# YouCompleteMe configuration.
+.ycm_extra_conf.pyc
+
+# ninja generated files.
+.ninja_deps
+.ninja_log
+build.ninja
+install_manifest.txt
+rules.ninja
+
+# out-of-source build top-level folders.
+build/
+_build/
diff --git a/extension/llm/custom_ops/spinquant/third-party/FFHT/external/benchmark/.travis-libcxx-setup.sh b/extension/llm/custom_ops/spinquant/third-party/FFHT/external/benchmark/.travis-libcxx-setup.sh
new file mode 100644
index 00000000000..a591743c6a6
--- /dev/null
+++ b/extension/llm/custom_ops/spinquant/third-party/FFHT/external/benchmark/.travis-libcxx-setup.sh
@@ -0,0 +1,28 @@
+#!/usr/bin/env bash
+
+# Install a newer CMake version
+curl -sSL https://cmake.org/files/v3.6/cmake-3.6.1-Linux-x86_64.sh -o install-cmake.sh
+chmod +x install-cmake.sh
+sudo ./install-cmake.sh --prefix=/usr/local --skip-license
+
+# Checkout LLVM sources
+git clone --depth=1 https://github.com/llvm-mirror/llvm.git llvm-source
+git clone --depth=1 https://github.com/llvm-mirror/libcxx.git llvm-source/projects/libcxx
+git clone --depth=1 https://github.com/llvm-mirror/libcxxabi.git llvm-source/projects/libcxxabi
+
+# Setup libc++ options
+if [ -z "$BUILD_32_BITS" ]; then
+ export BUILD_32_BITS=OFF && echo disabling 32 bit build
+fi
+
+# Build and install libc++ (Use unstable ABI for better sanitizer coverage)
+mkdir llvm-build && cd llvm-build
+cmake -DCMAKE_C_COMPILER=${C_COMPILER} -DCMAKE_CXX_COMPILER=${COMPILER} \
+ -DCMAKE_BUILD_TYPE=RelWithDebInfo -DCMAKE_INSTALL_PREFIX=/usr \
+ -DLIBCXX_ABI_UNSTABLE=ON \
+ -DLLVM_USE_SANITIZER=${LIBCXX_SANITIZER} \
+ -DLLVM_BUILD_32_BITS=${BUILD_32_BITS} \
+ ../llvm-source
+make cxx -j2
+sudo make install-cxxabi install-cxx
+cd ../
diff --git a/extension/llm/custom_ops/spinquant/third-party/FFHT/external/benchmark/.travis.yml b/extension/llm/custom_ops/spinquant/third-party/FFHT/external/benchmark/.travis.yml
new file mode 100644
index 00000000000..36df088446c
--- /dev/null
+++ b/extension/llm/custom_ops/spinquant/third-party/FFHT/external/benchmark/.travis.yml
@@ -0,0 +1,157 @@
+sudo: required
+dist: trusty
+language: cpp
+
+env:
+ global:
+ - /usr/local/bin:$PATH
+
+matrix:
+ include:
+ - compiler: gcc
+ addons:
+ apt:
+ packages:
+ - lcov
+ env: COMPILER=g++ C_COMPILER=gcc BUILD_TYPE=Coverage
+ - compiler: gcc
+ env: COMPILER=g++ C_COMPILER=gcc BUILD_TYPE=Debug
+ - compiler: gcc
+ env: COMPILER=g++ C_COMPILER=gcc BUILD_TYPE=Release
+ - compiler: gcc
+ addons:
+ apt:
+ packages:
+ - g++-multilib
+ env: COMPILER=g++ C_COMPILER=gcc BUILD_TYPE=Debug BUILD_32_BITS=ON
+ - compiler: gcc
+ addons:
+ apt:
+ packages:
+ - g++-multilib
+ env: COMPILER=g++ C_COMPILER=gcc BUILD_TYPE=Release BUILD_32_BITS=ON
+ - compiler: gcc
+ addons:
+ apt:
+ sources:
+ - ubuntu-toolchain-r-test
+ packages:
+ - g++-6
+ env:
+ - COMPILER=g++-6 C_COMPILER=gcc-6 BUILD_TYPE=Debug
+ - EXTRA_FLAGS="-fno-omit-frame-pointer -g -O2 -fsanitize=undefined,address -fuse-ld=gold"
+ - compiler: clang
+ env: COMPILER=clang++ C_COMPILER=clang BUILD_TYPE=Debug
+ - compiler: clang
+ env: COMPILER=clang++ C_COMPILER=clang BUILD_TYPE=Release
+ # Clang w/ libc++
+ - compiler: clang
+ addons:
+ apt:
+ packages:
+ clang-3.8
+ env:
+ - COMPILER=clang++-3.8 C_COMPILER=clang-3.8 BUILD_TYPE=Debug
+ - LIBCXX_BUILD=1
+ - EXTRA_FLAGS="-stdlib=libc++"
+ - compiler: clang
+ addons:
+ apt:
+ packages:
+ clang-3.8
+ env:
+ - COMPILER=clang++-3.8 C_COMPILER=clang-3.8 BUILD_TYPE=Release
+ - LIBCXX_BUILD=1
+ - EXTRA_FLAGS="-stdlib=libc++"
+ # Clang w/ 32bit libc++
+ - compiler: clang
+ addons:
+ apt:
+ packages:
+ - clang-3.8
+ - g++-multilib
+ env:
+ - COMPILER=clang++-3.8 C_COMPILER=clang-3.8 BUILD_TYPE=Debug
+ - LIBCXX_BUILD=1
+ - BUILD_32_BITS=ON
+ - EXTRA_FLAGS="-stdlib=libc++ -m32"
+ # Clang w/ 32bit libc++
+ - compiler: clang
+ addons:
+ apt:
+ packages:
+ - clang-3.8
+ - g++-multilib
+ env:
+ - COMPILER=clang++-3.8 C_COMPILER=clang-3.8 BUILD_TYPE=Release
+ - LIBCXX_BUILD=1
+ - BUILD_32_BITS=ON
+ - EXTRA_FLAGS="-stdlib=libc++ -m32"
+ # Clang w/ libc++, ASAN, UBSAN
+ - compiler: clang
+ addons:
+ apt:
+ packages:
+ clang-3.8
+ env:
+ - COMPILER=clang++-3.8 C_COMPILER=clang-3.8 BUILD_TYPE=Debug
+ - LIBCXX_BUILD=1 LIBCXX_SANITIZER="Undefined;Address"
+ - EXTRA_FLAGS="-stdlib=libc++ -g -O2 -fno-omit-frame-pointer -fsanitize=undefined,address -fno-sanitize-recover=all"
+ - UBSAN_OPTIONS=print_stacktrace=1
+ # Clang w/ libc++ and MSAN
+ - compiler: clang
+ addons:
+ apt:
+ packages:
+ clang-3.8
+ env:
+ - COMPILER=clang++-3.8 C_COMPILER=clang-3.8 BUILD_TYPE=Debug
+ - LIBCXX_BUILD=1 LIBCXX_SANITIZER=MemoryWithOrigins
+ - EXTRA_FLAGS="-stdlib=libc++ -g -O2 -fno-omit-frame-pointer -fsanitize=memory -fsanitize-memory-track-origins"
+ # Clang w/ libc++ and MSAN
+ - compiler: clang
+ addons:
+ apt:
+ packages:
+ clang-3.8
+ env:
+ - COMPILER=clang++-3.8 C_COMPILER=clang-3.8 BUILD_TYPE=RelWithDebInfo
+ - LIBCXX_BUILD=1 LIBCXX_SANITIZER=Thread
+ - EXTRA_FLAGS="-stdlib=libc++ -g -O2 -fno-omit-frame-pointer -fsanitize=thread -fno-sanitize-recover=all"
+
+ - os: osx
+ osx_image: xcode8.3
+ compiler: clang
+ env:
+ - COMPILER=clang++ BUILD_TYPE=Debug
+ - os: osx
+ osx_image: xcode8.3
+ compiler: clang
+ env:
+ - COMPILER=clang++ BUILD_TYPE=Release
+
+before_script:
+ - if [ -z "$BUILD_32_BITS" ]; then
+ export BUILD_32_BITS=OFF && echo disabling 32 bit build;
+ fi
+ - if [ -n "${LIBCXX_BUILD}" ]; then
+ source .travis-libcxx-setup.sh;
+ fi
+ - mkdir build && cd build
+
+install:
+ - if [ "${BUILD_TYPE}" == "Coverage" -a "${TRAVIS_OS_NAME}" == "linux" ]; then
+ PATH=~/.local/bin:${PATH};
+ pip install --user --upgrade pip;
+ pip install --user cpp-coveralls;
+ fi
+
+script:
+ - cmake -DCMAKE_C_COMPILER=${C_COMPILER} -DCMAKE_CXX_COMPILER=${COMPILER} -DCMAKE_BUILD_TYPE=${BUILD_TYPE} -DCMAKE_CXX_FLAGS="${EXTRA_FLAGS}" -DBENCHMARK_BUILD_32_BITS=${BUILD_32_BITS} ..
+ - make
+ - ctest -C ${BUILD_TYPE} --output-on-failure
+
+after_success:
+ - if [ "${BUILD_TYPE}" == "Coverage" -a "${TRAVIS_OS_NAME}" == "linux" ]; then
+ coveralls --include src --include include --gcov-options '\-lp' --root .. --build-root .;
+ fi
diff --git a/extension/llm/custom_ops/spinquant/third-party/FFHT/external/benchmark/AUTHORS b/extension/llm/custom_ops/spinquant/third-party/FFHT/external/benchmark/AUTHORS
new file mode 100644
index 00000000000..ae278df4046
--- /dev/null
+++ b/extension/llm/custom_ops/spinquant/third-party/FFHT/external/benchmark/AUTHORS
@@ -0,0 +1,40 @@
+# This is the official list of benchmark authors for copyright purposes.
+# This file is distinct from the CONTRIBUTORS files.
+# See the latter for an explanation.
+#
+# Names should be added to this file as:
+# Name or Organization
+# The email address is not required for organizations.
+#
+# Please keep the list sorted.
+
+Albert Pretorius
+Arne Beer
+Christopher Seymour
+David Coeurjolly
+Dominic Hamon
+Eric Fiselier
+Eugene Zhuk
+Evgeny Safronov
+Felix Homann
+Google Inc.
+International Business Machines Corporation
+Ismael Jimenez Martinez
+Jern-Kuan Leong
+Joao Paulo Magalhaes
+JianXiong Zhou
+Jussi Knuuttila
+Kaito Udagawa
+Lei Xu
+Matt Clarkson
+Maxim Vafin
+Nick Hutchinson
+Oleksandr Sochka
+Paul Redmond
+Radoslav Yovchev
+Shuo Chen
+Yixuan Qiu
+Yusuke Suzuki
+Dirac Research
+Zbigniew Skowron
+Dominik Czarnota
diff --git a/extension/llm/custom_ops/spinquant/third-party/FFHT/external/benchmark/CMakeLists.txt b/extension/llm/custom_ops/spinquant/third-party/FFHT/external/benchmark/CMakeLists.txt
new file mode 100644
index 00000000000..f7f1566f569
--- /dev/null
+++ b/extension/llm/custom_ops/spinquant/third-party/FFHT/external/benchmark/CMakeLists.txt
@@ -0,0 +1,202 @@
+cmake_minimum_required (VERSION 2.8.12)
+
+project (benchmark)
+
+foreach(p
+ CMP0054 # CMake 3.1
+ CMP0056 # export EXE_LINKER_FLAGS to try_run
+ )
+ if(POLICY ${p})
+ cmake_policy(SET ${p} NEW)
+ endif()
+endforeach()
+
+option(BENCHMARK_ENABLE_TESTING "Enable testing of the benchmark library." ON)
+option(BENCHMARK_ENABLE_EXCEPTIONS "Enable the use of exceptions in the benchmark library." ON)
+option(BENCHMARK_ENABLE_LTO "Enable link time optimisation of the benchmark library." OFF)
+option(BENCHMARK_USE_LIBCXX "Build and test using libc++ as the standard library." OFF)
+option(BENCHMARK_BUILD_32_BITS "Build a 32 bit version of the library" OFF)
+
+# Make sure we can import out CMake functions
+list(APPEND CMAKE_MODULE_PATH "${CMAKE_CURRENT_SOURCE_DIR}/cmake")
+
+# Read the git tags to determine the project version
+include(GetGitVersion)
+get_git_version(GIT_VERSION)
+
+# Tell the user what versions we are using
+string(REGEX MATCH "[0-9]+\\.[0-9]+\\.[0-9]+" VERSION ${GIT_VERSION})
+message("-- Version: ${VERSION}")
+
+# The version of the libraries
+set(GENERIC_LIB_VERSION ${VERSION})
+string(SUBSTRING ${VERSION} 0 1 GENERIC_LIB_SOVERSION)
+
+# Import our CMake modules
+include(CheckCXXCompilerFlag)
+include(AddCXXCompilerFlag)
+include(CXXFeatureCheck)
+
+if (BENCHMARK_BUILD_32_BITS)
+ add_required_cxx_compiler_flag(-m32)
+endif()
+
+if ("${CMAKE_CXX_COMPILER_ID}" STREQUAL "MSVC")
+ # Turn compiler warnings up to 11
+ string(REGEX REPLACE "[-/]W[1-4]" "" CMAKE_CXX_FLAGS "${CMAKE_CXX_FLAGS}")
+ set(CMAKE_CXX_FLAGS "${CMAKE_CXX_FLAGS} /W4")
+ add_definitions(-D_CRT_SECURE_NO_WARNINGS)
+
+ if (NOT BENCHMARK_ENABLE_EXCEPTIONS)
+ add_cxx_compiler_flag(-EHs-)
+ add_cxx_compiler_flag(-EHa-)
+ endif()
+ # Link time optimisation
+ if (BENCHMARK_ENABLE_LTO)
+ set(CMAKE_CXX_FLAGS_RELEASE "${CMAKE_CXX_FLAGS_RELEASE} /GL")
+ set(CMAKE_STATIC_LINKER_FLAGS_RELEASE "${CMAKE_STATIC_LINKER_FLAGS_RELEASE} /LTCG")
+ set(CMAKE_SHARED_LINKER_FLAGS_RELEASE "${CMAKE_SHARED_LINKER_FLAGS_RELEASE} /LTCG")
+ set(CMAKE_EXE_LINKER_FLAGS_RELEASE "${CMAKE_EXE_LINKER_FLAGS_RELEASE} /LTCG")
+
+ set(CMAKE_CXX_FLAGS_RELWITHDEBINFO "${CMAKE_CXX_FLAGS_RELWITHDEBINFO} /GL")
+ string(REGEX REPLACE "[-/]INCREMENTAL" "/INCREMENTAL:NO" CMAKE_STATIC_LINKER_FLAGS_RELWITHDEBINFO "${CMAKE_STATIC_LINKER_FLAGS_RELWITHDEBINFO}")
+ set(CMAKE_STATIC_LINKER_FLAGS_RELWITHDEBINFO "${CMAKE_STATIC_LINKER_FLAGS_RELWITHDEBINFO} /LTCG")
+ string(REGEX REPLACE "[-/]INCREMENTAL" "/INCREMENTAL:NO" CMAKE_SHARED_LINKER_FLAGS_RELWITHDEBINFO "${CMAKE_SHARED_LINKER_FLAGS_RELWITHDEBINFO}")
+ set(CMAKE_SHARED_LINKER_FLAGS_RELWITHDEBINFO "${CMAKE_SHARED_LINKER_FLAGS_RELWITHDEBINFO} /LTCG")
+ string(REGEX REPLACE "[-/]INCREMENTAL" "/INCREMENTAL:NO" CMAKE_EXE_LINKER_FLAGS_RELWITHDEBINFO "${CMAKE_EXE_LINKER_FLAGS_RELWITHDEBINFO}")
+ set(CMAKE_EXE_LINKER_FLAGS_RELWITHDEBINFO "${CMAKE_EXE_LINKER_FLAGS_RELWITHDEBINFO} /LTCG")
+
+ set(CMAKE_CXX_FLAGS_MINSIZEREL "${CMAKE_CXX_FLAGS_MINSIZEREL} /GL")
+ set(CMAKE_STATIC_LINKER_FLAGS_MINSIZEREL "${CMAKE_STATIC_LINKER_FLAGS_MINSIZEREL} /LTCG")
+ set(CMAKE_SHARED_LINKER_FLAGS_MINSIZEREL "${CMAKE_SHARED_LINKER_FLAGS_MINSIZEREL} /LTCG")
+ set(CMAKE_EXE_LINKER_FLAGS_MINSIZEREL "${CMAKE_EXE_LINKER_FLAGS_MINSIZEREL} /LTCG")
+ endif()
+else()
+ # Try and enable C++11. Don't use C++14 because it doesn't work in some
+ # configurations.
+ add_cxx_compiler_flag(-std=c++11)
+ if (NOT HAVE_CXX_FLAG_STD_CXX11)
+ add_cxx_compiler_flag(-std=c++0x)
+ endif()
+
+ # Turn compiler warnings up to 11
+ add_cxx_compiler_flag(-Wall)
+
+ add_cxx_compiler_flag(-Wextra)
+ add_cxx_compiler_flag(-Wshadow)
+ add_cxx_compiler_flag(-Werror RELEASE)
+ add_cxx_compiler_flag(-Werror RELWITHDEBINFO)
+ add_cxx_compiler_flag(-Werror MINSIZEREL)
+ add_cxx_compiler_flag(-pedantic)
+ add_cxx_compiler_flag(-pedantic-errors)
+ add_cxx_compiler_flag(-Wshorten-64-to-32)
+ add_cxx_compiler_flag(-Wfloat-equal)
+ add_cxx_compiler_flag(-fstrict-aliasing)
+ if (NOT BENCHMARK_ENABLE_EXCEPTIONS)
+ add_cxx_compiler_flag(-fno-exceptions)
+ endif()
+ if (NOT BENCHMARK_USE_LIBCXX)
+ add_cxx_compiler_flag(-Wzero-as-null-pointer-constant)
+ endif()
+ if (HAVE_CXX_FLAG_FSTRICT_ALIASING)
+ if (NOT CMAKE_CXX_COMPILER_ID STREQUAL "Intel") #ICC17u2: Many false positives for Wstrict-aliasing
+ add_cxx_compiler_flag(-Wstrict-aliasing)
+ endif()
+ endif()
+ # ICC17u2: overloaded virtual function "benchmark::Fixture::SetUp" is only partially overridden
+ # (because of deprecated overload)
+ add_cxx_compiler_flag(-wd654)
+ add_cxx_compiler_flag(-Wthread-safety)
+ if (HAVE_CXX_FLAG_WTHREAD_SAFETY)
+ cxx_feature_check(THREAD_SAFETY_ATTRIBUTES)
+ endif()
+
+ # On most UNIX like platforms g++ and clang++ define _GNU_SOURCE as a
+ # predefined macro, which turns on all of the wonderful libc extensions.
+ # However g++ doesn't do this in Cygwin so we have to define it ourselfs
+ # since we depend on GNU/POSIX/BSD extensions.
+ if (CYGWIN)
+ add_definitions(-D_GNU_SOURCE=1)
+ endif()
+
+ # Link time optimisation
+ if (BENCHMARK_ENABLE_LTO)
+ add_cxx_compiler_flag(-flto)
+ if ("${CMAKE_C_COMPILER_ID}" STREQUAL "GNU")
+ find_program(GCC_AR gcc-ar)
+ if (GCC_AR)
+ set(CMAKE_AR ${GCC_AR})
+ endif()
+ find_program(GCC_RANLIB gcc-ranlib)
+ if (GCC_RANLIB)
+ set(CMAKE_RANLIB ${GCC_RANLIB})
+ endif()
+ endif()
+ endif()
+
+ # Coverage build type
+ set(CMAKE_CXX_FLAGS_COVERAGE "${CMAKE_CXX_FLAGS_DEBUG}" CACHE STRING
+ "Flags used by the C++ compiler during coverage builds."
+ FORCE)
+ set(CMAKE_EXE_LINKER_FLAGS_COVERAGE
+ "${CMAKE_EXE_LINKER_FLAGS_DEBUG}" CACHE STRING
+ "Flags used for linking binaries during coverage builds."
+ FORCE)
+ set(CMAKE_SHARED_LINKER_FLAGS_COVERAGE
+ "${CMAKE_SHARED_LINKER_FLAGS_DEBUG}" CACHE STRING
+ "Flags used by the shared libraries linker during coverage builds."
+ FORCE)
+ mark_as_advanced(
+ CMAKE_CXX_FLAGS_COVERAGE
+ CMAKE_EXE_LINKER_FLAGS_COVERAGE
+ CMAKE_SHARED_LINKER_FLAGS_COVERAGE)
+ set(CMAKE_BUILD_TYPE "${CMAKE_BUILD_TYPE}" CACHE STRING
+ "Choose the type of build, options are: None Debug Release RelWithDebInfo MinSizeRel Coverage."
+ FORCE)
+ add_cxx_compiler_flag(--coverage COVERAGE)
+endif()
+
+if (BENCHMARK_USE_LIBCXX)
+ if ("${CMAKE_CXX_COMPILER_ID}" STREQUAL "Clang")
+ add_cxx_compiler_flag(-stdlib=libc++)
+ elseif ("${CMAKE_CXX_COMPILER_ID}" STREQUAL "GNU" OR
+ "${CMAKE_CXX_COMPILER_ID}" STREQUAL "Intel")
+ add_cxx_compiler_flag(-nostdinc++)
+ message("libc++ header path must be manually specified using CMAKE_CXX_FLAGS")
+ # Adding -nodefaultlibs directly to CMAKE__LINKER_FLAGS will break
+ # configuration checks such as 'find_package(Threads)'
+ list(APPEND BENCHMARK_CXX_LINKER_FLAGS -nodefaultlibs)
+ # -lc++ cannot be added directly to CMAKE__LINKER_FLAGS because
+ # linker flags appear before all linker inputs and -lc++ must appear after.
+ list(APPEND BENCHMARK_CXX_LIBRARIES c++)
+ else()
+ message(FATAL "-DBENCHMARK_USE_LIBCXX:BOOL=ON is not supported for compiler")
+ endif()
+endif(BENCHMARK_USE_LIBCXX)
+
+# C++ feature checks
+# Determine the correct regular expression engine to use
+cxx_feature_check(STD_REGEX)
+cxx_feature_check(GNU_POSIX_REGEX)
+cxx_feature_check(POSIX_REGEX)
+if(NOT HAVE_STD_REGEX AND NOT HAVE_GNU_POSIX_REGEX AND NOT HAVE_POSIX_REGEX)
+ message(FATAL_ERROR "Failed to determine the source files for the regular expression backend")
+endif()
+if (NOT BENCHMARK_ENABLE_EXCEPTIONS AND HAVE_STD_REGEX
+ AND NOT HAVE_GNU_POSIX_REGEX AND NOT HAVE_POSIX_REGEX)
+ message(WARNING "Using std::regex with exceptions disabled is not fully supported")
+endif()
+cxx_feature_check(STEADY_CLOCK)
+# Ensure we have pthreads
+find_package(Threads REQUIRED)
+
+# Set up directories
+include_directories(${PROJECT_SOURCE_DIR}/include)
+
+# Build the targets
+add_subdirectory(src)
+
+if (BENCHMARK_ENABLE_TESTING)
+ enable_testing()
+ add_subdirectory(test)
+endif()
diff --git a/extension/llm/custom_ops/spinquant/third-party/FFHT/external/benchmark/CONTRIBUTING.md b/extension/llm/custom_ops/spinquant/third-party/FFHT/external/benchmark/CONTRIBUTING.md
new file mode 100644
index 00000000000..43de4c9d470
--- /dev/null
+++ b/extension/llm/custom_ops/spinquant/third-party/FFHT/external/benchmark/CONTRIBUTING.md
@@ -0,0 +1,58 @@
+# How to contribute #
+
+We'd love to accept your patches and contributions to this project. There are
+a just a few small guidelines you need to follow.
+
+
+## Contributor License Agreement ##
+
+Contributions to any Google project must be accompanied by a Contributor
+License Agreement. This is not a copyright **assignment**, it simply gives
+Google permission to use and redistribute your contributions as part of the
+project.
+
+ * If you are an individual writing original source code and you're sure you
+ own the intellectual property, then you'll need to sign an [individual
+ CLA][].
+
+ * If you work for a company that wants to allow you to contribute your work,
+ then you'll need to sign a [corporate CLA][].
+
+You generally only need to submit a CLA once, so if you've already submitted
+one (even if it was for a different project), you probably don't need to do it
+again.
+
+[individual CLA]: https://developers.google.com/open-source/cla/individual
+[corporate CLA]: https://developers.google.com/open-source/cla/corporate
+
+Once your CLA is submitted (or if you already submitted one for
+another Google project), make a commit adding yourself to the
+[AUTHORS][] and [CONTRIBUTORS][] files. This commit can be part
+of your first [pull request][].
+
+[AUTHORS]: AUTHORS
+[CONTRIBUTORS]: CONTRIBUTORS
+
+
+## Submitting a patch ##
+
+ 1. It's generally best to start by opening a new issue describing the bug or
+ feature you're intending to fix. Even if you think it's relatively minor,
+ it's helpful to know what people are working on. Mention in the initial
+ issue that you are planning to work on that bug or feature so that it can
+ be assigned to you.
+
+ 1. Follow the normal process of [forking][] the project, and setup a new
+ branch to work in. It's important that each group of changes be done in
+ separate branches in order to ensure that a pull request only includes the
+ commits related to that bug or feature.
+
+ 1. Do your best to have [well-formed commit messages][] for each change.
+ This provides consistency throughout the project, and ensures that commit
+ messages are able to be formatted properly by various git tools.
+
+ 1. Finally, push the commits to your fork and submit a [pull request][].
+
+[forking]: https://help.github.com/articles/fork-a-repo
+[well-formed commit messages]: http://tbaggery.com/2008/04/19/a-note-about-git-commit-messages.html
+[pull request]: https://help.github.com/articles/creating-a-pull-request
diff --git a/extension/llm/custom_ops/spinquant/third-party/FFHT/external/benchmark/CONTRIBUTORS b/extension/llm/custom_ops/spinquant/third-party/FFHT/external/benchmark/CONTRIBUTORS
new file mode 100644
index 00000000000..9abb60865eb
--- /dev/null
+++ b/extension/llm/custom_ops/spinquant/third-party/FFHT/external/benchmark/CONTRIBUTORS
@@ -0,0 +1,59 @@
+# People who have agreed to one of the CLAs and can contribute patches.
+# The AUTHORS file lists the copyright holders; this file
+# lists people. For example, Google employees are listed here
+# but not in AUTHORS, because Google holds the copyright.
+#
+# Names should be added to this file only after verifying that
+# the individual or the individual's organization has agreed to
+# the appropriate Contributor License Agreement, found here:
+#
+# https://developers.google.com/open-source/cla/individual
+# https://developers.google.com/open-source/cla/corporate
+#
+# The agreement for individuals can be filled out on the web.
+#
+# When adding J Random Contributor's name to this file,
+# either J's name or J's organization's name should be
+# added to the AUTHORS file, depending on whether the
+# individual or corporate CLA was used.
+#
+# Names should be added to this file as:
+# Name
+#
+# Please keep the list sorted.
+
+Albert Pretorius
+Arne Beer
+Billy Robert O'Neal III
+Chris Kennelly
+Christopher Seymour
+David Coeurjolly
+Dominic Hamon
+Eric Fiselier
+Eugene Zhuk
+Evgeny Safronov
+Felix Homann
+Ismael Jimenez Martinez
+Jern-Kuan Leong
+Joao Paulo Magalhaes
+JianXiong Zhou
+Jussi Knuuttila
+Kaito Udagawa
+Kai Wolf
+Lei Xu
+Matt Clarkson
+Maxim Vafin
+Nick Hutchinson
+Oleksandr Sochka
+Pascal Leroy
+Paul Redmond
+Pierre Phaneuf
+Radoslav Yovchev
+Ray Glover
+Shuo Chen
+Tom Madams
+Yixuan Qiu
+Yusuke Suzuki
+Tobias Ulvgård
+Zbigniew Skowron
+Dominik Czarnota
diff --git a/extension/llm/custom_ops/spinquant/third-party/FFHT/external/benchmark/LICENSE b/extension/llm/custom_ops/spinquant/third-party/FFHT/external/benchmark/LICENSE
new file mode 100644
index 00000000000..d6456956733
--- /dev/null
+++ b/extension/llm/custom_ops/spinquant/third-party/FFHT/external/benchmark/LICENSE
@@ -0,0 +1,202 @@
+
+ Apache License
+ Version 2.0, January 2004
+ http://www.apache.org/licenses/
+
+ TERMS AND CONDITIONS FOR USE, REPRODUCTION, AND DISTRIBUTION
+
+ 1. Definitions.
+
+ "License" shall mean the terms and conditions for use, reproduction,
+ and distribution as defined by Sections 1 through 9 of this document.
+
+ "Licensor" shall mean the copyright owner or entity authorized by
+ the copyright owner that is granting the License.
+
+ "Legal Entity" shall mean the union of the acting entity and all
+ other entities that control, are controlled by, or are under common
+ control with that entity. For the purposes of this definition,
+ "control" means (i) the power, direct or indirect, to cause the
+ direction or management of such entity, whether by contract or
+ otherwise, or (ii) ownership of fifty percent (50%) or more of the
+ outstanding shares, or (iii) beneficial ownership of such entity.
+
+ "You" (or "Your") shall mean an individual or Legal Entity
+ exercising permissions granted by this License.
+
+ "Source" form shall mean the preferred form for making modifications,
+ including but not limited to software source code, documentation
+ source, and configuration files.
+
+ "Object" form shall mean any form resulting from mechanical
+ transformation or translation of a Source form, including but
+ not limited to compiled object code, generated documentation,
+ and conversions to other media types.
+
+ "Work" shall mean the work of authorship, whether in Source or
+ Object form, made available under the License, as indicated by a
+ copyright notice that is included in or attached to the work
+ (an example is provided in the Appendix below).
+
+ "Derivative Works" shall mean any work, whether in Source or Object
+ form, that is based on (or derived from) the Work and for which the
+ editorial revisions, annotations, elaborations, or other modifications
+ represent, as a whole, an original work of authorship. For the purposes
+ of this License, Derivative Works shall not include works that remain
+ separable from, or merely link (or bind by name) to the interfaces of,
+ the Work and Derivative Works thereof.
+
+ "Contribution" shall mean any work of authorship, including
+ the original version of the Work and any modifications or additions
+ to that Work or Derivative Works thereof, that is intentionally
+ submitted to Licensor for inclusion in the Work by the copyright owner
+ or by an individual or Legal Entity authorized to submit on behalf of
+ the copyright owner. For the purposes of this definition, "submitted"
+ means any form of electronic, verbal, or written communication sent
+ to the Licensor or its representatives, including but not limited to
+ communication on electronic mailing lists, source code control systems,
+ and issue tracking systems that are managed by, or on behalf of, the
+ Licensor for the purpose of discussing and improving the Work, but
+ excluding communication that is conspicuously marked or otherwise
+ designated in writing by the copyright owner as "Not a Contribution."
+
+ "Contributor" shall mean Licensor and any individual or Legal Entity
+ on behalf of whom a Contribution has been received by Licensor and
+ subsequently incorporated within the Work.
+
+ 2. Grant of Copyright License. Subject to the terms and conditions of
+ this License, each Contributor hereby grants to You a perpetual,
+ worldwide, non-exclusive, no-charge, royalty-free, irrevocable
+ copyright license to reproduce, prepare Derivative Works of,
+ publicly display, publicly perform, sublicense, and distribute the
+ Work and such Derivative Works in Source or Object form.
+
+ 3. Grant of Patent License. Subject to the terms and conditions of
+ this License, each Contributor hereby grants to You a perpetual,
+ worldwide, non-exclusive, no-charge, royalty-free, irrevocable
+ (except as stated in this section) patent license to make, have made,
+ use, offer to sell, sell, import, and otherwise transfer the Work,
+ where such license applies only to those patent claims licensable
+ by such Contributor that are necessarily infringed by their
+ Contribution(s) alone or by combination of their Contribution(s)
+ with the Work to which such Contribution(s) was submitted. If You
+ institute patent litigation against any entity (including a
+ cross-claim or counterclaim in a lawsuit) alleging that the Work
+ or a Contribution incorporated within the Work constitutes direct
+ or contributory patent infringement, then any patent licenses
+ granted to You under this License for that Work shall terminate
+ as of the date such litigation is filed.
+
+ 4. Redistribution. You may reproduce and distribute copies of the
+ Work or Derivative Works thereof in any medium, with or without
+ modifications, and in Source or Object form, provided that You
+ meet the following conditions:
+
+ (a) You must give any other recipients of the Work or
+ Derivative Works a copy of this License; and
+
+ (b) You must cause any modified files to carry prominent notices
+ stating that You changed the files; and
+
+ (c) You must retain, in the Source form of any Derivative Works
+ that You distribute, all copyright, patent, trademark, and
+ attribution notices from the Source form of the Work,
+ excluding those notices that do not pertain to any part of
+ the Derivative Works; and
+
+ (d) If the Work includes a "NOTICE" text file as part of its
+ distribution, then any Derivative Works that You distribute must
+ include a readable copy of the attribution notices contained
+ within such NOTICE file, excluding those notices that do not
+ pertain to any part of the Derivative Works, in at least one
+ of the following places: within a NOTICE text file distributed
+ as part of the Derivative Works; within the Source form or
+ documentation, if provided along with the Derivative Works; or,
+ within a display generated by the Derivative Works, if and
+ wherever such third-party notices normally appear. The contents
+ of the NOTICE file are for informational purposes only and
+ do not modify the License. You may add Your own attribution
+ notices within Derivative Works that You distribute, alongside
+ or as an addendum to the NOTICE text from the Work, provided
+ that such additional attribution notices cannot be construed
+ as modifying the License.
+
+ You may add Your own copyright statement to Your modifications and
+ may provide additional or different license terms and conditions
+ for use, reproduction, or distribution of Your modifications, or
+ for any such Derivative Works as a whole, provided Your use,
+ reproduction, and distribution of the Work otherwise complies with
+ the conditions stated in this License.
+
+ 5. Submission of Contributions. Unless You explicitly state otherwise,
+ any Contribution intentionally submitted for inclusion in the Work
+ by You to the Licensor shall be under the terms and conditions of
+ this License, without any additional terms or conditions.
+ Notwithstanding the above, nothing herein shall supersede or modify
+ the terms of any separate license agreement you may have executed
+ with Licensor regarding such Contributions.
+
+ 6. Trademarks. This License does not grant permission to use the trade
+ names, trademarks, service marks, or product names of the Licensor,
+ except as required for reasonable and customary use in describing the
+ origin of the Work and reproducing the content of the NOTICE file.
+
+ 7. Disclaimer of Warranty. Unless required by applicable law or
+ agreed to in writing, Licensor provides the Work (and each
+ Contributor provides its Contributions) on an "AS IS" BASIS,
+ WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or
+ implied, including, without limitation, any warranties or conditions
+ of TITLE, NON-INFRINGEMENT, MERCHANTABILITY, or FITNESS FOR A
+ PARTICULAR PURPOSE. You are solely responsible for determining the
+ appropriateness of using or redistributing the Work and assume any
+ risks associated with Your exercise of permissions under this License.
+
+ 8. Limitation of Liability. In no event and under no legal theory,
+ whether in tort (including negligence), contract, or otherwise,
+ unless required by applicable law (such as deliberate and grossly
+ negligent acts) or agreed to in writing, shall any Contributor be
+ liable to You for damages, including any direct, indirect, special,
+ incidental, or consequential damages of any character arising as a
+ result of this License or out of the use or inability to use the
+ Work (including but not limited to damages for loss of goodwill,
+ work stoppage, computer failure or malfunction, or any and all
+ other commercial damages or losses), even if such Contributor
+ has been advised of the possibility of such damages.
+
+ 9. Accepting Warranty or Additional Liability. While redistributing
+ the Work or Derivative Works thereof, You may choose to offer,
+ and charge a fee for, acceptance of support, warranty, indemnity,
+ or other liability obligations and/or rights consistent with this
+ License. However, in accepting such obligations, You may act only
+ on Your own behalf and on Your sole responsibility, not on behalf
+ of any other Contributor, and only if You agree to indemnify,
+ defend, and hold each Contributor harmless for any liability
+ incurred by, or claims asserted against, such Contributor by reason
+ of your accepting any such warranty or additional liability.
+
+ END OF TERMS AND CONDITIONS
+
+ APPENDIX: How to apply the Apache License to your work.
+
+ To apply the Apache License to your work, attach the following
+ boilerplate notice, with the fields enclosed by brackets "[]"
+ replaced with your own identifying information. (Don't include
+ the brackets!) The text should be enclosed in the appropriate
+ comment syntax for the file format. We also recommend that a
+ file or class name and description of purpose be included on the
+ same "printed page" as the copyright notice for easier
+ identification within third-party archives.
+
+ Copyright [yyyy] [name of copyright owner]
+
+ Licensed under the Apache License, Version 2.0 (the "License");
+ you may not use this file except in compliance with the License.
+ You may obtain a copy of the License at
+
+ http://www.apache.org/licenses/LICENSE-2.0
+
+ Unless required by applicable law or agreed to in writing, software
+ distributed under the License is distributed on an "AS IS" BASIS,
+ WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ See the License for the specific language governing permissions and
+ limitations under the License.
diff --git a/extension/llm/custom_ops/spinquant/third-party/FFHT/external/benchmark/README.md b/extension/llm/custom_ops/spinquant/third-party/FFHT/external/benchmark/README.md
new file mode 100644
index 00000000000..2430d93bf9c
--- /dev/null
+++ b/extension/llm/custom_ops/spinquant/third-party/FFHT/external/benchmark/README.md
@@ -0,0 +1,726 @@
+# benchmark
+[](https://travis-ci.org/google/benchmark)
+[](https://ci.appveyor.com/project/google/benchmark/branch/master)
+[](https://coveralls.io/r/google/benchmark)
+
+A library to support the benchmarking of functions, similar to unit-tests.
+
+Discussion group: https://groups.google.com/d/forum/benchmark-discuss
+
+IRC channel: https://freenode.net #googlebenchmark
+
+[Known issues and common problems](#known-issues)
+
+[Additional Tooling Documentation](docs/tools.md)
+
+## Example usage
+### Basic usage
+Define a function that executes the code to be measured.
+
+```c++
+static void BM_StringCreation(benchmark::State& state) {
+ while (state.KeepRunning())
+ std::string empty_string;
+}
+// Register the function as a benchmark
+BENCHMARK(BM_StringCreation);
+
+// Define another benchmark
+static void BM_StringCopy(benchmark::State& state) {
+ std::string x = "hello";
+ while (state.KeepRunning())
+ std::string copy(x);
+}
+BENCHMARK(BM_StringCopy);
+
+BENCHMARK_MAIN();
+```
+
+### Passing arguments
+Sometimes a family of benchmarks can be implemented with just one routine that
+takes an extra argument to specify which one of the family of benchmarks to
+run. For example, the following code defines a family of benchmarks for
+measuring the speed of `memcpy()` calls of different lengths:
+
+```c++
+static void BM_memcpy(benchmark::State& state) {
+ char* src = new char[state.range(0)];
+ char* dst = new char[state.range(0)];
+ memset(src, 'x', state.range(0));
+ while (state.KeepRunning())
+ memcpy(dst, src, state.range(0));
+ state.SetBytesProcessed(int64_t(state.iterations()) *
+ int64_t(state.range(0)));
+ delete[] src;
+ delete[] dst;
+}
+BENCHMARK(BM_memcpy)->Arg(8)->Arg(64)->Arg(512)->Arg(1<<10)->Arg(8<<10);
+```
+
+The preceding code is quite repetitive, and can be replaced with the following
+short-hand. The following invocation will pick a few appropriate arguments in
+the specified range and will generate a benchmark for each such argument.
+
+```c++
+BENCHMARK(BM_memcpy)->Range(8, 8<<10);
+```
+
+By default the arguments in the range are generated in multiples of eight and
+the command above selects [ 8, 64, 512, 4k, 8k ]. In the following code the
+range multiplier is changed to multiples of two.
+
+```c++
+BENCHMARK(BM_memcpy)->RangeMultiplier(2)->Range(8, 8<<10);
+```
+Now arguments generated are [ 8, 16, 32, 64, 128, 256, 512, 1024, 2k, 4k, 8k ].
+
+You might have a benchmark that depends on two or more inputs. For example, the
+following code defines a family of benchmarks for measuring the speed of set
+insertion.
+
+```c++
+static void BM_SetInsert(benchmark::State& state) {
+ while (state.KeepRunning()) {
+ state.PauseTiming();
+ std::set data = ConstructRandomSet(state.range(0));
+ state.ResumeTiming();
+ for (int j = 0; j < state.range(1); ++j)
+ data.insert(RandomNumber());
+ }
+}
+BENCHMARK(BM_SetInsert)
+ ->Args({1<<10, 1})
+ ->Args({1<<10, 8})
+ ->Args({1<<10, 64})
+ ->Args({1<<10, 512})
+ ->Args({8<<10, 1})
+ ->Args({8<<10, 8})
+ ->Args({8<<10, 64})
+ ->Args({8<<10, 512});
+```
+
+The preceding code is quite repetitive, and can be replaced with the following
+short-hand. The following macro will pick a few appropriate arguments in the
+product of the two specified ranges and will generate a benchmark for each such
+pair.
+
+```c++
+BENCHMARK(BM_SetInsert)->Ranges({{1<<10, 8<<10}, {1, 512}});
+```
+
+For more complex patterns of inputs, passing a custom function to `Apply` allows
+programmatic specification of an arbitrary set of arguments on which to run the
+benchmark. The following example enumerates a dense range on one parameter,
+and a sparse range on the second.
+
+```c++
+static void CustomArguments(benchmark::internal::Benchmark* b) {
+ for (int i = 0; i <= 10; ++i)
+ for (int j = 32; j <= 1024*1024; j *= 8)
+ b->Args({i, j});
+}
+BENCHMARK(BM_SetInsert)->Apply(CustomArguments);
+```
+
+### Calculate asymptotic complexity (Big O)
+Asymptotic complexity might be calculated for a family of benchmarks. The
+following code will calculate the coefficient for the high-order term in the
+running time and the normalized root-mean square error of string comparison.
+
+```c++
+static void BM_StringCompare(benchmark::State& state) {
+ std::string s1(state.range(0), '-');
+ std::string s2(state.range(0), '-');
+ while (state.KeepRunning()) {
+ benchmark::DoNotOptimize(s1.compare(s2));
+ }
+ state.SetComplexityN(state.range(0));
+}
+BENCHMARK(BM_StringCompare)
+ ->RangeMultiplier(2)->Range(1<<10, 1<<18)->Complexity(benchmark::oN);
+```
+
+As shown in the following invocation, asymptotic complexity might also be
+calculated automatically.
+
+```c++
+BENCHMARK(BM_StringCompare)
+ ->RangeMultiplier(2)->Range(1<<10, 1<<18)->Complexity();
+```
+
+The following code will specify asymptotic complexity with a lambda function,
+that might be used to customize high-order term calculation.
+
+```c++
+BENCHMARK(BM_StringCompare)->RangeMultiplier(2)
+ ->Range(1<<10, 1<<18)->Complexity([](int n)->double{return n; });
+```
+
+### Templated benchmarks
+Templated benchmarks work the same way: This example produces and consumes
+messages of size `sizeof(v)` `range_x` times. It also outputs throughput in the
+absence of multiprogramming.
+
+```c++
+template int BM_Sequential(benchmark::State& state) {
+ Q q;
+ typename Q::value_type v;
+ while (state.KeepRunning()) {
+ for (int i = state.range(0); i--; )
+ q.push(v);
+ for (int e = state.range(0); e--; )
+ q.Wait(&v);
+ }
+ // actually messages, not bytes:
+ state.SetBytesProcessed(
+ static_cast(state.iterations())*state.range(0));
+}
+BENCHMARK_TEMPLATE(BM_Sequential, WaitQueue)->Range(1<<0, 1<<10);
+```
+
+Three macros are provided for adding benchmark templates.
+
+```c++
+#if __cplusplus >= 201103L // C++11 and greater.
+#define BENCHMARK_TEMPLATE(func, ...) // Takes any number of parameters.
+#else // C++ < C++11
+#define BENCHMARK_TEMPLATE(func, arg1)
+#endif
+#define BENCHMARK_TEMPLATE1(func, arg1)
+#define BENCHMARK_TEMPLATE2(func, arg1, arg2)
+```
+
+## Passing arbitrary arguments to a benchmark
+In C++11 it is possible to define a benchmark that takes an arbitrary number
+of extra arguments. The `BENCHMARK_CAPTURE(func, test_case_name, ...args)`
+macro creates a benchmark that invokes `func` with the `benchmark::State` as
+the first argument followed by the specified `args...`.
+The `test_case_name` is appended to the name of the benchmark and
+should describe the values passed.
+
+```c++
+template `
+void BM_takes_args(benchmark::State& state, ExtraArgs&&... extra_args) {
+ [...]
+}
+// Registers a benchmark named "BM_takes_args/int_string_test` that passes
+// the specified values to `extra_args`.
+BENCHMARK_CAPTURE(BM_takes_args, int_string_test, 42, std::string("abc"));
+```
+Note that elements of `...args` may refer to global variables. Users should
+avoid modifying global state inside of a benchmark.
+
+## Using RegisterBenchmark(name, fn, args...)
+
+The `RegisterBenchmark(name, func, args...)` function provides an alternative
+way to create and register benchmarks.
+`RegisterBenchmark(name, func, args...)` creates, registers, and returns a
+pointer to a new benchmark with the specified `name` that invokes
+`func(st, args...)` where `st` is a `benchmark::State` object.
+
+Unlike the `BENCHMARK` registration macros, which can only be used at the global
+scope, the `RegisterBenchmark` can be called anywhere. This allows for
+benchmark tests to be registered programmatically.
+
+Additionally `RegisterBenchmark` allows any callable object to be registered
+as a benchmark. Including capturing lambdas and function objects. This
+allows the creation
+
+For Example:
+```c++
+auto BM_test = [](benchmark::State& st, auto Inputs) { /* ... */ };
+
+int main(int argc, char** argv) {
+ for (auto& test_input : { /* ... */ })
+ benchmark::RegisterBenchmark(test_input.name(), BM_test, test_input);
+ benchmark::Initialize(&argc, argv);
+ benchmark::RunSpecifiedBenchmarks();
+}
+```
+
+### Multithreaded benchmarks
+In a multithreaded test (benchmark invoked by multiple threads simultaneously),
+it is guaranteed that none of the threads will start until all have called
+`KeepRunning`, and all will have finished before KeepRunning returns false. As
+such, any global setup or teardown can be wrapped in a check against the thread
+index:
+
+```c++
+static void BM_MultiThreaded(benchmark::State& state) {
+ if (state.thread_index == 0) {
+ // Setup code here.
+ }
+ while (state.KeepRunning()) {
+ // Run the test as normal.
+ }
+ if (state.thread_index == 0) {
+ // Teardown code here.
+ }
+}
+BENCHMARK(BM_MultiThreaded)->Threads(2);
+```
+
+If the benchmarked code itself uses threads and you want to compare it to
+single-threaded code, you may want to use real-time ("wallclock") measurements
+for latency comparisons:
+
+```c++
+BENCHMARK(BM_test)->Range(8, 8<<10)->UseRealTime();
+```
+
+Without `UseRealTime`, CPU time is used by default.
+
+
+## Manual timing
+For benchmarking something for which neither CPU time nor real-time are
+correct or accurate enough, completely manual timing is supported using
+the `UseManualTime` function.
+
+When `UseManualTime` is used, the benchmarked code must call
+`SetIterationTime` once per iteration of the `KeepRunning` loop to
+report the manually measured time.
+
+An example use case for this is benchmarking GPU execution (e.g. OpenCL
+or CUDA kernels, OpenGL or Vulkan or Direct3D draw calls), which cannot
+be accurately measured using CPU time or real-time. Instead, they can be
+measured accurately using a dedicated API, and these measurement results
+can be reported back with `SetIterationTime`.
+
+```c++
+static void BM_ManualTiming(benchmark::State& state) {
+ int microseconds = state.range(0);
+ std::chrono::duration sleep_duration {
+ static_cast(microseconds)
+ };
+
+ while (state.KeepRunning()) {
+ auto start = std::chrono::high_resolution_clock::now();
+ // Simulate some useful workload with a sleep
+ std::this_thread::sleep_for(sleep_duration);
+ auto end = std::chrono::high_resolution_clock::now();
+
+ auto elapsed_seconds =
+ std::chrono::duration_cast>(
+ end - start);
+
+ state.SetIterationTime(elapsed_seconds.count());
+ }
+}
+BENCHMARK(BM_ManualTiming)->Range(1, 1<<17)->UseManualTime();
+```
+
+### Preventing optimisation
+To prevent a value or expression from being optimized away by the compiler
+the `benchmark::DoNotOptimize(...)` and `benchmark::ClobberMemory()`
+functions can be used.
+
+```c++
+static void BM_test(benchmark::State& state) {
+ while (state.KeepRunning()) {
+ int x = 0;
+ for (int i=0; i < 64; ++i) {
+ benchmark::DoNotOptimize(x += i);
+ }
+ }
+}
+```
+
+`DoNotOptimize()` forces the *result* of `` to be stored in either
+memory or a register. For GNU based compilers it acts as read/write barrier
+for global memory. More specifically it forces the compiler to flush pending
+writes to memory and reload any other values as necessary.
+
+Note that `DoNotOptimize()` does not prevent optimizations on ``
+in any way. `` may even be removed entirely when the result is already
+known. For example:
+
+```c++
+ /* Example 1: `` is removed entirely. */
+ int foo(int x) { return x + 42; }
+ while (...) DoNotOptimize(foo(0)); // Optimized to DoNotOptimize(42);
+
+ /* Example 2: Result of '' is only reused */
+ int bar(int) __attribute__((const));
+ while (...) DoNotOptimize(bar(0)); // Optimized to:
+ // int __result__ = bar(0);
+ // while (...) DoNotOptimize(__result__);
+```
+
+The second tool for preventing optimizations is `ClobberMemory()`. In essence
+`ClobberMemory()` forces the compiler to perform all pending writes to global
+memory. Memory managed by block scope objects must be "escaped" using
+`DoNotOptimize(...)` before it can be clobbered. In the below example
+`ClobberMemory()` prevents the call to `v.push_back(42)` from being optimized
+away.
+
+```c++
+static void BM_vector_push_back(benchmark::State& state) {
+ while (state.KeepRunning()) {
+ std::vector v;
+ v.reserve(1);
+ benchmark::DoNotOptimize(v.data()); // Allow v.data() to be clobbered.
+ v.push_back(42);
+ benchmark::ClobberMemory(); // Force 42 to be written to memory.
+ }
+}
+```
+
+Note that `ClobberMemory()` is only available for GNU or MSVC based compilers.
+
+### Set time unit manually
+If a benchmark runs a few milliseconds it may be hard to visually compare the
+measured times, since the output data is given in nanoseconds per default. In
+order to manually set the time unit, you can specify it manually:
+
+```c++
+BENCHMARK(BM_test)->Unit(benchmark::kMillisecond);
+```
+
+## Controlling number of iterations
+In all cases, the number of iterations for which the benchmark is run is
+governed by the amount of time the benchmark takes. Concretely, the number of
+iterations is at least one, not more than 1e9, until CPU time is greater than
+the minimum time, or the wallclock time is 5x minimum time. The minimum time is
+set as a flag `--benchmark_min_time` or per benchmark by calling `MinTime` on
+the registered benchmark object.
+
+## Reporting the mean and standard devation by repeated benchmarks
+By default each benchmark is run once and that single result is reported.
+However benchmarks are often noisy and a single result may not be representative
+of the overall behavior. For this reason it's possible to repeatedly rerun the
+benchmark.
+
+The number of runs of each benchmark is specified globally by the
+`--benchmark_repetitions` flag or on a per benchmark basis by calling
+`Repetitions` on the registered benchmark object. When a benchmark is run
+more than once the mean and standard deviation of the runs will be reported.
+
+Additionally the `--benchmark_report_aggregates_only={true|false}` flag or
+`ReportAggregatesOnly(bool)` function can be used to change how repeated tests
+are reported. By default the result of each repeated run is reported. When this
+option is 'true' only the mean and standard deviation of the runs is reported.
+Calling `ReportAggregatesOnly(bool)` on a registered benchmark object overrides
+the value of the flag for that benchmark.
+
+## Fixtures
+Fixture tests are created by
+first defining a type that derives from ::benchmark::Fixture and then
+creating/registering the tests using the following macros:
+
+* `BENCHMARK_F(ClassName, Method)`
+* `BENCHMARK_DEFINE_F(ClassName, Method)`
+* `BENCHMARK_REGISTER_F(ClassName, Method)`
+
+For Example:
+
+```c++
+class MyFixture : public benchmark::Fixture {};
+
+BENCHMARK_F(MyFixture, FooTest)(benchmark::State& st) {
+ while (st.KeepRunning()) {
+ ...
+ }
+}
+
+BENCHMARK_DEFINE_F(MyFixture, BarTest)(benchmark::State& st) {
+ while (st.KeepRunning()) {
+ ...
+ }
+}
+/* BarTest is NOT registered */
+BENCHMARK_REGISTER_F(MyFixture, BarTest)->Threads(2);
+/* BarTest is now registered */
+```
+
+
+## User-defined counters
+
+You can add your own counters with user-defined names. The example below
+will add columns "Foo", "Bar" and "Baz" in its output:
+
+```c++
+static void UserCountersExample1(benchmark::State& state) {
+ double numFoos = 0, numBars = 0, numBazs = 0;
+ while (state.KeepRunning()) {
+ // ... count Foo,Bar,Baz events
+ }
+ state.counters["Foo"] = numFoos;
+ state.counters["Bar"] = numBars;
+ state.counters["Baz"] = numBazs;
+}
+```
+
+The `state.counters` object is a `std::map` with `std::string` keys
+and `Counter` values. The latter is a `double`-like class, via an implicit
+conversion to `double&`. Thus you can use all of the standard arithmetic
+assignment operators (`=,+=,-=,*=,/=`) to change the value of each counter.
+
+In multithreaded benchmarks, each counter is set on the calling thread only.
+When the benchmark finishes, the counters from each thread will be summed;
+the resulting sum is the value which will be shown for the benchmark.
+
+The `Counter` constructor accepts two parameters: the value as a `double`
+and a bit flag which allows you to show counters as rates and/or as
+per-thread averages:
+
+```c++
+ // sets a simple counter
+ state.counters["Foo"] = numFoos;
+
+ // Set the counter as a rate. It will be presented divided
+ // by the duration of the benchmark.
+ state.counters["FooRate"] = Counter(numFoos, benchmark::Counter::kIsRate);
+
+ // Set the counter as a thread-average quantity. It will
+ // be presented divided by the number of threads.
+ state.counters["FooAvg"] = Counter(numFoos, benchmark::Counter::kAvgThreads);
+
+ // There's also a combined flag:
+ state.counters["FooAvgRate"] = Counter(numFoos,benchmark::Counter::kAvgThreadsRate);
+```
+
+When you're compiling in C++11 mode or later you can use `insert()` with
+`std::initializer_list`:
+
+```c++
+ // With C++11, this can be done:
+ state.counters.insert({{"Foo", numFoos}, {"Bar", numBars}, {"Baz", numBazs}});
+ // ... instead of:
+ state.counters["Foo"] = numFoos;
+ state.counters["Bar"] = numBars;
+ state.counters["Baz"] = numBazs;
+```
+
+### Counter reporting
+
+When using the console reporter, by default, user counters are are printed at
+the end after the table, the same way as ``bytes_processed`` and
+``items_processed``. This is best for cases in which there are few counters,
+or where there are only a couple of lines per benchmark. Here's an example of
+the default output:
+
+```
+------------------------------------------------------------------------------
+Benchmark Time CPU Iterations UserCounters...
+------------------------------------------------------------------------------
+BM_UserCounter/threads:8 2248 ns 10277 ns 68808 Bar=16 Bat=40 Baz=24 Foo=8
+BM_UserCounter/threads:1 9797 ns 9788 ns 71523 Bar=2 Bat=5 Baz=3 Foo=1024m
+BM_UserCounter/threads:2 4924 ns 9842 ns 71036 Bar=4 Bat=10 Baz=6 Foo=2
+BM_UserCounter/threads:4 2589 ns 10284 ns 68012 Bar=8 Bat=20 Baz=12 Foo=4
+BM_UserCounter/threads:8 2212 ns 10287 ns 68040 Bar=16 Bat=40 Baz=24 Foo=8
+BM_UserCounter/threads:16 1782 ns 10278 ns 68144 Bar=32 Bat=80 Baz=48 Foo=16
+BM_UserCounter/threads:32 1291 ns 10296 ns 68256 Bar=64 Bat=160 Baz=96 Foo=32
+BM_UserCounter/threads:4 2615 ns 10307 ns 68040 Bar=8 Bat=20 Baz=12 Foo=4
+BM_Factorial 26 ns 26 ns 26608979 40320
+BM_Factorial/real_time 26 ns 26 ns 26587936 40320
+BM_CalculatePiRange/1 16 ns 16 ns 45704255 0
+BM_CalculatePiRange/8 73 ns 73 ns 9520927 3.28374
+BM_CalculatePiRange/64 609 ns 609 ns 1140647 3.15746
+BM_CalculatePiRange/512 4900 ns 4901 ns 142696 3.14355
+```
+
+If this doesn't suit you, you can print each counter as a table column by
+passing the flag `--benchmark_counters_tabular=true` to the benchmark
+application. This is best for cases in which there are a lot of counters, or
+a lot of lines per individual benchmark. Note that this will trigger a
+reprinting of the table header any time the counter set changes between
+individual benchmarks. Here's an example of corresponding output when
+`--benchmark_counters_tabular=true` is passed:
+
+```
+---------------------------------------------------------------------------------------
+Benchmark Time CPU Iterations Bar Bat Baz Foo
+---------------------------------------------------------------------------------------
+BM_UserCounter/threads:8 2198 ns 9953 ns 70688 16 40 24 8
+BM_UserCounter/threads:1 9504 ns 9504 ns 73787 2 5 3 1
+BM_UserCounter/threads:2 4775 ns 9550 ns 72606 4 10 6 2
+BM_UserCounter/threads:4 2508 ns 9951 ns 70332 8 20 12 4
+BM_UserCounter/threads:8 2055 ns 9933 ns 70344 16 40 24 8
+BM_UserCounter/threads:16 1610 ns 9946 ns 70720 32 80 48 16
+BM_UserCounter/threads:32 1192 ns 9948 ns 70496 64 160 96 32
+BM_UserCounter/threads:4 2506 ns 9949 ns 70332 8 20 12 4
+--------------------------------------------------------------
+Benchmark Time CPU Iterations
+--------------------------------------------------------------
+BM_Factorial 26 ns 26 ns 26392245 40320
+BM_Factorial/real_time 26 ns 26 ns 26494107 40320
+BM_CalculatePiRange/1 15 ns 15 ns 45571597 0
+BM_CalculatePiRange/8 74 ns 74 ns 9450212 3.28374
+BM_CalculatePiRange/64 595 ns 595 ns 1173901 3.15746
+BM_CalculatePiRange/512 4752 ns 4752 ns 147380 3.14355
+BM_CalculatePiRange/4k 37970 ns 37972 ns 18453 3.14184
+BM_CalculatePiRange/32k 303733 ns 303744 ns 2305 3.14162
+BM_CalculatePiRange/256k 2434095 ns 2434186 ns 288 3.1416
+BM_CalculatePiRange/1024k 9721140 ns 9721413 ns 71 3.14159
+BM_CalculatePi/threads:8 2255 ns 9943 ns 70936
+```
+Note above the additional header printed when the benchmark changes from
+``BM_UserCounter`` to ``BM_Factorial``. This is because ``BM_Factorial`` does
+not have the same counter set as ``BM_UserCounter``.
+
+## Exiting Benchmarks in Error
+
+When errors caused by external influences, such as file I/O and network
+communication, occur within a benchmark the
+`State::SkipWithError(const char* msg)` function can be used to skip that run
+of benchmark and report the error. Note that only future iterations of the
+`KeepRunning()` are skipped. Users may explicitly return to exit the
+benchmark immediately.
+
+The `SkipWithError(...)` function may be used at any point within the benchmark,
+including before and after the `KeepRunning()` loop.
+
+For example:
+
+```c++
+static void BM_test(benchmark::State& state) {
+ auto resource = GetResource();
+ if (!resource.good()) {
+ state.SkipWithError("Resource is not good!");
+ // KeepRunning() loop will not be entered.
+ }
+ while (state.KeepRunning()) {
+ auto data = resource.read_data();
+ if (!resource.good()) {
+ state.SkipWithError("Failed to read data!");
+ break; // Needed to skip the rest of the iteration.
+ }
+ do_stuff(data);
+ }
+}
+```
+
+## Running a subset of the benchmarks
+
+The `--benchmark_filter=` option can be used to only run the benchmarks
+which match the specified ``. For example:
+
+```bash
+$ ./run_benchmarks.x --benchmark_filter=BM_memcpy/32
+Run on (1 X 2300 MHz CPU )
+2016-06-25 19:34:24
+Benchmark Time CPU Iterations
+----------------------------------------------------
+BM_memcpy/32 11 ns 11 ns 79545455
+BM_memcpy/32k 2181 ns 2185 ns 324074
+BM_memcpy/32 12 ns 12 ns 54687500
+BM_memcpy/32k 1834 ns 1837 ns 357143
+```
+
+
+## Output Formats
+The library supports multiple output formats. Use the
+`--benchmark_format=` flag to set the format type. `console`
+is the default format.
+
+The Console format is intended to be a human readable format. By default
+the format generates color output. Context is output on stderr and the
+tabular data on stdout. Example tabular output looks like:
+```
+Benchmark Time(ns) CPU(ns) Iterations
+----------------------------------------------------------------------
+BM_SetInsert/1024/1 28928 29349 23853 133.097kB/s 33.2742k items/s
+BM_SetInsert/1024/8 32065 32913 21375 949.487kB/s 237.372k items/s
+BM_SetInsert/1024/10 33157 33648 21431 1.13369MB/s 290.225k items/s
+```
+
+The JSON format outputs human readable json split into two top level attributes.
+The `context` attribute contains information about the run in general, including
+information about the CPU and the date.
+The `benchmarks` attribute contains a list of ever benchmark run. Example json
+output looks like:
+```json
+{
+ "context": {
+ "date": "2015/03/17-18:40:25",
+ "num_cpus": 40,
+ "mhz_per_cpu": 2801,
+ "cpu_scaling_enabled": false,
+ "build_type": "debug"
+ },
+ "benchmarks": [
+ {
+ "name": "BM_SetInsert/1024/1",
+ "iterations": 94877,
+ "real_time": 29275,
+ "cpu_time": 29836,
+ "bytes_per_second": 134066,
+ "items_per_second": 33516
+ },
+ {
+ "name": "BM_SetInsert/1024/8",
+ "iterations": 21609,
+ "real_time": 32317,
+ "cpu_time": 32429,
+ "bytes_per_second": 986770,
+ "items_per_second": 246693
+ },
+ {
+ "name": "BM_SetInsert/1024/10",
+ "iterations": 21393,
+ "real_time": 32724,
+ "cpu_time": 33355,
+ "bytes_per_second": 1199226,
+ "items_per_second": 299807
+ }
+ ]
+}
+```
+
+The CSV format outputs comma-separated values. The `context` is output on stderr
+and the CSV itself on stdout. Example CSV output looks like:
+```
+name,iterations,real_time,cpu_time,bytes_per_second,items_per_second,label
+"BM_SetInsert/1024/1",65465,17890.7,8407.45,475768,118942,
+"BM_SetInsert/1024/8",116606,18810.1,9766.64,3.27646e+06,819115,
+"BM_SetInsert/1024/10",106365,17238.4,8421.53,4.74973e+06,1.18743e+06,
+```
+
+## Output Files
+The library supports writing the output of the benchmark to a file specified
+by `--benchmark_out=`. The format of the output can be specified
+using `--benchmark_out_format={json|console|csv}`. Specifying
+`--benchmark_out` does not suppress the console output.
+
+## Debug vs Release
+By default, benchmark builds as a debug library. You will see a warning in the output when this is the case. To build it as a release library instead, use:
+
+```
+cmake -DCMAKE_BUILD_TYPE=Release
+```
+
+To enable link-time optimisation, use
+
+```
+cmake -DCMAKE_BUILD_TYPE=Release -DBENCHMARK_ENABLE_LTO=true
+```
+
+## Linking against the library
+When using gcc, it is necessary to link against pthread to avoid runtime exceptions.
+This is due to how gcc implements std::thread.
+See [issue #67](https://github.com/google/benchmark/issues/67) for more details.
+
+## Compiler Support
+
+Google Benchmark uses C++11 when building the library. As such we require
+a modern C++ toolchain, both compiler and standard library.
+
+The following minimum versions are strongly recommended build the library:
+
+* GCC 4.8
+* Clang 3.4
+* Visual Studio 2013
+* Intel 2015 Update 1
+
+Anything older *may* work.
+
+Note: Using the library and its headers in C++03 is supported. C++11 is only
+required to build the library.
+
+# Known Issues
+
+### Windows
+
+* Users must manually link `shlwapi.lib`. Failure to do so may result
+in unresolved symbols.
+
diff --git a/extension/llm/custom_ops/spinquant/third-party/FFHT/external/benchmark/appveyor.yml b/extension/llm/custom_ops/spinquant/third-party/FFHT/external/benchmark/appveyor.yml
new file mode 100644
index 00000000000..e084f386b77
--- /dev/null
+++ b/extension/llm/custom_ops/spinquant/third-party/FFHT/external/benchmark/appveyor.yml
@@ -0,0 +1,56 @@
+version: '{build}'
+
+image: Visual Studio 2017
+
+configuration:
+ - Debug
+ - Release
+
+environment:
+ matrix:
+ - compiler: msvc-15-seh
+ generator: "Visual Studio 15 2017"
+
+ - compiler: msvc-15-seh
+ generator: "Visual Studio 15 2017 Win64"
+
+ - compiler: msvc-14-seh
+ generator: "Visual Studio 14 2015"
+
+ - compiler: msvc-14-seh
+ generator: "Visual Studio 14 2015 Win64"
+
+ - compiler: msvc-12-seh
+ generator: "Visual Studio 12 2013"
+
+ - compiler: msvc-12-seh
+ generator: "Visual Studio 12 2013 Win64"
+
+ - compiler: gcc-5.3.0-posix
+ generator: "MinGW Makefiles"
+ cxx_path: 'C:\mingw-w64\i686-5.3.0-posix-dwarf-rt_v4-rev0\mingw32\bin'
+ APPVEYOR_BUILD_WORKER_IMAGE: Visual Studio 2015
+
+matrix:
+ fast_finish: true
+
+install:
+ # git bash conflicts with MinGW makefiles
+ - if "%generator%"=="MinGW Makefiles" (set "PATH=%PATH:C:\Program Files\Git\usr\bin;=%")
+ - if not "%cxx_path%"=="" (set "PATH=%PATH%;%cxx_path%")
+
+build_script:
+ - md _build -Force
+ - cd _build
+ - echo %configuration%
+ - cmake -G "%generator%" "-DCMAKE_BUILD_TYPE=%configuration%" ..
+ - cmake --build . --config %configuration%
+
+test_script:
+ - ctest -c %configuration% --timeout 300 --output-on-failure
+
+artifacts:
+ - path: '_build/CMakeFiles/*.log'
+ name: logs
+ - path: '_build/Testing/**/*.xml'
+ name: test_results
diff --git a/extension/llm/custom_ops/spinquant/third-party/FFHT/external/benchmark/cmake/AddCXXCompilerFlag.cmake b/extension/llm/custom_ops/spinquant/third-party/FFHT/external/benchmark/cmake/AddCXXCompilerFlag.cmake
new file mode 100644
index 00000000000..0b176ba27f1
--- /dev/null
+++ b/extension/llm/custom_ops/spinquant/third-party/FFHT/external/benchmark/cmake/AddCXXCompilerFlag.cmake
@@ -0,0 +1,64 @@
+# - Adds a compiler flag if it is supported by the compiler
+#
+# This function checks that the supplied compiler flag is supported and then
+# adds it to the corresponding compiler flags
+#
+# add_cxx_compiler_flag( [])
+#
+# - Example
+#
+# include(AddCXXCompilerFlag)
+# add_cxx_compiler_flag(-Wall)
+# add_cxx_compiler_flag(-no-strict-aliasing RELEASE)
+# Requires CMake 2.6+
+
+if(__add_cxx_compiler_flag)
+ return()
+endif()
+set(__add_cxx_compiler_flag INCLUDED)
+
+include(CheckCXXCompilerFlag)
+
+function(mangle_compiler_flag FLAG OUTPUT)
+ string(TOUPPER "HAVE_CXX_FLAG_${FLAG}" SANITIZED_FLAG)
+ string(REPLACE "+" "X" SANITIZED_FLAG ${SANITIZED_FLAG})
+ string(REGEX REPLACE "[^A-Za-z_0-9]" "_" SANITIZED_FLAG ${SANITIZED_FLAG})
+ string(REGEX REPLACE "_+" "_" SANITIZED_FLAG ${SANITIZED_FLAG})
+ set(${OUTPUT} "${SANITIZED_FLAG}" PARENT_SCOPE)
+endfunction(mangle_compiler_flag)
+
+function(add_cxx_compiler_flag FLAG)
+ mangle_compiler_flag("${FLAG}" MANGLED_FLAG)
+ set(OLD_CMAKE_REQUIRED_FLAGS "${CMAKE_REQUIRED_FLAGS}")
+ set(CMAKE_REQUIRED_FLAGS "${CMAKE_REQUIRED_FLAGS} ${FLAG}")
+ check_cxx_compiler_flag("${FLAG}" ${MANGLED_FLAG})
+ set(CMAKE_REQUIRED_FLAGS "${OLD_CMAKE_REQUIRED_FLAGS}")
+ if(${MANGLED_FLAG})
+ set(VARIANT ${ARGV1})
+ if(ARGV1)
+ string(TOUPPER "_${VARIANT}" VARIANT)
+ endif()
+ set(CMAKE_CXX_FLAGS${VARIANT} "${CMAKE_CXX_FLAGS${VARIANT}} ${FLAG}" PARENT_SCOPE)
+ endif()
+endfunction()
+
+function(add_required_cxx_compiler_flag FLAG)
+ mangle_compiler_flag("${FLAG}" MANGLED_FLAG)
+ set(OLD_CMAKE_REQUIRED_FLAGS "${CMAKE_REQUIRED_FLAGS}")
+ set(CMAKE_REQUIRED_FLAGS "${CMAKE_REQUIRED_FLAGS} ${FLAG}")
+ check_cxx_compiler_flag("${FLAG}" ${MANGLED_FLAG})
+ set(CMAKE_REQUIRED_FLAGS "${OLD_CMAKE_REQUIRED_FLAGS}")
+ if(${MANGLED_FLAG})
+ set(VARIANT ${ARGV1})
+ if(ARGV1)
+ string(TOUPPER "_${VARIANT}" VARIANT)
+ endif()
+ set(CMAKE_CXX_FLAGS${VARIANT} "${CMAKE_CXX_FLAGS${VARIANT}} ${FLAG}" PARENT_SCOPE)
+ set(CMAKE_EXE_LINKER_FLAGS "${CMAKE_EXE_LINKER_FLAGS} ${FLAG}" PARENT_SCOPE)
+ set(CMAKE_SHARED_LINKER_FLAGS "${CMAKE_SHARED_LINKER_FLAGS} ${FLAG}" PARENT_SCOPE)
+ set(CMAKE_MODULE_LINKER_FLAGS "${CMAKE_MODULE_LINKER_FLAGS} ${FLAG}" PARENT_SCOPE)
+ set(CMAKE_REQUIRED_FLAGS "${CMAKE_REQUIRED_FLAGS} ${FLAG}" PARENT_SCOPE)
+ else()
+ message(FATAL_ERROR "Required flag '${FLAG}' is not supported by the compiler")
+ endif()
+endfunction()
diff --git a/extension/llm/custom_ops/spinquant/third-party/FFHT/external/benchmark/cmake/CXXFeatureCheck.cmake b/extension/llm/custom_ops/spinquant/third-party/FFHT/external/benchmark/cmake/CXXFeatureCheck.cmake
new file mode 100644
index 00000000000..2c4460f0e30
--- /dev/null
+++ b/extension/llm/custom_ops/spinquant/third-party/FFHT/external/benchmark/cmake/CXXFeatureCheck.cmake
@@ -0,0 +1,46 @@
+# - Compile and run code to check for C++ features
+#
+# This functions compiles a source file under the `cmake` folder
+# and adds the corresponding `HAVE_[FILENAME]` flag to the CMake
+# environment
+#
+# cxx_feature_check( [])
+#
+# - Example
+#
+# include(CXXFeatureCheck)
+# cxx_feature_check(STD_REGEX)
+# Requires CMake 2.8.12+
+
+if(__cxx_feature_check)
+ return()
+endif()
+set(__cxx_feature_check INCLUDED)
+
+function(cxx_feature_check FILE)
+ string(TOLOWER ${FILE} FILE)
+ string(TOUPPER ${FILE} VAR)
+ string(TOUPPER "HAVE_${VAR}" FEATURE)
+ if (DEFINED HAVE_${VAR})
+ set(HAVE_${VAR} 1 CACHE INTERNAL "Feature test for ${FILE}" PARENT_SCOPE)
+ add_definitions(-DHAVE_${VAR})
+ return()
+ endif()
+ message("-- Performing Test ${FEATURE}")
+ try_run(RUN_${FEATURE} COMPILE_${FEATURE}
+ ${CMAKE_BINARY_DIR} ${CMAKE_CURRENT_SOURCE_DIR}/cmake/${FILE}.cpp
+ CMAKE_FLAGS ${BENCHMARK_CXX_LINKER_FLAGS}
+ LINK_LIBRARIES ${BENCHMARK_CXX_LIBRARIES})
+ if(RUN_${FEATURE} EQUAL 0)
+ message("-- Performing Test ${FEATURE} -- success")
+ set(HAVE_${VAR} 1 CACHE INTERNAL "Feature test for ${FILE}" PARENT_SCOPE)
+ add_definitions(-DHAVE_${VAR})
+ else()
+ if(NOT COMPILE_${FEATURE})
+ message("-- Performing Test ${FEATURE} -- failed to compile")
+ else()
+ message("-- Performing Test ${FEATURE} -- compiled but failed to run")
+ endif()
+ endif()
+endfunction()
+
diff --git a/extension/llm/custom_ops/spinquant/third-party/FFHT/external/benchmark/cmake/Config.cmake.in b/extension/llm/custom_ops/spinquant/third-party/FFHT/external/benchmark/cmake/Config.cmake.in
new file mode 100644
index 00000000000..6e9256eea8a
--- /dev/null
+++ b/extension/llm/custom_ops/spinquant/third-party/FFHT/external/benchmark/cmake/Config.cmake.in
@@ -0,0 +1 @@
+include("${CMAKE_CURRENT_LIST_DIR}/@targets_export_name@.cmake")
diff --git a/extension/llm/custom_ops/spinquant/third-party/FFHT/external/benchmark/cmake/GetGitVersion.cmake b/extension/llm/custom_ops/spinquant/third-party/FFHT/external/benchmark/cmake/GetGitVersion.cmake
new file mode 100644
index 00000000000..8dd94800459
--- /dev/null
+++ b/extension/llm/custom_ops/spinquant/third-party/FFHT/external/benchmark/cmake/GetGitVersion.cmake
@@ -0,0 +1,51 @@
+# - Returns a version string from Git tags
+#
+# This function inspects the annotated git tags for the project and returns a string
+# into a CMake variable
+#
+# get_git_version()
+#
+# - Example
+#
+# include(GetGitVersion)
+# get_git_version(GIT_VERSION)
+#
+# Requires CMake 2.8.11+
+find_package(Git)
+
+if(__get_git_version)
+ return()
+endif()
+set(__get_git_version INCLUDED)
+
+function(get_git_version var)
+ if(GIT_EXECUTABLE)
+ execute_process(COMMAND ${GIT_EXECUTABLE} describe --match "v[0-9]*.[0-9]*.[0-9]*" --abbrev=8
+ RESULT_VARIABLE status
+ OUTPUT_VARIABLE GIT_VERSION
+ ERROR_QUIET)
+ if(${status})
+ set(GIT_VERSION "v0.0.0")
+ else()
+ string(STRIP ${GIT_VERSION} GIT_VERSION)
+ string(REGEX REPLACE "-[0-9]+-g" "-" GIT_VERSION ${GIT_VERSION})
+ endif()
+
+ # Work out if the repository is dirty
+ execute_process(COMMAND ${GIT_EXECUTABLE} update-index -q --refresh
+ OUTPUT_QUIET
+ ERROR_QUIET)
+ execute_process(COMMAND ${GIT_EXECUTABLE} diff-index --name-only HEAD --
+ OUTPUT_VARIABLE GIT_DIFF_INDEX
+ ERROR_QUIET)
+ string(COMPARE NOTEQUAL "${GIT_DIFF_INDEX}" "" GIT_DIRTY)
+ if (${GIT_DIRTY})
+ set(GIT_VERSION "${GIT_VERSION}-dirty")
+ endif()
+ else()
+ set(GIT_VERSION "v0.0.0")
+ endif()
+
+ message("-- git Version: ${GIT_VERSION}")
+ set(${var} ${GIT_VERSION} PARENT_SCOPE)
+endfunction()
diff --git a/extension/llm/custom_ops/spinquant/third-party/FFHT/external/benchmark/cmake/gnu_posix_regex.cpp b/extension/llm/custom_ops/spinquant/third-party/FFHT/external/benchmark/cmake/gnu_posix_regex.cpp
new file mode 100644
index 00000000000..b5b91cdab7c
--- /dev/null
+++ b/extension/llm/custom_ops/spinquant/third-party/FFHT/external/benchmark/cmake/gnu_posix_regex.cpp
@@ -0,0 +1,12 @@
+#include
+#include
+int main() {
+ std::string str = "test0159";
+ regex_t re;
+ int ec = regcomp(&re, "^[a-z]+[0-9]+$", REG_EXTENDED | REG_NOSUB);
+ if (ec != 0) {
+ return ec;
+ }
+ return regexec(&re, str.c_str(), 0, nullptr, 0) ? -1 : 0;
+}
+
diff --git a/extension/llm/custom_ops/spinquant/third-party/FFHT/external/benchmark/cmake/posix_regex.cpp b/extension/llm/custom_ops/spinquant/third-party/FFHT/external/benchmark/cmake/posix_regex.cpp
new file mode 100644
index 00000000000..466dc62560a
--- /dev/null
+++ b/extension/llm/custom_ops/spinquant/third-party/FFHT/external/benchmark/cmake/posix_regex.cpp
@@ -0,0 +1,14 @@
+#include
+#include
+int main() {
+ std::string str = "test0159";
+ regex_t re;
+ int ec = regcomp(&re, "^[a-z]+[0-9]+$", REG_EXTENDED | REG_NOSUB);
+ if (ec != 0) {
+ return ec;
+ }
+ int ret = regexec(&re, str.c_str(), 0, nullptr, 0) ? -1 : 0;
+ regfree(&re);
+ return ret;
+}
+
diff --git a/extension/llm/custom_ops/spinquant/third-party/FFHT/external/benchmark/cmake/std_regex.cpp b/extension/llm/custom_ops/spinquant/third-party/FFHT/external/benchmark/cmake/std_regex.cpp
new file mode 100644
index 00000000000..696f2a26bce
--- /dev/null
+++ b/extension/llm/custom_ops/spinquant/third-party/FFHT/external/benchmark/cmake/std_regex.cpp
@@ -0,0 +1,10 @@
+#include
+#include
+int main() {
+ const std::string str = "test0159";
+ std::regex re;
+ re = std::regex("^[a-z]+[0-9]+$",
+ std::regex_constants::extended | std::regex_constants::nosubs);
+ return std::regex_search(str, re) ? 0 : -1;
+}
+
diff --git a/extension/llm/custom_ops/spinquant/third-party/FFHT/external/benchmark/cmake/steady_clock.cpp b/extension/llm/custom_ops/spinquant/third-party/FFHT/external/benchmark/cmake/steady_clock.cpp
new file mode 100644
index 00000000000..66d50d17e9e
--- /dev/null
+++ b/extension/llm/custom_ops/spinquant/third-party/FFHT/external/benchmark/cmake/steady_clock.cpp
@@ -0,0 +1,7 @@
+#include
+
+int main() {
+ typedef std::chrono::steady_clock Clock;
+ Clock::time_point tp = Clock::now();
+ ((void)tp);
+}
diff --git a/extension/llm/custom_ops/spinquant/third-party/FFHT/external/benchmark/cmake/thread_safety_attributes.cpp b/extension/llm/custom_ops/spinquant/third-party/FFHT/external/benchmark/cmake/thread_safety_attributes.cpp
new file mode 100644
index 00000000000..46161babdb1
--- /dev/null
+++ b/extension/llm/custom_ops/spinquant/third-party/FFHT/external/benchmark/cmake/thread_safety_attributes.cpp
@@ -0,0 +1,4 @@
+#define HAVE_THREAD_SAFETY_ATTRIBUTES
+#include "../src/mutex.h"
+
+int main() {}
diff --git a/extension/llm/custom_ops/spinquant/third-party/FFHT/external/benchmark/docs/tools.md b/extension/llm/custom_ops/spinquant/third-party/FFHT/external/benchmark/docs/tools.md
new file mode 100644
index 00000000000..f176f74a48f
--- /dev/null
+++ b/extension/llm/custom_ops/spinquant/third-party/FFHT/external/benchmark/docs/tools.md
@@ -0,0 +1,59 @@
+# Benchmark Tools
+
+## compare_bench.py
+
+The `compare_bench.py` utility which can be used to compare the result of benchmarks.
+The program is invoked like:
+
+``` bash
+$ compare_bench.py [benchmark options]...
+```
+
+Where `` and `` either specify a benchmark executable file, or a JSON output file. The type of the input file is automatically detected. If a benchmark executable is specified then the benchmark is run to obtain the results. Otherwise the results are simply loaded from the output file.
+
+The sample output using the JSON test files under `Inputs/` gives:
+
+``` bash
+$ ./compare_bench.py ./gbench/Inputs/test1_run1.json ./gbench/Inputs/test1_run2.json
+Comparing ./gbench/Inputs/test1_run1.json to ./gbench/Inputs/test1_run2.json
+Benchmark Time CPU
+----------------------------------------------
+BM_SameTimes +0.00 +0.00
+BM_2xFaster -0.50 -0.50
+BM_2xSlower +1.00 +1.00
+BM_10PercentFaster -0.10 -0.10
+BM_10PercentSlower +0.10 +0.10
+```
+
+When a benchmark executable is run, the raw output from the benchmark is printed in real time to stdout. The sample output using `benchmark/basic_test` for both arguments looks like:
+
+```
+./compare_bench.py test/basic_test test/basic_test --benchmark_filter=BM_empty.*
+RUNNING: test/basic_test --benchmark_filter=BM_empty.*
+Run on (4 X 4228.32 MHz CPU s)
+2016-08-02 19:21:33
+Benchmark Time CPU Iterations
+--------------------------------------------------------------------
+BM_empty 9 ns 9 ns 79545455
+BM_empty/threads:4 4 ns 9 ns 75268816
+BM_empty_stop_start 8 ns 8 ns 83333333
+BM_empty_stop_start/threads:4 3 ns 8 ns 83333332
+RUNNING: test/basic_test --benchmark_filter=BM_empty.*
+Run on (4 X 4228.32 MHz CPU s)
+2016-08-02 19:21:35
+Benchmark Time CPU Iterations
+--------------------------------------------------------------------
+BM_empty 9 ns 9 ns 76086957
+BM_empty/threads:4 4 ns 9 ns 76086956
+BM_empty_stop_start 8 ns 8 ns 87500000
+BM_empty_stop_start/threads:4 3 ns 8 ns 88607596
+Comparing test/basic_test to test/basic_test
+Benchmark Time CPU
+---------------------------------------------------------
+BM_empty +0.00 +0.00
+BM_empty/threads:4 +0.00 +0.00
+BM_empty_stop_start +0.00 +0.00
+BM_empty_stop_start/threads:4 +0.00 +0.00
+```
+
+Obviously this example doesn't give any useful output, but it's intended to show the output format when 'compare_bench.py' needs to run benchmarks.
diff --git a/extension/llm/custom_ops/spinquant/third-party/FFHT/external/benchmark/include/benchmark/benchmark.h b/extension/llm/custom_ops/spinquant/third-party/FFHT/external/benchmark/include/benchmark/benchmark.h
new file mode 100644
index 00000000000..bd3b0ffb4cb
--- /dev/null
+++ b/extension/llm/custom_ops/spinquant/third-party/FFHT/external/benchmark/include/benchmark/benchmark.h
@@ -0,0 +1,1210 @@
+// Copyright 2015 Google Inc. All rights reserved.
+//
+// Licensed under the Apache License, Version 2.0 (the "License");
+// you may not use this file except in compliance with the License.
+// You may obtain a copy of the License at
+//
+// http://www.apache.org/licenses/LICENSE-2.0
+//
+// Unless required by applicable law or agreed to in writing, software
+// distributed under the License is distributed on an "AS IS" BASIS,
+// WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+// See the License for the specific language governing permissions and
+// limitations under the License.
+
+// Support for registering benchmarks for functions.
+
+/* Example usage:
+// Define a function that executes the code to be measured a
+// specified number of times:
+static void BM_StringCreation(benchmark::State& state) {
+ while (state.KeepRunning())
+ std::string empty_string;
+}
+
+// Register the function as a benchmark
+BENCHMARK(BM_StringCreation);
+
+// Define another benchmark
+static void BM_StringCopy(benchmark::State& state) {
+ std::string x = "hello";
+ while (state.KeepRunning())
+ std::string copy(x);
+}
+BENCHMARK(BM_StringCopy);
+
+// Augment the main() program to invoke benchmarks if specified
+// via the --benchmarks command line flag. E.g.,
+// my_unittest --benchmark_filter=all
+// my_unittest --benchmark_filter=BM_StringCreation
+// my_unittest --benchmark_filter=String
+// my_unittest --benchmark_filter='Copy|Creation'
+int main(int argc, char** argv) {
+ benchmark::Initialize(&argc, argv);
+ benchmark::RunSpecifiedBenchmarks();
+ return 0;
+}
+
+// Sometimes a family of microbenchmarks can be implemented with
+// just one routine that takes an extra argument to specify which
+// one of the family of benchmarks to run. For example, the following
+// code defines a family of microbenchmarks for measuring the speed
+// of memcpy() calls of different lengths:
+
+static void BM_memcpy(benchmark::State& state) {
+ char* src = new char[state.range(0)]; char* dst = new char[state.range(0)];
+ memset(src, 'x', state.range(0));
+ while (state.KeepRunning())
+ memcpy(dst, src, state.range(0));
+ state.SetBytesProcessed(int64_t(state.iterations()) *
+ int64_t(state.range(0)));
+ delete[] src; delete[] dst;
+}
+BENCHMARK(BM_memcpy)->Arg(8)->Arg(64)->Arg(512)->Arg(1<<10)->Arg(8<<10);
+
+// The preceding code is quite repetitive, and can be replaced with the
+// following short-hand. The following invocation will pick a few
+// appropriate arguments in the specified range and will generate a
+// microbenchmark for each such argument.
+BENCHMARK(BM_memcpy)->Range(8, 8<<10);
+
+// You might have a microbenchmark that depends on two inputs. For
+// example, the following code defines a family of microbenchmarks for
+// measuring the speed of set insertion.
+static void BM_SetInsert(benchmark::State& state) {
+ while (state.KeepRunning()) {
+ state.PauseTiming();
+ set data = ConstructRandomSet(state.range(0));
+ state.ResumeTiming();
+ for (int j = 0; j < state.range(1); ++j)
+ data.insert(RandomNumber());
+ }
+}
+BENCHMARK(BM_SetInsert)
+ ->Args({1<<10, 1})
+ ->Args({1<<10, 8})
+ ->Args({1<<10, 64})
+ ->Args({1<<10, 512})
+ ->Args({8<<10, 1})
+ ->Args({8<<10, 8})
+ ->Args({8<<10, 64})
+ ->Args({8<<10, 512});
+
+// The preceding code is quite repetitive, and can be replaced with
+// the following short-hand. The following macro will pick a few
+// appropriate arguments in the product of the two specified ranges
+// and will generate a microbenchmark for each such pair.
+BENCHMARK(BM_SetInsert)->Ranges({{1<<10, 8<<10}, {1, 512}});
+
+// For more complex patterns of inputs, passing a custom function
+// to Apply allows programmatic specification of an
+// arbitrary set of arguments to run the microbenchmark on.
+// The following example enumerates a dense range on
+// one parameter, and a sparse range on the second.
+static void CustomArguments(benchmark::internal::Benchmark* b) {
+ for (int i = 0; i <= 10; ++i)
+ for (int j = 32; j <= 1024*1024; j *= 8)
+ b->Args({i, j});
+}
+BENCHMARK(BM_SetInsert)->Apply(CustomArguments);
+
+// Templated microbenchmarks work the same way:
+// Produce then consume 'size' messages 'iters' times
+// Measures throughput in the absence of multiprogramming.
+template int BM_Sequential(benchmark::State& state) {
+ Q q;
+ typename Q::value_type v;
+ while (state.KeepRunning()) {
+ for (int i = state.range(0); i--; )
+ q.push(v);
+ for (int e = state.range(0); e--; )
+ q.Wait(&v);
+ }
+ // actually messages, not bytes:
+ state.SetBytesProcessed(
+ static_cast(state.iterations())*state.range(0));
+}
+BENCHMARK_TEMPLATE(BM_Sequential, WaitQueue)->Range(1<<0, 1<<10);
+
+Use `Benchmark::MinTime(double t)` to set the minimum time used to run the
+benchmark. This option overrides the `benchmark_min_time` flag.
+
+void BM_test(benchmark::State& state) {
+ ... body ...
+}
+BENCHMARK(BM_test)->MinTime(2.0); // Run for at least 2 seconds.
+
+In a multithreaded test, it is guaranteed that none of the threads will start
+until all have called KeepRunning, and all will have finished before KeepRunning
+returns false. As such, any global setup or teardown you want to do can be
+wrapped in a check against the thread index:
+
+static void BM_MultiThreaded(benchmark::State& state) {
+ if (state.thread_index == 0) {
+ // Setup code here.
+ }
+ while (state.KeepRunning()) {
+ // Run the test as normal.
+ }
+ if (state.thread_index == 0) {
+ // Teardown code here.
+ }
+}
+BENCHMARK(BM_MultiThreaded)->Threads(4);
+
+
+If a benchmark runs a few milliseconds it may be hard to visually compare the
+measured times, since the output data is given in nanoseconds per default. In
+order to manually set the time unit, you can specify it manually:
+
+BENCHMARK(BM_test)->Unit(benchmark::kMillisecond);
+*/
+
+#ifndef BENCHMARK_BENCHMARK_H_
+#define BENCHMARK_BENCHMARK_H_
+
+
+#if __cplusplus >= 201103L
+#define BENCHMARK_HAS_CXX11
+#endif
+
+#include
+
+#include
+#include
+#include
+#include
+#include
+#include