You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
<h6><b>RSR 🧮</b>: Efficient Matrix Multiplication for Accelerating Inference in Binary and Ternary Neural Networks</h6>
147
-
The codebase provides ready-to-use <b>NumPy</b>-based, <b>Torch</b>-based, and native <b>C++</b> implementations of our <i>RSR</i> and <i>RSR++</i> algorithms.
148
-
It contains codes for the inference on CPU and GPU and sample experiments on `1.58bit` models, including
149
-
<ahref="https://huggingface.co/HF1BitLLM/Llama3-8B-1.58-100B-tokens" target="_blank">[`Llama3-8B-1.58bit`]</a>, <ahref="https://huggingface.co/tiiuae/Falcon3-10B-Instruct-1.58bit" target="_blank">[`Falcon3-10B-1.58bit`]</a>, and <ahref="https://huggingface.co/tiiuae/Falcon3-3B-Instruct-1.58bit" target="_blank">[`Falcon3-3B-1.58bit`]</a>.
150
-
✨
147
+
This project aims to provide a fast and efficient approach to low-bit matrix multiplication.
148
+
The code repository implements Redundant Segment Reduction (RSR), a fast matrix multiplication algorithm designed for matrices in binary and ternary networks.
149
+
The RSR method optimizes computation efficiency by a log(n) factor, making it particularly useful for applications in low-bit deep learning and efficient inference.
150
+
The codebase provides ready-to-use <b>C++</b> and <b>NumPy</b>-based implementations, as well as <b>PyTorch</b> implementations with both <b>CPU</b> and <b>GPU</b> support, enabling scalable and optimized matrix operations in deep learning environments.
151
+
It includes sample experiments on various `1.58bit` models and LLMs.✨
0 commit comments