Skip to content

Commit a1263be

Browse files
committed
Add FP1.3.0 quantization results and images
1 parent 81ec53e commit a1263be

File tree

6 files changed

+25
-7
lines changed

6 files changed

+25
-7
lines changed

BitNetMCU.py

Lines changed: 3 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -52,6 +52,7 @@ class BitLinear(nn.Linear):
5252
- BinaryBalanced : 1 bit, weights are balanced around zero
5353
- 2bitsym : 2 bit symmetric
5454
- 4bitsym : 4 bit symmetric
55+
- FP130 : 4 bit shift encoding
5556
- 8bit : 8 bit
5657
5758
Normalization Types:
@@ -157,7 +158,7 @@ def weight_quant(self, w):
157158
elif self.QuantType == '4bitsym':
158159
scale = 2.0 / mag # 2.0 for tensor, 6.5 for output
159160
u = ((w * scale - 0.5).round().clamp_(-8, 7) + 0.5) / scale
160-
elif self.QuantType == '4bitshift': # encoding (F1.3.0) : S * ( 2^E3 + 1) -> min 2^0 = 1, max 2^7 = 127
161+
elif self.QuantType == 'FP130': # encoding (F1.3.0) : S * ( 2^E3 + 1) -> min 2^0 = 1, max 2^7 = 127
161162
scale = 16.0 / mag
162163
e = ((w * scale).abs()).log2().floor().clamp_(0, 7)
163164
u = w.sign()*(e.exp2()) / scale
@@ -266,7 +267,7 @@ def quantize(self,model):
266267
scale = 2.0 / mag # 2.0 for tensor, 6.5 for output
267268
u = ((w * scale - 0.5).round().clamp_(-8, 7) + 0.5)
268269
bpw = 4
269-
elif QuantType == '4bitshift':
270+
elif QuantType == 'FP130':
270271
scale = 16.0 / mag
271272
e = ((w * scale ).abs()).log2().floor().clamp_(0, 7)
272273
u = w.sign()*(e.exp2() )

docs/documentation.md

Lines changed: 12 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -538,6 +538,18 @@ By simplifying the model architecture and using a full-custom implementation, I
538538

539539
While this project focused on MNIST inference as a test case, I plan to apply this approach to other applications in the future.
540540

541+
# Addendum: FP1.3.0 Quantization
542+
543+
<div align="center">
544+
<img src="first_layer_weights_fp130.png" width="60%">
545+
</div>
546+
547+
<div align="center">
548+
<img src="fp130_export.png" width="80%">
549+
</div>
550+
551+
TODO
552+
541553
# References
542554

543555
References and further reading:

docs/first_layer_weights_fp130.png

47.4 KB
Loading

docs/fp130_export.png

77.2 KB
Loading

exportquant.py

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -78,7 +78,7 @@ def export_to_hfile(quantized_model, filename, runname):
7878
elif quantization_type == '4bitsym':
7979
encoded_weights = ((weights < 0).astype(int) << 3) | (np.floor(np.abs(weights))).astype(int) # use bitwise operations to encode the weights
8080
QuantID = 4
81-
elif quantization_type == '4bitshift': # FP1.3.0 encoding (sign * 2^exp)
81+
elif quantization_type == 'FP130': # FP1.3.0 encoding (sign * 2^exp)
8282
encoded_weights = ((weights < 0).astype(int) << 3) | (np.floor(np.log2(np.abs(weights)))).astype(int)
8383
QuantID = 16 + 4
8484
else:

readme.md

Lines changed: 9 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -1,6 +1,6 @@
11
# BitNetMCU: High Accuracy Low-Bit Quantized Neural Networks on a low-end Microcontroller
22

3-
**BitNetMCU** is a project focused on the training and inference of low-bit quantized neural networks, specifically designed to run efficiently on low-end RISC-V microcontrollers like the CH32V003. Quantization aware training (QAT) and finetuning of model structure and inference code allowed *surpassing 99% Test accuracy on a 16x16 MNIST dataset without using multiplication instructions and in only 2kb of RAM and 16kb of Flash*.
3+
**BitNetMCU** is a project focused on the training and inference of low-bit quantized neural networks, specifically designed to run efficiently on low-end microcontrollers like the CH32V003. Quantization aware training (QAT) and fine-tuning of model structure and inference code allowed *surpassing 99% Test accuracy on a 16x16 MNIST dataset without using multiplication instructions and in only 2kb of RAM and 16kb of Flash*.
44

55
The training pipeline is based on PyTorch and should run anywhere. The inference engine is implemented in Ansi-C and can be easily ported to any Microcontroller.
66

@@ -37,13 +37,18 @@ The data pipeline is split into several Python scripts for flexibility:
3737

3838
1. **Configuration**: Modify `trainingparameters.yaml` to set all hyperparameters for training the model.
3939

40-
2. **Training the Model**: The `training.py` script is used to train the model and store it as a `.pth` file in the `modeldata/` folder. The model weights are still in float format at this stage, as they are quantized on-the-fly during training.
40+
2. **Training the Model**: The `training.py` script is used to train the model and store the weights as a `.pth` file in the `modeldata/` folder. The model weights are still in float format at this stage, as they are quantized on-the-fly during training.
4141

42-
2. **Exporting the Quantized Model**: The `exportquant.py` script is used to convert the model into a quantized format. The quantized model is exported to the C header file `BitNetMCU_model.h`.
42+
2. **Exporting the Quantized Model**: The `exportquant.py` script is used to convert the model into a quantized format. The quantized model weights are exported to the C header file `BitNetMCU_model.h`.
4343

4444
3. **Optional: Testing the C-Model**: Compile and execute `BitNetMCU_MNIST_test.c` to test inference of ten digits. The model data is included from `BitNetMCU_MNIST_test_data.h`, and the test data is included from the `BitNetMCU_MNIST_test_data.h` file.
4545

4646
4. **Optional: Verification C vs Python Model on full dataset**: The inference code, along with the model data, is compiled into a DLL. The `test-inference.py` script calls the DLL and compares the results with the original Python model. This allows for an accurate comparison to the entire MNIST test data set of 10,000 images.
4747

48-
5. **Optional: Testing inference on the MCU**: follow the instructions in `mcu/readme.md`. Porting to architectures other than CH32V003 is straighfoward and the files in the `mcu` directory can serve as a reference
48+
5. **Optional: Testing inference on the MCU**: follow the instructions in `mcu/readme.md`. Porting to architectures other than CH32V003 is straightforward and the files in the `mcu` directory can serve as a reference.
4949

50+
## Updates
51+
52+
- 24th April 2024 - First release with Binary, Ternary, 2 bit, 4 bit and 8 bit quantization.
53+
- 2nd May 2024 - [tagged version 0.1a](https://github.com/cpldcpu/BitNetMCU/tree/0.1a)
54+
- 8th May 2024 - Added FP1.3.0 Quantization to allow fully multiplication-free inference with 98.9% accuracy.

0 commit comments

Comments
 (0)