Add FP1.3.0 quantization results and images

cpldcpu · cpldcpu · commit a1263bec81ae · 2024-05-08T18:13:53.000+02:00
diff --git a/BitNetMCU.py b/BitNetMCU.py
@@ -52,6 +52,7 @@ class BitLinear(nn.Linear):
     - BinaryBalanced : 1 bit, weights are balanced around zero
     - 2bitsym        : 2 bit symmetric
     - 4bitsym        : 4 bit symmetric
+    - FP130          : 4 bit shift encoding
     - 8bit           : 8 bit
 
     Normalization Types:
@@ -157,7 +158,7 @@ def weight_quant(self, w):
         elif self.QuantType == '4bitsym':
             scale = 2.0 / mag # 2.0 for tensor, 6.5 for output
             u = ((w * scale - 0.5).round().clamp_(-8, 7) + 0.5) / scale        
-        elif self.QuantType ==  '4bitshift': # encoding (F1.3.0) : S * ( 2^E3 + 1) -> min 2^0 = 1, max 2^7 = 127
+        elif self.QuantType ==  'FP130': # encoding (F1.3.0) : S * ( 2^E3 + 1) -> min 2^0 = 1, max 2^7 = 127
             scale = 16.0 / mag
             e = ((w * scale).abs()).log2().floor().clamp_(0, 7)
             u = w.sign()*(e.exp2()) / scale            
@@ -266,7 +267,7 @@ def quantize(self,model):
                     scale = 2.0 / mag # 2.0 for tensor, 6.5 for output
                     u = ((w * scale - 0.5).round().clamp_(-8, 7) + 0.5) 
                     bpw = 4
-                elif QuantType ==  '4bitshift': 
+                elif QuantType ==  'FP130': 
                     scale = 16.0 / mag 
                     e = ((w * scale ).abs()).log2().floor().clamp_(0, 7)
                     u = w.sign()*(e.exp2() )    
diff --git a/docs/documentation.md b/docs/documentation.md
@@ -538,6 +538,18 @@ By simplifying the model architecture and using a full-custom implementation, I
 
 While this project focused on MNIST inference as a test case, I plan to apply this approach to other applications in the future.
 
+# Addendum: FP1.3.0 Quantization
+
+<div align="center">
+    <img src="first_layer_weights_fp130.png" width="60%">
+</div>
+
+<div align="center">
+    <img src="fp130_export.png" width="80%">
+</div>
+
+TODO
+
 # References
 
 References and further reading:
diff --git a/docs/first_layer_weights_fp130.png b/docs/first_layer_weights_fp130.png
diff --git a/docs/fp130_export.png b/docs/fp130_export.png
diff --git a/exportquant.py b/exportquant.py
@@ -78,7 +78,7 @@ def export_to_hfile(quantized_model, filename, runname):
             elif quantization_type == '4bitsym': 
                 encoded_weights = ((weights < 0).astype(int) << 3) | (np.floor(np.abs(weights))).astype(int)  # use bitwise operations to encode the weights
                 QuantID = 4
-            elif quantization_type == '4bitshift': # FP1.3.0 encoding (sign * 2^exp)
+            elif quantization_type == 'FP130': # FP1.3.0 encoding (sign * 2^exp)
                 encoded_weights = ((weights < 0).astype(int) << 3) | (np.floor(np.log2(np.abs(weights)))).astype(int)  
                 QuantID = 16 + 4
             else:
diff --git a/readme.md b/readme.md
@@ -1,6 +1,6 @@
 # BitNetMCU: High Accuracy Low-Bit Quantized Neural Networks on a low-end Microcontroller
 
-**BitNetMCU** is a project focused on the training and inference of low-bit quantized neural networks, specifically designed to run efficiently on low-end RISC-V microcontrollers like the CH32V003. Quantization aware training (QAT) and finetuning of model structure and inference code allowed *surpassing 99% Test accuracy on a 16x16 MNIST dataset without using multiplication instructions and in only 2kb of RAM and 16kb of Flash*.
+**BitNetMCU** is a project focused on the training and inference of low-bit quantized neural networks, specifically designed to run efficiently on low-end microcontrollers like the CH32V003. Quantization aware training (QAT) and fine-tuning of model structure and inference code allowed *surpassing 99% Test accuracy on a 16x16 MNIST dataset without using multiplication instructions and in only 2kb of RAM and 16kb of Flash*.
 
 The training pipeline is based on PyTorch and should run anywhere. The inference engine is implemented in Ansi-C and can be easily ported to any Microcontroller.
 
@@ -37,13 +37,18 @@ The data pipeline is split into several Python scripts for flexibility:
 
 1. **Configuration**: Modify `trainingparameters.yaml` to set all hyperparameters for training the model.
 
-2. **Training the Model**: The `training.py` script is used to train the model and store it as a `.pth` file in the `modeldata/` folder. The model weights are still in float format at this stage, as they are quantized on-the-fly during training.
+2. **Training the Model**: The `training.py` script is used to train the model and store the weights as a `.pth` file in the `modeldata/` folder. The model weights are still in float format at this stage, as they are quantized on-the-fly during training.
 
-2. **Exporting the Quantized Model**: The `exportquant.py` script is used to convert the model into a quantized format. The quantized model is exported to the C header file `BitNetMCU_model.h`.
+2. **Exporting the Quantized Model**: The `exportquant.py` script is used to convert the model into a quantized format. The quantized model weights are exported to the C header file `BitNetMCU_model.h`.
 
 3. **Optional: Testing the C-Model**: Compile and execute `BitNetMCU_MNIST_test.c` to test inference of ten digits. The model data is included from `BitNetMCU_MNIST_test_data.h`, and the test data is included from the `BitNetMCU_MNIST_test_data.h` file. 
 
 4. **Optional: Verification C vs Python Model on full dataset**: The inference code, along with the model data, is compiled into a DLL. The `test-inference.py` script calls the DLL and compares the results with the original Python model. This allows for an accurate comparison to the entire MNIST test data set of 10,000 images.
 
-5. **Optional: Testing inference on the MCU**: follow the instructions in  `mcu/readme.md`. Porting to architectures other than CH32V003 is straighfoward and the files in the `mcu` directory can serve as a reference
+5. **Optional: Testing inference on the MCU**: follow the instructions in  `mcu/readme.md`. Porting to architectures other than CH32V003 is straightforward and the files in the `mcu` directory can serve as a reference.
 
+## Updates
+
+- 24th April 2024 - First release with Binary, Ternary, 2 bit, 4 bit and 8 bit quantization. 
+- 2nd May 2024 - [tagged version 0.1a](https://github.com/cpldcpu/BitNetMCU/tree/0.1a)
+- 8th May 2024 - Added FP1.3.0 Quantization to allow fully multiplication-free inference with 98.9% accuracy.