Skip to content

Commit 650c4f4

Browse files
Merge pull request #4 from MartaAndronic/assemble
neuralut-assemble
2 parents 610c8a2 + c4dcdec commit 650c4f4

File tree

23 files changed

+5091
-435
lines changed

23 files changed

+5091
-435
lines changed

README.md

Lines changed: 43 additions & 20 deletions
Original file line numberDiff line numberDiff line change
@@ -1,18 +1,33 @@
1-
# NeuraLUT: Hiding Neural Network Density in Boolean Synthesizable Functions
1+
# NeuraLUT-Assemble: Hardware-aware Assembling of Sub-Neural Networks for Efficient LUT Inference
22

3-
[![DOI](https://img.shields.io/badge/DOI-10.1109/FPL64840.2024.00028-orange)](https://doi.org/10.1109/FPL64840.2024.00028)
4-
[![arXiv](https://img.shields.io/badge/arXiv-2403.00849-b31b1b.svg?style=flat)](https://arxiv.org/abs/2403.00849)
3+
[![DOI](https://img.shields.io/badge/DOI-10.1109/FCCM62733.2025.00077-orange)](https://doi.org/10.1109/FCCM62733.2025.00077)
4+
[![arXiv](https://img.shields.io/badge/arXiv-2504.00592-b31b1b.svg?style=flat)](https://arxiv.org/abs/2504.00592)
55

66
<p align="left">
77
<img src="logo.png" width="500" alt="NeuraLUT Logo">
88
</p>
99

10-
NeuraLUT is the first quantized neural network training methodology that maps dense and full-precision sub-networks with skip-connections to LUTs to leverage the underlying structure of the FPGA architecture.
11-
> _Built on top of [LogicNets](https://github.com/Xilinx/logicnets), NeuraLUT introduces new architecture designs, optimized training flows, and innovative sparsity handling._
10+
NeuraLUT-Assemble (FCCM'25) extends our prior work by assembling multiple NeuraLUT neurons into tree structures with larger fan-in.
11+
- The hardware-aware assembling strategy groups connections at the input of these tree structures, guided by our hardware-aware pruning method.
12+
- This design achieves better trade-offs in LUT utilization, latency, and accuracy compared to the original NeuraLUT framework.
13+
14+
## This project builds on two earlier works
15+
16+
| NeuraLUT — [release v1.0.0](https://github.com/MartaAndronic/NeuraLUT/releases/tag/v1.0.0) | PolyLUT - Hardware-aware Structured Pruning |
17+
| --- | --- |
18+
| [![DOI](https://img.shields.io/badge/DOI-10.1109/FPL64840.2024.00028-orange)](https://doi.org/10.1109/FPL64840.2024.00028) | [![DOI](https://img.shields.io/badge/DOI-10.1109/TC.2025.3586311-orange)](https://doi.org/10.1109/TC.2025.3586311)|
19+
1220
---
1321

1422
#### ✨ New! ReducedLUT branch available for advanced compression using don't-cares (see below).
1523

24+
---
25+
#### 📓 New! Demo Notebooks
26+
27+
We include demo notebooks in each subfolder inside the `datasets/` directory to help you get started quickly and as an exercise.
28+
29+
**Pretrained checkpoints** are also provided in the `test_demo/` folder so you can skip training.
30+
>These checkpoints are not the exact ones used in the paper but are provided for convenience and practice.
1631
---
1732

1833
## 🚀 Features
@@ -95,24 +110,21 @@ We released a dedicated [ReducedLUT branch](https://github.com/MartaAndronic/Neu
95110

96111
---
97112

98-
## 🧬 What's New in NeuraLUT vs LogicNets?
99-
100-
| Feature | LogicNets | NeuraLUT |
101-
|--------|-----------|-----------|
102-
| **Dataset Support** | Jet Substructure | Jet Substructure, MNIST |
103-
| **Training Flow** | Weight mask for sparsity | FeatureMask for input channel control |
104-
| **Forward Function** | Basic FC layers | Multiple FCs + Skip Connections |
105-
| **Experiment Logging** | TensorBoard | Weights & Biases |
106-
| **GPU Integration** |||
107-
| **Neuron Enumeration** | Basic LUT inference | Batched truth table gen |
108-
| **Architecture Customization** | Limited | Novel model designs described in paper |
109-
110-
---
111-
112113
## 📚 Citation
113114

114-
#### If this repo contributes to your research or FPGA design, please cite our NeuraLUT paper:
115+
#### If this repo contributes to your research or FPGA design, please cite our papers:
115116

117+
```bibtex
118+
@inproceedings{andronic2025neuralut-assemble,
119+
author = "Andronic, Marta and Constantinides, George A.",
120+
title = "{NeuraLUT-Assemble: Hardware-Aware Assembling of Sub-Neural Networks for Efficient LUT Inference}",
121+
booktitle = "{2025 IEEE 33rd Annual International Symposium on Field-Programmable Custom Computing Machines (FCCM)}",
122+
pages = "208-216",
123+
publisher = "IEEE",
124+
year = 2025,
125+
note = "doi: 10.1109/FCCM62733.2025.00077"
126+
}
127+
```
116128
```bibtex
117129
@inproceedings{andronic2024neuralut,
118130
author = "Andronic, Marta and Constantinides, George A.",
@@ -124,6 +136,17 @@ We released a dedicated [ReducedLUT branch](https://github.com/MartaAndronic/Neu
124136
note = "doi: 10.1109/FPL64840.2024.00028"
125137
}
126138
```
139+
```bibtex
140+
@inproceedings{andronic2024neuralut,
141+
author = "Andronic, Marta and Constantinides, George A.",
142+
title = "{PolyLUT: Ultra-Low Latency Polynomial Inference With Hardware-Aware Structured Pruning}",
143+
booktitle = "{IEEE Transactions on Computers}",
144+
pages = "3181-3194",
145+
publisher = "IEEE",
146+
year = 2025,
147+
note = "doi: 10.1109/TC.2025.3586311"
148+
}
149+
```
127150
#### If ReducedLUT contributes to your research please also cite:
128151
```bibtex
129152
@inproceedings{reducedlut,
Lines changed: 65 additions & 9 deletions
Original file line numberDiff line numberDiff line change
@@ -1,27 +1,72 @@
1-
## NeuraLUT on the jet substructure tagging dataset
1+
## NeuraLUT-Assemble on the jet substructure tagging dataset (CERNBox)
22

3-
To reproduce the results in our paper follow the steps below. Subsequently, compile the Verilog files using the following settings (utilize Vivado 2020.1, target the xcvu9p-flgb2104-2-i FPGA part, use the Vivado Flow_PerfOptimized_high settings, and perform synthesis in the Out-of-Context (OOC) mode).
3+
This folder provides the code and resources to reproduce our NeuraLUT-Assemble results on the CERNBox jet substructure tagging dataset.
44

5-
## Download dataset
5+
We also include a pretrained checkpoint in the test_demo folder so you can skip training and go straight to evaluation and hardware generation.
6+
>These checkpoints are not the exact ones used in the paper but are provided for convenience and practice.
7+
8+
## Download JSC dataset from CERNBox
69
Navigate to the jet_substructure directory.
710
```
811
mkdir -p data
912
wget https://cernbox.cern.ch/index.php/s/jvFd5MoWhGs1l5v/download -O data/processed-pythia82-lhc13-all-pt1-50k-r1_h022_e0175_t220_nonu_truth.z
1013
```
1114

15+
### 📓 Demo Notebook
16+
For a quick and interactive overview, check out demo.ipynb.
17+
18+
This notebook:
19+
20+
* Loads the pretrained checkpoint
21+
* Verifies the test accuracy
22+
* Generates the truth tables
23+
* Runs a software simulation on the truth tables to validate accuracy
24+
25+
Generates Verilog files (⚠️ Note: only software simulation is performed in the notebook)
26+
27+
For full hardware simulation and Verilog compilation, please use neq2lut.py as shown below.
28+
29+
### 🚀 Quickstart
30+
31+
To reproduce the full results, including hardware simulation with Verilator, follow these steps:
32+
33+
1. Train the Model (optional)
1234
```
13-
python train.py --arch jsc-2l --log_dir jsc-2l --cuda
14-
python neq2lut.py --arch jsc-2l --checkpoint ./test_jsc-2l/best_accuracy.pth --log-dir ./test_jsc-2l/verilog/ --add-registers --seed 8766 --device 1 --cuda
35+
python train.py --arch jsc-cernbox --log_dir demo --cuda --device 1
1536
```
37+
2. Convert to Verilog, Simulate, and Evaluate
38+
This script:
39+
* Loads the trained checkpoint
40+
* Verifies test accuracy
41+
* Generates truth tables
42+
* Runs both software simulation and hardware simulation using Verilator
43+
* Compiles Verilog files for FPGA inference
44+
1645
```
17-
python train.py --arch jsc-5l --log_dir jsc-5l --cuda
18-
python neq2lut.py --arch jsc-5l --checkpoint ./test_jsc-5l/best_accuracy.pth --log-dir ./test_jsc-5l/verilog/ --add-registers --seed 312846 --device 1 --cuda
46+
python neq2lut.py --arch jsc-cernbox \
47+
--checkpoint ./test_demo/best_accuracy.pth \
48+
--log-dir ./test_demo/verilog/ \
49+
--add-registers \
50+
--device 1 \
51+
--imask ./test_demo/imask.pth \
52+
--cuda
1953
```
2054

2155

22-
## Citation
23-
Should you find this work valuable, we kindly request that you consider referencing our paper as below:
56+
## 📖 Citation
57+
Should you find this work valuable, we kindly request that you consider referencing our papers as below:
58+
```bibtex
59+
@inproceedings{andronic2025neuralut-assemble,
60+
author = "Andronic, Marta and Constantinides, George A.",
61+
title = "{NeuraLUT-Assemble: Hardware-Aware Assembling of Sub-Neural Networks for Efficient LUT Inference}",
62+
booktitle = "{2025 IEEE 33rd Annual International Symposium on Field-Programmable Custom Computing Machines (FCCM)}",
63+
pages = "208-216",
64+
publisher = "IEEE",
65+
year = 2025,
66+
note = "doi: 10.1109/FCCM62733.2025.00077"
67+
}
2468
```
69+
```bibtex
2570
@inproceedings{andronic2024neuralut,
2671
author = "Andronic, Marta and Constantinides, George A.",
2772
title = "{NeuraLUT: Hiding Neural Network Density in Boolean Synthesizable Functions}",
@@ -31,4 +76,15 @@ Should you find this work valuable, we kindly request that you consider referenc
3176
year = 2024,
3277
note = "doi: 10.1109/FPL64840.2024.00028"
3378
}
79+
```
80+
```bibtex
81+
@inproceedings{andronic2024neuralut,
82+
author = "Andronic, Marta and Constantinides, George A.",
83+
title = "{PolyLUT: Ultra-Low Latency Polynomial Inference With Hardware-Aware Structured Pruning}",
84+
booktitle = "{IEEE Transactions on Computers}",
85+
pages = "3181-3194",
86+
publisher = "IEEE",
87+
year = 2025,
88+
note = "doi: 10.1109/TC.2025.3586311"
89+
}
3490
```

0 commit comments

Comments
 (0)