MartaAndronic
diff --git a/‎README.md‎
Lines changed: 43 additions & 20 deletions b/‎README.md‎
Lines changed: 43 additions & 20 deletions
diff --git a/‎datasets/jet_substructure/README.md‎
Lines changed: 65 additions & 9 deletions b/‎datasets/jet_substructure/README.md‎
Lines changed: 65 additions & 9 deletions
@@ -1,18 +1,33 @@
-# NeuraLUT: Hiding Neural Network Density in Boolean Synthesizable Functions
+# NeuraLUT-Assemble: Hardware-aware Assembling of Sub-Neural Networks for Efficient LUT Inference
 
-[![DOI](https://img.shields.io/badge/DOI-10.1109/FPL64840.2024.00028-orange)](https://doi.org/10.1109/FPL64840.2024.00028)
-[![arXiv](https://img.shields.io/badge/arXiv-2403.00849-b31b1b.svg?style=flat)](https://arxiv.org/abs/2403.00849)
+[![DOI](https://img.shields.io/badge/DOI-10.1109/FCCM62733.2025.00077-orange)](https://doi.org/10.1109/FCCM62733.2025.00077)
+[![arXiv](https://img.shields.io/badge/arXiv-2504.00592-b31b1b.svg?style=flat)](https://arxiv.org/abs/2504.00592)
 
 <p align="left">
   <img src="logo.png" width="500" alt="NeuraLUT Logo">
 </p>
 
-NeuraLUT is the first quantized neural network training methodology that maps dense and full-precision sub-networks with skip-connections to LUTs to leverage the underlying structure of the FPGA architecture.
-> _Built on top of [LogicNets](https://github.com/Xilinx/logicnets), NeuraLUT introduces new architecture designs, optimized training flows, and innovative sparsity handling._
+NeuraLUT-Assemble (FCCM'25) extends our prior work by assembling multiple NeuraLUT neurons into tree structures with larger fan-in.
+- The hardware-aware assembling strategy groups connections at the input of these tree structures, guided by our hardware-aware pruning method.
+- This design achieves better trade-offs in LUT utilization, latency, and accuracy compared to the original NeuraLUT framework.
+
+## This project builds on two earlier works
+
+| NeuraLUT — [release v1.0.0](https://github.com/MartaAndronic/NeuraLUT/releases/tag/v1.0.0) | PolyLUT - Hardware-aware Structured Pruning |
+| --- | --- |
+| [![DOI](https://img.shields.io/badge/DOI-10.1109/FPL64840.2024.00028-orange)](https://doi.org/10.1109/FPL64840.2024.00028) | [![DOI](https://img.shields.io/badge/DOI-10.1109/TC.2025.3586311-orange)](https://doi.org/10.1109/TC.2025.3586311)|
+
 ---
 
 #### ✨ New! ReducedLUT branch available for advanced compression using don't-cares (see below).
 
+---
+#### 📓 New! Demo Notebooks
+
+We include demo notebooks in each subfolder inside the `datasets/` directory to help you get started quickly and as an exercise.  
+
+**Pretrained checkpoints** are also provided in the `test_demo/` folder so you can skip training.
+>These checkpoints are not the exact ones used in the paper but are provided for convenience and practice.  
 ---
 
 ## 🚀 Features
@@ -95,24 +110,21 @@ We released a dedicated [ReducedLUT branch](https://github.com/MartaAndronic/Neu
 
 ---
 
-## 🧬 What's New in NeuraLUT vs LogicNets?
-
-| Feature | LogicNets | NeuraLUT |
-|--------|-----------|-----------|
-| **Dataset Support** | Jet Substructure | Jet Substructure, MNIST |
-| **Training Flow** | Weight mask for sparsity | FeatureMask for input channel control |
-| **Forward Function** | Basic FC layers | Multiple FCs + Skip Connections |
-| **Experiment Logging** | TensorBoard | Weights & Biases |
-| **GPU Integration** | ✘ | ✅ |
-| **Neuron Enumeration** | Basic LUT inference | Batched truth table gen |
-| **Architecture Customization** | Limited | Novel model designs described in paper |
-
----
-
 ## 📚 Citation
 
-#### If this repo contributes to your research or FPGA design, please cite our NeuraLUT paper:
+#### If this repo contributes to your research or FPGA design, please cite our papers:
 
+```bibtex
+@inproceedings{andronic2025neuralut-assemble,
+	author	= "Andronic, Marta and Constantinides, George A.",
+	title		= "{NeuraLUT-Assemble: Hardware-Aware Assembling of Sub-Neural Networks for Efficient LUT Inference}",
+	booktitle	= "{2025 IEEE 33rd Annual International Symposium on Field-Programmable Custom Computing Machines (FCCM)}",
+	pages		= "208-216",
+	publisher	= "IEEE",
+	year		= 2025,
+	note		= "doi: 10.1109/FCCM62733.2025.00077"
+}
+```
 ```bibtex
 @inproceedings{andronic2024neuralut,
 	author	= "Andronic, Marta and Constantinides, George A.",
@@ -124,6 +136,17 @@ We released a dedicated [ReducedLUT branch](https://github.com/MartaAndronic/Neu
 	note		= "doi: 10.1109/FPL64840.2024.00028"
 }
 ```
+```bibtex
+@inproceedings{andronic2024neuralut,
+	author	= "Andronic, Marta and Constantinides, George A.",
+	title		= "{PolyLUT: Ultra-Low Latency Polynomial Inference With Hardware-Aware Structured Pruning}",
+	booktitle	= "{IEEE Transactions on Computers}",
+	pages		= "3181-3194",
+	publisher	= "IEEE",
+	year		= 2025,
+	note		= "doi: 10.1109/TC.2025.3586311"
+}
+```
 #### If ReducedLUT contributes to your research please also cite:
 ```bibtex
 @inproceedings{reducedlut,
 
@@ -1,27 +1,72 @@
-## NeuraLUT on the jet substructure tagging dataset
+## NeuraLUT-Assemble on the jet substructure tagging dataset (CERNBox)
 
-To reproduce the results in our paper follow the steps below. Subsequently, compile the Verilog files using the following settings (utilize Vivado 2020.1, target the xcvu9p-flgb2104-2-i FPGA part, use the Vivado Flow_PerfOptimized_high settings, and perform synthesis in the Out-of-Context (OOC) mode).
+This folder provides the code and resources to reproduce our NeuraLUT-Assemble results on the CERNBox jet substructure tagging dataset.
 
-## Download dataset
+We also include a pretrained checkpoint in the test_demo folder so you can skip training and go straight to evaluation and hardware generation.
+>These checkpoints are not the exact ones used in the paper but are provided for convenience and practice.
+
+## Download JSC dataset from CERNBox
 Navigate to the jet_substructure directory.
 ```
 mkdir -p data
 wget https://cernbox.cern.ch/index.php/s/jvFd5MoWhGs1l5v/download -O data/processed-pythia82-lhc13-all-pt1-50k-r1_h022_e0175_t220_nonu_truth.z
 ```
 
+### 📓 Demo Notebook
+For a quick and interactive overview, check out demo.ipynb.
+
+This notebook:
+
+* Loads the pretrained checkpoint
+* Verifies the test accuracy
+* Generates the truth tables
+* Runs a software simulation on the truth tables to validate accuracy
+
+Generates Verilog files (⚠️ Note: only software simulation is performed in the notebook)
+
+For full hardware simulation and Verilog compilation, please use neq2lut.py as shown below.
+
+### 🚀 Quickstart
+
+To reproduce the full results, including hardware simulation with Verilator, follow these steps:
+
+1. Train the Model (optional)
 ```
-python train.py --arch jsc-2l --log_dir jsc-2l --cuda
-python neq2lut.py --arch jsc-2l --checkpoint ./test_jsc-2l/best_accuracy.pth --log-dir ./test_jsc-2l/verilog/ --add-registers --seed 8766 --device 1 --cuda
+python train.py --arch jsc-cernbox --log_dir demo --cuda --device 1
 ```
+2. Convert to Verilog, Simulate, and Evaluate
+This script:
+* Loads the trained checkpoint
+* Verifies test accuracy
+* Generates truth tables
+* Runs both software simulation and hardware simulation using Verilator
+* Compiles Verilog files for FPGA inference
+
 ```
-python train.py --arch jsc-5l --log_dir jsc-5l --cuda
-python neq2lut.py --arch jsc-5l --checkpoint ./test_jsc-5l/best_accuracy.pth --log-dir ./test_jsc-5l/verilog/ --add-registers --seed 312846 --device 1 --cuda
+python neq2lut.py --arch jsc-cernbox \
+                  --checkpoint ./test_demo/best_accuracy.pth \
+                  --log-dir ./test_demo/verilog/ \
+                  --add-registers \
+                  --device 1 \
+                  --imask ./test_demo/imask.pth \
+                  --cuda
 ```
 
 
-## Citation
-Should you find this work valuable, we kindly request that you consider referencing our paper as below:
+## 📖 Citation
+Should you find this work valuable, we kindly request that you consider referencing our papers as below:
+```bibtex
+@inproceedings{andronic2025neuralut-assemble,
+	author	= "Andronic, Marta and Constantinides, George A.",
+	title		= "{NeuraLUT-Assemble: Hardware-Aware Assembling of Sub-Neural Networks for Efficient LUT Inference}",
+	booktitle	= "{2025 IEEE 33rd Annual International Symposium on Field-Programmable Custom Computing Machines (FCCM)}",
+	pages		= "208-216",
+	publisher	= "IEEE",
+	year		= 2025,
+	note		= "doi: 10.1109/FCCM62733.2025.00077"
+}
 ```
+```bibtex
 @inproceedings{andronic2024neuralut,
 	author	= "Andronic, Marta and Constantinides, George A.",
 	title		= "{NeuraLUT: Hiding Neural Network Density in Boolean Synthesizable Functions}",
@@ -31,4 +76,15 @@ Should you find this work valuable, we kindly request that you consider referenc
 	year		= 2024,
 	note		= "doi: 10.1109/FPL64840.2024.00028"
 }
+```
+```bibtex
+@inproceedings{andronic2024neuralut,
+	author	= "Andronic, Marta and Constantinides, George A.",
+	title		= "{PolyLUT: Ultra-Low Latency Polynomial Inference With Hardware-Aware Structured Pruning}",
+	booktitle	= "{IEEE Transactions on Computers}",
+	pages		= "3181-3194",
+	publisher	= "IEEE",
+	year		= 2025,
+	note		= "doi: 10.1109/TC.2025.3586311"
+}
 ```