Prosperity PPU: Product Sparsity Accelerator for SNNs

⚠️ Prototype / Research Implementation
This is an experimental prototype for exploring product sparsity in SNN accelerators. It is intended for research, simulation, and FPGA prototyping—not production deployment.

This repository implements the Prosperity PPU—a hardware accelerator for spiking neural networks (SNNs) that exploits product sparsity to dramatically reduce computation by reusing shared spike patterns across matrix rows.

Architecture Overview

Pipeline: Detector → Pruner → Dispatcher → Processor → Neuron Array
Key Features:
- Product Sparsity: Identifies and reuses identical or subset spike patterns (prefixes) to avoid redundant MACs.
- TCAM-based Detector: Fast, parallel detection of prefix relationships.
- Pruner: Selects the best prefix for each row and computes the suffix mask.
- Dispatcher: Sorts and issues rows in dependency-safe order (prefix before suffix).
- Ping-Pong Task Buffers: Double-buffered task banks decouple analysis from compute for phase overlap.
- 128-PE Processor: Parallel MAC array using signed INT8 weights and INT16 accumulators.
- Standalone Neuron Array: Dedicated LIF backend decoupled from processor MAC execution.
- Single-port RAM Interface: For loading spike tiles from the host.

File Structure

ppu/top.v — Top-level PPU module (pipeline controller)
ppu/detector.v — TCAM-based prefix detector
ppu/pruner.v — Prefix selection and suffix mask computation
ppu/dispatcher.v — Sorting and dispatch logic
ppu/processor.v — 128-PE MAC array for matrix computation
ppu/neuron_array.v — Dedicated LIF neuron array backend
ppu/tcam/hdl/ — TCAM hardware modules
tb/ — Python cocotb testbenches

How It Works

Tile Load: Host loads a tile of spike patterns into the PPU's RAM.
Detection: For each row, the detector finds all possible prefixes (subsets).
Pruning: The pruner selects the best prefix (max overlap, lowest index) and computes the suffix mask (bits to compute).
Dispatch: The dispatcher sorts all rows by popcount and row index, ensuring all prefixes are processed before their suffixes.
Processing: The 128-PE processor array performs matrix computation, reusing prefix results and computing only the suffix bits for maximum efficiency. Each PE uses 8-bit weights with 16-bit accumulators.

Running Tests

Prerequisites

Full Pipeline Test

To run a full random pipeline test:

pytest tb/test_top.py

Testing Suite

Use the provided testing scripts:

# Run tests with pytest (cocotb)

# Run full test suite
pytest -q

# Run all cocotb tests (tb folder)
pytest tb/ -v

# Run a single test module
pytest tb/test_top.py -v

# Run a single test function
pytest tb/test_processor.py::runCocotbTests -v

Notes:

This repository no longer includes helper shell scripts; use pytest directly to run cocotb tests.
Recommended: run inside a Python virtual environment and install requirements from requirements.txt.
To view simulator/cocotb output, run pytest with -s to disable capture (e.g., pytest -s tb/test_top.py).

Evaluation Benchmarks

# End-to-end sparse vs dense(no-prefix-reuse) ablation + metrics export
pytest -s tb/bench_top_ablation.py

# Stage microbenchmarks (detector/pruner/dispatcher/processor)
python tb/bench_pipeline.py all

# Train/export hardware-compatible MNIST workload (INT8 weights + 16-bit spikes)
python tb/workloads/train_mnist_hw_model.py --download --output tb/workloads/mnist_hw_eval.npz

# Run hardware-vs-software MNIST accuracy benchmark
python tb/bench_mnist_accuracy.py --workload tb/workloads/mnist_hw_eval.npz --samples 256

tb/bench_top_ablation.py defaults to an overlap-heavy workload profile (WORKLOAD_PROFILE=overlap_chain) and supports overrides via environment variables (e.g., ACTIVE_ROWS=64). It runs the metrics-only cocotb testcase (bench_end_to_end_metrics) so cycle/sparsity characterization is decoupled from strict numerical golden assertions.

Benchmark outputs are written to tb/bench_results/:

end_to_end_tile_metrics.csv
end_to_end_ablation_summary.json
detector_throughput.csv, pruner_reuse.csv, dispatcher_overhead.csv, processor_throughput.csv
snn_accuracy_metrics.csv, snn_accuracy_summary.json

MNIST accuracy mode runs bench_mnist_snn_accuracy in tb/test_top.py. It compares processor writeback INT16 scores against a software golden model generated from the same exported workload and reports both hardware and software classification accuracy.

Customization

Change ROWS, SPIKES, and NO_WIDTH parameters in the testbenches or top module for different tile sizes.
Adjust PE_COUNT, WEIGHT_WIDTH, and ACC_WIDTH parameters for different processor configurations.
Edit the testbenches in tb/ to create custom spike patterns or test new scenarios.

RAW Submission Scope

This artifact targets a single-tile Prosperity-style PPU implementation.

In scope:

Detector, pruner, dispatcher, processor, neuron array, timestep control, injector/collector
AXI4-Lite host control and weight DMA path
RTL simulation, stage microbenchmarks, end-to-end ablation, and MNIST HW-vs-SW parity flow

Out of scope for this submission:

SFU path
Multi-tile NoC/inter-tile routing

RAW Submission Status

Item	Status	Evidence
Core pipeline RTL integration	✅	`ppu/top.v`, `tb/test_top.py`
Block-level verification	✅	`tb/test_detector.py`, `tb/test_pruner.py`, `tb/test_dispatcher.py`, `tb/test_processor.py`, `tb/test_lif.py`, `tb/test_neuron_array.py`, `tb/test_spike_injector.py`, `tb/test_spike_collector.py`, `tb/test_timestep_ctrl.py`
Host/control path verification	✅	`tb/test_axi_lite_bridge.py`, `tb/test_csr.py`, `tb/test_weight_mem_ctrl.py`
End-to-end sparse vs dense ablation	✅	`tb/bench_results/end_to_end_ablation_summary.json`, `tb/bench_results/end_to_end_tile_metrics.csv`
Stage-level benchmark summaries	✅	`tb/bench_results/detector_summary.json`, `tb/bench_results/pruner_summary.json`, `tb/bench_results/dispatcher_summary.json`, `tb/bench_results/processor_summary.json`
MNIST HW-vs-SW parity benchmark	✅	`tb/bench_results/snn_accuracy_summary.json`, `tb/bench_results/snn_accuracy_metrics.csv`
FPGA implementation metrics (LUT/FF/BRAM/DSP/Fmax/power)	⏳	Run `vivado -mode batch -source tools/fpga/synth.tcl` and archive generated reports

Provisional FPGA Resource Envelope (Estimate)

Use this estimate table until tool-generated synthesis reports are archived:

Resource	Low Estimate	High Estimate
LUTs	50,000	120,000
FFs	30,000	80,000
BRAM18K	10	40

RAW Artifact Commands

# 1) Unit + integration tests
pytest -q

# 2) Stage microbenchmarks
python tb/bench_pipeline.py all

# 3) End-to-end sparse vs dense ablation
pytest -s tb/bench_top_ablation.py

# 4) Train/export workload and run MNIST parity benchmark
python tb/workloads/train_mnist_hw_model.py --download --output tb/workloads/mnist_hw_eval.npz
python tb/bench_mnist_accuracy.py --workload tb/workloads/mnist_hw_eval.npz --samples 256

# 5) FPGA implementation reports (if Vivado is available)
vivado -mode batch -source tools/fpga/synth.tcl

RAW Reporting Notes

Keep claims explicitly single-tile.
Report cycle/MAC/accuracy metrics from tb/bench_results/.
If energy is estimated from cycle/MAC reduction, label it as model-based estimation unless measured board/silicon power is available.

References

Prosperity: Accelerating SNNs via Product Sparsity (arXiv:2503.03379)

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Prosperity PPU: Product Sparsity Accelerator for SNNs

Architecture Overview

File Structure

How It Works

Running Tests

Prerequisites

Full Pipeline Test

Testing Suite

Evaluation Benchmarks

Customization

RAW Submission Scope

RAW Submission Status

Provisional FPGA Resource Envelope (Estimate)

RAW Artifact Commands

RAW Reporting Notes

References

License

FilesExpand file tree

README.md

Latest commit

History

README.md

File metadata and controls

Prosperity PPU: Product Sparsity Accelerator for SNNs

Architecture Overview

File Structure

How It Works

Running Tests

Prerequisites

Full Pipeline Test

Testing Suite

Evaluation Benchmarks

Customization

RAW Submission Scope

RAW Submission Status

Provisional FPGA Resource Envelope (Estimate)

RAW Artifact Commands

RAW Reporting Notes

References

License