This repository provides a reference design of a stereo 16-bit Finite Impulse Response (FIR) filter implemented in Verilog, integrated with AXI-Stream, AXI-Lite control, and AXI DMA, and validated using RTL simulation and a bare-metal Vitis application.
Target platform: AMD Kria KV260
Focus: clean architecture, deterministic behavior, and hardware–software co-design
This design operates in a fully real-time, sample-by-sample streaming architecture with deterministic latency.
- Stereo FIR filter (Q1.15 fixed-point)
- Parameterizable tap count (default: 129 taps)
- Transposed-form architecture
- AXI-Stream data interface (32-bit interleaved stereo)
- AXI-Lite control and coefficient memory
- Deterministic 1-cycle processing latency
- Runtime coefficient reconfiguration
- Bare-metal C reference application using AXI DMA
- Independent testbenches for DSP core and AXI wrapper
- Default pass-through initialization (failsafe audio streaming before AXI-Lite programming)
+----------------------+
| ARM (Bare-metal) |
| - AXI DMA |
| - AXI-Lite Ctrl |
+----------+-----------+
|
+----------v-----------+
| Stereo FIR AXI IP |
| - AXI-Stream IN |
| - FIR Core (L/R) |
| - AXI-Stream OUT |
+----------------------+
- Left and Right channels are processed synchronously
- Each channel has an independent FIR core
- Filter coefficients are shared between channels
- Runtime coefficient updates are supported
- AXI-Stream width: 32-bit
- Channel mapping:
[31:16]→ Left channel (signed Q1.15)[15:0]→ Right channel (signed Q1.15)
- Core Latency: 1 clock cycle
- Throughput: 1 sample / cycle (fully pipelined)
- Architecture: Fully parallel transposed FIR (200 taps)
Note: All tap computations are performed combinationally in a single cycle, with a registered output stage.
| Offset | Register | Description |
|---|---|---|
| 0x00 | CTRL | bit0 = enable, bit1 = clear_state |
| 0x10 | COEF[0] | FIR coefficient tap 0 (Q1.15) |
| 0x14 | COEF[1] | FIR coefficient tap 1 (Q1.15) |
| ... | ... | ... |
| 0x10 + 4·(N−1) | COEF[N−1] | FIR coefficient tap N−1 |
See docs/address_map.md for detailed documentation.
RTL verification is performed using dedicated testbenches:
-
tb_fir_core.v
Verifies FIR DSP behavior:- Impulse response
- Step and sine response
- Saturation behavior
- Dynamic coefficient switching
-
tb_fir_axis.v
Verifies system-level integration:- AXI-Lite register access
- Runtime coefficient updates
- AXI-Stream handshake and backpressure
- Stereo interleaved data processing
Simulation outputs are stored in the result/ directory.
A bare-metal Vitis application is included to demonstrate:
- FIR driver initialization
- AXI-Lite coefficient programming
- Runtime filter reconfiguration (hot-swap)
- AXI DMA stereo buffer transfer
- Basic register read-back verification
Linux / PetaLinux integration is intentionally out of scope.
- Package FIR RTL as custom AXI IP in Vivado
- Integrate IP with AXI DMA in block design
- Generate bitstream and export XSA
- Build bare-metal application in Vitis
- Run DMA-based validation
See docs/build_overview.md for details.
This repository is intended as a clean, minimal FIR reference design.
Advanced features such as:
- Multiband FIR
- Polyphase / QMF structures
- Mid-side processing
- Adaptive or time-varying filters
are intentionally out of scope.
Target Device: AMD Kria KV260 (ZU5EV)
Clock Constraint: 100 MHz
- Worst Negative Slack (WNS): +1.276 ns
- Timing Status: Met
- LUTs: 31,142
- Flip-Flops (FF): 36,416
- DSP Slices: 400
- BRAM: 2
- Total On-Chip Power: 2.842 W
A minimal hardware validation was performed using a PYNQ-based DMA test.
- Platform: AMD Kria KV260
- Interface: AXI DMA (MM2S / S2MM)
- Data Format: 32-bit interleaved stereo (Q1.15)
- Bitstream loaded on target
- Test buffer (stereo ramp signal) generated in software
- Data streamed to FPGA via AXI DMA
- Output captured and compared against input
[INFO] Loading bitstream...
[INFO] Allocating buffers...
[INFO] Generating test signal...
[INFO] Starting DMA transfer...
[INFO] Transfer complete.
[INFO] Max difference: 0
[PASS] Output matches input (pass-through / identity response)
[INFO] Done.
- Default configuration behaves as identity (pass-through).
- Non-zero difference is expected after coefficient update.
- Validation confirms correct AXI DMA and streaming datapath operation.
-
The default configuration uses 129 FIR taps as a practical and representative example.
- An odd tap count is commonly used for linear-phase FIR designs.
- The FIR core itself is fully parameterizable and not limited to 129 taps.
-
FIR coefficients are mapped through an AXI-Lite memory interface.
- Each tap occupies one 32-bit AXI-Lite word.
- The current address width safely supports up to 256 taps.
- Designs requiring more than 256 taps only need a wider AXI-Lite address bus; no architectural changes are required.
-
This repository intentionally does not include coefficient generation tools (Python / MATLAB).
- The focus is on hardware architecture and AXI integration.
- FIR coefficient design is application-specific and better handled by external DSP tools.
- Users are free to generate and load their own coefficients at runtime.
This repository is provided as a reference implementation.
- The design is considered feature-complete and stable
- No active development roadmap is planned
- Issues and pull requests are not expected
Future updates may occur only for major revisions.
This module is part of a small RTL-focused DSP building block series.
For a reference implementation of a Quadrature Mirror Filter (QMF) analysis/synthesis filter bank
using AXI-Stream and fixed-point arithmetic, see:
🔗 https://github.com/vrm-lab/Quadrature-Mirror-Filter-FPGA
The QMF repository focuses on:
- subband analysis and reconstruction behavior
- fixed-point DSP discipline
- AXI-Stream integration correctness
It is provided as a reference RTL design, not as a complete system.
Licensed under the MIT License.
Provided as-is, without warranty.