Real-time SIMD-optimized analog circuit simulation for audio DSP.
libmksim models vacuum tubes, diodes, transistors, op-amps, RLC networks, potentiometers, and switches using Wave Digital Filters for linear networks and local Newton-Raphson solvers for nonlinear devices. Written in Rust with zero runtime allocation, it targets audio plugin and DAW integration via a stable C FFI.
Cargo.toml crate-type = ["lib", "cdylib", "staticlib"]
Dependencies zero (core), criterion (bench only)
License MIT
- Architecture Overview
- Module Map
- Theory
- Component Reference
- Circuit Graph
- Amp Stages
- C API
- Rust Usage
- Building
- Testing & Benchmarks
- Performance Targets
- References
Most analog audio circuits are composed of small nonlinear sections (tubes, diodes) surrounded by mostly linear passive networks. libmksim exploits this by separating the problem:
+------------------+
| Circuit Graph |
| (topo-sorted) |
+--------+---------+
|
+-----------------+-----------------+
| |
+-------v--------+ +---------v--------+
| Linear Networks| | Nonlinear Devices|
| (WDF adaptors) | | (Newton-Raphson) |
+----------------+ +------------------+
Linear networks (resistors, capacitors, inductors, tone stacks) are evaluated as Wave Digital Filter trees -- O(1) per component, no matrix solves, guaranteed passive stability.
Nonlinear devices (tubes, diodes, transistors, op-amps) are solved locally with Newton-Raphson iteration, typically converging in 2-4 steps.
The full processing pipeline:
Audio Input
|
v
Oversampling (upsample)
|
v
Circuit Graph Evaluation
|
v
Local Nonlinear Solvers
|
v
DC Blocking
|
v
Downsampling
|
v
Audio Output
src/
lib.rs Top-level crate root
simd/
traits.rs SimdFloat trait (generic SIMD abstraction)
scalar.rs ScalarFloat(f32) — WIDTH=1 fallback
exp.rs simd_exp() — range reduction + degree-5 minimax
log.rs simd_log(), simd_log1pexp()
sigmoid.rs simd_sigmoid()
reciprocal.rs simd_recip(), simd_rsqrt()
avx2.rs F32x8(__m256) — AVX2+FMA backend
avx512.rs AVX-512 backend (placeholder)
neon.rs F32x4(float32x4_t) — ARM NEON backend
dispatch.rs Runtime CPU detection
core/
node.rs NodeId, AlignedBuffer<N>, SignalBuffer
solver.rs NewtonSolver (generic Newton-Raphson)
circuit_graph.rs CircuitGraph, CompiledGraph, ZdfCluster
buffer_pool.rs Preallocated buffer pool
components/
mod.rs CircuitComponent trait
tubes.rs TriodeStage, PentodeStage (Koren equations)
diode.rs DiodeClipper, AntiParallelDiodeClipper
transistor.rs BjtStage, MosfetStage
opamp.rs OpAmpStage (gain + slew rate + rail clamp)
potentiometers.rs Potentiometer (linear/log/reverse-log tapers)
switches.rs Switch (smoothed conductance crossfade)
passive.rs Resistor, Capacitor, Inductor wrappers
rlc.rs RLC filter topologies via WDF
wdf/
components.rs WdfComponent trait, WdfResistor/Capacitor/Inductor
adaptors.rs SeriesAdaptor, ParallelAdaptor
ports.rs NonlinearPort trait, DiodePort, DiodePairPort
dsp/
parameter.rs Parameter (one-pole smoothing)
dc_block.rs DcBlocker
oversampling.rs Oversampler (2x/4x/8x/16x FIR)
anti_alias.rs AntiAliasFilter (Blackman-Harris windowed sinc)
stages/
preamp.rs PreampStage (two cascaded triodes)
tone_stack.rs ToneStack (treble/mid/bass)
power_amp.rs PowerAmpStage (phase splitter + pentode pair)
api/
processor.rs TubeDspEngine (high-level interface)
ffi.rs C FFI (extern "C" functions)
include/
libmksim.h C header for plugin integration
benches/
simd_math.rs Criterion benchmarks for exp/log/sigmoid
components.rs Criterion benchmarks for triode/amp chain
Wave Digital Filters (WDF) transform voltage-current relationships into incident/reflected wave pairs, enabling efficient evaluation of passive networks without matrix solves.
Wave variables. Given port resistance R:
a = incident wave (arriving at the port)
b = reflected wave (leaving the port)
V = a + b (voltage)
I = (a - b) / (2R) (current)
One-port elements:
| Element | Port Resistance | Reflected Wave |
|---|---|---|
| Resistor | R | b = 0 (absorbs energy) |
| Capacitor | R = 1 / (2 C f_s) | b = state (trapezoidal) |
| Inductor | R = 2 L f_s | b = -state |
Adaptors connect two child ports into series or parallel topologies. The scattering equations use impedance/conductance ratios to compute how waves reflect and transmit at the junction:
- Series adaptor: R_port = R_a + R_b, scattering ratio gamma = R_a / (R_a + R_b)
- Parallel adaptor: R_port = R_a R_b / (R_a + R_b), scattering ratio gamma = R_b / (R_a + R_b)
Entire passive circuits (RC filters, RLC networks, tone stacks) are represented as binary adaptor trees. Processing is a two-pass traversal: reflected waves propagate bottom-up, then incident waves propagate top-down.
WDF guarantees passive circuits remain passive in the discrete domain, preventing instability and oscillation even under rapid parameter changes.
References: Fettweis (1986), Werner et al. (2015).
Nonlinear devices cannot be represented as WDF one-ports directly. Instead, they are placed at the root of the WDF tree as nonlinear ports that receive the incident wave from the tree, solve the device's I-V equation, and return the reflected wave.
The generic solver uses Newton-Raphson iteration:
Given F(x) = 0 with derivative F'(x):
x_{n+1} = x_n - F(x_n) / F'(x_n)
Stability measures:
- Voltage clamping: x is clamped to [-500, +500] V
- Step limiting: Newton steps are clamped to +/- 50 V
- Warm start: previous-sample solution as initial guess
- Derivative guard: F'(x) is bounded away from zero
Typical convergence: 2-4 iterations per sample.
Tube models use the Koren equations, which approximate the Child-Langmuir three-halves power law with smooth behavior near cutoff.
Triode plate current:
I_p = k * [ln(1 + exp(E))]^2
where E = (V_g + V_p / mu) / V_k
| Symbol | Meaning | 12AX7 Value |
|---|---|---|
| mu | Amplification factor | 100 |
| k | Scaling constant | 2.1e-6 |
| V_k | Curvature constant | 1.3 |
| R_p | Plate resistance | 100 kOhm |
| B+ | Supply voltage | 250 V |
The circuit equation B+ - R_p * I_p(V_g, V_p) - V_p = 0 is solved with
Newton-Raphson. The derivative uses the identity:
dI_p/dV_p = (2 k L) / (mu V_k) * sigmoid(E)
where L = ln(1 + exp(E))
sigmoid(E) = exp(E) / (1 + exp(E))
The ln(1 + exp(E)) function is computed using the numerically stable
softplus formulation max(E, 0) + ln(1 + exp(-|E|)) to avoid overflow.
Pentode plate current decouples plate voltage from grid control:
I_p = k * [ln(1 + exp(V_g / V_k))]^2 * (1 + V_p / V_a)
References: Koren (1996), Yeh (2006), Pakarinen & Yeh (2009).
Diode -- Shockley equation:
I = I_s * (exp(V / (n V_t)) - 1)
where I_s = saturation current
n = ideality factor
V_t = thermal voltage (~25.85 mV at room temperature)
Presets: silicon (I_s=1e-12, n=1.9), germanium (I_s=1e-6, n=1.2), LED (I_s=1e-20, n=2.0).
Anti-parallel diode pair (symmetric clipper):
I = 2 I_s sinh(V / (n V_t))
BJT -- simplified Ebers-Moll:
I_c = beta * I_s * (exp(V_be / V_t) - 1)
MOSFET -- square-law model with region handling:
cutoff: I_d = 0 (V_gs < V_th)
saturation: I_d = (k/2) * (V_gs - V_th)^2 (V_ds >= V_gs - V_th)
triode: I_d = k * ((V_gs-V_th)V_ds - V_ds^2/2)
Op-amp:
V_out = clamp(A * (V+ - V-), V_neg, V_pos)
with slew rate limiting: |dV/dt| <= SR.
All math in the library is written generically over the SimdFloat trait:
pub trait SimdFloat: Copy + Clone + Sized {
const WIDTH: usize;
fn splat(v: f32) -> Self;
fn add(self, rhs: Self) -> Self;
fn mul(self, rhs: Self) -> Self;
fn fma(self, b: Self, c: Self) -> Self;
// ... load, store, sub, div, max, min, abs, neg, cmp_ge, blend
}Backends:
| Backend | Type | WIDTH | Feature Gate |
|---|---|---|---|
| Scalar | ScalarFloat |
1 | (default) |
| AVX2 + FMA | F32x8 |
8 | avx2 |
| AVX-512 | F32x16 |
16 | avx512 |
| ARM NEON | F32x4 |
4 | neon |
Fast math functions (all generic over S: SimdFloat):
| Function | Algorithm | Max Error |
|---|---|---|
simd_exp(x) |
Range reduction to [-ln2/2, ln2/2], degree-5 Horner polynomial, 2^k via IEEE 754 bit manipulation | < 1e-5 |
simd_log(x) |
Mantissa extraction, normalization to [sqrt(2)/2, sqrt(2)], atanh series (6 terms) | < 1e-5 |
simd_log1pexp(x) |
Stable softplus: `max(x,0) + ln(1 + exp(- | x |
simd_sigmoid(x) |
`exp(- | x |
simd_recip(x) |
Newton refinement: r = r * (2 - x*r) |
< 1e-6 |
simd_rsqrt(x) |
Newton refinement on inverse square root | < 1e-5 |
Nonlinear processing generates harmonics that alias back into the audio band.
The Oversampler provides configurable 2x/4x/8x/16x oversampling:
- Upsample: zero-stuff by factor, then apply FIR lowpass
- Process: run the nonlinear component at the higher sample rate
- Downsample: apply FIR lowpass, then decimate by factor
FIR filters use a windowed sinc design (Blackman window) with order scaling by factor (15 for 2x, 23 for 4x, 31 for 8x, 47 for 16x).
Every circuit component implements CircuitComponent:
pub trait CircuitComponent {
fn prepare(&mut self, sample_rate: f32);
fn process_block(&mut self, input: &[f32], output: &mut [f32]);
fn update_parameters(&mut self);
}| Component | Model | Newton Iters |
|---|---|---|
TriodeStage |
Koren equation (12AX7, 12AT7 presets) | 2-4 |
PentodeStage |
Koren pentode (EL34 preset) | 2-4 |
DiodeClipper |
Shockley (silicon, germanium, LED) | 2-4 |
AntiParallelDiodeClipper |
sinh symmetric clipper | 2-4 |
BjtStage |
Ebers-Moll (2N3904 preset) | 2-4 |
MosfetStage |
Square-law with triode/sat (2N7000) | 2-4 |
OpAmpStage |
Finite gain + slew rate (TL072 preset) | 1 |
| Component | WDF Port Resistance |
|---|---|
WdfResistor |
R |
WdfCapacitor |
1 / (2 C f_s) |
WdfInductor |
2 L f_s |
SeriesAdaptor |
R_a + R_b |
ParallelAdaptor |
R_a R_b / (R_a + R_b) |
| Component | Behavior |
|---|---|
Potentiometer |
Linear, log, or reverse-log taper; smoothed position |
Switch |
R_on=0.1 Ohm, R_off=50 MOhm; smoothed conductance crossfade |
| Component | Description |
|---|---|
Parameter |
One-pole smooth: p[n] = p[n-1] + a(target - p[n-1]) |
DcBlocker |
y[n] = x[n] - x[n-1] + 0.995 * y[n-1] |
Oversampler |
2x/4x/8x/16x with FIR anti-alias |
AntiAliasFilter |
Polyphase FIR, Blackman-Harris window |
The CircuitGraph connects components into a directed processing graph:
let mut graph = CircuitGraph::new();
let triode = graph.add_node(Box::new(TriodeStage::new(PARAMS_12AX7.clone())));
let cap = graph.add_node(Box::new(Capacitor::new(0.022e-6, 44100.0)));
graph.connect(triode, cap);
let compiled = graph.compile()?; // topo sort + ZDF detection
graph.process_block(&compiled, 128);Compilation performs:
- Kahn's algorithm topological sort
- DFS-based cycle detection
- Feedback loops grouped into ZDF clusters (solved with Newton iteration)
- Buffer assignment from preallocated pool
Graph overhead target: < 5% of DSP time.
Circuits with instantaneous feedback (op-amp feedback, cathode feedback)
create cycles that break topological sorting. These nodes are grouped into
ZdfClusters and solved iteratively:
y = f(x + g(y)) // solved with Newton iteration per sample
libmksim includes prebuilt guitar amplifier stages:
| Stage | Components |
|---|---|
PreampStage |
Input triode -> DC blocker -> second triode |
ToneStack |
Treble/mid/bass one-pole filters with pots |
PowerAmpStage |
Phase splitter + push-pull pentode pair + transformer model |
These are composed by TubeDspEngine into a complete signal chain:
Input -> PreampStage -> ToneStack -> PowerAmpStage -> DcBlocker -> Output
The library exposes an opaque-pointer C API for plugin integration.
See include/libmksim.h.
#include "libmksim.h"
// Create engine
TubeDspEngine* engine = tube_dsp_create(44100, 128);
// Set parameters
tube_dsp_set_parameter(engine, TUBE_DSP_PARAM_TREBLE, 0.7f);
tube_dsp_set_parameter(engine, TUBE_DSP_PARAM_BASS, 0.5f);
tube_dsp_set_parameter(engine, TUBE_DSP_PARAM_INPUT_GAIN, 2.0f);
// Process audio
float input[128], output[128];
int err = tube_dsp_process(engine, input, output, 128);
// Cleanup
tube_dsp_destroy(engine);Error codes:
| Code | Value |
|---|---|
TUBE_DSP_OK |
0 |
TUBE_DSP_ERROR |
-1 |
TUBE_DSP_NULL_PTR |
-2 |
TUBE_DSP_INVALID_PARAM |
-3 |
Parameter IDs:
| ID | Name | Range |
|---|---|---|
| 0 | Input Gain | 0.0 - 10.0 |
| 1 | Output Gain | 0.0 - 2.0 |
| 2 | Treble | 0.0 - 1.0 |
| 3 | Mid | 0.0 - 1.0 |
| 4 | Bass | 0.0 - 1.0 |
| 5 | Drive | 0.0 - 1.0 |
| 6 | Presence | 0.0 - 1.0 |
| 7 | Master Volume | 0.0 - 1.0 |
Thread safety: call tube_dsp_process from the audio thread and
tube_dsp_set_parameter from the control thread. No heap allocation
occurs in the audio path.
use libmksim::api::processor::TubeDspEngine;
let mut engine = TubeDspEngine::new(44100, 128);
engine.set_parameter(2, 0.7); // treble
engine.set_parameter(4, 0.5); // bass
let input = [0.0f32; 128];
let mut output = [0.0f32; 128];
engine.process(&input, &mut output);Using individual components:
use libmksim::components::tubes::{TriodeStage, PARAMS_12AX7};
use libmksim::components::CircuitComponent;
let mut triode = TriodeStage::new(PARAMS_12AX7.clone());
triode.prepare(44100.0);
let input = [0.5f32; 64];
let mut output = [0.0f32; 64];
triode.process_block(&input, &mut output);Using WDF directly:
use libmksim::wdf::components::{WdfResistor, WdfCapacitor, WdfComponent};
use libmksim::wdf::adaptors::SeriesAdaptor;
let resistor = WdfResistor::new(1000.0);
let capacitor = WdfCapacitor::new(100e-9, 44100.0);
let mut rc = SeriesAdaptor::new(resistor, capacitor);
// Process one sample
let b = rc.reflected(); // bottom-up: get reflected wave
let a = input_voltage - b; // compute incident from source
rc.incident(a); // top-down: propagate incident# Build (default scalar backend)
cargo build --release
# Build with AVX2 SIMD backend (x86_64)
cargo build --release --features avx2
# Build with NEON backend (aarch64)
cargo build --release --features neon
# Build C dynamic library
cargo build --release
# Output: target/release/libmksim.dylib (macOS)
# target/release/libmksim.so (Linux)
# target/release/mksim.dll (Windows)# Run all 119 tests
cargo test
# Run benchmarks
cargo benchTests cover:
- SIMD math accuracy (exp, log, sigmoid vs stdlib over 1M samples)
- Transfer curves for all nonlinear devices
- Newton solver convergence
- WDF energy conservation and frequency response
- Diode clipper waveform behavior
- Anti-parallel clipper symmetry
- Potentiometer taper curves
- Switch conductance transitions
- Circuit graph topological sort and cycle detection
- Full engine integration (no NaN/Inf, bounded output)
| Component | Target (cycles/sample) |
|---|---|
| Passive WDF network | < 10 |
| Op-amp stage | < 15 |
| Diode clipper | < 20 |
| BJT stage | < 40 |
| Triode stage | < 150 |
| Full amp chain | < 500 |
| Platform | Throughput Target |
|---|---|
| AVX2 | > 200M samples/sec |
| AVX-512 | > 400M samples/sec |
| ARM NEON | > 150M samples/sec |
-
A. Fettweis, "Wave digital filters: Theory and practice," Proceedings of the IEEE, vol. 74, no. 2, pp. 270-327, 1986.
-
K. Werner, V. Nangia, J. O. Smith III, and J. S. Abel, "A general and explicit formulation for wave digital filters with multiple/multiport nonlinearities and complicated topologies," Proc. IEEE WASPAA, 2015.
-
N. Koren, "Improved vacuum tube models for SPICE simulations," Glass Audio, vol. 8, no. 5, 1996.
-
D. T. Yeh, J. S. Abel, and J. O. Smith III, "Simulation of the diode limiter in guitar distortion circuits by numerical solution of ordinary differential equations," Proc. DAFx, 2007.
-
J. Pakarinen and D. T. Yeh, "A review of digital techniques for modeling vacuum-tube guitar amplifiers," Computer Music Journal, vol. 33, no. 2, pp. 85-100, 2009.
-
D. T. Yeh, "Digital implementation of musical distortion circuits by analysis and simulation," PhD dissertation, Stanford University, 2009.
-
R. C. D. de Paiva, S. D'Angelo, J. Pakarinen, and V. Valimaki, "Emulation of operational amplifier circuits using wave digital filters," IEEE Trans. Circuits and Systems II, 2012.
-
W. Shockley, "The theory of p-n junctions in semiconductors and p-n junction transistors," Bell System Technical Journal, vol. 28, no. 3, pp. 435-489, 1949.
MIT