Figure 1: Original video, 2X magnified, and 5X magnified.
Phase-based motion magnification amplifies subtle motions invisible to the naked eye. Unlike Eulerian (color-based) methods that amplify pixel intensity changes, phase-based magnification operates on the phase of complex wavelet coefficients — which directly encode local position — enabling 10–100x amplification with fewer artifacts. This is a Python implementation based on Wadhwa et al. (SIGGRAPH 2013) using the 2D Dual-Tree Complex Wavelet Transform.
v2.0.0 adds GPU acceleration via PyTorch, delivering ~5x end-to-end speedup on CUDA-capable GPUs.
There are two main approaches to video motion magnification:
-
Eulerian (Wu et al., SIGGRAPH 2012) — amplifies temporal pixel intensity changes at fixed spatial locations. Works well for revealing color variations (e.g., blood flow under skin) but produces artifacts when amplifying motion beyond small factors, because the first-order Taylor approximation breaks down.
-
Phase-based (Wadhwa et al., SIGGRAPH 2013) — operates on the phase of complex wavelet/pyramid coefficients. Phase directly encodes local spatial position, so phase changes over time directly represent motion. This supports much larger amplification factors (10–100x) with fewer artifacts because it manipulates motion information directly rather than relying on an intensity-to-motion approximation.
For a band-pass filtered signal at spatial frequency
By amplifying
The original phase-based method uses complex steerable pyramids, which are accurate but computationally expensive (~21x overcomplete). The Dual-Tree Complex Wavelet Transform (DTCWT), developed by Kingsbury (Cambridge, late 1990s), provides a faster alternative.
The standard Discrete Wavelet Transform (DWT) has two problems for phase-based processing: it is not shift-invariant (shifting input by 1 pixel completely changes coefficients), and it has poor directional selectivity (only 3 sub-bands). The DTCWT solves both by running two parallel filter banks whose wavelets are related by the Hilbert transform, producing complex-valued coefficients with clean amplitude and phase information.
In 2D, the DTCWT produces 6 complex sub-bands per scale at approximately
| Property | DWT | DTCWT | Steerable Pyramid |
|---|---|---|---|
| Shift invariant | No | Approximately | Yes |
| Directional | No (3 bands) | Yes (6 bands/scale) | Yes (configurable) |
| Overcomplete | 1x | ~4x | ~21x |
| Speed | Fast | Fast | Slow |
The DTCWT is ~5x faster than complex steerable pyramids while still providing reliable phase information for motion estimation.
The algorithm has five stages:
Input Video
|
v
[1. Forward 2D DTCWT] ──> Complex coefficients C = A * e^(i*phi)
| (nlevels scales x 6 orientations per frame)
v
[2. Phase Extraction] ──> Cumulative phase phi(t) via frame-to-frame
| complex division + cumsum
v
[3. Temporal Filter] ──> Separate base motion phi_0 (slow)
| from detail motion (phi - phi_0)
v
[4. Phase Modification] ──> Amplify detail: phi_0 + (phi - phi_0) * k
| + smoothing pass (width=2)
v
[5. Inverse DTCWT] ──> Reconstruct with |C| * e^(i*phi_modified)
|
v
Output Video (magnified motions)
Each color channel (R, G, B) is processed independently through the full pipeline, then recombined for the output video.
1. Forward 2D DTCWT
Each video frame is decomposed into nlevels scales × 6 orientations, producing complex coefficients
2. Phase Extraction
Cumulative phase is computed via frame-to-frame complex division. For each coefficient, dividing frame angle() and cumsum() produces
3. Temporal Filtering
A flat-top window low-pass filter separates the phase into base motion
4. Phase Modification
The detail motion is amplified by factor
An additional smoothing pass (width=2) removes high-frequency phase noise introduced by amplification.
5. Inverse DTCWT
Coefficients are reconstructed with the original amplitude and modified phase:
| Application | Magnification (k) | Filter Width | What It Reveals |
|---|---|---|---|
| Pulse / breathing | 3–10 | 80–120 | Chest movement, skin motion from heartbeat |
| Structural vibration | 5–20 | 40–80 | Building sway, bridge oscillations |
| Mechanical vibration | 10–50 | 20–60 | Machine vibrations, resonance modes |
| Coronal seismology | 3–10 | 50–100 | Solar coronal loop oscillations |
- Higher k → more noise/artifacts — amplification also amplifies phase noise, producing spatial artifacts at high magnification factors.
- Memory intensive — all frame pyramids must remain in memory simultaneously for temporal filtering. Long videos or high resolutions may require significant RAM.
- Slow on CPU — DTCWT is computed on every frame × 3 color channels. Processing time scales linearly with frame count. Use
--gpufor ~5x speedup. - Large motions violate assumptions — the phase-to-motion relationship is linear only for small displacements. Large motions produce phase wrapping artifacts.
The normalize_phase() function normalizes complex coefficients to unit magnitude (
extract_temporal_phases() computes frame-to-frame phase changes via complex division — dividing the current frame's normalized coefficients by the previous frame's gives the phase ratio. Taking np.angle() converts to angles, and np.cumsum() along the time axis produces the absolute phase evolution relative to frame 0.
flattop_filter_1d() applies a flat-top window (from scipy.signal.windows.flattop) as a low-pass smoothing kernel along the time axis. The window size is width / 0.2327, where 0.2327 is the flat-top window's equivalent noise bandwidth in bins. This filter separates the slow baseline motion from the fast detail motion we want to amplify.
For windows larger than 32 samples, the filter switches to FFT-based convolution (scipy.signal.fftconvolve) for a ~4x speedup.
After filtering, the baseline phase
An additional smoothing pass with width=2 removes high-frequency phase noise that would appear as spatial flickering. The final coefficients are reconstructed by preserving the original amplitude and applying the modified phase:
The --gpu flag enables GPU-accelerated processing via PyTorch and pytorch_wavelets. The GPU path replaces both the DTCWT transforms and temporal filtering with CUDA-accelerated equivalents while keeping the same algorithmic pipeline.
Storing all DTCWT coefficients (amplitudes + phases) for every frame would require ~718 MB of CPU RAM per channel. The GPU path avoids this with a two-pass design:
Pass 1 — Forward DTCWT + Phase Extraction:
- Frames are sent to the GPU in batches (batch size auto-tuned to ~70% of available VRAM)
- Forward DTCWT produces complex coefficients; only the phase is extracted and stored on CPU
- Amplitudes and lowpass coefficients (Yl) are discarded — they will be recomputed in Pass 2
- Cross-batch boundary handling carries the last frame's normalized coefficients to the next batch, ensuring bitwise-identical results to single-batch processing
Temporal Filtering on GPU (between passes):
- Phase arrays are filtered using cuFFT (see below)
Pass 2 — Reconstruction + Inverse DTCWT:
- Forward DTCWT is re-run on the original frames to recover amplitudes and Yl (deterministic — verified 0.00 diff between runs)
- Modified phases from the filtered output are combined with recovered amplitudes:
real = amp * cos(phase),imag = amp * sin(phase) - Inverse DTCWT produces the output frames
The temporal filter is the pipeline bottleneck (54% of CPU runtime). On GPU, it uses torch.fft (backed by cuFFT):
- Phase arrays are chunked along the coefficient dimension to fit in VRAM — cuFFT requires ~20x the array size in working memory, so a 538 MB phase array would need ~3.8 GB for a whole-array FFT
- Each chunk is transferred to GPU, FFT'd along the time axis, multiplied by the pre-computed FFT of the flat-top window, then inverse FFT'd
- The magnification and smoothing passes are applied on-GPU before transferring back
Chunk size auto-tuning: queries torch.cuda.mem_get_info() and uses 70% of free VRAM as the limit, adapting to any GPU without user configuration.
Boundary handling: The GPU path uses zero-padding (not reflect-padding like CPU) for FFT convolution. This produces ~1.3% relative error at the first and last few frames, which at 65+ dB PSNR is visually imperceptible.
| Decision | Choice | Why |
|---|---|---|
| Two-pass vs store amplitudes | Two-pass (recompute) | Saves ~718 MB RAM/channel; forward DTCWT is deterministic, adds <1s on GPU |
| C=1 sequential vs C=3 batched | 3x C=1 sequential | C=3 only speeds DTCWT (22% of pipeline) by 2.2x but costs 2.5x VRAM; 1.2x total speedup not worth the complexity |
| Float32 vs float64 | Float32 everywhere | PyTorch/CUDA standard; cumsum error max 2.4e-4 rad at 900 frames, 1000x below visibility threshold |
| Whole-array vs chunked cuFFT | Chunked | Whole-array OOMs on consumer GPUs (>100 frames at 528x592); chunking is still 3x faster than CPU FFT |
| Zero-pad vs reflect-pad (GPU FFT) | Zero-pad | Reflect-padding would increase memory; 1.3% boundary error at 65+ dB PSNR is visually imperceptible |
See docs/design/gpu-acceleration.md for the full design document including alternatives considered and tradeoff analysis.
Benchmarked on face.mp4 (301 frames, 528x592, k=3) with an RTX 4050 (6 GB VRAM):
| Metric | CPU | GPU |
|---|---|---|
| Per-channel speedup | — | ~5-17x |
| End-to-end time | ~2 min | ~24 sec |
| Precision | float64 | float32 |
| Peak RAM | ~1.2 GB | ~800 MB |
| Peak VRAM | — | ~2-3 GB |
Hardware requirements (GPU path):
- NVIDIA GPU with CUDA 12.1+ support
- Minimum ~4 GB VRAM recommended (auto-tuning adapts batch/chunk sizes)
nvidia-container-toolkitfor Docker GPU support
Pre-flight memory check: Before processing, the tool estimates peak CPU RAM and VRAM usage and warns if it may exceed available resources, with suggestions to reduce --nlevels, resolution, or switch to CPU mode.
Note: CPU and GPU paths produce different outputs — they use different DTCWT implementations (dtcwt vs pytorch_wavelets) at different precisions (float64 vs float32). Both produce valid motion magnification results; they are not cross-comparable.
The easiest way to try the notebook — click the badge at the top of this README. No installation needed.
CLI tool (recommended for processing your own videos):
git clone https://github.com/joeljose/Motion-Magnification-Using-2D-DTCWT.git
cd Motion-Magnification-Using-2D-DTCWT
pip install -r requirements.txt
python motion_mag.py -i face.mp4Notebook (for interactive exploration and learning):
pip install -r requirements.txt requests
jupyter notebook MotionMagDtcwt.ipynbRequirements: Python 3.8+
# Build
./docker-build.sh
# Run
docker run --rm -it \
-v "$(pwd)":/app/data \
motion-mag-dtcwt:latest \
-i /app/data/input.mp4 -o /app/data/output.aviRequires nvidia-container-toolkit.
# Build
./docker-build-gpu.sh
# Run
docker run --rm -it --gpus all \
-v "$(pwd)":/app/data \
motion-mag-dtcwt-gpu:latest \
-i /app/data/input.mp4 -o /app/data/output.aviThe GPU Docker image is based on pytorch/pytorch:2.1.2-cuda12.1-cudnn8-runtime and includes PyTorch, pytorch_wavelets, and all dependencies. The --gpu flag is the default entrypoint behavior in the GPU image.
# CPU (default)
python motion_mag.py -i face.mp4
python motion_mag.py -i face.mp4 -o magnified.avi -k 5
python motion_mag.py -i face.mp4 -k 3 -w 80 --nlevels 6
# GPU
python motion_mag.py -i face.mp4 --gpu
python motion_mag.py -i face.mp4 --gpu --device 1 -k 10
python motion_mag.py -i face.mp4 --gpu -k 5 --biort near_sym_a --qshift qshift_a| Flag | Default | Description |
|---|---|---|
-i / --input |
(required) | Input video path |
-o / --output |
<input>_magnified.avi |
Output video path |
-k / --magnification |
3 | Magnification factor |
-w / --width |
80 | Temporal filter width (frames) |
--nlevels |
8 | DTCWT decomposition levels |
--gpu |
off | Enable GPU acceleration (requires PyTorch + pytorch_wavelets) |
--device |
0 | CUDA device index (for multi-GPU systems) |
--biort |
near_sym_b |
Biorthogonal wavelet filter for DTCWT level 1 |
--qshift |
qshift_b |
Quarter-shift wavelet filter for DTCWT levels 2+ |
--version |
— | Show program version and exit |
Available wavelet filters:
--biort:antonini,legall,near_sym_a,near_sym_b--qshift:qshift_06,qshift_a,qshift_b,qshift_c,qshift_d
Open the notebook and run all cells. By default, it downloads a sample face video from the original paper and magnifies it. To use your own video, change the filename variable.
- Start with low magnification (k=3) and increase gradually.
- Larger filter width → smoother temporal filtering, better for slow motions (breathing, pulse).
- Fewer
nlevels→ faster processing but less spatial detail captured. - R, G, B channels are processed independently — color artifacts indicate magnification is too high.
- Use
--gpufor ~5x faster processing if you have an NVIDIA GPU.
- Default wavelet filters changed from
near_sym_a/qshift_atonear_sym_b/qshift_b. The longernear_sym_bfilters produce fewer block artifacts at higher magnification factors (k=5+). To restore v1.x behavior:python motion_mag.py -i input.mp4 --biort near_sym_a --qshift qshift_a
- Output differs from v1.x even on CPU due to the filter change. Pin filters explicitly if reproducibility with older versions is needed.
See CHANGELOG.md for full release history.
All tests run inside Docker — no local Python dependencies needed:
# CPU: lint + unit tests (builds image automatically if not found)
./test.sh
# GPU: lint + unit tests including CUDA tests (requires nvidia-container-toolkit)
./test.sh gpu
# Force rebuild before testing
./test.sh --build
./test.sh gpu --buildCPU tests (tests/test_motion_mag.py) cover:
- Phase normalization (unit magnitude, zero safety)
- Flat-top temporal filter (DC passthrough, smoothing, edge cases)
- Temporal phase extraction (constant phase, output shape)
magnify_motionssmoke tests (shape, dtype, finite values)load_videobuffer safety- All CLI input validation error paths
GPU tests (tests/test_motion_mag_gpu.py) cover:
- GPU forward/inverse DTCWT roundtrip
- Phase extraction (finite values, correct shapes)
- Batched vs single-batch consistency (cross-batch boundary verification)
- cuFFT temporal filter (shape preservation, DC signal handling)
- Full GPU pipeline smoke test (finite output, correct dimensions)
- Memory estimation arithmetic
- All GPU tests skip automatically on systems without CUDA
Dev workflow:
- Make your changes
- Run
./test.sh(and./test.sh gpuif touching GPU code) - If all tests pass, commit and open a PR
- CI runs lint + smoke tests automatically
Version is tracked in a VERSION file at the project root. motion_mag.py has __version__ baked into the source (updated at release time).
To cut a release:
- Update
VERSIONwith the new version number - Update
__version__inmotion_mag.py - Update
CHANGELOG.md— move items from[Unreleased]to[X.Y.Z] - YYYY-MM-DD - Commit:
Release vX.Y.Z - Tag:
git tag -a vX.Y.Z -m "Release vX.Y.Z" - Push:
git push && git push origin vX.Y.Z - Rebuild Docker images:
./docker-build.sh && ./docker-build-gpu.sh
motion_mag.py # CLI tool (CPU + GPU paths)
MotionMagDtcwt.ipynb # Jupyter notebook
Dockerfile # CPU Docker image (python:3.11-slim)
Dockerfile.gpu # GPU Docker image (pytorch:2.1.2-cuda12.1)
docker-build.sh # Build + tag CPU image
docker-build-gpu.sh # Build + tag GPU image
test.sh # Run lint + tests (Docker, supports cpu/gpu mode)
requirements.txt # CPU runtime dependencies
requirements-gpu.txt # GPU runtime dependencies
requirements-dev.txt # Dev dependencies (pytest, ruff)
tests/
test_motion_mag.py # CPU unit tests
test_motion_mag_gpu.py # GPU unit tests (CUDA-only, skip on CPU)
docs/design/ # Architecture decision records
gpu-acceleration.md # GPU design doc
dtcwt-hardening.md # Hardening design doc
VERSION # Single source of truth for version
CHANGELOG.md # Release history
CONTRIBUTING.md # Contribution guidelines
-
Wadhwa, N., Rubinstein, M., Durand, F., & Freeman, W.T. (2013). Phase-Based Video Motion Processing. ACM Transactions on Graphics (SIGGRAPH), 32(4).
-
Wadhwa, N., Rubinstein, M., Durand, F., & Freeman, W.T. (2014). Riesz Pyramids for Fast Phase-Based Video Magnification. IEEE International Conference on Computational Photography (ICCP).
-
Anfinogentov, S. & Nakariakov, V.M. (2016). Motion Magnification in Coronal Seismology. Solar Physics, 291(11), 3251–3267. GitHub.
-
Wu, H-Y., Rubinstein, M., Shih, E., Guttag, J., Durand, F., & Freeman, W.T. (2012). Eulerian Video Magnification for Revealing Subtle Changes in the World. ACM Transactions on Graphics (SIGGRAPH), 31(4).
-
Kingsbury, N.G. (1998). The Dual-Tree Complex Wavelet Transform: A New Technique for Shift Invariance and Directional Filters. IEEE DSP Workshop.


