MANS is a high-performance compression framework designed to make Asymmetric Numeral Systems (ANS) efficient and portable for multi-byte integer data across CPUs, NVIDIA GPUs, and AMD GPUs.
At the core of MANS is ADM (Adaptive Data Mapping) β a lightweight, distribution-aware transformation that maps 16/32-bit integers into a compact 8-bit domain, significantly improving both compression ratio and throughput while maintaining full numerical fidelity. β‘
This framework is tailored for high-volume scientific and HPC workloads such as photon science, large-scale simulations, and sensor-array pipelines, where multi-byte integer data is ubiquitous and processed in small slices. π¬π
(C) 2025 by Institute of Computing Technology, Chinese Academy of Sciences.
Developers: Wenjing Huang (Lead Developer, Designer of ADM), Jinwu Yang (CPU and GPU Implementations/Optimizations of Parallel ANS Encoders), Shengquan Yin (HIP Version of ANS Encoder)
Contributors: Dingwen Tao (Supervisor), Guangming Tan
-
π High-Efficiency Compression
Achieves up to 1.24Γ higher compression ratio than standard ANS and 2.37Γ higher than 16-bit Huffman. -
β‘ Cross-Platform High-Performance
Optimized for:- NVIDIA GPUs (CUDA)
- AMD GPUs (HIP/ROCm)
- Multi-core CPUs with OpenMP + SIMD
-
π§© Adaptive Data Mapping (ADM)
Converts multi-byte integers into effective 8-bit symbols with <1% overhead. -
π Flexible CPU Modes
- -p mode: portable, GPU-consistent ANS implementation
- -r mode: maximum compression ratio using FSE-ANS
-
π¦ Lightweight & Easy Integration
Minimal dependencies, C++17 compatible, and CMake-based build system.
MANS consistently delivers strong compression ratios across real-world scientific datasets:
- 2.37Γ higher compression ratio vs. FSE-ANS
- 1.32Γ higher compression ratio vs. 16-bit Huffman
- Up to 2.09Γ improvement on quantization-based datasets
- More stable performance on small slices compared to 16-bit Huffman
- 1.24Γ higher compression ratio than FSE-ANS
- Slightly lower than -r mode due to parallel ANS design
- Maintains consistency with GPU behavior
MANS provides strong, consistent performance across diverse platforms:
- 1.92Γ faster compression vs. FSE-ANS
- 2.04Γ faster decompression vs. FSE-ANS
- 45.14Γ faster compression vs. nvCOMP Huffman
- Up to 288.45Γ faster decompression compared to CPU portable mode
- Up to 90.42Γ faster compression and 135.86Γ faster decompression vs. CPU
- 0.52Γ CUDA compression throughput; 0.47Γ CUDA decompression throughput
- CMake β₯ 3.15
- C++17 compiler
- OpenMP (for CPU parallelization)
- CUDA 12.6 (for NVIDIA GPUs)
- ROCm (for AMD GPUs)
- Git
- Recommended OS: Ubuntu 22.04+
git clone https://github.com/ewTomato/MANS.gitcd MANS; mkdir build; cd build;
cmake -DTARGET_PLATFORM=cpu_nv -DCMAKE_C_COMPILER=gcc -DCMAKE_CXX_COMPILER=g++ .. && make -j # cpu + nvidia> Other platform options include:
> - `cpu` β CPU-only build
> - `nv` β NVIDIA-only build
> - `amd` β AMD-only build
> - `cpu_amd` β CPU + AMD build
> - `all` β NVIDIA + AMD + CPU build
Below is a minimal workflow based on current binaries in this repository.
cpu_mans_autotune now generates synthetic u16 datasets internally and sweeps all three mappings (dims=1/2/3) in one run.
- data-size list is configurable by
--data-size-mb-list - output CSV contains a
dimscolumn
- Run autotune and generate thread CSV:
./build/bin/cpu/cpu_mans_autotune \
--data-size-mb-list 0.00390625,0.0078125,1,4 \
--csv ./build/thread_sweep.csv --out ./build/best_threads.csvOptional: control synthetic block-type ratios (smooth/spike/constant/random):
./build/bin/cpu/cpu_mans_autotune \
--ratio-smooth 1.0 --ratio-spike 0.0 --ratio-constant 0.0 --ratio-random 0.0- Run bench without
--threads; it auto-loads CSV and selects nearest thread config by input size:
cd build
./bin/cpu/cpu_mans_bench -u2 /path/to/input_u16.bin --mode r --dims 1 134217728 --csv bench.csvAuto-load order:
MANS_THREAD_CSVenv var (if set)- otherwise
best_threads.csvin current working directory
Debug warning for ADM bypass:
MANS_WARN_IF_NO_ADM=1 ./bin/cpu/cpu_mans_bench -u2 /path/to/input_u16.bin --mode r --dims 1 134217728./build/bin/nv/nv_mapping_uint16 input_file output_file_adm
./build/bin/nv/cudaans_compress output_file_adm output_file_mans
./build/bin/nv/cudaans_decompress output_file_mans output_file_adm_restore./build/bin/amd/amd_mapping_uint16 input_file output_file_adm
./build/bin/hipans_compress output_file_adm output_file_mans
./build/bin/hipans_decompress output_file_mans output_file_adm_restoresee tools/H5Z-MANS/README.md for detailed instructions on building and using the HDF5 filter plugin for MANS.
MANS/
βββ amd/ # ADM, ANS, GPU kernels(AMD version)
βββ build/ # CMake build directory (generated by user)
βββ cpu/ # ADM, PANS(CPU version)
βββ nv/ # ADM, ANS, GPU kernels(NVIDIA version)
βββ testdata/
βββ tools/ # test scripts and tools(hdf5 filter)
βββ README.md
...
MANS is released under the MIT License.
Please see the LICENSE file for full details.
If you use MANS in your research or software, please cite our work:
@inproceedings{huang2025mans,
title={MANS: Efficient and Portable ANS Encoding for Multi-Byte Integer Data on CPUs and GPUs},
author={Huang, Wenjing and Yang, Jinwu and Yin, Shengquan and Li, Haoxu and Gu, Yida and Liu, Zedong and Jing, Xing and Wei, Zheng and Fu, Shiyuan and Hu, Hao and others},
booktitle={Proceedings of the International Conference for High Performance Computing, Networking, Storage and Analysis},
pages={1299--1314},
year={2025},
DOI={10.1145/3712285.3759825}
}
More details and related materials will coming soon.

