MANS: Efficient and Portable ANS Encoding for Multi-Byte Integer Data on CPUs and GPUs

MANS is a high-performance compression framework designed to make Asymmetric Numeral Systems (ANS) efficient and portable for multi-byte integer data across CPUs, NVIDIA GPUs, and AMD GPUs.

At the core of MANS is ADM (Adaptive Data Mapping) — a lightweight, distribution-aware transformation that maps 16/32-bit integers into a compact 8-bit domain, significantly improving both compression ratio and throughput while maintaining full numerical fidelity. ⚡

This framework is tailored for high-volume scientific and HPC workloads such as photon science, large-scale simulations, and sensor-array pipelines, where multi-byte integer data is ubiquitous and processed in small slices. 🔬🚀

Developers: Wenjing Huang (Lead Developer, Designer of ADM), Jinwu Yang (CPU and GPU Implementations/Optimizations of Parallel ANS Encoders), Shengquan Yin (HIP Version of ANS Encoder)

Contributors: Dingwen Tao (Supervisor), Guangming Tan

✨ Key Features

🚀 High-Efficiency Compression
Achieves up to 1.24× higher compression ratio than standard ANS and 2.37× higher than 16-bit Huffman.
⚡ Cross-Platform High-Performance
Optimized for:
- NVIDIA GPUs (CUDA)
- AMD GPUs (HIP/ROCm)
- Multi-core CPUs with OpenMP + SIMD
🧩 Adaptive Data Mapping (ADM)
Converts multi-byte integers into effective 8-bit symbols with <1% overhead.
🔀 Flexible CPU Modes
- -p mode: portable, GPU-consistent ANS implementation
- -r mode: maximum compression ratio using FSE-ANS
📦 Lightweight & Easy Integration
Minimal dependencies, C++17 compatible, and CMake-based build system.

📈 Compression Ratio Performance

MANS consistently delivers strong compression ratios across real-world scientific datasets:

CPU (-r mode)

2.37× higher compression ratio vs. FSE-ANS
1.32× higher compression ratio vs. 16-bit Huffman
Up to 2.09× improvement on quantization-based datasets
More stable performance on small slices compared to 16-bit Huffman

CPU (-p mode)

1.24× higher compression ratio than FSE-ANS
Slightly lower than -r mode due to parallel ANS design
Maintains consistency with GPU behavior

📊 Throughput Performance

MANS provides strong, consistent performance across diverse platforms:

Intel(R) Xeon(R) Gold 5220S

1.92× faster compression vs. FSE-ANS
2.04× faster decompression vs. FSE-ANS

NVIDIA A100 GPU

45.14× faster compression vs. nvCOMP Huffman
Up to 288.45× faster decompression compared to CPU portable mode

AMD MI210 GPU

Up to 90.42× faster compression and 135.86× faster decompression vs. CPU
0.52× CUDA compression throughput; 0.47× CUDA decompression throughput

⚙️ Requirements

CMake ≥ 3.15
C++17 compiler
OpenMP (for CPU parallelization)
CUDA 12.6 (for NVIDIA GPUs)
ROCm (for AMD GPUs)
Git
Recommended OS: Ubuntu 22.04+

🔧 Building

1️⃣ Clone the Repository

git clone https://github.com/ewTomato/MANS.git

2️⃣ Configure & Build

cd MANS; mkdir build; cd build;
cmake -DTARGET_PLATFORM=cpu_nv -DCMAKE_C_COMPILER=gcc -DCMAKE_CXX_COMPILER=g++ .. && make -j   # cpu + nvidia

> Other platform options include:  
> - `cpu` — CPU-only build   
> - `nv` — NVIDIA-only build  
> - `amd` — AMD-only build
> - `cpu_amd` — CPU + AMD build 
> - `all` — NVIDIA + AMD + CPU build

🚀 Usage

Below is a minimal workflow based on current binaries in this repository.

CPU: autotune first, then auto-pick threads

cpu_mans_autotune now generates synthetic u16 datasets internally and sweeps all three mappings (dims=1/2/3) in one run.

data-size list is configurable by --data-size-mb-list
output CSV contains a dims column

Run autotune and generate thread CSV:

./build/bin/cpu/cpu_mans_autotune \
  --data-size-mb-list 0.00390625,0.0078125,1,4 \
  --csv ./build/thread_sweep.csv --out ./build/best_threads.csv

Optional: control synthetic block-type ratios (smooth/spike/constant/random):

./build/bin/cpu/cpu_mans_autotune \
  --ratio-smooth 1.0 --ratio-spike 0.0 --ratio-constant 0.0 --ratio-random 0.0

Run bench without --threads; it auto-loads CSV and selects nearest thread config by input size:

cd build
./bin/cpu/cpu_mans_bench -u2 /path/to/input_u16.bin --mode r --dims 1 134217728 --csv bench.csv

Auto-load order:

MANS_THREAD_CSV env var (if set)
otherwise best_threads.csv in current working directory

Debug warning for ADM bypass:

MANS_WARN_IF_NO_ADM=1 ./bin/cpu/cpu_mans_bench -u2 /path/to/input_u16.bin --mode r --dims 1 134217728

NVIDIA GPU

./build/bin/nv/nv_mapping_uint16 input_file output_file_adm
./build/bin/nv/cudaans_compress output_file_adm output_file_mans
./build/bin/nv/cudaans_decompress output_file_mans output_file_adm_restore

AMD GPU

./build/bin/amd/amd_mapping_uint16 input_file output_file_adm
./build/bin/hipans_compress output_file_adm output_file_mans
./build/bin/hipans_decompress output_file_mans output_file_adm_restore

HDF5 Filter Plugin: H5Z-MANS

see tools/H5Z-MANS/README.md for detailed instructions on building and using the HDF5 filter plugin for MANS.

📁 Project Structure

MANS/
 ├── amd/              # ADM, ANS, GPU kernels(AMD version)
 ├── build/            # CMake build directory (generated by user)
 ├── cpu/              # ADM, PANS(CPU version)
 ├── nv/               # ADM, ANS, GPU kernels(NVIDIA version)
 ├── testdata/ 
 ├── tools/            # test scripts and tools(hdf5 filter)
 └── README.md
 ...

📜 License

MANS is released under the MIT License.
Please see the LICENSE file for full details.

📚 Citation

If you use MANS in your research or software, please cite our work:

@inproceedings{huang2025mans,
  title={MANS: Efficient and Portable ANS Encoding for Multi-Byte Integer Data on CPUs and GPUs},
  author={Huang, Wenjing and Yang, Jinwu and Yin, Shengquan and Li, Haoxu and Gu, Yida and Liu, Zedong and Jing, Xing and Wei, Zheng and Fu, Shiyuan and Hu, Hao and others},
  booktitle={Proceedings of the International Conference for High Performance Computing, Networking, Storage and Analysis},
  pages={1299--1314},
  year={2025},
  DOI={10.1145/3712285.3759825}
}

More details and related materials will coming soon.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

MANS: Efficient and Portable ANS Encoding for Multi-Byte Integer Data on CPUs and GPUs

✨ Key Features

📈 Compression Ratio Performance

CPU (-r mode)

CPU (-p mode)

📊 Throughput Performance

Intel(R) Xeon(R) Gold 5220S

NVIDIA A100 GPU

AMD MI210 GPU

⚙️ Requirements

🔧 Building

1️⃣ Clone the Repository

2️⃣ Configure & Build

🚀 Usage

CPU: autotune first, then auto-pick threads

NVIDIA GPU

AMD GPU

HDF5 Filter Plugin: H5Z-MANS

see tools/H5Z-MANS/README.md for detailed instructions on building and using the HDF5 filter plugin for MANS.

📁 Project Structure

📜 License

📚 Citation

FilesExpand file tree

README.md

Latest commit

History

README.md

File metadata and controls

MANS: Efficient and Portable ANS Encoding for Multi-Byte Integer Data on CPUs and GPUs

✨ Key Features

📈 Compression Ratio Performance

CPU (-r mode)

CPU (-p mode)

📊 Throughput Performance

Intel(R) Xeon(R) Gold 5220S

NVIDIA A100 GPU

AMD MI210 GPU

⚙️ Requirements

🔧 Building

1️⃣ Clone the Repository

2️⃣ Configure & Build

🚀 Usage

CPU: autotune first, then auto-pick threads

NVIDIA GPU

AMD GPU

HDF5 Filter Plugin: H5Z-MANS

see tools/H5Z-MANS/README.md for detailed instructions on building and using the HDF5 filter plugin for MANS.

📁 Project Structure

📜 License

📚 Citation