Skip to content

hpdps-group/MANS

Repository files navigation

MANS: Efficient and Portable ANS Encoding for Multi-Byte Integer Data on CPUs and GPUs

C++ Version License CUDA ROCm

MANS is a high-performance compression framework designed to make Asymmetric Numeral Systems (ANS) efficient and portable for multi-byte integer data across CPUs, NVIDIA GPUs, and AMD GPUs.

At the core of MANS is ADM (Adaptive Data Mapping) β€” a lightweight, distribution-aware transformation that maps 16/32-bit integers into a compact 8-bit domain, significantly improving both compression ratio and throughput while maintaining full numerical fidelity. ⚑

This framework is tailored for high-volume scientific and HPC workloads such as photon science, large-scale simulations, and sensor-array pipelines, where multi-byte integer data is ubiquitous and processed in small slices. πŸ”¬πŸš€

(C) 2025 by Institute of Computing Technology, Chinese Academy of Sciences.

Developers: Wenjing Huang (Lead Developer, Designer of ADM), Jinwu Yang (CPU and GPU Implementations/Optimizations of Parallel ANS Encoders), Shengquan Yin (HIP Version of ANS Encoder)

Contributors: Dingwen Tao (Supervisor), Guangming Tan


✨ Key Features

  • πŸš€ High-Efficiency Compression
    Achieves up to 1.24Γ— higher compression ratio than standard ANS and 2.37Γ— higher than 16-bit Huffman.

  • ⚑ Cross-Platform High-Performance
    Optimized for:

    • NVIDIA GPUs (CUDA)
    • AMD GPUs (HIP/ROCm)
    • Multi-core CPUs with OpenMP + SIMD
  • 🧩 Adaptive Data Mapping (ADM)
    Converts multi-byte integers into effective 8-bit symbols with <1% overhead.

  • πŸ”€ Flexible CPU Modes

    • -p mode: portable, GPU-consistent ANS implementation
    • -r mode: maximum compression ratio using FSE-ANS
  • πŸ“¦ Lightweight & Easy Integration
    Minimal dependencies, C++17 compatible, and CMake-based build system.


πŸ“ˆ Compression Ratio Performance

MANS consistently delivers strong compression ratios across real-world scientific datasets:

CPU (-r mode)

  • 2.37Γ— higher compression ratio vs. FSE-ANS
  • 1.32Γ— higher compression ratio vs. 16-bit Huffman
  • Up to 2.09Γ— improvement on quantization-based datasets
  • More stable performance on small slices compared to 16-bit Huffman

CPU (-p mode)

  • 1.24Γ— higher compression ratio than FSE-ANS
  • Slightly lower than -r mode due to parallel ANS design
  • Maintains consistency with GPU behavior

πŸ“Š Throughput Performance

MANS provides strong, consistent performance across diverse platforms:

Performance Plot

Intel(R) Xeon(R) Gold 5220S

  • 1.92Γ— faster compression vs. FSE-ANS
  • 2.04Γ— faster decompression vs. FSE-ANS

NVIDIA A100 GPU

  • 45.14Γ— faster compression vs. nvCOMP Huffman
  • Up to 288.45Γ— faster decompression compared to CPU portable mode

AMD MI210 GPU

  • Up to 90.42Γ— faster compression and 135.86Γ— faster decompression vs. CPU
  • 0.52Γ— CUDA compression throughput; 0.47Γ— CUDA decompression throughput

βš™οΈ Requirements

  • CMake β‰₯ 3.15
  • C++17 compiler
  • OpenMP (for CPU parallelization)
  • CUDA 12.6 (for NVIDIA GPUs)
  • ROCm (for AMD GPUs)
  • Git
  • Recommended OS: Ubuntu 22.04+

πŸ”§ Building

1️⃣ Clone the Repository

git clone https://github.com/ewTomato/MANS.git

2️⃣ Configure & Build

cd MANS; mkdir build; cd build;
cmake -DTARGET_PLATFORM=cpu_nv -DCMAKE_C_COMPILER=gcc -DCMAKE_CXX_COMPILER=g++ .. && make -j   # cpu + nvidia
> Other platform options include:  
> - `cpu` β€” CPU-only build   
> - `nv` β€” NVIDIA-only build  
> - `amd` β€” AMD-only build
> - `cpu_amd` β€” CPU + AMD build 
> - `all` β€” NVIDIA + AMD + CPU build  

πŸš€ Usage

Below is a minimal workflow based on current binaries in this repository.

CPU: autotune first, then auto-pick threads

cpu_mans_autotune now generates synthetic u16 datasets internally and sweeps all three mappings (dims=1/2/3) in one run.

  • data-size list is configurable by --data-size-mb-list
  • output CSV contains a dims column
  1. Run autotune and generate thread CSV:
./build/bin/cpu/cpu_mans_autotune \
  --data-size-mb-list 0.00390625,0.0078125,1,4 \
  --csv ./build/thread_sweep.csv --out ./build/best_threads.csv

Optional: control synthetic block-type ratios (smooth/spike/constant/random):

./build/bin/cpu/cpu_mans_autotune \
  --ratio-smooth 1.0 --ratio-spike 0.0 --ratio-constant 0.0 --ratio-random 0.0
  1. Run bench without --threads; it auto-loads CSV and selects nearest thread config by input size:
cd build
./bin/cpu/cpu_mans_bench -u2 /path/to/input_u16.bin --mode r --dims 1 134217728 --csv bench.csv

Auto-load order:

  • MANS_THREAD_CSV env var (if set)
  • otherwise best_threads.csv in current working directory

Debug warning for ADM bypass:

MANS_WARN_IF_NO_ADM=1 ./bin/cpu/cpu_mans_bench -u2 /path/to/input_u16.bin --mode r --dims 1 134217728

NVIDIA GPU

./build/bin/nv/nv_mapping_uint16 input_file output_file_adm
./build/bin/nv/cudaans_compress output_file_adm output_file_mans
./build/bin/nv/cudaans_decompress output_file_mans output_file_adm_restore

AMD GPU

./build/bin/amd/amd_mapping_uint16 input_file output_file_adm
./build/bin/hipans_compress output_file_adm output_file_mans
./build/bin/hipans_decompress output_file_mans output_file_adm_restore

HDF5 Filter Plugin: H5Z-MANS

see tools/H5Z-MANS/README.md for detailed instructions on building and using the HDF5 filter plugin for MANS.

πŸ“ Project Structure

MANS/
 β”œβ”€β”€ amd/              # ADM, ANS, GPU kernels(AMD version)
 β”œβ”€β”€ build/            # CMake build directory (generated by user)
 β”œβ”€β”€ cpu/              # ADM, PANS(CPU version)
 β”œβ”€β”€ nv/               # ADM, ANS, GPU kernels(NVIDIA version)
 β”œβ”€β”€ testdata/ 
 β”œβ”€β”€ tools/            # test scripts and tools(hdf5 filter)
 └── README.md
 ...

πŸ“œ License

MANS is released under the MIT License.
Please see the LICENSE file for full details.


πŸ“š Citation

If you use MANS in your research or software, please cite our work:

@inproceedings{huang2025mans,
  title={MANS: Efficient and Portable ANS Encoding for Multi-Byte Integer Data on CPUs and GPUs},
  author={Huang, Wenjing and Yang, Jinwu and Yin, Shengquan and Li, Haoxu and Gu, Yida and Liu, Zedong and Jing, Xing and Wei, Zheng and Fu, Shiyuan and Hu, Hao and others},
  booktitle={Proceedings of the International Conference for High Performance Computing, Networking, Storage and Analysis},
  pages={1299--1314},
  year={2025},
  DOI={10.1145/3712285.3759825}
}

More details and related materials will coming soon.

About

An optimized ANS compressor for multi-byte integer data on both CPUs and GPUs.

Resources

License

Stars

Watchers

Forks

Packages

 
 
 

Contributors