ESM encoder/decoder for Tamper - a compact format for bulk categorical datasets.
This repository contains an ESM-native implementation of the Tamper encoder and decoder format originally developed at the New York Times, plus strict parity tooling to ensure identical output to the frozen legacy implementation.
This project is an independent ESM implementation of the Tamper format. It does not define a new format and is not affiliated with the original NYT repository.
Tamper is a column-oriented packer for tabular categorical data (low-cardinality enums, booleans, bucketed integers) where JSON + compression becomes inefficient.
- Original NYT Tamper repository
- NYT Tamper project documentation
- Original NYT announcement / background
- unsetbit/tamp community encoder implementation
npm install @aoede/tamper
yarn add @aoede/tamper
pnpm add @aoede/tamper
Tamper is a good fit when your data is:
- Tabular (many rows with the same attributes)
- Categorical-heavy (enums, booleans, small integers)
- Bulk (transferred or stored as snapshots)
- Read-mostly / immutable
- Required to match legacy Tamper output exactly
Use cases:
- Analytics extracts for dashboards
- Lookup / reference tables
- ML-style categorical feature matrices shipped to JS or WASM
Do not use Tamper for:
- Nested or hierarchical objects
- General APIs or CRUD payloads
- Arbitrary graphs
- Free-form documents or HTML
If your data is not mostly categorical and tabular, JSON + Brotli/Zstd or a schema-based format (e.g. Protobuf, Arrow) will likely be a better fit.
Tamper is a data serialisation protocol originally developed at the New York Times to efficiently transfer large categorical datasets from server to browser.
This repository provides a modern ESM implementation of the original CommonJS codebase, with:
- identical encoded output
- identical decoded results
- strict, automated parity checks against the frozen legacy implementation
Tamper packs categorical columns using bitwise encodings, automatically selecting the most efficient strategy per attribute:
- Integer packing - sparse or bounded integer values
- Bitmap packing - dense categorical values
- Existence packing - tracks presence using run-length encoding
These strategies are chosen automatically by the encoder based on observed data characteristics.
Tamper achieves significant compression for categorical tabular data:
- Sparse datasets: 10-15x compression (e.g., 500 events across 10K IDs)
- Dense multi-value attributes: 20-30x compression (bitmap encoding)
- Very sparse datasets: 4-5x compression at scale (existence encoding with RLE)
The compression ratio improves with dataset size due to fixed header overhead. See real examples with the size comparison script:
npm run exampleThis script demonstrates four scenarios showing Tamper vs plain JSON size, compression ratios, and the impact of:
- Existence encoding for sparse data
- Integer encoding for categorical values
- Bitmap encoding for multi-value attributes
- Fixed overhead on small vs large datasets
Note: These compression ratios are before any transport-level compression. Tamper packs can be further compressed with gzip/brotli for additional gains, often achieving better overall compression than gzip/brotli on plain JSON (due to Tamper's elimination of field name repetition and use of bit-packed encodings).
├── clients/js/src/ # ESM decoder (browser-side)
├── encoders/js/
│ ├── core/ # Environment-agnostic encoder logic
│ └── env/ # Node.js & browser adapters
├── legacy/ # Frozen legacy implementation (reference only)
├── vendor/bitsy/ # Vendored bitset library (no npm deps)
├── scripts/ # Parity verification tools
└── test/ # Test datasets & canonical outputs
- Node.js (ESM-capable; tested with current LTS)
- npm (for installing dev tooling)
- Encoder runtime uses a local
vendor/bitsyshim (no network installs)
Install dev dependencies for TSX-driven scripts:
npm installExports:
createTamper()- decoder factoryTamper- decoder methods- default export - alias of
createTamper
import createTamper from './clients/js/src/tamper.ts';
import fs from 'node:fs/promises';
const tamper = createTamper();
const pack = JSON.parse(await fs.readFile('pack.json', 'utf8'));
const items = tamper.unpackData(pack);Entry points:
- Node / standard ESM:
encoders/js/index.ts - Browser / edge: compose core + environment adapter
Exports:
createPackSet,PackSetPack,IntegerPack,BitmapPack,ExistencePack
import { createPackSet } from './encoders/js/index.ts';
const tamp = createPackSet();
// configure attributes + pack data...
const json = tamp.toJSON();Browser / edge example:
import createEncoder from './encoders/js/core/createEncoder.ts';
import browserEnv from './encoders/js/env/browser.ts';
const { createPackSet } = createEncoder(browserEnv);
const tamp = createPackSet();
// configure attributes + pack data...
const json = tamp.toJSON();Decoder parity compares decoded output from the legacy and ESM implementations:
tsx scripts/compare-decoders.tsEncoder parity builds packs from test datasets and compares full JSON output against canonical fixtures:
tsx scripts/compare-encoders.tsThe ESM implementation's parity is verified by ensuring all canonical fixtures match byte-for-byte.
- Encoder output is tuned to exactly match canonical JSON fixtures (including legacy fields such as
max_guidand existence metadata). - The legacy implementation is retained only for parity verification and reference; it is not used at runtime.
- The browser encoder uses
Uint8ArrayandDataViewand does not depend on Node.jsBuffer.
PASS large.json
PASS run.json
PASS run2.json
PASS small.json
PASS small2.json
PASS sparse.json
PASS spstart.json
All 7 file(s) passed parity checks.
...
All 7 file(s) passed encoder parity checks.