Skip to content

Latest commit

 

History

History
258 lines (170 loc) · 12.6 KB

File metadata and controls

258 lines (170 loc) · 12.6 KB

Boltzmann Entropy Research Reference

This document summarizes the foundational research behind our entropy implementation (H5) and provides a comprehensive reference for Bitcoin transaction privacy analysis.

The Boltzmann Framework

Origin: LaurentMT (~2015), creator of the OXT.me Boltzmann tool.

The Boltzmann framework defines transaction entropy as:

E = log2(N)

where N is the number of valid interpretations (input-to-output mappings) of a transaction. Higher entropy means more ambiguity for an adversary attempting to trace fund flows.

Key concepts:

  • Intrinsic entropy: The theoretical maximum entropy based on transaction structure alone
  • Actual entropy: The effective entropy after accounting for deterministic links (where an output can only be funded by one specific input)
  • Link probability matrix: For each (input, output) pair, the probability that the input funded that output, computed by summing over all valid interpretations

LaurentMT's original implementation enumerated all valid sub-mappings, computed the probability matrix, and derived Shannon entropy from it. This was computationally expensive but exact.

Source: LaurentMT, "Bitcoin Transactions & Privacy" (Parts 1-3), gist.github.com/LaurentMT


The Partition Formula

For equal-value CoinJoin transactions (where all n outputs have the same value), the number of valid interpretations can be computed exactly using integer partitions:

N = sum over all integer partitions (s1, s2, ..., sk) of n:
    n!^2 / (prod(si!^2) * prod(mj!))

where:

  • (s1, s2, ..., sk) is a partition of n (the parts sum to n)
  • mj = multiplicity of each distinct part size

Worked Example: n=5 (Whirlpool)

The 7 integer partitions of 5:

Partition prod(si!^2) Multiplicities prod(mj!) Term = 14400 / (prod * mult)
[5] (5!)^2 = 14400 {5:1} 1! = 1 1
[4,1] (4!)^2 * (1!)^2 = 576 {4:1, 1:1} 1 25
[3,2] (3!)^2 * (2!)^2 = 144 {3:1, 2:1} 1 100
[3,1,1] (3!)^2 * (1!)^4 = 36 {3:1, 1:2} 2! = 2 200
[2,2,1] (2!)^4 * (1!)^2 = 16 {2:2, 1:1} 2! = 2 450
[2,1,1,1] (2!)^2 * (1!)^6 = 4 {2:1, 1:3} 3! = 6 600
[1,1,1,1,1] (1!)^10 = 1 {1:5} 5! = 120 120

n!^2 = (5!)^2 = 14400.

For partition [1,1,1,1,1] (5 parts of size 1):

  • prod(si!^2) = (1!)^10 = 1
  • Multiplicities: {1: 5}, so prod(mj!) = 5! = 120
  • Term = 14400 / (1 * 120) = 120

For partition [2,1,1,1] (one part of 2, three parts of 1):

  • prod(si!^2) = (2!)^2 * (1!)^6 = 4
  • Multiplicities: {2: 1, 1: 3}, so prod(mj!) = 1! * 3! = 6
  • Term = 14400 / (4 * 6) = 600

For partition [2,2,1]:

  • prod(si!^2) = (2!)^4 * (1!)^2 = 16
  • Multiplicities: {2: 2, 1: 1}, so prod(mj!) = 2! * 1! = 2
  • Term = 14400 / (16 * 2) = 450

For partition [3,1,1]:

  • prod(si!^2) = (3!)^2 * (1!)^4 = 36
  • Multiplicities: {3: 1, 1: 2}, so prod(mj!) = 1! * 2! = 2
  • Term = 14400 / (36 * 2) = 200

For partition [3,2]:

  • prod(si!^2) = (3!)^2 * (2!)^2 = 144
  • Multiplicities: {3: 1, 2: 1}, so prod(mj!) = 1
  • Term = 14400 / 144 = 100

For partition [4,1]:

  • prod(si!^2) = (4!)^2 * (1!)^2 = 576
  • Multiplicities: {4: 1, 1: 1}, so prod(mj!) = 1
  • Term = 14400 / 576 = 25

For partition [5]:

  • prod(si!^2) = (5!)^2 = 14400
  • Multiplicities: {5: 1}, so prod(mj!) = 1
  • Term = 14400 / 14400 = 1

Total N = 120 + 600 + 450 + 200 + 100 + 25 + 1 = 1,496


Reference Values Table

n Interpretations (N) Entropy E = log2(N)
2 3 1.58 bits
3 16 4.00 bits
4 131 7.03 bits
5 1,496 10.55 bits
6 22,482 14.46 bits
7 426,833 18.70 bits
8 9,934,563 23.24 bits
9 ~277,006,192 ~28.05 bits

Note: The partition formula accounts for many-to-many mappings (one input funding multiple outputs). The classic permutation model (n!) only counts one-to-one mappings and significantly undercounts: e.g. for n=5, permutations give 120 vs. 1,496 from the partition formula.


Implementation Notes

Our implementation in entropy.ts uses boltzmannEqualOutputs(n) which generates all integer partitions of n and applies the formula above. This is exact for n <= 50 (at most ~204,226 partitions for n=50).

For n > 50, we fall back to an asymptotic approximation.

For mixed-value transactions (non-CoinJoin), we use assignment-based enumeration which is a lower bound but reasonable for typical transactions.


Multi-Denomination CoinJoin Entropy (WabiSabi)

The Problem

The Boltzmann partition formula assumes ALL outputs share one value. WabiSabi CoinJoins have 30+ denomination tiers (e.g., 20 outputs at 2M sats, 19 at 33M sats, 15 at 5M sats, etc.). Applying Boltzmann to just the largest tier ignores the other 29 tiers entirely.

Community feedback from Arkad (March 2026) correctly identified this flaw: per-UTXO entropy for WabiSabi was reported as lower than JoinMarket because only 20 of 279 outputs contributed to the entropy calculation.

Why Exact Computation is Infeasible

The exact multi-denomination entropy computation is NP-hard (constrained subset sum problem). The definitive work is Gavenda et al., "Analysis of Input-Output Mappings in CoinJoin Transactions with Arbitrary Values" (ESORICS 2025, arXiv 2510.17284):

  • Their approach uses constrained subset sum with "numeric mappings" (grouping by denomination class)
  • Execution times on real Wasabi CoinJoins: 0.07 seconds to 4.4 hours
  • Even their parallelized C++ implementation cannot handle median-sized CoinJoins
  • Found 10-50% anonymity set reduction vs. naive independence assumptions

Client-side computation in a browser is out of the question.

Why boltzmannEqualOutputs(k) Cannot Be Used Per-Tier

A tempting approach: compute boltzmannEqualOutputs(k) for each denomination tier independently and sum the entropies. This is mathematically wrong for two reasons:

  1. boltzmannEqualOutputs(k) assumes all k inputs have the same value as the outputs. In a multi-tier transaction, inputs have heterogeneous values. Most many-to-many flows that inflate the Boltzmann count beyond k! are physically impossible (an input of 50,000 sats cannot split to partially fund two 2,097,152-sat outputs).

  2. Entropy subadditivity: H(X,Y) <= H(X) + H(Y). Treating tiers as independent and summing their entropies gives an upper bound, not a lower bound. Cross-tier input sharing creates constraints that reduce joint entropy.

The overcount ratio of boltzmannEqualOutputs(k) vs k! grows explosively:

  • k=5: 1,496 vs 120 (12.5x)
  • k=10: ~9 billion vs ~3.6 million (2,504x)
  • k=20: ~9.3 * 10^27 vs ~2.4 * 10^18 (~3.8 billion x)

Our Approach: Weighted-Average Per-Tier Permutation Entropy

The key insight: entropy should measure per-participant privacy, not total transaction ambiguity. For Whirlpool (single tier), Boltzmann already gives per-participant entropy because all participants share one pool. For WabiSabi (multiple tiers), we need an aggregate that answers the same question: "how much entropy does a typical participant get?"

Per-tier computation: Each denomination tier of k equal-valued outputs is treated as a mini-CoinJoin. Within a tier, k! permutations are valid (swapping equal-valued columns preserves the flow matrix), giving log2(k!) bits of per-tier entropy.

Eligible-input constraint: Only inputs with value >= the denomination can fund a tier one-to-one. The effective k is min(output_count, eligible_inputs).

Weighted average: The per-tier entropies are averaged, weighted by tier size (number of outputs). Larger tiers contain more participants, so they contribute proportionally more to the expected participant entropy.

For each tier i with k_i >= 2 equal outputs:
  eligible_i = count of inputs with value >= denomination_i
  effective_k_i = min(k_i, eligible_i)
  tier_entropy_i = log2(effective_k_i!)

weighted_entropy = sum(k_i * tier_entropy_i) / sum(k_i)

Why not sum? Summing across tiers answers "how hard is it to deanonymize ALL participants simultaneously?" - a useless question. No single participant benefits from all 30 tiers. An adversary targets one participant, not the whole transaction.

Why not min? Taking the minimum (weakest tier) is too conservative. Small 2-output tiers may be coordinator change or edge cases. The weighted average represents expected privacy for a randomly chosen participant.

This is labeled "multi-tier permutation estimate" in the output.

Alternative Considered: Wasabi's Anonymity Score

Wasabi 2.0 replaced Boltzmann with a per-UTXO "anonymity score" based on k-anonymity counting per denomination tier. This requires client knowledge of which inputs/outputs belong to "you" - an external observer cannot compute it. Since am-i.exposed is an observer tool (not a wallet), this approach is not applicable as a primary metric. The anonymity-set heuristic already reports per-tier k-anonymity separately.

References

  • Gavenda et al., "Analysis of Input-Output Mappings in CoinJoin Transactions with Arbitrary Values" (ESORICS 2025, arXiv 2510.17284)
  • Wasabi Wallet Blog, "Anonymity Set vs Anonymity Score"
  • Maurer et al., "Anonymous CoinJoin Transactions with Arbitrary Values" (IEEE TrustCom 2017)
  • Ficsor et al., "WabiSabi: Centrally Coordinated CoinJoins with Variable Amounts" (2021, ePrint 2021/206)

Deterministic Links

A deterministic link exists when an output can only be funded by one specific input (or set of inputs). CoinJoin Sudoku (Kristov Atlas) showed that even in CoinJoin transactions, some input-output links may have probability 1.0, meaning the CoinJoin provides zero privacy for those specific participants.

Detection: If the link probability matrix has any entry LP = 1.0, that link is deterministic. Our current implementation does not compute the full link probability matrix, but the entropy calculation implicitly accounts for deterministic links (they reduce the number of valid interpretations).


Steganographic Transactions

LaurentMT's essay "Hell is Other People" introduced the concept of steganographic transactions - transactions designed to look like something they're not:

  • PayJoin (BIP78): Looks like a normal payment but the receiver contributes an input, breaking CIOH
  • Stonewall: Simulated 2-party CoinJoin from a single wallet (4 outputs: 2 equal + 2 change)
  • STONEWALLx2: Real collaborative CoinJoin with the Stonewall structure
  • Ricochet: Adds intermediate hops to increase the distance between CoinJoin and exchange deposit

These techniques exploit the fact that chain analysis relies on heuristics. If a transaction is designed to violate the assumptions of those heuristics, the analysis produces false results.


The Asymmetric Game

Privacy-Enhancing Technologies (PETs) face a harder challenge than chain analysts:

  • Analysts benefit from Bayesian updating: each new heuristic narrows the possibility space
  • Privacy tools must defeat all heuristics simultaneously - one slip undoes everything
  • A single participant in a CoinJoin with poor OPSEC can compromise other participants' privacy
  • Temporal analysis, network-level surveillance, and exchange KYC provide independent correlation channels

This asymmetry is why entropy alone is insufficient as a privacy metric. High entropy is necessary but not sufficient for privacy.


Wallet Fingerprinting History

Chain analysis evolved from naive CIOH clustering (2011-2013) through increasingly sophisticated techniques:

  1. 2011-2013: Reid & Harriman, Ron & Shamir - first academic address clustering via CIOH
  2. 2013: Maxwell proposes CoinJoin as a countermeasure
  3. 2015: LaurentMT defines the Boltzmann framework; Meiklejohn et al. scale clustering
  4. 2017: Wallet fingerprinting via nLockTime, nSequence emerges (Bitcoin Core's anti-fee-sniping)
  5. 2019: Low-R signature grinding (Bitcoin Core 0.17+) becomes a reliable fingerprint
  6. 2020s: ML-based clustering, BIP69 detection, Taproot adoption begins neutralizing fingerprints

Sources:

  • Spiral BTC, "The Scroll #3: A Brief History of Wallet Clustering"
  • OXT Research / ErgoBTC, "Understanding Bitcoin Privacy with OXT" (Parts 1-4)
  • 0xB10C, wallet fingerprinting empirical analysis

Credits

This implementation and research compilation would not be possible without:

  • LaurentMT - Creator of the Boltzmann entropy framework
  • Greg Maxwell - Inventor of CoinJoin (2013)
  • OXT Research / ErgoBTC - Educational series on Bitcoin privacy
  • Kristov Atlas - CoinJoin Sudoku research
  • Spiral BTC - Historical survey of chain analysis
  • privacidadbitcoin.com - Spanish-language Bitcoin privacy education

This document is part of the am-i.exposed project. See also: privacy-engine.md for the full heuristic reference, archive/README.md for archived research articles.