This document summarizes the foundational research behind our entropy implementation (H5) and provides a comprehensive reference for Bitcoin transaction privacy analysis.
Origin: LaurentMT (~2015), creator of the OXT.me Boltzmann tool.
The Boltzmann framework defines transaction entropy as:
E = log2(N)
where N is the number of valid interpretations (input-to-output mappings) of a transaction. Higher entropy means more ambiguity for an adversary attempting to trace fund flows.
Key concepts:
- Intrinsic entropy: The theoretical maximum entropy based on transaction structure alone
- Actual entropy: The effective entropy after accounting for deterministic links (where an output can only be funded by one specific input)
- Link probability matrix: For each (input, output) pair, the probability that the input funded that output, computed by summing over all valid interpretations
LaurentMT's original implementation enumerated all valid sub-mappings, computed the probability matrix, and derived Shannon entropy from it. This was computationally expensive but exact.
Source: LaurentMT, "Bitcoin Transactions & Privacy" (Parts 1-3), gist.github.com/LaurentMT
For equal-value CoinJoin transactions (where all n outputs have the same value), the number of valid interpretations can be computed exactly using integer partitions:
N = sum over all integer partitions (s1, s2, ..., sk) of n:
n!^2 / (prod(si!^2) * prod(mj!))
where:
(s1, s2, ..., sk)is a partition of n (the parts sum to n)mj= multiplicity of each distinct part size
The 7 integer partitions of 5:
| Partition | prod(si!^2) | Multiplicities | prod(mj!) | Term = 14400 / (prod * mult) |
|---|---|---|---|---|
| [5] | (5!)^2 = 14400 | {5:1} | 1! = 1 | 1 |
| [4,1] | (4!)^2 * (1!)^2 = 576 | {4:1, 1:1} | 1 | 25 |
| [3,2] | (3!)^2 * (2!)^2 = 144 | {3:1, 2:1} | 1 | 100 |
| [3,1,1] | (3!)^2 * (1!)^4 = 36 | {3:1, 1:2} | 2! = 2 | 200 |
| [2,2,1] | (2!)^4 * (1!)^2 = 16 | {2:2, 1:1} | 2! = 2 | 450 |
| [2,1,1,1] | (2!)^2 * (1!)^6 = 4 | {2:1, 1:3} | 3! = 6 | 600 |
| [1,1,1,1,1] | (1!)^10 = 1 | {1:5} | 5! = 120 | 120 |
n!^2 = (5!)^2 = 14400.
For partition [1,1,1,1,1] (5 parts of size 1):
- prod(si!^2) = (1!)^10 = 1
- Multiplicities: {1: 5}, so prod(mj!) = 5! = 120
- Term = 14400 / (1 * 120) = 120
For partition [2,1,1,1] (one part of 2, three parts of 1):
- prod(si!^2) = (2!)^2 * (1!)^6 = 4
- Multiplicities: {2: 1, 1: 3}, so prod(mj!) = 1! * 3! = 6
- Term = 14400 / (4 * 6) = 600
For partition [2,2,1]:
- prod(si!^2) = (2!)^4 * (1!)^2 = 16
- Multiplicities: {2: 2, 1: 1}, so prod(mj!) = 2! * 1! = 2
- Term = 14400 / (16 * 2) = 450
For partition [3,1,1]:
- prod(si!^2) = (3!)^2 * (1!)^4 = 36
- Multiplicities: {3: 1, 1: 2}, so prod(mj!) = 1! * 2! = 2
- Term = 14400 / (36 * 2) = 200
For partition [3,2]:
- prod(si!^2) = (3!)^2 * (2!)^2 = 144
- Multiplicities: {3: 1, 2: 1}, so prod(mj!) = 1
- Term = 14400 / 144 = 100
For partition [4,1]:
- prod(si!^2) = (4!)^2 * (1!)^2 = 576
- Multiplicities: {4: 1, 1: 1}, so prod(mj!) = 1
- Term = 14400 / 576 = 25
For partition [5]:
- prod(si!^2) = (5!)^2 = 14400
- Multiplicities: {5: 1}, so prod(mj!) = 1
- Term = 14400 / 14400 = 1
Total N = 120 + 600 + 450 + 200 + 100 + 25 + 1 = 1,496
| n | Interpretations (N) | Entropy E = log2(N) |
|---|---|---|
| 2 | 3 | 1.58 bits |
| 3 | 16 | 4.00 bits |
| 4 | 131 | 7.03 bits |
| 5 | 1,496 | 10.55 bits |
| 6 | 22,482 | 14.46 bits |
| 7 | 426,833 | 18.70 bits |
| 8 | 9,934,563 | 23.24 bits |
| 9 | ~277,006,192 | ~28.05 bits |
Note: The partition formula accounts for many-to-many mappings (one input funding multiple outputs). The classic permutation model (n!) only counts one-to-one mappings and significantly undercounts: e.g. for n=5, permutations give 120 vs. 1,496 from the partition formula.
Our implementation in entropy.ts uses boltzmannEqualOutputs(n) which generates all integer partitions of n and applies the formula above. This is exact for n <= 50 (at most ~204,226 partitions for n=50).
For n > 50, we fall back to an asymptotic approximation.
For mixed-value transactions (non-CoinJoin), we use assignment-based enumeration which is a lower bound but reasonable for typical transactions.
The Boltzmann partition formula assumes ALL outputs share one value. WabiSabi CoinJoins have 30+ denomination tiers (e.g., 20 outputs at 2M sats, 19 at 33M sats, 15 at 5M sats, etc.). Applying Boltzmann to just the largest tier ignores the other 29 tiers entirely.
Community feedback from Arkad (March 2026) correctly identified this flaw: per-UTXO entropy for WabiSabi was reported as lower than JoinMarket because only 20 of 279 outputs contributed to the entropy calculation.
The exact multi-denomination entropy computation is NP-hard (constrained subset sum problem). The definitive work is Gavenda et al., "Analysis of Input-Output Mappings in CoinJoin Transactions with Arbitrary Values" (ESORICS 2025, arXiv 2510.17284):
- Their approach uses constrained subset sum with "numeric mappings" (grouping by denomination class)
- Execution times on real Wasabi CoinJoins: 0.07 seconds to 4.4 hours
- Even their parallelized C++ implementation cannot handle median-sized CoinJoins
- Found 10-50% anonymity set reduction vs. naive independence assumptions
Client-side computation in a browser is out of the question.
A tempting approach: compute boltzmannEqualOutputs(k) for each denomination tier independently and sum the entropies. This is mathematically wrong for two reasons:
-
boltzmannEqualOutputs(k)assumes all k inputs have the same value as the outputs. In a multi-tier transaction, inputs have heterogeneous values. Most many-to-many flows that inflate the Boltzmann count beyond k! are physically impossible (an input of 50,000 sats cannot split to partially fund two 2,097,152-sat outputs). -
Entropy subadditivity: H(X,Y) <= H(X) + H(Y). Treating tiers as independent and summing their entropies gives an upper bound, not a lower bound. Cross-tier input sharing creates constraints that reduce joint entropy.
The overcount ratio of boltzmannEqualOutputs(k) vs k! grows explosively:
- k=5: 1,496 vs 120 (12.5x)
- k=10: ~9 billion vs ~3.6 million (2,504x)
- k=20: ~9.3 * 10^27 vs ~2.4 * 10^18 (~3.8 billion x)
The key insight: entropy should measure per-participant privacy, not total transaction ambiguity. For Whirlpool (single tier), Boltzmann already gives per-participant entropy because all participants share one pool. For WabiSabi (multiple tiers), we need an aggregate that answers the same question: "how much entropy does a typical participant get?"
Per-tier computation: Each denomination tier of k equal-valued outputs is treated as a mini-CoinJoin. Within a tier, k! permutations are valid (swapping equal-valued columns preserves the flow matrix), giving log2(k!) bits of per-tier entropy.
Eligible-input constraint: Only inputs with value >= the denomination can fund a tier one-to-one. The effective k is min(output_count, eligible_inputs).
Weighted average: The per-tier entropies are averaged, weighted by tier size (number of outputs). Larger tiers contain more participants, so they contribute proportionally more to the expected participant entropy.
For each tier i with k_i >= 2 equal outputs:
eligible_i = count of inputs with value >= denomination_i
effective_k_i = min(k_i, eligible_i)
tier_entropy_i = log2(effective_k_i!)
weighted_entropy = sum(k_i * tier_entropy_i) / sum(k_i)
Why not sum? Summing across tiers answers "how hard is it to deanonymize ALL participants simultaneously?" - a useless question. No single participant benefits from all 30 tiers. An adversary targets one participant, not the whole transaction.
Why not min? Taking the minimum (weakest tier) is too conservative. Small 2-output tiers may be coordinator change or edge cases. The weighted average represents expected privacy for a randomly chosen participant.
This is labeled "multi-tier permutation estimate" in the output.
Wasabi 2.0 replaced Boltzmann with a per-UTXO "anonymity score" based on k-anonymity counting per denomination tier. This requires client knowledge of which inputs/outputs belong to "you" - an external observer cannot compute it. Since am-i.exposed is an observer tool (not a wallet), this approach is not applicable as a primary metric. The anonymity-set heuristic already reports per-tier k-anonymity separately.
- Gavenda et al., "Analysis of Input-Output Mappings in CoinJoin Transactions with Arbitrary Values" (ESORICS 2025, arXiv 2510.17284)
- Wasabi Wallet Blog, "Anonymity Set vs Anonymity Score"
- Maurer et al., "Anonymous CoinJoin Transactions with Arbitrary Values" (IEEE TrustCom 2017)
- Ficsor et al., "WabiSabi: Centrally Coordinated CoinJoins with Variable Amounts" (2021, ePrint 2021/206)
A deterministic link exists when an output can only be funded by one specific input (or set of inputs). CoinJoin Sudoku (Kristov Atlas) showed that even in CoinJoin transactions, some input-output links may have probability 1.0, meaning the CoinJoin provides zero privacy for those specific participants.
Detection: If the link probability matrix has any entry LP = 1.0, that link is deterministic. Our current implementation does not compute the full link probability matrix, but the entropy calculation implicitly accounts for deterministic links (they reduce the number of valid interpretations).
LaurentMT's essay "Hell is Other People" introduced the concept of steganographic transactions - transactions designed to look like something they're not:
- PayJoin (BIP78): Looks like a normal payment but the receiver contributes an input, breaking CIOH
- Stonewall: Simulated 2-party CoinJoin from a single wallet (4 outputs: 2 equal + 2 change)
- STONEWALLx2: Real collaborative CoinJoin with the Stonewall structure
- Ricochet: Adds intermediate hops to increase the distance between CoinJoin and exchange deposit
These techniques exploit the fact that chain analysis relies on heuristics. If a transaction is designed to violate the assumptions of those heuristics, the analysis produces false results.
Privacy-Enhancing Technologies (PETs) face a harder challenge than chain analysts:
- Analysts benefit from Bayesian updating: each new heuristic narrows the possibility space
- Privacy tools must defeat all heuristics simultaneously - one slip undoes everything
- A single participant in a CoinJoin with poor OPSEC can compromise other participants' privacy
- Temporal analysis, network-level surveillance, and exchange KYC provide independent correlation channels
This asymmetry is why entropy alone is insufficient as a privacy metric. High entropy is necessary but not sufficient for privacy.
Chain analysis evolved from naive CIOH clustering (2011-2013) through increasingly sophisticated techniques:
- 2011-2013: Reid & Harriman, Ron & Shamir - first academic address clustering via CIOH
- 2013: Maxwell proposes CoinJoin as a countermeasure
- 2015: LaurentMT defines the Boltzmann framework; Meiklejohn et al. scale clustering
- 2017: Wallet fingerprinting via nLockTime, nSequence emerges (Bitcoin Core's anti-fee-sniping)
- 2019: Low-R signature grinding (Bitcoin Core 0.17+) becomes a reliable fingerprint
- 2020s: ML-based clustering, BIP69 detection, Taproot adoption begins neutralizing fingerprints
Sources:
- Spiral BTC, "The Scroll #3: A Brief History of Wallet Clustering"
- OXT Research / ErgoBTC, "Understanding Bitcoin Privacy with OXT" (Parts 1-4)
- 0xB10C, wallet fingerprinting empirical analysis
This implementation and research compilation would not be possible without:
- LaurentMT - Creator of the Boltzmann entropy framework
- Greg Maxwell - Inventor of CoinJoin (2013)
- OXT Research / ErgoBTC - Educational series on Bitcoin privacy
- Kristov Atlas - CoinJoin Sudoku research
- Spiral BTC - Historical survey of chain analysis
- privacidadbitcoin.com - Spanish-language Bitcoin privacy education
This document is part of the am-i.exposed project. See also: privacy-engine.md for the full heuristic reference, archive/README.md for archived research articles.