Skip to content

Commit b93d9ea

Browse files
committed
Add bip-0173
1 parent 9cafe77 commit b93d9ea

File tree

2 files changed

+369
-0
lines changed

2 files changed

+369
-0
lines changed

README.mediawiki

Lines changed: 7 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -694,6 +694,13 @@ Those proposing changes should consider that ultimately consent may rest with th
694694
| Standard
695695
| Draft
696696
|-
697+
| [[bip-0173.mediawiki|173]]
698+
| Applications
699+
| Base32 address format for native v0-16 witness outputs
700+
| Pieter Wuille, Greg Maxwell
701+
| Informational
702+
| Draft
703+
|-
697704
| [[bip-0180.mediawiki|180]]
698705
| Peer Services
699706
| Block size/weight fraud proof

bip-0173.mediawiki

Lines changed: 362 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,362 @@
1+
<pre>
2+
BIP: 173
3+
Layer: Applications
4+
Title: Base32 address format for native v0-16 witness outputs
5+
Author: Pieter Wuille <[email protected]>
6+
Greg Maxwell <[email protected]>
7+
Comments-Summary: No comments yet.
8+
Comments-URI: https://github.com/bitcoin/bips/wiki/Comments:BIP-0173
9+
Status: Draft
10+
Type: Informational
11+
Created: 2016-03-20
12+
License: BSD-2-Clause
13+
Replaces: 142
14+
</pre>
15+
16+
==Introduction==
17+
18+
===Abstract===
19+
20+
This document proposes a checksummed base32 format, "Bech32", and a standard for native segregated witness output addresses using it.
21+
22+
===Copyright===
23+
24+
This BIP is licensed under the 2-clause BSD license.
25+
26+
===Motivation===
27+
28+
For most of its history, Bitcoin has relied on base58 addresses with a
29+
truncated double-SHA256 checksum. They were part of the original
30+
software and their scope was extended in
31+
[https://github.com/bitcoin/bips/blob/master/bip-0013.mediawiki BIP13]
32+
for Pay-to-script-hash
33+
([https://github.com/bitcoin/bips/blob/master/bip-0016.mediawiki P2SH]).
34+
However, both the character set and the checksum algorithm have limitations:
35+
* Base58 needs a lot of space in QR codes, as it cannot use the ''alphanumeric mode''.
36+
* The mixed case in base58 makes it inconvenient to reliably write down, type on mobile keyboards, or read out loud.
37+
* The double SHA256 checksum is slow and has no error-detection guarantees.
38+
* Most of the research on error-detecting codes only applies to character-set sizes that are a [https://en.wikipedia.org/wiki/Prime_power prime power], which 58 is not.
39+
* Base58 decoding is complicated and relatively slow.
40+
41+
Included in the Segregated Witness proposal are a new class of outputs
42+
(witness programs, see
43+
[https://github.com/bitcoin/bips/blob/master/bip-0141.mediawiki BIP141]),
44+
and two instances of it ("P2WPKH" and "P2WSH", see
45+
[https://github.com/bitcoin/bips/blob/master/bip-0143.mediawiki BIP143]).
46+
Their functionality is available indirectly to older clients by embedding in P2SH
47+
outputs, but for optimal efficiency and security it is best to use it
48+
directly. In this document we propose a new address format for native
49+
witness outputs (current and future versions).
50+
51+
This replaces
52+
[https://github.com/bitcoin/bips/blob/master/bip-0142.mediawiki BIP142],
53+
and was previously discussed
54+
[https://bitcoincore.org/logs/2016-05-zurich-meeting-notes.html#base32 here] (summarized
55+
[https://bitcoincore.org/en/meetings/2016/05/20/#error-correcting-codes-for-future-address-types here]).
56+
57+
===Examples===
58+
59+
All examples use public key
60+
<tt>0279BE667EF9DCBBAC55A06295CE870B07029BFCDB2DCE28D959F2815B16F81798</tt>.
61+
The P2WSH examples use <tt>key OP_CHECKSIG</tt> as script.
62+
63+
* Mainnet P2WPKH: <tt>bc1qw508d6qejxtdg4y5r3zarvary0c5xw7kv8f3t4</tt>
64+
* Testnet P2WPKH: <tt>tb1qw508d6qejxtdg4y5r3zarvary0c5xw7kxpjzsx</tt>
65+
* Mainnet P2WSH: <tt>bc1qrp33g0q5c5txsp9arysrx4k6zdkfs4nce4xj0gdcccefvpysxf3qccfmv3</tt>
66+
* Testnet P2WSH: <tt>tb1qrp33g0q5c5txsp9arysrx4k6zdkfs4nce4xj0gdcccefvpysxf3q0sl5k7</tt>
67+
68+
==Specification==
69+
70+
We first describe the general checksummed base32<ref>'''Why use base32 at all?''' The lack of mixed case makes it more
71+
efficient to read out loud or to put into QR codes. It does come with a 15% length
72+
increase, but that does not matter when copy-pasting addresses.</ref> format called
73+
''Bech32'' and then define Segregated Witness addresses using it.
74+
75+
===Bech32===
76+
77+
A Bech32<ref>'''Why call it Bech32?''' "Bech" contains the characters BCH (the error
78+
detection algorithm used) and sounds a bit like "base".</ref> string is at most 90 characters long and consists of:
79+
* The '''human-readable part''', which is intended to convey the type of data or anything else that is relevant for the reader. Its validity (including the used set of characters) is application specific, but restricted to ASCII characters with values in the range 33-126.
80+
* The '''separator''', which is always "1". In case "1" is allowed inside the human-readable part, the last one in the string is the separator<ref>'''Why include a separator in addresses?''' That way the human-readable
81+
part is unambiguously separated from the data part, avoiding potential
82+
collisions with other human-readable parts that share a prefix. It also
83+
allows us to avoid having character-set restrictions on the human-readable part. The
84+
separator is ''1'' because using a non-alphanumeric character would
85+
complicate copy-pasting of addresses (with no double-click selection in
86+
several applications). Therefore an alphanumeric character outside the normal character set
87+
was chosen.</ref>.
88+
* The '''data part''', which is at least 6 characters long and only consists of alphanumeric characters excluding "1", "b", "i", and "o"<ref>'''Why not use an existing character set like [http://www.faqs.org/rfcs/rfc3548.html RFC3548] or [https://philzimmermann.com/docs/human-oriented-base-32-encoding.txt z-base-32]'''?
89+
The character set is chosen to minimize ambiguity according to
90+
[https://hissa.nist.gov/~black/GTLD/ this] visual similarity data, and
91+
the ordering is chosen to minimize the number of pairs of similar
92+
characters (according to the same data) that differ in more than 1 bit.
93+
As the checksum is chosen to maximize detection capabilities for low
94+
numbers of bit errors, this choice improves its performance under some
95+
error models.</ref>.
96+
97+
98+
{| class="wikitable"
99+
|-
100+
!
101+
!0
102+
!1
103+
!2
104+
!3
105+
!4
106+
!5
107+
!6
108+
!7
109+
|-
110+
!+0
111+
|q||p||z||r||y||9||x||8
112+
|-
113+
!+8
114+
|g||f||2||t||v||d||w||0
115+
|-
116+
!+16
117+
|s||3||j||n||5||4||k||h
118+
|-
119+
!+24
120+
|c||e||6||m||u||a||7||l
121+
|}
122+
123+
124+
'''Checksum'''
125+
126+
The last six characters of the data part form a checksum and contain no
127+
information. Valid strings MUST pass the criteria for validity specified
128+
by the Python3 code snippet below. The function
129+
<tt>bech32_verify_checksum</tt> must return true when its arguments are:
130+
* <tt>hrp</tt>: the human-readable part as a string
131+
* <tt>data</tt>: the data part as a list of integers representing the characters after conversion using the table above
132+
133+
<pre>
134+
def bech32_polymod(values):
135+
GEN = [0x3b6a57b2, 0x26508e6d, 0x1ea119fa, 0x3d4233dd, 0x2a1462b3]
136+
chk = 1
137+
for v in values:
138+
b = (chk >> 25)
139+
chk = (chk & 0x1ffffff) << 5 ^ v
140+
for i in range(5):
141+
chk ^= GEN[i] if ((b >> i) & 1) else 0
142+
return chk
143+
144+
def bech32_hrp_expand(s):
145+
return [ord(x) >> 5 for x in s] + [0] + [ord(x) & 31 for x in s]
146+
147+
def bech32_verify_checksum(hrp, data):
148+
return bech32_polymod(bech32_hrp_expand(hrp) + data) == 1
149+
</pre>
150+
151+
This implements a [https://en.wikipedia.org/wiki/BCH_code BCH code] that
152+
guarantees detection of '''any error affecting at most 4 characters'''
153+
and has less than a 1 in 10<sup>9</sup> chance of failing to detect more
154+
errors. More details about the properties can be found in the
155+
Checksum Design appendix. The human-readable part is processed by first
156+
feeding the higher bits of each character's ASCII value into the
157+
checksum calculation followed by a zero and then the lower bits of each<ref>'''Why are the high bits of the human-readable part processed first?'''
158+
This results in the actually checksummed data being ''[high hrp] 0 [low hrp] [data]''. This means that under the assumption that errors to the
159+
human readable part only change the low 5 bits (like changing an alphabetical character into another), errors are restricted to the ''[low hrp] [data]''
160+
part, which is at most 89 characters, and thus all error detection properties (see appendix) remain applicable.</ref>.
161+
162+
To construct a valid checksum given the human-readable part and (non-checksum) values of the data-part characters, the code below can be used:
163+
164+
<pre>
165+
def bech32_create_checksum(hrp, data):
166+
values = bech32_hrp_expand(hrp) + data
167+
polymod = bech32_polymod(values + [0,0,0,0,0,0]) ^ 1
168+
return [(polymod >> 5 * (5 - i)) & 31 for i in range(6)]
169+
</pre>
170+
171+
'''Error correction'''
172+
173+
One of the properties of these BCH codes is that they can be used for
174+
error correction. An unfortunate side effect of error correction is that
175+
it erodes error detection: correction changes invalid inputs into valid
176+
inputs, but if more than a few errors were made then the valid input may
177+
not be the correct input. Use of an incorrect but valid input can cause
178+
funds to be lost irrecoverably. Because of this, implementations SHOULD
179+
NOT implement correction beyond potentially suggesting to the user where
180+
in the string an error might be found, without suggesting the correction
181+
to make.
182+
183+
'''Uppercase/lowercase'''
184+
185+
Decoders MUST accept both uppercase and lowercase strings, but
186+
not mixed case. The lowercase form is used when determining a character's
187+
value for checksum purposes. For presentation, lowercase is usually
188+
preferable, but inside QR codes uppercase SHOULD be used, as those permit
189+
the use of
190+
''[http://www.thonky.com/qr-code-tutorial/alphanumeric-mode-encoding alphanumeric mode]'', which is 45% more compact than the normal
191+
''[http://www.thonky.com/qr-code-tutorial/byte-mode-encoding byte mode]''.
192+
193+
===Segwit address format===
194+
195+
A segwit address<ref>'''Why not make an address format that is generic for all scriptPubKeys?'''
196+
That would lead to confusion about addresses for
197+
existing scriptPubKey types. Furthermore, if addresses that do not have a one-to-one mapping with scriptPubKeys (such as ECDH-based
198+
addresses) are ever introduced, having a fully generic old address type available would
199+
permit reinterpreting the resulting scriptPubKeys using the old address
200+
format, with lost funds as a result if bitcoins are sent to them.</ref> is a Bech32 encoding of:
201+
202+
* The human-readable part "bc"<ref>'''Why use 'bc' as human-readable part and not 'btc'?''' 'bc' is shorter.</ref> for mainnet, and "tb"<ref>'''Why use 'tb' as human-readable part for testnet?''' It was chosen to
203+
be of the same length as the mainnet counterpart (to simplify
204+
implementations' assumptions about lengths), but still be visually
205+
distinct.</ref> for testnet.
206+
* The data-part values:
207+
** 1 value: the witness version
208+
** A conversion of the the 2-to-40-byte witness program (as defined by [https://github.com/bitcoin/bips/blob/master/bip-0141.mediawiki BIP141]) to base32:
209+
*** Start with the bits of the witness program, most significant bit per byte first.
210+
*** Re-arrange those bits into groups of 5, and pad with zeroes at the end if needed.
211+
*** Translate those bits to characters using the table above.
212+
213+
'''Decoding'''
214+
215+
Software interpreting a segwit address:
216+
* MUST verify that the human-readable part is "bc" for mainnet and "tb" for testnet.
217+
* MUST verify that the first decoded data value (the witness version) is between 0 and 16, inclusive.
218+
* Convert the rest of the data to bytes:
219+
** Translate the values to 5 bits, most significant bit first.
220+
** Re-arrange those bits into groups of 8 bits. Any incomplete group at the end MUST be 4 bits or less, MUST be all zeroes, and is discarded.
221+
** There MUST be between 2 and 40 groups, which are interpreted as the bytes of the witness program.
222+
223+
Decoders SHOULD enforce known-length restrictions on witness programs.
224+
For example, BIP141 specifies ''If the version byte is 0, but the witness
225+
program is neither 20 nor 32 bytes, the script must fail.''
226+
227+
As a result of the previous rules, addresses are always between 14 and 74 characters long, and their length modulo 8 cannot be 0, 3, or 5.
228+
Version 0 witness addresses are always 42 or 62 characters, but implementations MUST allow the use of any version.
229+
230+
===Compatibility===
231+
232+
Only new software will be able to use these addresses, and only for
233+
receivers with segwit-enabled new software. In all other cases, P2SH or
234+
P2PKH addresses can be used.
235+
236+
==Rationale==
237+
238+
<references />
239+
240+
==Reference implementations==
241+
242+
* Reference encoder and decoder:
243+
** [https://github.com/sipa/bech32/tree/master/ref/c For C]
244+
** [https://github.com/sipa/bech32/tree/master/ref/javascript For JavaScript]
245+
** [https://github.com/sipa/bech32/tree/master/ref/python For Python]
246+
247+
* Fancy decoder that localizes errors:
248+
** [https://github.com/sipa/bech32/tree/master/ecc/javascript For JavaScript] ([http://bitcoin.sipa.be/bech32/demo/demo.html demo website])
249+
250+
==Appendices==
251+
252+
===Test vectors===
253+
254+
The following strings have a valid Bech32 checksum.
255+
* <tt>A12UEL5L</tt>
256+
* <tt>an83characterlonghumanreadablepartthatcontainsthenumber1andtheexcludedcharactersbio1tt5tgs</tt>
257+
* <tt>abcdef1qpzry9x8gf2tvdw0s3jn54khce6mua7lmqqqxw</tt>
258+
* <tt>11qqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqc8247j</tt>
259+
* <tt>split1checkupstagehandshakeupstreamerranterredcaperred2y9e3w</tt>
260+
261+
The following list gives valid segwit addresses and the scriptPubKey that they
262+
translate to in hex.
263+
* <tt>BC1QW508D6QEJXTDG4Y5R3ZARVARY0C5XW7KV8F3T4</tt>: <tt>0014751e76e8199196d454941c45d1b3a323f1433bd6</tt>
264+
* <tt>tb1qrp33g0q5c5txsp9arysrx4k6zdkfs4nce4xj0gdcccefvpysxf3q0sl5k7</tt>: <tt>00201863143c14c5166804bd19203356da136c985678cd4d27a1b8c6329604903262</tt>
265+
* <tt>bc1pw508d6qejxtdg4y5r3zarvary0c5xw7kw508d6qejxtdg4y5r3zarvary0c5xw7k7grplx</tt>: <tt>8128751e76e8199196d454941c45d1b3a323f1433bd6751e76e8199196d454941c45d1b3a323f1433bd6</tt>
266+
* <tt>BC1SW50QA3JX3S</tt>: <tt>9002751e</tt>
267+
* <tt>bc1zw508d6qejxtdg4y5r3zarvaryvg6kdaj</tt>: <tt>8210751e76e8199196d454941c45d1b3a323</tt>
268+
* <tt>tb1qqqqqp399et2xygdj5xreqhjjvcmzhxw4aywxecjdzew6hylgvsesrxh6hy</tt>: <tt>0020000000c4a5cad46221b2a187905e5266362b99d5e91c6ce24d165dab93e86433</tt>
269+
270+
The following list gives invalid segwit addresses and the reason for
271+
their invalidity.
272+
* <tt>tc1qw508d6qejxtdg4y5r3zarvary0c5xw7kg3g4ty</tt>: Invalid human-readable part
273+
* <tt>bc1qw508d6qejxtdg4y5r3zarvary0c5xw7kv8f3t5</tt>: Invalid checksum
274+
* <tt>BC13W508D6QEJXTDG4Y5R3ZARVARY0C5XW7KN40WF2</tt>: Invalid witness version
275+
* <tt>bc1rw5uspcuh</tt>: Invalid program length
276+
* <tt>bc10w508d6qejxtdg4y5r3zarvary0c5xw7kw508d6qejxtdg4y5r3zarvary0c5xw7kw5rljs90</tt>: Invalid program length
277+
* <tt>BC1QR508D6QEJXTDG4Y5R3ZARVARYV98GJ9P</tt>: Invalid program length for witness version 0 (per BIP141)
278+
* <tt>tb1qrp33g0q5c5txsp9arysrx4k6zdkfs4nce4xj0gdcccefvpysxf3q0sL5k7</tt>: Mixed case
279+
* <tt>tb1pw508d6qejxtdg4y5r3zarqfsj6c3</tt>: zero padding of more than 4 bits
280+
* <tt>tb1qrp33g0q5c5txsp9arysrx4k6zdkfs4nce4xj0gdcccefvpysxf3pjxtptv</tt>: Non-zero padding in 8-to-5 conversion
281+
282+
===Checksum design===
283+
284+
'''Design choices'''
285+
286+
BCH codes can be constructed over any prime-power alphabet and can be chosen to have a good trade-off between
287+
size and error-detection capabilities. While most work around BCH codes uses a binary alphabet, that is not a requirement.
288+
This makes them more appropriate for our use case than [https://en.wikipedia.org/wiki/Cyclic_redundancy_check CRC codes]. Unlike
289+
[https://en.wikipedia.org/wiki/Reed%E2%80%93Solomon_error_correction Reed-Solomon codes],
290+
they are not restricted in length to one less than the alphabet size. While they also support efficient error correction,
291+
the implementation of just error detection is very simple.
292+
293+
We pick 6 checksum characters as a trade-off between length of the addresses and the error-detection capabilities, as 6
294+
characters is the lowest number sufficient for a random failure chance below 1 per billion. For the length of data
295+
we're interested in protecting (up to 71 bytes for a potential future 40-byte witness
296+
program), BCH codes can be constructed that guarantee detecting up to 4 errors.
297+
298+
'''Selected properties'''
299+
300+
Many of these codes perform badly when dealing with more errors than they are designed to detect, but not all.
301+
For that reason, we consider codes that are designed to detect only 3 errors as well as 4 errors,
302+
and analyse how well they perform in practice.
303+
304+
The specific code chosen here is the result
305+
of:
306+
* Starting with an exhaustive list of 159605 BCH codes designed to detect 3 or 4 errors up to length 93, 151, 165, 341, 1023, and 1057.
307+
* From those, requiring the detection of 4 errors up to length 71, resulting in 28825 remaining codes.
308+
* From those, choosing the codes with the best worst-case window for 5-character errors, resulting in 310 remaining codes.
309+
* From those, picking the code with the lowest chance for not detecting small numbers of ''bit'' errors.
310+
311+
As a naive search would require over 6.5 * 10<sup>19</sup> checksum evaluations, a collision-search approach was used for
312+
analysis. The code can be found [https://github.com/sipa/ezbase32/ here].
313+
314+
'''Properties'''
315+
316+
The following table summarizes the chances for detection failure (as
317+
multiples of 1 in 10<sup>9</sup>).
318+
319+
{| class="wikitable"
320+
|-
321+
!colspan="2" | Window length
322+
!colspan="6" | Number of wrong characters
323+
|-
324+
!Length
325+
!Description
326+
!≤4
327+
!5
328+
!6
329+
!7
330+
!8
331+
!≥9
332+
|-
333+
| 8 || Longest detecting 6 errors || colspan="3" | 0 || 1.127 || 0.909 || n/a
334+
|-
335+
| 18 || Longest detecting 5 errors || colspan="2" | 0 || 0.965 || 0.929 || 0.932 || 0.931
336+
|-
337+
| 19 || Worst case for 6 errors || 0 || 0.093 || 0.972 || 0.928 || colspan="2" | 0.931
338+
|-
339+
| 39 || Length for a P2WPKH address || 0 || 0.756 || 0.935 || 0.932 || colspan="2" | 0.931
340+
|-
341+
| 59 || Length for a P2WSH address || 0 || 0.805 || 0.933 || colspan="3" | 0.931
342+
|-
343+
| 71 || Length for a 40-byte program address || 0 || 0.830 || 0.934 || colspan="3" | 0.931
344+
|-
345+
| 89 || Longest detecting 4 errors || 0 || 0.867 || 0.933 || colspan="3" | 0.931
346+
|}
347+
This means that when 5 changed characters occur randomly distributed in
348+
the 39 characters of a P2WPKH address, there is a chance of
349+
''0.756 per billion'' that it will go undetected. When those 5 changes
350+
occur randomly within a 19-character window, that chance goes down to
351+
''0.093 per billion''. As the number of errors goes up, the chance
352+
converges towards ''1 in 2<sup>30</sup>'' = ''0.931 per billion''.
353+
354+
Even though the chosen code performs reasonably well up to 1023 characters,
355+
other designs are preferable for lengths above 89 characters (excluding the
356+
separator).
357+
358+
==Acknowledgements==
359+
360+
This document is inspired by the [https://rusty.ozlabs.org/?p=578 address proposal] by Rusty Russell, the
361+
[https://lists.linuxfoundation.org/pipermail/bitcoin-dev/2014-February/004402.html base32] proposal by Mark Friedenbach, and had input from Luke Dashjr,
362+
Johnson Lau, Eric Lombrozo, Peter Todd, and various other reviewers.

0 commit comments

Comments
 (0)