Skip to content

Commit 1cafbd1

Browse files
authored
Merge pull request bitcoin#1143 from achow101/descriptors
[BIPs 380-386] Output Script Descriptors
2 parents 8a050ec + 761ef12 commit 1cafbd1

8 files changed

+763
-0
lines changed

README.mediawiki

Lines changed: 49 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -1050,6 +1050,55 @@ Those proposing changes should consider that ultimately consent may rest with th
10501050
| Andrew Chow
10511051
| Standard
10521052
| Draft
1053+
|-
1054+
| [[bip-0380.mediawiki|380]]
1055+
| Applications
1056+
| Output Script Descriptors General Operation
1057+
| Pieter Wuille, Andrew Chow
1058+
| Informational
1059+
| Draft
1060+
|-
1061+
| [[bip-0381.mediawiki|381]]
1062+
| Applications
1063+
| Non-Segwit Output Script Descriptors
1064+
| Pieter Wuille, Andrew Chow
1065+
| Informational
1066+
| Draft
1067+
|-
1068+
| [[bip-0382.mediawiki|382]]
1069+
| Applications
1070+
| Segwit Output Script Descriptors
1071+
| Pieter Wuille, Andrew Chow
1072+
| Informational
1073+
| Draft
1074+
|-
1075+
| [[bip-0383.mediawiki|383]]
1076+
| Applications
1077+
| Multisig Output Script Descriptors
1078+
| Pieter Wuille, Andrew Chow
1079+
| Informational
1080+
| Draft
1081+
|-
1082+
| [[bip-0384.mediawiki|384]]
1083+
| Applications
1084+
| combo() Output Script Descriptors
1085+
| Pieter Wuille, Andrew Chow
1086+
| Informational
1087+
| Draft
1088+
|-
1089+
| [[bip-0385.mediawiki|385]]
1090+
| Applications
1091+
| raw() and addr() Output Script Descriptors
1092+
| Pieter Wuille, Andrew Chow
1093+
| Informational
1094+
| Draft
1095+
|-
1096+
| [[bip-0386.mediawiki|386]]
1097+
| Applications
1098+
| tr() Output Script Descriptors
1099+
| Pieter Wuille, Andrew Chow
1100+
| Informational
1101+
| Draft
10531102
|}
10541103

10551104
<!-- IMPORTANT! See the instructions at the top of this page, do NOT JUST add BIPs here! -->

bip-0380.mediawiki

Lines changed: 277 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,277 @@
1+
<pre>
2+
BIP: 380
3+
Layer: Applications
4+
Title: Output Script Descriptors General Operation
5+
Author: Pieter Wuille <[email protected]>
6+
Andrew Chow <[email protected]>
7+
Comments-Summary: No comments yet.
8+
Comments-URI: https://github.com/bitcoin/bips/wiki/Comments:BIP-0380
9+
Status: Draft
10+
Type: Informational
11+
Created: 2021-06-27
12+
License: BSD-2-Clause
13+
</pre>
14+
15+
==Abstract==
16+
17+
Output Script Descriptors are a simple language which can be used to describe collections of output scripts.
18+
There can be many different descriptor fragments and functions.
19+
This document describes the general syntax for descriptors, descriptor checksums, and common expressions.
20+
21+
==Copyright==
22+
23+
This BIP is licensed under the BSD 2-clause license.
24+
25+
==Motivation==
26+
27+
Bitcoin wallets traditionally have stored a set of keys which are later serialized and mutated to produce the output scripts that the wallet watches and the addresses it provides to users.
28+
Typically backups have consisted of solely the private keys, nowadays primarily in the form of BIP 39 mnemonics.
29+
However this backup solution is insuffient, especially since the introduction of Segregated Witness which added new output types.
30+
Given just the private keys, it is not possible for restored wallets to know which kinds of output scripts and addresses to produce.
31+
This has lead to incompatibilities between wallets when restoring a backup or exporting data for a watch only wallet.
32+
33+
Further complicating matters are BIP 32 derivation paths.
34+
Although BIPs 44, 49, and 84 have specified standard BIP 32 derivation paths for different output scripts and addresses, not all wallets support them nor use those derivation paths.
35+
The lack of derivation path information in these backups and exports leads to further incompatibilities between wallets.
36+
37+
Current solutions to these issues have not been generic and can be viewed as being layer violations.
38+
Solutions such as introducing different version bytes for extended key serialization both are a layer violation (key derivation should be separate from script type meaning) and specific only to a particular derivation path and script type.
39+
40+
Output Script Descriptors introduces a generic solution to these issues.
41+
Script types are specified explicitly through the use of Script Expressions.
42+
Key derivation paths are specified explicitly in Key Expressions.
43+
These allow for creating wallet backups and exports which specify the exact scripts, subscripts (redeemScript, witnessScript, etc.), and keys to produce.
44+
With the general structure specified in this BIP, new Script Expressions can be introduced as new script types are added.
45+
Lastly, the use of common terminology and existing standards allow for Output Script Descriptors to be engineer readable so that the results can be understood at a glance.
46+
47+
==Specification==
48+
49+
Descriptors consist of several types of expressions.
50+
The top level expression is a <tt>SCRIPT</tt>.
51+
This expression may be followed by <tt>#CHECKSUM</tt>, where <tt>CHECKSUM</tt> is an 8 character alphanumeric descriptor checksum.
52+
53+
===Script Expressions===
54+
55+
Script Expressions (denoted <tt>SCRIPT</tt>) are expressions which correspond directly with a Bitcoin script.
56+
These expressions are written as functions and take arguments.
57+
Such expressions have a script template which is filled with the arguments correspondingly.
58+
Expressions are written with a human readable identifier string with the arguments enclosed with parentheses.
59+
The identifier string should be alphanumeric and may include underscores.
60+
61+
The arguments to a script expression are defined by that expression itself.
62+
They could be a script expression, a key expression, or some other expression entirely.
63+
64+
===Key Expressions===
65+
66+
A common expression used as an argument to script expressions are key expressions (denoted <tt>KEY</tt>).
67+
These represent a public or private key and, optionally, information about the origin of that key.
68+
Key expressions can only be used as arguments to script expressions.
69+
70+
Key expressions consist of:
71+
* Optionally, key origin information, consisting of:
72+
** An open bracket <tt>[</tt>
73+
** Exactly 8 hex characters for the fingerprint of the key where the derivation starts (see BIP 32 for details)
74+
** Followed by zero or more <tt>/NUM</tt> or <tt>/NUMh</tt> path elements to indicate the unhardened or hardened derivation steps between the fingerprint and the key that follows.
75+
** A closing bracket <tt>]</tt>
76+
* Followed by the actual key, which is either:
77+
** A hex encoded public key, which depending the script expression, may be either:
78+
*** 66 hex character string beginning with <tt>02</tt> or <tt>03</tt> representing a compressed public key
79+
*** 130 hex character string beginning with <tt>04</tt> representing an uncompressed public key
80+
** A [[https://en.bitcoin.it/wiki/Wallet_import_format|WIF]] encoded private key
81+
** <tt>xpub</tt> encoded extended public key or <tt>xprv</tt> encoded extended private key (as defined in BIP 32)
82+
*** Followed by zero or more <tt>/NUM</tt> or <tt>/NUMh</tt> path elements indicating BIP 32 derivation steps to be taken after the given extended key.
83+
*** Optionally followed by a single <tt>/*</tt> or <tt>/*h</tt> final step to denote all direct unhardened or hardened children.
84+
85+
If the <tt>KEY</tt> is a BIP 32 extended key, before output scripts can be created, child keys must be derived using the derivation information that follows the extended key.
86+
When the final step is <tt>/*</tt> or <tt>/*'</tt>, an output script will be produced for every child key index.
87+
The derived key must be not be serialized as an uncompressed public key.
88+
Script Expressions may have further requirements on how derived public keys are serialized for script creation.
89+
90+
In the above specification, the hardened indicator <tt>h</tt> may be replaced with alternative hardened indicators of <tt>H</tt> or <tt>'</tt>.
91+
92+
====Normalization of Key Expressions with Hardened Derivation====
93+
94+
When a descriptor is exported without private keys, it is necessary to do additional derivation to remove any intermediate hardened derivation steps for the exported descriptor to be useful.
95+
The exporter should derive the extended public key at the last hardened derivation step and use that extended public key as the key in the descriptor.
96+
The derivation steps that were taken to get to that key must be added to the previous key origin information.
97+
If there is no key origin information, then one must be added for the newly derived extended public key.
98+
If the final derivation is hardened, then it is not necessary to do additional derivation.
99+
100+
===Character Set===
101+
102+
The expressions used in descriptors must only contain characters within this character set so that the descriptor checksum will work.
103+
104+
The allowed characters are:
105+
<pre>
106+
0123456789()[],'/*abcdefgh@:$%{}
107+
IJKLMNOPQRSTUVWXYZ&+-.;<=>?!^_|~
108+
ijklmnopqrstuvwxyzABCDEFGH`#"\<space>
109+
</pre>
110+
Note that <tt><space></tt> on the last line is a space character.
111+
112+
This character set is written as 3 groups of 32 characters in this specific order so that the checksum below can identify more errors.
113+
The first group are the most common "unprotected" characters (i.e. things such as hex and keypaths that do not already have their own checksums).
114+
Case errors cause an offset that is a multiple of 32 while as many alphabetic characters are in the same group while following the previous restrictions.
115+
116+
===Checksum===
117+
118+
Following the top level script expression is a single octothorpe (<tt>#</tt>) followed by the 8 character checksum.
119+
The checksum is an error correcting checksum similar to bech32.
120+
121+
The checksum has the following properties:
122+
* Mistakes in a descriptor string are measured in "symbol errors". The higher the number of symbol errors, the harder it is to detect:
123+
** An error substituting a character from <tt>0123456789()[],'/*abcdefgh@:$%{}</tt> for another in that set always counts as 1 symbol error.
124+
*** Note that hex encoded keys are covered by these characters. Extended keys (<tt>xpub</tt> and <tt>xprv</tt>) use other characters too, but also have their own checksum mechanism.
125+
*** <tt>SCRIPT</tt> expression function names use other characters, but mistakes in these would generally result in an unparsable descriptor.
126+
** A case error always counts as 1 symbol error.
127+
** Any other 1 character substitution error counts as 1 or 2 symbol errors.
128+
* Any 1 symbol error is always detected.
129+
* Any 2 or 3 symbol error in a descriptor of up to 49154 characters is always detected.
130+
* Any 4 symbol error in a descriptor of up to 507 characters is always detected.
131+
* Any 5 symbol error in a descriptor of up to 77 characters is always detected.
132+
* Is optimized to minimize the chance of a 5 symbol error in a descriptor up to 387 characters is undetected
133+
* Random errors have a chance of 1 in 2<super>40</super> of being undetected.
134+
135+
The checksum itself uses the same character set as bech32: <tt>qpzry9x8gf2tvdw0s3jn54khce6mua7l</tt>
136+
137+
Valid descriptor strings with a checksum must pass the criteria for validity specified by the Python3 code snippet below.
138+
The function <tt>descsum_check</tt> must return true when its argument <tt>s</tt> is a descriptor consisting in the form <tt>SCRIPT#CHECKSUM</tt>.
139+
140+
<pre>
141+
INPUT_CHARSET = "0123456789()[],'/*abcdefgh@:$%{}IJKLMNOPQRSTUVWXYZ&+-.;<=>?!^_|~ijklmnopqrstuvwxyzABCDEFGH`#\"\\ "
142+
CHECKSUM_CHARSET = "qpzry9x8gf2tvdw0s3jn54khce6mua7l"
143+
GENERATOR = [0xf5dee51989, 0xa9fdca3312, 0x1bab10e32d, 0x3706b1677a, 0x644d626ffd]
144+
145+
def descsum_polymod(symbols):
146+
"""Internal function that computes the descriptor checksum."""
147+
chk = 1
148+
for value in symbols:
149+
top = chk >> 35
150+
chk = (chk & 0x7ffffffff) << 5 ^ value
151+
for i in range(5):
152+
chk ^= GENERATOR[i] if ((top >> i) & 1) else 0
153+
return chk
154+
155+
def descsum_expand(s):
156+
"""Internal function that does the character to symbol expansion"""
157+
groups = []
158+
symbols = []
159+
for c in s:
160+
if not c in INPUT_CHARSET:
161+
return None
162+
v = INPUT_CHARSET.find(c)
163+
symbols.append(v & 31)
164+
groups.append(v >> 5)
165+
if len(groups) == 3:
166+
symbols.append(groups[0] * 9 + groups[1] * 3 + groups[2])
167+
groups = []
168+
if len(groups) == 1:
169+
symbols.append(groups[0])
170+
elif len(groups) == 2:
171+
symbols.append(groups[0] * 3 + groups[1])
172+
return symbols
173+
174+
def descsum_check(s):
175+
"""Verify that the checksum is correct in a descriptor"""
176+
if s[-9] != '#':
177+
return False
178+
if not all(x in CHECKSUM_CHARSET for x in s[-8:]):
179+
return False
180+
symbols = descsum_expand(s[:-9]) + [CHECKSUM_CHARSET.find(x) for x in s[-8:]]
181+
return descsum_polymod(symbols) == 1
182+
</pre>
183+
184+
This implements a BCH code that has the properties described above.
185+
The entire descriptor string is first processed into an array of symbols.
186+
The symbol for each character is its position within its group.
187+
After every 3rd symbol, a 4th symbol is inserted which represents the group numbers combined together.
188+
This means that a change that only affects the position within a group, or only a group number change, will only affect a single symbol.
189+
190+
To construct a valid checksum given a script expression, the code below can be used:
191+
192+
<pre>
193+
def descsum_create(s):
194+
"""Add a checksum to a descriptor without"""
195+
symbols = descsum_expand(s) + [0, 0, 0, 0, 0, 0, 0, 0]
196+
checksum = descsum_polymod(symbols) ^ 1
197+
return s + '#' + ''.join(CHECKSUM_CHARSET[(checksum >> (5 * (7 - i))) & 31] for i in range(8))
198+
199+
</pre>
200+
201+
==Backwards Compatibility==
202+
203+
Output script descriptors are an entirely new language which is not compatible with any existing software.
204+
However many components of the expressions reuse encodings and serializations defined by previous BIPs.
205+
206+
Output script descriptors are designed for future extension with further fragment types and new script expressions.
207+
These will be specified in additional BIPs.
208+
209+
==Reference Implemntation==
210+
211+
Descriptors have been implemented in Bitcoin Core since version 0.17.
212+
213+
==Appendix A: Index of Expressions==
214+
215+
Future BIPs may specify additional types of expressions.
216+
All available expression types are listed in this table.
217+
218+
{|
219+
! Name
220+
! Denoted As
221+
! BIP
222+
|-
223+
| Script
224+
| <tt>SCRIPT</tt>
225+
| 380
226+
|-
227+
| Key
228+
| <tt>KEY</tt>
229+
| 380
230+
|-
231+
| Tree
232+
| <tt>TREE</tt>
233+
| [[bip-0386.mediawiki|386]]
234+
|}
235+
236+
==Appendix B: Index of Script Expressions==
237+
238+
Script expressions will be specified in additional BIPs.
239+
This Table lists all available Script expressions and the BIPs specifying them.
240+
241+
{|
242+
! Expression
243+
! BIP
244+
|-
245+
| <tt>pk(KEY)</tt>
246+
| [[bip-0381.mediawiki|381]]
247+
|-
248+
| <tt>pkh(KEY)</tt>
249+
| [[bip-0381.mediawiki|381]]
250+
|-
251+
| <tt>sh(SCRIPT)</tt>
252+
| [[bip-0381.mediawiki|381]]
253+
|-
254+
| <tt>wpkh(KEY)</tt>
255+
| [[bip-0382.mediawiki|382]]
256+
|-
257+
| <tt>wsh(SCRIPT)</tt>
258+
| [[bip-0382.mediawiki|382]]
259+
|-
260+
| <tt>multi(NUM, KEY, ..., KEY)</tt>
261+
| [[bip-0383.mediawiki|383]]
262+
|-
263+
| <tt>sortedmulti(NUM, KEY, ..., KEY)</tt>
264+
| [[bip-0383.mediawiki|383]]
265+
|-
266+
| <tt>combo(KEY)</tt>
267+
| [[bip-0384.mediawiki|384]]
268+
|-
269+
| <tt>raw(HEX)</tt>
270+
| [[bip-0385.mediawiki|385]]
271+
|-
272+
| <tt>addr(ADDR)</tt>
273+
| [[bip-0385.mediawiki|385]]
274+
|-
275+
| <tt>tr(KEY)</tt>, <tt>tr(KEY, TREE)</tt>
276+
| [[bip-0386.mediawiki|386]]
277+
|}

0 commit comments

Comments
 (0)