|
| 1 | +<pre> |
| 2 | + BIP: 380 |
| 3 | + Layer: Applications |
| 4 | + Title: Output Script Descriptors General Operation |
| 5 | + Author: Pieter Wuille < [email protected]> |
| 6 | + |
| 7 | + Comments-Summary: No comments yet. |
| 8 | + Comments-URI: https://github.com/bitcoin/bips/wiki/Comments:BIP-0380 |
| 9 | + Status: Draft |
| 10 | + Type: Informational |
| 11 | + Created: 2021-06-27 |
| 12 | + License: BSD-2-Clause |
| 13 | +</pre> |
| 14 | + |
| 15 | +==Abstract== |
| 16 | + |
| 17 | +Output Script Descriptors are a simple language which can be used to describe collections of output scripts. |
| 18 | +There can be many different descriptor fragments and functions. |
| 19 | +This document describes the general syntax for descriptors, descriptor checksums, and common expressions. |
| 20 | + |
| 21 | +==Copyright== |
| 22 | + |
| 23 | +This BIP is licensed under the BSD 2-clause license. |
| 24 | + |
| 25 | +==Motivation== |
| 26 | + |
| 27 | +Bitcoin wallets traditionally have stored a set of keys which are later serialized and mutated to produce the output scripts that the wallet watches and the addresses it provides to users. |
| 28 | +Typically backups have consisted of solely the private keys, nowadays primarily in the form of BIP 39 mnemonics. |
| 29 | +However this backup solution is insuffient, especially since the introduction of Segregated Witness which added new output types. |
| 30 | +Given just the private keys, it is not possible for restored wallets to know which kinds of output scripts and addresses to produce. |
| 31 | +This has lead to incompatibilities between wallets when restoring a backup or exporting data for a watch only wallet. |
| 32 | + |
| 33 | +Further complicating matters are BIP 32 derivation paths. |
| 34 | +Although BIPs 44, 49, and 84 have specified standard BIP 32 derivation paths for different output scripts and addresses, not all wallets support them nor use those derivation paths. |
| 35 | +The lack of derivation path information in these backups and exports leads to further incompatibilities between wallets. |
| 36 | + |
| 37 | +Current solutions to these issues have not been generic and can be viewed as being layer violations. |
| 38 | +Solutions such as introducing different version bytes for extended key serialization both are a layer violation (key derivation should be separate from script type meaning) and specific only to a particular derivation path and script type. |
| 39 | + |
| 40 | +Output Script Descriptors introduces a generic solution to these issues. |
| 41 | +Script types are specified explicitly through the use of Script Expressions. |
| 42 | +Key derivation paths are specified explicitly in Key Expressions. |
| 43 | +These allow for creating wallet backups and exports which specify the exact scripts, subscripts (redeemScript, witnessScript, etc.), and keys to produce. |
| 44 | +With the general structure specified in this BIP, new Script Expressions can be introduced as new script types are added. |
| 45 | +Lastly, the use of common terminology and existing standards allow for Output Script Descriptors to be engineer readable so that the results can be understood at a glance. |
| 46 | + |
| 47 | +==Specification== |
| 48 | + |
| 49 | +Descriptors consist of several types of expressions. |
| 50 | +The top level expression is a <tt>SCRIPT</tt>. |
| 51 | +This expression may be followed by <tt>#CHECKSUM</tt>, where <tt>CHECKSUM</tt> is an 8 character alphanumeric descriptor checksum. |
| 52 | + |
| 53 | +===Script Expressions=== |
| 54 | + |
| 55 | +Script Expressions (denoted <tt>SCRIPT</tt>) are expressions which correspond directly with a Bitcoin script. |
| 56 | +These expressions are written as functions and take arguments. |
| 57 | +Such expressions have a script template which is filled with the arguments correspondingly. |
| 58 | +Expressions are written with a human readable identifier string with the arguments enclosed with parentheses. |
| 59 | +The identifier string should be alphanumeric and may include underscores. |
| 60 | + |
| 61 | +The arguments to a script expression are defined by that expression itself. |
| 62 | +They could be a script expression, a key expression, or some other expression entirely. |
| 63 | + |
| 64 | +===Key Expressions=== |
| 65 | + |
| 66 | +A common expression used as an argument to script expressions are key expressions (denoted <tt>KEY</tt>). |
| 67 | +These represent a public or private key and, optionally, information about the origin of that key. |
| 68 | +Key expressions can only be used as arguments to script expressions. |
| 69 | + |
| 70 | +Key expressions consist of: |
| 71 | +* Optionally, key origin information, consisting of: |
| 72 | +** An open bracket <tt>[</tt> |
| 73 | +** Exactly 8 hex characters for the fingerprint of the key where the derivation starts (see BIP 32 for details) |
| 74 | +** Followed by zero or more <tt>/NUM</tt> or <tt>/NUMh</tt> path elements to indicate the unhardened or hardened derivation steps between the fingerprint and the key that follows. |
| 75 | +** A closing bracket <tt>]</tt> |
| 76 | +* Followed by the actual key, which is either: |
| 77 | +** A hex encoded public key, which depending the script expression, may be either: |
| 78 | +*** 66 hex character string beginning with <tt>02</tt> or <tt>03</tt> representing a compressed public key |
| 79 | +*** 130 hex character string beginning with <tt>04</tt> representing an uncompressed public key |
| 80 | +** A [[https://en.bitcoin.it/wiki/Wallet_import_format|WIF]] encoded private key |
| 81 | +** <tt>xpub</tt> encoded extended public key or <tt>xprv</tt> encoded extended private key (as defined in BIP 32) |
| 82 | +*** Followed by zero or more <tt>/NUM</tt> or <tt>/NUMh</tt> path elements indicating BIP 32 derivation steps to be taken after the given extended key. |
| 83 | +*** Optionally followed by a single <tt>/*</tt> or <tt>/*h</tt> final step to denote all direct unhardened or hardened children. |
| 84 | +
|
| 85 | +If the <tt>KEY</tt> is a BIP 32 extended key, before output scripts can be created, child keys must be derived using the derivation information that follows the extended key. |
| 86 | +When the final step is <tt>/*</tt> or <tt>/*'</tt>, an output script will be produced for every child key index. |
| 87 | +The derived key must be not be serialized as an uncompressed public key. |
| 88 | +Script Expressions may have further requirements on how derived public keys are serialized for script creation. |
| 89 | + |
| 90 | +In the above specification, the hardened indicator <tt>h</tt> may be replaced with alternative hardened indicators of <tt>H</tt> or <tt>'</tt>. |
| 91 | + |
| 92 | +====Normalization of Key Expressions with Hardened Derivation==== |
| 93 | + |
| 94 | +When a descriptor is exported without private keys, it is necessary to do additional derivation to remove any intermediate hardened derivation steps for the exported descriptor to be useful. |
| 95 | +The exporter should derive the extended public key at the last hardened derivation step and use that extended public key as the key in the descriptor. |
| 96 | +The derivation steps that were taken to get to that key must be added to the previous key origin information. |
| 97 | +If there is no key origin information, then one must be added for the newly derived extended public key. |
| 98 | +If the final derivation is hardened, then it is not necessary to do additional derivation. |
| 99 | + |
| 100 | +===Character Set=== |
| 101 | + |
| 102 | +The expressions used in descriptors must only contain characters within this character set so that the descriptor checksum will work. |
| 103 | + |
| 104 | +The allowed characters are: |
| 105 | +<pre> |
| 106 | +0123456789()[],'/*abcdefgh@:$%{} |
| 107 | +IJKLMNOPQRSTUVWXYZ&+-.;<=>?!^_|~ |
| 108 | +ijklmnopqrstuvwxyzABCDEFGH`#"\<space> |
| 109 | +</pre> |
| 110 | +Note that <tt><space></tt> on the last line is a space character. |
| 111 | + |
| 112 | +This character set is written as 3 groups of 32 characters in this specific order so that the checksum below can identify more errors. |
| 113 | +The first group are the most common "unprotected" characters (i.e. things such as hex and keypaths that do not already have their own checksums). |
| 114 | +Case errors cause an offset that is a multiple of 32 while as many alphabetic characters are in the same group while following the previous restrictions. |
| 115 | + |
| 116 | +===Checksum=== |
| 117 | + |
| 118 | +Following the top level script expression is a single octothorpe (<tt>#</tt>) followed by the 8 character checksum. |
| 119 | +The checksum is an error correcting checksum similar to bech32. |
| 120 | + |
| 121 | +The checksum has the following properties: |
| 122 | +* Mistakes in a descriptor string are measured in "symbol errors". The higher the number of symbol errors, the harder it is to detect: |
| 123 | +** An error substituting a character from <tt>0123456789()[],'/*abcdefgh@:$%{}</tt> for another in that set always counts as 1 symbol error. |
| 124 | +*** Note that hex encoded keys are covered by these characters. Extended keys (<tt>xpub</tt> and <tt>xprv</tt>) use other characters too, but also have their own checksum mechanism. |
| 125 | +*** <tt>SCRIPT</tt> expression function names use other characters, but mistakes in these would generally result in an unparsable descriptor. |
| 126 | +** A case error always counts as 1 symbol error. |
| 127 | +** Any other 1 character substitution error counts as 1 or 2 symbol errors. |
| 128 | +* Any 1 symbol error is always detected. |
| 129 | +* Any 2 or 3 symbol error in a descriptor of up to 49154 characters is always detected. |
| 130 | +* Any 4 symbol error in a descriptor of up to 507 characters is always detected. |
| 131 | +* Any 5 symbol error in a descriptor of up to 77 characters is always detected. |
| 132 | +* Is optimized to minimize the chance of a 5 symbol error in a descriptor up to 387 characters is undetected |
| 133 | +* Random errors have a chance of 1 in 2<super>40</super> of being undetected. |
| 134 | +
|
| 135 | +The checksum itself uses the same character set as bech32: <tt>qpzry9x8gf2tvdw0s3jn54khce6mua7l</tt> |
| 136 | + |
| 137 | +Valid descriptor strings with a checksum must pass the criteria for validity specified by the Python3 code snippet below. |
| 138 | +The function <tt>descsum_check</tt> must return true when its argument <tt>s</tt> is a descriptor consisting in the form <tt>SCRIPT#CHECKSUM</tt>. |
| 139 | + |
| 140 | +<pre> |
| 141 | +INPUT_CHARSET = "0123456789()[],'/*abcdefgh@:$%{}IJKLMNOPQRSTUVWXYZ&+-.;<=>?!^_|~ijklmnopqrstuvwxyzABCDEFGH`#\"\\ " |
| 142 | +CHECKSUM_CHARSET = "qpzry9x8gf2tvdw0s3jn54khce6mua7l" |
| 143 | +GENERATOR = [0xf5dee51989, 0xa9fdca3312, 0x1bab10e32d, 0x3706b1677a, 0x644d626ffd] |
| 144 | + |
| 145 | +def descsum_polymod(symbols): |
| 146 | + """Internal function that computes the descriptor checksum.""" |
| 147 | + chk = 1 |
| 148 | + for value in symbols: |
| 149 | + top = chk >> 35 |
| 150 | + chk = (chk & 0x7ffffffff) << 5 ^ value |
| 151 | + for i in range(5): |
| 152 | + chk ^= GENERATOR[i] if ((top >> i) & 1) else 0 |
| 153 | + return chk |
| 154 | + |
| 155 | +def descsum_expand(s): |
| 156 | + """Internal function that does the character to symbol expansion""" |
| 157 | + groups = [] |
| 158 | + symbols = [] |
| 159 | + for c in s: |
| 160 | + if not c in INPUT_CHARSET: |
| 161 | + return None |
| 162 | + v = INPUT_CHARSET.find(c) |
| 163 | + symbols.append(v & 31) |
| 164 | + groups.append(v >> 5) |
| 165 | + if len(groups) == 3: |
| 166 | + symbols.append(groups[0] * 9 + groups[1] * 3 + groups[2]) |
| 167 | + groups = [] |
| 168 | + if len(groups) == 1: |
| 169 | + symbols.append(groups[0]) |
| 170 | + elif len(groups) == 2: |
| 171 | + symbols.append(groups[0] * 3 + groups[1]) |
| 172 | + return symbols |
| 173 | + |
| 174 | +def descsum_check(s): |
| 175 | + """Verify that the checksum is correct in a descriptor""" |
| 176 | + if s[-9] != '#': |
| 177 | + return False |
| 178 | + if not all(x in CHECKSUM_CHARSET for x in s[-8:]): |
| 179 | + return False |
| 180 | + symbols = descsum_expand(s[:-9]) + [CHECKSUM_CHARSET.find(x) for x in s[-8:]] |
| 181 | + return descsum_polymod(symbols) == 1 |
| 182 | +</pre> |
| 183 | + |
| 184 | +This implements a BCH code that has the properties described above. |
| 185 | +The entire descriptor string is first processed into an array of symbols. |
| 186 | +The symbol for each character is its position within its group. |
| 187 | +After every 3rd symbol, a 4th symbol is inserted which represents the group numbers combined together. |
| 188 | +This means that a change that only affects the position within a group, or only a group number change, will only affect a single symbol. |
| 189 | + |
| 190 | +To construct a valid checksum given a script expression, the code below can be used: |
| 191 | + |
| 192 | +<pre> |
| 193 | +def descsum_create(s): |
| 194 | + """Add a checksum to a descriptor without""" |
| 195 | + symbols = descsum_expand(s) + [0, 0, 0, 0, 0, 0, 0, 0] |
| 196 | + checksum = descsum_polymod(symbols) ^ 1 |
| 197 | + return s + '#' + ''.join(CHECKSUM_CHARSET[(checksum >> (5 * (7 - i))) & 31] for i in range(8)) |
| 198 | +
|
| 199 | +</pre> |
| 200 | +
|
| 201 | +==Backwards Compatibility== |
| 202 | +
|
| 203 | +Output script descriptors are an entirely new language which is not compatible with any existing software. |
| 204 | +However many components of the expressions reuse encodings and serializations defined by previous BIPs. |
| 205 | +
|
| 206 | +Output script descriptors are designed for future extension with further fragment types and new script expressions. |
| 207 | +These will be specified in additional BIPs. |
| 208 | +
|
| 209 | +==Reference Implemntation== |
| 210 | +
|
| 211 | +Descriptors have been implemented in Bitcoin Core since version 0.17. |
| 212 | +
|
| 213 | +==Appendix A: Index of Expressions== |
| 214 | +
|
| 215 | +Future BIPs may specify additional types of expressions. |
| 216 | +All available expression types are listed in this table. |
| 217 | +
|
| 218 | +{| |
| 219 | +! Name |
| 220 | +! Denoted As |
| 221 | +! BIP |
| 222 | +|- |
| 223 | +| Script |
| 224 | +| <tt>SCRIPT</tt> |
| 225 | +| 380 |
| 226 | +|- |
| 227 | +| Key |
| 228 | +| <tt>KEY</tt> |
| 229 | +| 380 |
| 230 | +|- |
| 231 | +| Tree |
| 232 | +| <tt>TREE</tt> |
| 233 | +| [[bip-0386.mediawiki|386]] |
| 234 | +|} |
| 235 | +
|
| 236 | +==Appendix B: Index of Script Expressions== |
| 237 | +
|
| 238 | +Script expressions will be specified in additional BIPs. |
| 239 | +This Table lists all available Script expressions and the BIPs specifying them. |
| 240 | +
|
| 241 | +{| |
| 242 | +! Expression |
| 243 | +! BIP |
| 244 | +|- |
| 245 | +| <tt>pk(KEY)</tt> |
| 246 | +| [[bip-0381.mediawiki|381]] |
| 247 | +|- |
| 248 | +| <tt>pkh(KEY)</tt> |
| 249 | +| [[bip-0381.mediawiki|381]] |
| 250 | +|- |
| 251 | +| <tt>sh(SCRIPT)</tt> |
| 252 | +| [[bip-0381.mediawiki|381]] |
| 253 | +|- |
| 254 | +| <tt>wpkh(KEY)</tt> |
| 255 | +| [[bip-0382.mediawiki|382]] |
| 256 | +|- |
| 257 | +| <tt>wsh(SCRIPT)</tt> |
| 258 | +| [[bip-0382.mediawiki|382]] |
| 259 | +|- |
| 260 | +| <tt>multi(NUM, KEY, ..., KEY)</tt> |
| 261 | +| [[bip-0383.mediawiki|383]] |
| 262 | +|- |
| 263 | +| <tt>sortedmulti(NUM, KEY, ..., KEY)</tt> |
| 264 | +| [[bip-0383.mediawiki|383]] |
| 265 | +|- |
| 266 | +| <tt>combo(KEY)</tt> |
| 267 | +| [[bip-0384.mediawiki|384]] |
| 268 | +|- |
| 269 | +| <tt>raw(HEX)</tt> |
| 270 | +| [[bip-0385.mediawiki|385]] |
| 271 | +|- |
| 272 | +| <tt>addr(ADDR)</tt> |
| 273 | +| [[bip-0385.mediawiki|385]] |
| 274 | +|- |
| 275 | +| <tt>tr(KEY)</tt>, <tt>tr(KEY, TREE)</tt> |
| 276 | +| [[bip-0386.mediawiki|386]] |
| 277 | +|} |
0 commit comments