Skip to content

Commit 43e71f7

Browse files
committed
Merge bitcoin/bitcoin#27432: contrib: add tool to convert compact-serialized UTXO set to SQLite database
4080b66 test: add test for utxo-to-sqlite conversion script (Sebastian Falbesoner) ec99ed7 contrib: add tool to convert compact-serialized UTXO set to SQLite database (Sebastian Falbesoner) Pull request description: ## Problem description There is demand from users to get the UTXO set in form of a SQLite database (#24628). Bitcoin Core currently only supports dumping the UTXO set in a binary _compact-serialized_ format, which was crafted specifically for AssumeUTXO snapshots (see PR #16899), with the primary goal of being as compact as possible. Previous PRs tried to extend the `dumptxoutset` RPC with new formats, either in human-readable form (e.g. #18689, #24202), or most recently, directly as SQLite database (#24952). Both are not optimal: due to the huge size of the ever-growing UTXO set with already more than 80 million entries on mainnet, human-readable formats are practically useless, and very likely one of the first steps would be to put them in some form of database anyway. Directly adding SQLite3 dumping support on the other hand introduces an additional dependency to the non-wallet part of bitcoind and the risk of increased maintenance burden (see e.g. bitcoin/bitcoin#24952 (comment), bitcoin/bitcoin#24628 (comment)). ## Proposed solution This PR follows the "external tooling" route by adding a simple Python script for achieving the same goal in a two-step process (first create compact-serialized UTXO set via `dumptxoutset`, then convert it to SQLite via the new script). Executive summary: - single file, no extra dependencies (sqlite3 is included in Python's standard library [1]) - ~150 LOC, mostly deserialization/decompression routines ported from the Core codebase and (probably the most difficult part) a little elliptic curve / finite field math to decompress pubkeys (essentialy solving the secp256k1 curve equation y^2 = x^3 + 7 for y given x, respecting the proper polarity as indicated by the compression tag) - creates a database with only one table `utxos` with the following schema: ```(txid TEXT, vout INT, value INT, coinbase INT, height INT, scriptpubkey TEXT)``` - the resulting file has roughly 2x the size of the compact-serialized UTXO set (this is mostly due to encoding txids and scriptpubkeys as hex-strings rather than bytes) [1] note that there are some rare cases of operating systems like FreeBSD though, where the sqlite3 module has to installed explicitly (see #26819) A functional test is also added that creates UTXO set entries with various output script types (standard and also non-standard, for e.g. large scripts) and verifies that the UTXO sets of both formats match by comparing corresponding MuHashes. One MuHash is supplied by the bitcoind instance via `gettxoutsetinfo muhash`, the other is calculated in the test by reading back the created SQLite database entries and hashing them with the test framework's `MuHash3072` module. ## Manual test instructions I'd suggest to do manual tests also by comparing MuHashes. For that, I've written a go tool some time ago which would calculate the MuHash of a sqlite database in the created format (I've tried to do a similar tool in Python, but it's painfully slow). ``` $ [run bitcoind instance with -coinstatsindex] $ ./src/bitcoin-cli dumptxoutset ~/utxos.dat $ ./src/bitcoin-cli gettxoutsetinfo muhash <block height returned in previous call> (outputs MuHash calculated from node) $ ./contrib/utxo-tools/utxo_to_sqlite.py ~/utxos.dat ~/utxos.sqlite $ git clone https://github.com/theStack/utxo_dump_tools $ cd utxo_dump_tools/calc_utxo_hash $ go run calc_utxo_hash.go ~/utxos.sqlite (outputs MuHash calculated from the SQLite UTXO set) => verify that both MuHashes are equal ``` For a demonstration what can be done with the resulting database, see bitcoin/bitcoin#24952 (review) for some example queries. Thanks go to LarryRuane who gave me to the idea of rewriting this script in Python and adding it to `contrib`. ACKs for top commit: ajtowns: ACK 4080b66 - light review achow101: ACK 4080b66 romanz: tACK 4080b66 on signet (using [calc_utxo_hash](https://github.com/theStack/utxo_dump_tools/blob/8981aa3e85efac046f0f3b6a1f99d3f4a273cdd1/calc_utxo_hash/calc_utxo_hash.go)): tdb3: ACK 4080b66 Tree-SHA512: be8aa0369a28c8421a3ccdf1402e106563dd07c082269707311ca584d1c4c8c7b97d48c4fcd344696a36e7ab8cdb64a1d0ef9a192a15cff6d470baf21e46ee7b
2 parents e53310c + 4080b66 commit 43e71f7

File tree

4 files changed

+321
-0
lines changed

4 files changed

+321
-0
lines changed

contrib/README.md

Lines changed: 8 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -43,3 +43,11 @@ Command Line Tools
4343

4444
### [Completions](/contrib/completions) ###
4545
Shell completions for bash and fish.
46+
47+
UTXO Set Tools
48+
--------------
49+
50+
### [UTXO-to-SQLite](/contrib/utxo-tools/utxo_to_sqlite.py) ###
51+
This script converts a compact-serialized UTXO set (as generated by Bitcoin Core with `dumptxoutset`)
52+
to a SQLite3 database. For more details like e.g. the created table name and schema, refer to the
53+
module docstring on top of the script, which is also contained in the command's `--help` output.

contrib/utxo-tools/utxo_to_sqlite.py

Lines changed: 195 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,195 @@
1+
#!/usr/bin/env python3
2+
# Copyright (c) 2024-present The Bitcoin Core developers
3+
# Distributed under the MIT software license, see the accompanying
4+
# file COPYING or http://www.opensource.org/licenses/mit-license.php.
5+
"""Tool to convert a compact-serialized UTXO set to a SQLite3 database.
6+
7+
The input UTXO set can be generated by Bitcoin Core with the `dumptxoutset` RPC:
8+
$ bitcoin-cli dumptxoutset ~/utxos.dat
9+
10+
The created database contains a table `utxos` with the following schema:
11+
(txid TEXT, vout INT, value INT, coinbase INT, height INT, scriptpubkey TEXT)
12+
"""
13+
import argparse
14+
import os
15+
import sqlite3
16+
import sys
17+
import time
18+
19+
20+
UTXO_DUMP_MAGIC = b'utxo\xff'
21+
UTXO_DUMP_VERSION = 2
22+
NET_MAGIC_BYTES = {
23+
b"\xf9\xbe\xb4\xd9": "Mainnet",
24+
b"\x0a\x03\xcf\x40": "Signet",
25+
b"\x0b\x11\x09\x07": "Testnet3",
26+
b"\x1c\x16\x3f\x28": "Testnet4",
27+
b"\xfa\xbf\xb5\xda": "Regtest",
28+
}
29+
30+
31+
def read_varint(f):
32+
"""Equivalent of `ReadVarInt()` (see serialization module)."""
33+
n = 0
34+
while True:
35+
dat = f.read(1)[0]
36+
n = (n << 7) | (dat & 0x7f)
37+
if (dat & 0x80) > 0:
38+
n += 1
39+
else:
40+
return n
41+
42+
43+
def read_compactsize(f):
44+
"""Equivalent of `ReadCompactSize()` (see serialization module)."""
45+
n = f.read(1)[0]
46+
if n == 253:
47+
n = int.from_bytes(f.read(2), "little")
48+
elif n == 254:
49+
n = int.from_bytes(f.read(4), "little")
50+
elif n == 255:
51+
n = int.from_bytes(f.read(8), "little")
52+
return n
53+
54+
55+
def decompress_amount(x):
56+
"""Equivalent of `DecompressAmount()` (see compressor module)."""
57+
if x == 0:
58+
return 0
59+
x -= 1
60+
e = x % 10
61+
x //= 10
62+
n = 0
63+
if e < 9:
64+
d = (x % 9) + 1
65+
x //= 9
66+
n = x * 10 + d
67+
else:
68+
n = x + 1
69+
while e > 0:
70+
n *= 10
71+
e -= 1
72+
return n
73+
74+
75+
def decompress_script(f):
76+
"""Equivalent of `DecompressScript()` (see compressor module)."""
77+
size = read_varint(f) # sizes 0-5 encode compressed script types
78+
if size == 0: # P2PKH
79+
return bytes([0x76, 0xa9, 20]) + f.read(20) + bytes([0x88, 0xac])
80+
elif size == 1: # P2SH
81+
return bytes([0xa9, 20]) + f.read(20) + bytes([0x87])
82+
elif size in (2, 3): # P2PK (compressed)
83+
return bytes([33, size]) + f.read(32) + bytes([0xac])
84+
elif size in (4, 5): # P2PK (uncompressed)
85+
compressed_pubkey = bytes([size - 2]) + f.read(32)
86+
return bytes([65]) + decompress_pubkey(compressed_pubkey) + bytes([0xac])
87+
else: # others (bare multisig, segwit etc.)
88+
size -= 6
89+
assert size <= 10000, f"too long script with size {size}"
90+
return f.read(size)
91+
92+
93+
def decompress_pubkey(compressed_pubkey):
94+
"""Decompress pubkey by calculating y = sqrt(x^3 + 7) % p
95+
(see functions `secp256k1_eckey_pubkey_parse` and `secp256k1_ge_set_xo_var`).
96+
"""
97+
P = 2**256 - 2**32 - 977 # secp256k1 field size
98+
assert len(compressed_pubkey) == 33 and compressed_pubkey[0] in (2, 3)
99+
x = int.from_bytes(compressed_pubkey[1:], 'big')
100+
rhs = (x**3 + 7) % P
101+
y = pow(rhs, (P + 1)//4, P) # get sqrt using Tonelli-Shanks algorithm (for p % 4 = 3)
102+
assert pow(y, 2, P) == rhs, f"pubkey is not on curve ({compressed_pubkey.hex()})"
103+
tag_is_odd = compressed_pubkey[0] == 3
104+
y_is_odd = (y & 1) == 1
105+
if tag_is_odd != y_is_odd: # fix parity (even/odd) if necessary
106+
y = P - y
107+
return bytes([4]) + x.to_bytes(32, 'big') + y.to_bytes(32, 'big')
108+
109+
110+
def main():
111+
parser = argparse.ArgumentParser(description=__doc__, formatter_class=argparse.RawDescriptionHelpFormatter)
112+
parser.add_argument('infile', help='filename of compact-serialized UTXO set (input)')
113+
parser.add_argument('outfile', help='filename of created SQLite3 database (output)')
114+
parser.add_argument('-v', '--verbose', action='store_true', help='show details about each UTXO')
115+
args = parser.parse_args()
116+
117+
if not os.path.exists(args.infile):
118+
print(f"Error: provided input file '{args.infile}' doesn't exist.")
119+
sys.exit(1)
120+
121+
if os.path.exists(args.outfile):
122+
print(f"Error: provided output file '{args.outfile}' already exists.")
123+
sys.exit(1)
124+
125+
# create database table
126+
con = sqlite3.connect(args.outfile)
127+
con.execute("CREATE TABLE utxos(txid TEXT, vout INT, value INT, coinbase INT, height INT, scriptpubkey TEXT)")
128+
129+
# read metadata (magic bytes, version, network magic, block height, block hash, UTXO count)
130+
f = open(args.infile, 'rb')
131+
magic_bytes = f.read(5)
132+
version = int.from_bytes(f.read(2), 'little')
133+
network_magic = f.read(4)
134+
block_hash = f.read(32)
135+
num_utxos = int.from_bytes(f.read(8), 'little')
136+
if magic_bytes != UTXO_DUMP_MAGIC:
137+
print(f"Error: provided input file '{args.infile}' is not an UTXO dump.")
138+
sys.exit(1)
139+
if version != UTXO_DUMP_VERSION:
140+
print(f"Error: provided input file '{args.infile}' has unknown UTXO dump version {version} "
141+
f"(only version {UTXO_DUMP_VERSION} supported)")
142+
sys.exit(1)
143+
network_string = NET_MAGIC_BYTES.get(network_magic, f"unknown network ({network_magic.hex()})")
144+
print(f"UTXO Snapshot for {network_string} at block hash "
145+
f"{block_hash[::-1].hex()[:32]}..., contains {num_utxos} coins")
146+
147+
start_time = time.time()
148+
write_batch = []
149+
coins_per_hash_left = 0
150+
prevout_hash = None
151+
max_height = 0
152+
153+
for coin_idx in range(1, num_utxos+1):
154+
# read key (COutPoint)
155+
if coins_per_hash_left == 0: # read next prevout hash
156+
prevout_hash = f.read(32)[::-1].hex()
157+
coins_per_hash_left = read_compactsize(f)
158+
prevout_index = read_compactsize(f)
159+
# read value (Coin)
160+
code = read_varint(f)
161+
height = code >> 1
162+
is_coinbase = code & 1
163+
amount = decompress_amount(read_varint(f))
164+
scriptpubkey = decompress_script(f).hex()
165+
write_batch.append((prevout_hash, prevout_index, amount, is_coinbase, height, scriptpubkey))
166+
if height > max_height:
167+
max_height = height
168+
coins_per_hash_left -= 1
169+
170+
if args.verbose:
171+
print(f"Coin {coin_idx}/{num_utxos}:")
172+
print(f" prevout = {prevout_hash}:{prevout_index}")
173+
print(f" amount = {amount}, height = {height}, coinbase = {is_coinbase}")
174+
print(f" scriptPubKey = {scriptpubkey}\n")
175+
176+
if coin_idx % (16*1024) == 0 or coin_idx == num_utxos:
177+
# write utxo batch to database
178+
con.executemany("INSERT INTO utxos VALUES(?, ?, ?, ?, ?, ?)", write_batch)
179+
con.commit()
180+
write_batch.clear()
181+
182+
if coin_idx % (1024*1024) == 0:
183+
elapsed = time.time() - start_time
184+
print(f"{coin_idx} coins converted [{coin_idx/num_utxos*100:.2f}%], " +
185+
f"{elapsed:.3f}s passed since start")
186+
con.close()
187+
188+
print(f"TOTAL: {num_utxos} coins written to {args.outfile}, snapshot height is {max_height}.")
189+
if f.read(1) != b'': # EOF should be reached by now
190+
print(f"WARNING: input file {args.infile} has not reached EOF yet!")
191+
sys.exit(1)
192+
193+
194+
if __name__ == '__main__':
195+
main()

test/functional/test_runner.py

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -289,6 +289,7 @@
289289
'mempool_package_onemore.py',
290290
'mempool_package_limits.py',
291291
'mempool_package_rbf.py',
292+
'tool_utxo_to_sqlite.py',
292293
'feature_versionbits_warning.py',
293294
'feature_blocksxor.py',
294295
'rpc_preciousblock.py',
Lines changed: 117 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,117 @@
1+
#!/usr/bin/env python3
2+
# Copyright (c) 2024-present The Bitcoin Core developers
3+
# Distributed under the MIT software license, see the accompanying
4+
# file COPYING or http://www.opensource.org/licenses/mit-license.php.
5+
"""Test utxo-to-sqlite conversion tool"""
6+
import os.path
7+
try:
8+
import sqlite3
9+
except ImportError:
10+
pass
11+
import subprocess
12+
import sys
13+
14+
from test_framework.key import ECKey
15+
from test_framework.messages import (
16+
COutPoint,
17+
CTxOut,
18+
)
19+
from test_framework.crypto.muhash import MuHash3072
20+
from test_framework.script import (
21+
CScript,
22+
CScriptOp,
23+
)
24+
from test_framework.script_util import (
25+
PAY_TO_ANCHOR,
26+
key_to_p2pk_script,
27+
key_to_p2pkh_script,
28+
key_to_p2wpkh_script,
29+
keys_to_multisig_script,
30+
output_key_to_p2tr_script,
31+
script_to_p2sh_script,
32+
script_to_p2wsh_script,
33+
)
34+
from test_framework.test_framework import BitcoinTestFramework
35+
from test_framework.util import (
36+
assert_equal,
37+
)
38+
from test_framework.wallet import MiniWallet
39+
40+
41+
def calculate_muhash_from_sqlite_utxos(filename):
42+
muhash = MuHash3072()
43+
con = sqlite3.connect(filename)
44+
cur = con.cursor()
45+
for (txid_hex, vout, value, coinbase, height, spk_hex) in cur.execute("SELECT * FROM utxos"):
46+
# serialize UTXO for MuHash (see function `TxOutSer` in the coinstats module)
47+
utxo_ser = COutPoint(int(txid_hex, 16), vout).serialize()
48+
utxo_ser += (height * 2 + coinbase).to_bytes(4, 'little')
49+
utxo_ser += CTxOut(value, bytes.fromhex(spk_hex)).serialize()
50+
muhash.insert(utxo_ser)
51+
con.close()
52+
return muhash.digest()[::-1].hex()
53+
54+
55+
class UtxoToSqliteTest(BitcoinTestFramework):
56+
def set_test_params(self):
57+
self.num_nodes = 1
58+
# we want to create some UTXOs with non-standard output scripts
59+
self.extra_args = [['-acceptnonstdtxn=1']]
60+
61+
def skip_test_if_missing_module(self):
62+
self.skip_if_no_py_sqlite3()
63+
64+
def run_test(self):
65+
node = self.nodes[0]
66+
wallet = MiniWallet(node)
67+
key = ECKey()
68+
69+
self.log.info('Create UTXOs with various output script types')
70+
for i in range(1, 10+1):
71+
key.generate(compressed=False)
72+
uncompressed_pubkey = key.get_pubkey().get_bytes()
73+
key.generate(compressed=True)
74+
pubkey = key.get_pubkey().get_bytes()
75+
76+
# add output scripts for compressed script type 0 (P2PKH), type 1 (P2SH),
77+
# types 2-3 (P2PK compressed), types 4-5 (P2PK uncompressed) and
78+
# for uncompressed scripts (bare multisig, segwit, etc.)
79+
output_scripts = (
80+
key_to_p2pkh_script(pubkey),
81+
script_to_p2sh_script(key_to_p2pkh_script(pubkey)),
82+
key_to_p2pk_script(pubkey),
83+
key_to_p2pk_script(uncompressed_pubkey),
84+
85+
keys_to_multisig_script([pubkey]*i),
86+
keys_to_multisig_script([uncompressed_pubkey]*i),
87+
key_to_p2wpkh_script(pubkey),
88+
script_to_p2wsh_script(key_to_p2pkh_script(pubkey)),
89+
output_key_to_p2tr_script(pubkey[1:]),
90+
PAY_TO_ANCHOR,
91+
CScript([CScriptOp.encode_op_n(i)]*(1000*i)), # large script (up to 10000 bytes)
92+
)
93+
94+
# create outputs and mine them in a block
95+
for output_script in output_scripts:
96+
wallet.send_to(from_node=node, scriptPubKey=output_script, amount=i, fee=20000)
97+
self.generate(wallet, 1)
98+
99+
self.log.info('Dump UTXO set via `dumptxoutset` RPC')
100+
input_filename = os.path.join(self.options.tmpdir, "utxos.dat")
101+
node.dumptxoutset(input_filename, "latest")
102+
103+
self.log.info('Convert UTXO set from compact-serialized format to sqlite format')
104+
output_filename = os.path.join(self.options.tmpdir, "utxos.sqlite")
105+
base_dir = self.config["environment"]["SRCDIR"]
106+
utxo_to_sqlite_path = os.path.join(base_dir, "contrib", "utxo-tools", "utxo_to_sqlite.py")
107+
subprocess.run([sys.executable, utxo_to_sqlite_path, input_filename, output_filename],
108+
check=True, stderr=subprocess.STDOUT)
109+
110+
self.log.info('Verify that both UTXO sets match by comparing their MuHash')
111+
muhash_sqlite = calculate_muhash_from_sqlite_utxos(output_filename)
112+
muhash_compact_serialized = node.gettxoutsetinfo('muhash')['muhash']
113+
assert_equal(muhash_sqlite, muhash_compact_serialized)
114+
115+
116+
if __name__ == "__main__":
117+
UtxoToSqliteTest(__file__).main()

0 commit comments

Comments
 (0)