Remove serde json dependency from chain crate #1752

nymius · 2024-11-29T16:14:43Z

Description

As expressed by @LLFourn We want to not depend on serde_json. If we keep it around for serializing anchors we won't be able to remove it in the future because it will always be needed to do migrations
Currently there is only one widely used anchor, ConfirmationBlockTime.

The decision was to constrain support to just be for a single anchor type ConfirmationBlockTime. The anchor table will represent all fields of ConfirmationBlockTime, each one in its own column.

The reasons:

No one is using rusqlite with any other anchor type, and if they are, they can do something custom anyway.
The anchor representation may change in the future, supporting for multiple Anchor types here will cause more problems for migration later on.

Resolves #1695

Notes to the reviewers

Why the type of the confirmation_time column is INTEGER?

By sqlite3 docs:

Each value stored in an SQLite database (or manipulated by the database engine) has one of the following storage classes:
...
INTEGER. The value is a signed integer, stored in 0, 1, 2, 3, 4, 6, or 8 bytes depending on the magnitude of the value.

(Remember confirmation time is u64)

REAL. The value is a floating point value, stored as an 8-byte IEEE floating point number.
...
SQLite uses a more general dynamic type system. In SQLite, the datatype of a value is associated with the value itself, not with its container.
...
In order to maximize compatibility between SQLite and other database engines, ..., SQLite supports the concept of "type affinity" on columns. The type affinity of a column is the recommended type for data stored in that column. The important idea here is that the type is recommended, not required. Any column can still store any type of data. It is just that some columns, given the choice, will prefer to use one storage class over another. The preferred storage class for a column is called its "affinity".
...
A column with NUMERIC affinity may contain values using all five storage classes. When text data is inserted into a NUMERIC column, the storage class of the text is converted to INTEGER or REAL (in order of preference) if the text is a well-formed integer or real literal, respectively.
A column that uses INTEGER affinity behaves the same as a column with NUMERIC affinity.

But anchor table was defined using the STRICT keyword, again, by sqlite docs:

... The STRICT keyword causes the following differences:
...
SQLite attempts to coerce the data into the appropriate type using the usual affinity rules, as PostgreSQL, MySQL, SQL Server, and Oracle all do. If the value cannot be losslessly converted in the specified datatype, then an SQLITE_CONSTRAINT_DATATYPE error is raised.

So, the TLDR, with some help of this blog post is:

SQLite INTEGER supported values range from -2**63 to (2**63-1)
If the number is bigger it will treat the number as REAL.
As the SQLite table is defined with the STRICT keyword, it should enforce a losslessly conversion or fail with SQLITE_CONSTRAINT_DATATYPE.

Why not setting confirmation_time as BLOB or NUMERIC?

I don't have a strong opinion on this. INTEGER was the first numeric type I found, then later I found NUMERIC and they seemed to behave in the same way, so I didn't change the type.

I discarded BLOB and ANY first because they didn't provide a clear idea of what was being stored in the column, but in retrospective they seem to remove all the considerations that I had to do above, so maybe they are better fitted for the type.

Why adding a default value to confirmation_time column if the anchor column constraint is to be NOT NULL so all copied values will be filled?

confirmation_time should have the same constraint as anchor, to be NOT NULL, but as UPDATE statements might be executed in already filled tables, you must provide a default value for all the rows you are going to add the new column to. As the confirmation_time extraction of the json blob in anchor cannot be performed in the same step, I had to add this default value.

This is flexibilizing the schema of the tables and extending the bug surface it may have, but I'm assuming the application layer will enforce the addition of a valid confirmation_time always.

Why the default value of confirmation_time column is -1?

Considering the other alternatives were to use the max value, min value or zero and confirmation time should always be positive, I considered -1 just to be computer and human readable enough to perceive there must be something wrong if the ConfirmationBlockTime retrieved by the load of the wallet has this value set as the confirmation time.

Why to not be STRICT with each statement?

It is a constraint only applicable to tables on creation.

Why not creating a whole new table without anchor column and with the confirmation_time column, copy the content from one to the other and drop the former table?

Computation cost. I didn't benchmark it, and I don't know how efficient is SQLite engine under the hood, but at first sight it seems copying a single column is better than copying four.

Changelog notice

Reduce rusqlite implementation only to ChangeSet<ConfirmationBlockTime>.
Replace anchor column in sqlite by confirmation_time.
Modify migrate_schema versioned_script parameter's type to &[&str].
Transformed existing schemas in methods to allow their individual application.

Checklists

All Submissions:

I've signed all my commits
I followed the contribution guidelines
I ran cargo fmt and cargo clippy before committing

New Features:

I've added tests for the new feature
I've added docs for the new feature

Bugfixes:

This pull request breaks the existing API
I've added tests to reproduce the issue which are now passing
I'm linking the issue being fixed by this PR

evanlinjin · 2024-11-30T23:02:00Z

Great work!

It will be cool if we can test backwards compatibility with an old db, but to do this we need some modifications.

We need to be able to get the schema scripts externally. I think we can create a method for each schema version (instead of initializing the schema_0 and schema_1 inline in init_sqlite_tables). The signature of migrate_schema may need to be changed to take in versioned_scripts: &[&[String]] instead.

nymius · 2024-12-02T15:08:19Z

@evanlinjin

(instead of initializing the schema_0 and schema_1 inline in init_sqlite_tables).

I should move the schemas to their own methods, but I should still keep the inline initialization from init_sqlite_tables to apply schemas all together, right?

pub fn init_sqlite_tables(db_tx: &rusqlite::Transaction) -> rusqlite::Result<()> {
    migrate_schema(
        db_tx,
        Self::SCHEMA_NAME,
        &[Self::schema_v0(), Self::schema_v1()],
    )
}

evanlinjin · 2024-12-02T21:17:19Z

@nymius yes, that looks good to me!

…gesets We want to not depend on serde_json. If we keep it around for serializing anchors we won't be able to remove it in the future because it will always be needed to do migrations. Currently there is only one widely used anchor, ConfirmationBlockTime. The desicion was to constrain support to just be for a single anchor type ConfirmationBlockTime. The anchor table will represent all fields of ConfirmationBlockTime, each one in its own column. The reasons: - No one is using rusqlite with any other anchor type, and if they are, they can do something custom anyway. - The anchor representation may change in the future, supporting for multiple Anchor types here will cause more problems for migration later on.

AnchorImpl was a wrapper created to allow the implementation of foreign traits, like From/ToJson from serde_json for external unknown structs implementing the Anchor trait. As the Anchor generic in the rusqlite implementation for anchored ChangeSets was the only place where this AnchorImpl was used and it has been fixed to the anchor ConfirmationBlockTime, there is no more reason to keep this wrapper around.

…t<BlockId> The only struct implementing rustqlite is ChangeSet<ConfirmationBlockTime> from c49ea85 on.

We would like to test backward compatibility of new schemas. To do so, we should be able to apply schemas independently. Why to change `rusqlite::execute` by `rusqlite::execute_batch`? - we are going to need to return the statements of the schemas as Strings, because we are now returning them from methods, it seemed redundant to keep getting references to something is already referencing data, i.e., keep moving around &String instead of String (consider `rusqlite::execute_batch` takes multiple statements as a single String) - we were calling `rusqlite::execute` with empty params, so we weren't trapped by the method's signature - `rustqlite::execute_batch` does the same than we were doing applying statements secuentially in a loop - the code doesn't lose error expresivity: `rusqlite::execute_batch` is going to fail with the same errors `rusqlite::execute` does BREAKING CHANGE: changes public method `migrate_schema` signature.

…stence check

Why just v0 to v1 test and not a general backward compatibility test? Is harder to craft a general compatibility test without prior knowledge of how future schemas would look like. Also, the creation of a backward compatibility test for each new schema change will allow the execution of incremental backward compatibility tests with better granularity.

nymius · 2024-12-04T17:32:22Z

Questions for reviewers:

Should we refactor migrate_schema to avoid the addition of a "dummies" schemas to ensure the application of the correct schema for already initialized databases? I.e.:

Calling migrate_schema(..., &[schema_v0, &schema_v0]) to only apply schema_v1 to a version 0 database?

To apply v1 schema to a database already using v0 schema we need the v0 schema inside the versioned scripts array, because if versioned_scripts == &[schema_v1], exec_from will return 0 (because the current schema version is v0) and versioned_scripts[0] will be skipped, effectively doing nothing.
To avoid the issue I used all available schemas in the versioned_scripts schemas, i.e.: &[schema_v0, schema_v1, ..., schema_vN] but I want to know if another "cleaner" idea should be considered

Code reference

pub fn migrate_schema(
    db_tx: &Transaction,
    schema_name: &str,
    versioned_scripts: &[String],
) -> rusqlite::Result<()> {
    init_schemas_table(db_tx)?;
    let current_version = schema_version(db_tx, schema_name)?;
    let exec_from = current_version.map_or(0_usize, |v| v as usize + 1);
    let scripts_to_exec = versioned_scripts.iter().enumerate().skip(exec_from);
    for (version, script) in scripts_to_exec {
        set_schema_version(db_tx, schema_name, version as u32)?;
        db_tx.execute_batch(script)?;
    }
    Ok(())
}

Changes rationale:

Why I used raw SQLite statements in the backward compatibility test to introduce changes to the database? I.e.:

REPLACE INTO {} (txid, block_height, block_hash, anchor) VALUES( ... )

Because I'm using the new code to introduce changes that would have been crafted by the old code, I don't have the needed methods to do it "programatically" with the methods available from the current implementation.

…ate_schema `&str` is documenting clearly that `migrate_schema` should only read `versioned_strings`. Also, as `schema_vN` methods will return `String`, rust will always automatically deref `&String` to `&str`. BREAKING CHANGE: changes parameter versioned_strings from public function migrate_schema from type &[String] to &[&str].

evanlinjin · 2024-12-05T23:56:56Z

Should we refactor migrate_schema to avoid the addition of a "dummies" schemas to ensure the application of the correct schema for already initialized databases? I.e.:

@nymius Could you elaborate more on this?

Do you mean, instead of applying schemas sequentially, we want to be able to initialize in one go for empty databases?

evanlinjin

ACK 1c81cd5

nymius · 2024-12-06T03:16:44Z

Should we refactor migrate_schema to avoid the addition of a "dummies" schemas to ensure the application of the correct schema for already initialized databases? I.e.:

@nymius Could you elaborate more on this?

Do you mean, instead of applying schemas sequentially, we want to be able to initialize in one go for empty databases?

Yes. In the added test I wanted to:

apply schema v0.
add changes to the database compatible with schema v0.
apply schema v1.
check database data was updated to schema v1.

migrate_schema skips the first n items of versioned_scripts param (the schemas) where n is the current schema version. That means, if I want to independently apply a new schema k+1 to a database already using schema k I should create a list of k+1 versioned_scripts (where "script" could be any string, even empty) as migrate_schema will
skip the k first anyway. If I don't do that, by the documentation of skip: In particular, if the original iterator is too short, then the returned iterator is empty. So the iterator is going to be empty and migrate_schema will do nothing.

TLDR: if you don't add the k dummies before the k+1 schema that you want to apply, your migration is going to do nothing. Does that look right? Should I refactor it too? Should we leave that change for another PR?

notmandatory · 2024-12-06T17:14:48Z

TLDR: if you don't add the k dummies before the k+1 schema that you want to apply, your migration is going to do nothing. Does that look right? Should I refactor it too? Should we leave that change for another PR?

If you're saying there is a bug in the schema migration logic please put that in a new PR, if it's a blocker for this PR then we'll need to review and merge it first. But should be easier to review as a stand-alone change.

nymius · 2024-12-06T18:23:54Z

@notmandatory this isn't a blocker, but I'm not sure if this merits another PR, will continue discussion in another issue if that's OK.

notmandatory

ACK 1c81cd5

LagginTimes assigned nymius Nov 30, 2024

ValuedMammal added this to the 1.0.0-beta milestone Dec 1, 2024

nymius added 8 commits December 4, 2024 14:14

build(chain): remove serde_json dependency

d41cb62

test(wallet): check persisted anchors does not lose data

c3ea54d

test(chain): use ChangeSet<ConfirmationBlockTime> instead of ChangeSe…

5603e9f

…t<BlockId> The only struct implementing rustqlite is ChangeSet<ConfirmationBlockTime> from c49ea85 on.

test(wallet): pattern match ChainPosition::Confirmed in anchors persi…

4ec7940

…stence check

nymius force-pushed the refactor/i-1695-remove-serde-json-dependency branch from 8c70584 to de28bcd Compare December 4, 2024 17:22

nymius force-pushed the refactor/i-1695-remove-serde-json-dependency branch from de28bcd to 1c81cd5 Compare December 4, 2024 17:43

evanlinjin approved these changes Dec 6, 2024

View reviewed changes

notmandatory added module-database dependencies Pull requests that update a dependency file labels Dec 9, 2024

notmandatory approved these changes Dec 9, 2024

View reviewed changes

notmandatory merged commit ab08b8c into bitcoindevkit:master Dec 9, 2024
20 checks passed

nymius deleted the refactor/i-1695-remove-serde-json-dependency branch December 9, 2024 19:11

notmandatory mentioned this pull request Dec 12, 2024

Release 1.0.0-beta.6 #1771

Closed

32 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Remove serde json dependency from chain crate #1752

Remove serde json dependency from chain crate #1752

Uh oh!

nymius commented Nov 29, 2024 •

edited

Loading

Uh oh!

evanlinjin commented Nov 30, 2024

Uh oh!

nymius commented Dec 2, 2024

Uh oh!

evanlinjin commented Dec 2, 2024

Uh oh!

nymius commented Dec 4, 2024 •

edited

Loading

Uh oh!

evanlinjin commented Dec 5, 2024

Uh oh!

evanlinjin left a comment

Uh oh!

nymius commented Dec 6, 2024

Uh oh!

notmandatory commented Dec 6, 2024

Uh oh!

nymius commented Dec 6, 2024

Uh oh!

notmandatory left a comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

Remove serde json dependency from chain crate #1752

Remove serde json dependency from chain crate #1752

Uh oh!

Conversation

nymius commented Nov 29, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Description

Notes to the reviewers

Changelog notice

Checklists

All Submissions:

New Features:

Bugfixes:

Uh oh!

evanlinjin commented Nov 30, 2024

Uh oh!

nymius commented Dec 2, 2024

Uh oh!

evanlinjin commented Dec 2, 2024

Uh oh!

nymius commented Dec 4, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Questions for reviewers:

Changes rationale:

Uh oh!

evanlinjin commented Dec 5, 2024

Uh oh!

evanlinjin left a comment

Choose a reason for hiding this comment

Uh oh!

nymius commented Dec 6, 2024

Uh oh!

notmandatory commented Dec 6, 2024

Uh oh!

nymius commented Dec 6, 2024

Uh oh!

notmandatory left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

nymius commented Nov 29, 2024 •

edited

Loading

nymius commented Dec 4, 2024 •

edited

Loading