Skip to content

Conversation

@nymius
Copy link
Contributor

@nymius nymius commented Nov 29, 2024

Description

As expressed by @LLFourn We want to not depend on serde_json. If we keep it around for serializing anchors we won't be able to remove it in the future because it will always be needed to do migrations
Currently there is only one widely used anchor, ConfirmationBlockTime.

The decision was to constrain support to just be for a single anchor type ConfirmationBlockTime. The anchor table will represent all fields of ConfirmationBlockTime, each one in its own column.

The reasons:

  • No one is using rusqlite with any other anchor type, and if they are, they can do something custom anyway.
  • The anchor representation may change in the future, supporting for multiple Anchor types here will cause more problems for migration later on.

Resolves #1695

Notes to the reviewers

Why the type of the confirmation_time column is INTEGER?

By sqlite3 docs:

Each value stored in an SQLite database (or manipulated by the database engine) has one of the following storage classes:
...
INTEGER. The value is a signed integer, stored in 0, 1, 2, 3, 4, 6, or 8 bytes depending on the magnitude of the value.

(Remember confirmation time is u64)

REAL. The value is a floating point value, stored as an 8-byte IEEE floating point number.
...
SQLite uses a more general dynamic type system. In SQLite, the datatype of a value is associated with the value itself, not with its container.
...
In order to maximize compatibility between SQLite and other database engines, ..., SQLite supports the concept of "type affinity" on columns. The type affinity of a column is the recommended type for data stored in that column. The important idea here is that the type is recommended, not required. Any column can still store any type of data. It is just that some columns, given the choice, will prefer to use one storage class over another. The preferred storage class for a column is called its "affinity".
...
A column with NUMERIC affinity may contain values using all five storage classes. When text data is inserted into a NUMERIC column, the storage class of the text is converted to INTEGER or REAL (in order of preference) if the text is a well-formed integer or real literal, respectively.
A column that uses INTEGER affinity behaves the same as a column with NUMERIC affinity.

But anchor table was defined using the STRICT keyword, again, by sqlite docs:

... The STRICT keyword causes the following differences:
...
SQLite attempts to coerce the data into the appropriate type using the usual affinity rules, as PostgreSQL, MySQL, SQL Server, and Oracle all do. If the value cannot be losslessly converted in the specified datatype, then an SQLITE_CONSTRAINT_DATATYPE error is raised.

So, the TLDR, with some help of this blog post is:

  • SQLite INTEGER supported values range from -2**63 to (2**63-1)
  • If the number is bigger it will treat the number as REAL.
  • As the SQLite table is defined with the STRICT keyword, it should enforce a losslessly conversion or fail with SQLITE_CONSTRAINT_DATATYPE.
Why not setting confirmation_time as BLOB or NUMERIC?

I don't have a strong opinion on this. INTEGER was the first numeric type I found, then later I found NUMERIC and they seemed to behave in the same way, so I didn't change the type.

I discarded BLOB and ANY first because they didn't provide a clear idea of what was being stored in the column, but in retrospective they seem to remove all the considerations that I had to do above, so maybe they are better fitted for the type.

Why adding a default value to confirmation_time column if the anchor column constraint is to be NOT NULL so all copied values will be filled?

confirmation_time should have the same constraint as anchor, to be NOT NULL, but as UPDATE statements might be executed in already filled tables, you must provide a default value for all the rows you are going to add the new column to. As the confirmation_time extraction of the json blob in anchor cannot be performed in the same step, I had to add this default value.

This is flexibilizing the schema of the tables and extending the bug surface it may have, but I'm assuming the application layer will enforce the addition of a valid confirmation_time always.

Why the default value of confirmation_time column is -1?

Considering the other alternatives were to use the max value, min value or zero and confirmation time should always be positive, I considered -1 just to be computer and human readable enough to perceive there must be something wrong if the ConfirmationBlockTime retrieved by the load of the wallet has this value set as the confirmation time.

Why to not be STRICT with each statement?

It is a constraint only applicable to tables on creation.

Why not creating a whole new table without anchor column and with the confirmation_time column, copy the content from one to the other and drop the former table?

Computation cost. I didn't benchmark it, and I don't know how efficient is SQLite engine under the hood, but at first sight it seems copying a single column is better than copying four.

Changelog notice

  • Reduce rusqlite implementation only to ChangeSet<ConfirmationBlockTime>.
  • Replace anchor column in sqlite by confirmation_time.
  • Modify migrate_schema versioned_script parameter's type to &[&str].
  • Transformed existing schemas in methods to allow their individual application.

Checklists

All Submissions:

  • I've signed all my commits
  • I followed the contribution guidelines
  • I ran cargo fmt and cargo clippy before committing

New Features:

  • I've added tests for the new feature
  • I've added docs for the new feature

Bugfixes:

  • This pull request breaks the existing API
  • I've added tests to reproduce the issue which are now passing
  • I'm linking the issue being fixed by this PR

@evanlinjin
Copy link
Member

Great work!

It will be cool if we can test backwards compatibility with an old db, but to do this we need some modifications.

We need to be able to get the schema scripts externally. I think we can create a method for each schema version (instead of initializing the schema_0 and schema_1 inline in init_sqlite_tables). The signature of migrate_schema may need to be changed to take in versioned_scripts: &[&[String]] instead.

@ValuedMammal ValuedMammal added this to the 1.0.0-beta milestone Dec 1, 2024
@nymius
Copy link
Contributor Author

nymius commented Dec 2, 2024

@evanlinjin

(instead of initializing the schema_0 and schema_1 inline in init_sqlite_tables).

I should move the schemas to their own methods, but I should still keep the inline initialization from init_sqlite_tables to apply schemas all together, right?

pub fn init_sqlite_tables(db_tx: &rusqlite::Transaction) -> rusqlite::Result<()> {
    migrate_schema(
        db_tx,
        Self::SCHEMA_NAME,
        &[Self::schema_v0(), Self::schema_v1()],
    )
}

@evanlinjin
Copy link
Member

@nymius yes, that looks good to me!

…gesets

We want to not depend on serde_json. If we keep it around for
serializing anchors we won't be able to remove it in the future because
it will always be needed to do migrations.
Currently there is only one widely used anchor, ConfirmationBlockTime.

The desicion was to constrain support to just be for a single anchor
type ConfirmationBlockTime. The anchor table will represent all fields
of ConfirmationBlockTime, each one in its own column.

The reasons:

- No one is using rusqlite with any other anchor type, and if they are,
  they can do something custom anyway.
- The anchor representation may change in the future, supporting for
  multiple Anchor types here will cause more problems for migration
  later on.
AnchorImpl was a wrapper created to allow the implementation of foreign
traits, like From/ToJson from serde_json for external unknown structs
implementing the Anchor trait.

As the Anchor generic in the rusqlite implementation for anchored
ChangeSets was the only place where this AnchorImpl was used and it has
been fixed to the anchor ConfirmationBlockTime, there is no more reason
to keep this wrapper around.
…t<BlockId>

The only struct implementing rustqlite is
ChangeSet<ConfirmationBlockTime> from
c49ea85 on.
We would like to test backward compatibility of new schemas.
To do so, we should be able to apply schemas independently.

Why to change `rusqlite::execute` by `rusqlite::execute_batch`?
- we are going to need to return the statements of the schemas as
  Strings, because we are now returning them from methods, it seemed
  redundant to keep getting references to something is already
  referencing data, i.e., keep moving around &String instead of String
  (consider `rusqlite::execute_batch` takes multiple statements as a
  single String)
- we were calling `rusqlite::execute` with empty params, so we weren't
  trapped by the method's signature
- `rustqlite::execute_batch` does the same than we were doing applying
  statements secuentially in a loop
- the code doesn't lose error expresivity: `rusqlite::execute_batch` is
  going to fail with the same errors `rusqlite::execute` does

BREAKING CHANGE: changes public method `migrate_schema` signature.
Why just v0 to v1 test and not a general backward compatibility test?

Is harder to craft a general compatibility test without prior knowledge
of how future schemas would look like. Also, the creation of a backward
compatibility test for each new schema change will allow the execution
of incremental backward compatibility tests with better granularity.
@nymius nymius force-pushed the refactor/i-1695-remove-serde-json-dependency branch from 8c70584 to de28bcd Compare December 4, 2024 17:22
@nymius
Copy link
Contributor Author

nymius commented Dec 4, 2024

Questions for reviewers:

  • Should we refactor migrate_schema to avoid the addition of a "dummies" schemas to ensure the application of the correct schema for already initialized databases? I.e.:

Calling migrate_schema(..., &[schema_v0, &schema_v0]) to only apply schema_v1 to a version 0 database?

To apply v1 schema to a database already using v0 schema we need the v0 schema inside the versioned scripts array, because if versioned_scripts == &[schema_v1], exec_from will return 0 (because the current schema version is v0) and versioned_scripts[0] will be skipped, effectively doing nothing.
To avoid the issue I used all available schemas in the versioned_scripts schemas, i.e.: &[schema_v0, schema_v1, ..., schema_vN] but I want to know if another "cleaner" idea should be considered

Code reference
pub fn migrate_schema(
    db_tx: &Transaction,
    schema_name: &str,
    versioned_scripts: &[String],
) -> rusqlite::Result<()> {
    init_schemas_table(db_tx)?;
    let current_version = schema_version(db_tx, schema_name)?;
    let exec_from = current_version.map_or(0_usize, |v| v as usize + 1);
    let scripts_to_exec = versioned_scripts.iter().enumerate().skip(exec_from);
    for (version, script) in scripts_to_exec {
        set_schema_version(db_tx, schema_name, version as u32)?;
        db_tx.execute_batch(script)?;
    }
    Ok(())
}

Changes rationale:

  • Why I used raw SQLite statements in the backward compatibility test to introduce changes to the database? I.e.:
REPLACE INTO {} (txid, block_height, block_hash, anchor) VALUES( ... )

Because I'm using the new code to introduce changes that would have been crafted by the old code, I don't have the needed methods to do it "programatically" with the methods available from the current implementation.

…ate_schema

`&str` is documenting clearly that `migrate_schema` should only read
`versioned_strings`.

Also, as `schema_vN` methods will return `String`, rust will always
automatically deref `&String` to `&str`.

BREAKING CHANGE: changes parameter versioned_strings from public
function migrate_schema from type &[String] to &[&str].
@nymius nymius force-pushed the refactor/i-1695-remove-serde-json-dependency branch from de28bcd to 1c81cd5 Compare December 4, 2024 17:43
@evanlinjin
Copy link
Member

Should we refactor migrate_schema to avoid the addition of a "dummies" schemas to ensure the application of the correct schema for already initialized databases? I.e.:

@nymius Could you elaborate more on this?

Do you mean, instead of applying schemas sequentially, we want to be able to initialize in one go for empty databases?

Copy link
Member

@evanlinjin evanlinjin left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ACK 1c81cd5

@nymius
Copy link
Contributor Author

nymius commented Dec 6, 2024

Should we refactor migrate_schema to avoid the addition of a "dummies" schemas to ensure the application of the correct schema for already initialized databases? I.e.:

@nymius Could you elaborate more on this?

Do you mean, instead of applying schemas sequentially, we want to be able to initialize in one go for empty databases?

Yes. In the added test I wanted to:

  1. apply schema v0.
  2. add changes to the database compatible with schema v0.
  3. apply schema v1.
  4. check database data was updated to schema v1.

migrate_schema skips the first n items of versioned_scripts param (the schemas) where n is the current schema version. That means, if I want to independently apply a new schema k+1 to a database already using schema k I should create a list of k+1 versioned_scripts (where "script" could be any string, even empty) as migrate_schema will
skip the k first anyway. If I don't do that, by the documentation of skip: In particular, if the original iterator is too short, then the returned iterator is empty. So the iterator is going to be empty and migrate_schema will do nothing.

TLDR: if you don't add the k dummies before the k+1 schema that you want to apply, your migration is going to do nothing. Does that look right? Should I refactor it too? Should we leave that change for another PR?

@notmandatory
Copy link
Member

TLDR: if you don't add the k dummies before the k+1 schema that you want to apply, your migration is going to do nothing. Does that look right? Should I refactor it too? Should we leave that change for another PR?

If you're saying there is a bug in the schema migration logic please put that in a new PR, if it's a blocker for this PR then we'll need to review and merge it first. But should be easier to review as a stand-alone change.

@nymius
Copy link
Contributor Author

nymius commented Dec 6, 2024

@notmandatory this isn't a blocker, but I'm not sure if this merits another PR, will continue discussion in another issue if that's OK.

@notmandatory notmandatory added module-database dependencies Pull requests that update a dependency file labels Dec 9, 2024
Copy link
Member

@notmandatory notmandatory left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ACK 1c81cd5

@notmandatory notmandatory merged commit ab08b8c into bitcoindevkit:master Dec 9, 2024
20 checks passed
@nymius nymius deleted the refactor/i-1695-remove-serde-json-dependency branch December 9, 2024 19:11
@notmandatory notmandatory mentioned this pull request Dec 12, 2024
32 tasks
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

dependencies Pull requests that update a dependency file module-database

Projects

Archived in project

Development

Successfully merging this pull request may close these issues.

SQLite shouldn't depend on serde_json

4 participants