Refactor `json_serialization_test.py` to use schema-based hierarchical tests by rwxayheee · Pull Request #338 · forlilab/Meeko

rwxayheee · 2025-04-11T07:03:19Z

This is a rewrite of json_serialization_test.py to achieve the three major objectives:

Code efficiency
This PR follows the hierarchical spirit of the original JSON tests, in a schema-first way with minimal redundant codes. Hopefully this is easier for long-term maintainance and automated test coverage assessment.
Ensure exhaustive attribute checking
This PR exhaustively checks all attributes of all JSON-parsable classes, without hardcoding the list of attributes to check. As requested in added function stitch from PR188, with options residues_to_add and bonds_to_use #306 (comment) by @diogomart, the JSON-interchange logics of new attributes must be checked. This PR demands all attributes to be evaluated in original and decoded object, unless explicitly ignored. Otherwise, pytest will throw errors.
Deep equality check and bug fixes
This PR implements a deep recursive equality check, instead of hardcoding how each attribute should be evaluated. Because of this, some bugs are exposed.

(Bug fixes will be added in separate commits, in case we want to fix actual problems without major changes of the tests)

rwxayheee · 2025-04-11T16:40:53Z

Below is a little more technical details for future reference:

Design explained

This PR rewrites the tests using a schema-first principle to exhaustively validate (i) the serialization and deserialization process of JSON parsable classes, and (ii) the deep equality of original and decoded instances, based on the internal structure of the original object. The main components in the updated tests are:

JSONParsable classes subject to serialization tests
Additionally, it is possible to specify attributes to skip by EQUALITY_SKIP_FIELDS
Input data for testing and fixtures
These construct the highest-level objects, two Polymer instances and one default ResidueChemTemplates
Hierarchical tests
These tests (see an example below) have two parts. First, @pytest.mark.parametrize gives the list of JSON-parsable classes to be iterated for seralization/deserialization tests. Then, the test_json_roundtrip test performs deep equality tests for all attributes in all corresponding instances made from the subobject_factory function. The recursion tree, which describes the parent/child relationship between different classes and how one can be part of another is specified in subobject_factory.

# iterate over nested classes in the Polymer hierarchy
@pytest.mark.parametrize("cls", [
    Polymer,
    Monomer,
    RDKitMoleculeSetup,
])
# check for seralization/deserialization and deep equality
def test_json_roundtrip(cls, populated_polymer):
    """Tests starting from a populated polymer object"""
    for obj in subobject_factory(cls, populated_polymer):
        json_str = obj.to_json()
        decoded = cls.from_json(json_str)
        assert isinstance(decoded, cls)
        deep_assert_equal(decoded, obj)

Other tests
There are just a few more tests that were kept as is. Overall, the updated version is supposed to be a strict superset, with a greater and deeper coverage compared to the original version.

rwxayheee · 2025-04-11T16:46:00Z

Bugs discovered

Running the tests points out some minor issues:

FAILED test/json_serialization_test.py::test_json_roundtrip[Polymer] - AssertionError: [root.residue_chem_templates.padders.3-prime.adjacent_smartsmol] Mol SMILES mismatch
FAILED test/json_serialization_test.py::test_json_roundtrip[Monomer] - AssertionError: [root.molsetup.rmsd_symmetry_indices] Type mismatch: <class 'list'> != <class 'tuple'>
FAILED test/json_serialization_test.py::test_json_roundtrip[RDKitMoleculeSetup] - AssertionError: [root.rmsd_symmetry_indices] Type mismatch: <class 'list'> != <class 'tuple'>
FAILED test/json_serialization_test.py::test_json_roundtrip_missing[Polymer] - AssertionError: [root.residue_chem_templates.padders.3-prime.adjacent_smartsmol] Mol SMILES mismatch
FAILED test/json_serialization_test.py::test_json_rct[ResidueChemTemplates] - AssertionError: [root.padders.3-prime.adjacent_smartsmol] Mol SMILES mismatch
FAILED test/json_serialization_test.py::test_json_rct[ResiduePadder] - AssertionError: [root.adjacent_smartsmol] Mol SMILES mismatch
FAILED test/json_serialization_test.py::test_json_molsetup[RDKitMoleculeSetup] - AssertionError: [root.rmsd_symmetry_indices] Type mismatch: <class 'list'> != <class 'tuple'>
=========================================================== 7 failed, 79 passed, 1 skipped, 2 warnings in 3.79s ===========================================================

These will be fixed in additional, separate commits to address each issue.

because roundtrip MolToSmarts-MolFromSmarts not preserving all atom specs for class ResiduePadder attribute adjacent_smartsmol

…msd_symmetry_indices in decoding logic

rwxayheee · 2025-04-11T23:37:00Z

Issues and solutions

(1) The MolToSmarts-MolFromSmarts roundtrip not preserving all atom specs (3d6858c)

FAILED test/json_serialization_test.py::test_json_roundtrip[Polymer] - AssertionError: [root.residue_chem_templates.padders.3-prime.adjacent_smartsmol] Mol SMILES mismatch

This problem was discussed in rdkit/rdkit#3193. Here it occurred because we are serializing/deserializing SMARTS via a molecule attribute: class ResiduePadder attribute adjacent_smartsmol. There is possible information loss in the roundtrip (see example below). A potential a workaround would be to serialize the SMARTS directly without going through the roundtrip.

from rdkit import Chem

smarts_in = "[P:11]([O-:12])(=[O:13])[O:14][C:15]"
mol_1 = Chem.MolFromSmarts(smarts_in)
smarts_2 = Chem.MolToSmarts(mol_1)
print(f"Original SMARTS: {smarts_in}")
print(f"Converted SMARTS: {smarts_2}")
mol_2 = Chem.MolFromSmarts(smarts_2)

# Compare charges
for a1, a2 in zip(mol_1.GetAtoms(), mol_2.GetAtoms()):
    print(f"Atom {a1.GetSymbol()} — original: {a1.GetFormalCharge()}, decoded: {a2.GetFormalCharge()}")

Output:

Original SMARTS: [P:11]([O-:12])(=[O:13])[O:14][C:15]
Converted SMARTS: [P:11]([O&-:12])(=[O:13])[O:14][C:15]
Atom P — original: 0, decoded: 0
Atom O — original: -1, decoded: 0
Atom O — original: 0, decoded: 0
Atom O — original: 0, decoded: 0
Atom C — original: 0, decoded: 0

(2) Small type fix for class RDKitMoleculeSetup attribute rmsd_symmetry_indices in encoding logic (63fb250)

FAILED test/json_serialization_test.py::test_json_roundtrip[Monomer] - AssertionError: [root.molsetup.rmsd_symmetry_indices] Type mismatch: <class 'list'> != <class 'tuple'>

(3) Some bug fixes within class ResidueChemTemplatese (e8d734b)
There're some minor issues with variable names and scope of data_path.

(4) Remove unused attribute atom_true_count from class RDKitMoleculeSetup (2c94571)

(5) Similar to (2), restores tuple (converting list to tuple in decoding logic) in nested lists under class Polymer attr log (a46f885)

rwxayheee · 2025-04-12T01:23:58Z

(970b1e9) are improvements to attribute detection and comparison in the deep equality test. Now, the tests are able to capture the following problems:

What if...?

(1) What if we add an attribute in the __init__ of the class, without revision of the serialization/deserialization logic for it?

For example, as discussed in conversation #306 (comment), adding bonds to Polymer:

        _bonds = {}
        for key, bond_list in bonds.items():
            monomer1 = self.monomers[key[0]]
            monomer2 = self.monomers[key[1]]
            if monomer1.rdkit_mol is None or monomer2.rdkit_mol is None:
                continue
            invmap1 = {j: i for i, j in monomer1.mapidx_to_raw.items()}
            invmap2 = {j: i for i, j in monomer2.mapidx_to_raw.items()}
            _bonds[key] = [(invmap1[b[0]], invmap2[b[1]]) for b in bond_list]
        bonds = _bonds
        self.bonds = bonds

Expected pytest output

============================================================================== short test summary info ==============================================================================
FAILED test/json_serialization_test.py::test_json_roundtrip[Polymer] - AssertionError: [root.bonds] Dict keys mismatch

(2) What if the original instance for some reason got additional attribute that wasn't passed to the decoded object?

Example output

============================================================================== short test summary info ==============================================================================
FAILED test/json_serialization_test.py::test_json_roundtrip[Polymer] - AssertionError: [root] Missing attribute: additional_attr
FAILED test/json_serialization_test.py::test_json_roundtrip_missing[Polymer] - AssertionError: [root] Missing attribute: additional_attr

(3) What if the decoded instance for some reason got additional attribute that wasn't present in original object?

Example output

============================================================================== short test summary info ==============================================================================
FAILED test/json_serialization_test.py::test_json_roundtrip_missing[Polymer] - AssertionError: [root] Extra attributes in decoded object: {'excess_attr'}
FAILED test/json_serialization_test.py::test_json_roundtrip_missing[Monomer] - AssertionError: [root] Extra attributes in decoded object: {'excess_attr'}

rwxayheee · 2025-04-12T03:07:14Z

The updated test script currently includes four separable (mutually independent) test groups in #region Hierarchical Tests. It's very doable to add a new test group for a full hierachical test. Here're the general steps:

Developer Usage: Steps to write a new, independent hierarchical test

Here's a practical example:

step 1: Import the necessary data and create a fixture as needed

pkgdir = pathlib.Path(meeko.__file__).parents[1]
ahhy_v061_json = pkgdir / "test/polymer_data/AHHY-v0.6.1.json"
@pytest.fixture
def populated_polymer_v061():
    """fixture for a populated polymer object, from a v0.6.1 JSON file"""
    with open(ahhy_v061_json) as f:
        json_str = f.read()
    polymer = Polymer.from_json(json_str)
    if len(polymer.monomers) == 0:
        raise ValueError("Polymer creation failed")
    return polymer

step 2: Define an object factory

def lazy_factory(cls, root): 
    """Factory function to create a single object based on the class and root object."""
    if isinstance(root, Polymer):
        if cls is Polymer:
            return root
        monomers = list(root.monomers.values())
        if not monomers:
            raise ValueError("No monomers found in Polymer")
        last_monomer = monomers[-1]
        if cls is Monomer:
            return last_monomer
        if cls is RDKitMoleculeSetup:
            return last_monomer.molsetup
    raise ValueError(f"Unexpected class or root: {cls}, {type(root)}")

step 3: Define the class to be iterated and include nested JSON-parsable classes, if any

@pytest.mark.parametrize("cls", [
    Polymer,
    Monomer,
    RDKitMoleculeSetup,
])

step 4: Write the test_ function, in which serialization-deserialization and deep equality checks take place in order

def test_json_roundtrip_lazy(cls, populated_polymer_v061):
    """Lazy tests starting from a populated polymer object"""
    obj = lazy_factory(cls, populated_polymer_v061)
    if obj is None:
        warnings.warn(f"Subobject of type {cls.__name__} is None — skipping.", stacklevel=1)
        pass
    json_str = obj.to_json()
    decoded = cls.from_json(json_str)
    assert isinstance(decoded, cls)
    deep_assert_equal(decoded, obj)

The fixture (test case), object factory (scope) & classes to be iterated (depth of testing), and the testing logics are all flexible.

(1) incorrect var name in padder parsing from data (2) scope issue of default data_path

multiple polymer fixtures; let pass when original obj is None; remove functionally redundant test

rwxayheee · 2025-04-12T08:32:14Z

(Just realized that EQUALITY_SKIP_FIELDS is currently non-empty but should be empty if there are no remaining issues. Currently all tests are passing because it suppresses potential failures

EQUALITY_SKIP_FIELDS = { 
    RDKitMoleculeSetup: {"atom_true_count" },
    Monomer: {"template", "link_labels"},
}

(working on two small fixes before finalizing.)

rwxayheee · 2025-04-14T22:22:30Z

The test script is finalized, but the tests are not passing without exceptions (EQUALITY_SKIP_FIELDS) when there is unfinished attribute like Monomer: {"template", "link_labels"},. This was #293 and has yet to be solved. Therefore, I will take a look

rwxayheee · 2025-04-15T08:04:46Z

This should be ready. Overall, this PR has a rewritten version of the JSON roundtrip tests, and some minor fixes. I will write some more tests for residue templates in another PR.

refactor json tests

5faff37

rwxayheee marked this pull request as draft April 11, 2025 07:05

rwxayheee added 2 commits April 11, 2025 08:56

add doc string and beautify

27dc066

small correction

b8190cb

rwxayheee added 2 commits April 11, 2025 16:24

fix 1: serialize the padder's adjacent res SMARTS directly

3d6858c

because roundtrip MolToSmarts-MolFromSmarts not preserving all atom specs for class ResiduePadder attribute adjacent_smartsmol

fix 2: small type correction for class RDKitMoleculeSetup attribute r…

63fb250

…msd_symmetry_indices in decoding logic

improve attr detection in deep equality test

970b1e9

rwxayheee mentioned this pull request Apr 12, 2025

Remove attribute MoleculeSetup.is_sidechain #339

Merged

rwxayheee added 5 commits April 11, 2025 21:22

Merge remote-tracking branch 'origin/develop' into better_json_tests

4d24b1c

fix 3 in class ResidueChemTemplates

e8d734b

(1) incorrect var name in padder parsing from data (2) scope issue of default data_path

added a full hierarchical test for ahhy_v061_json

79ce856

improvements with v061 file: (1) use pytest.mark.parametrize for

102beaf

multiple polymer fixtures; let pass when original obj is None; remove functionally redundant test

small fixes; print warnings when original obj is None

cabc50f

rwxayheee marked this pull request as ready for review April 12, 2025 07:44

rwxayheee requested review from diogomart and removed request for diogomart April 12, 2025 07:53

rwxayheee marked this pull request as draft April 12, 2025 08:09

fix 4: remove attr atom_true_count from class RDKitMoleculeSetup

8879c81

fix 4: remove unused attr atom_true_count from class RDKitMoleculeSetup

2c94571

rwxayheee mentioned this pull request Apr 15, 2025

Templates stored within Polymer #293

Open

rwxayheee added 3 commits April 14, 2025 23:14

add logging that prints object path, class and attr name being evaluated

ad7bcd1

Merge remote-tracking branch 'origin/develop' into better_json_tests

6cb4450

fix 5: ensure JSON roundtrip restores tuple

a46f885

add JSON roundtrip test for a polymer from pqr

b4c69de

rwxayheee marked this pull request as ready for review April 15, 2025 07:34

rwxayheee requested a review from diogomart April 15, 2025 08:04

This was referenced Apr 23, 2025

remove unused Atom.interaction_vectors #346

Merged

Add docs on how to add templates for 9MXW #355

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Refactor `json_serialization_test.py` to use schema-based hierarchical tests#338

Refactor `json_serialization_test.py` to use schema-based hierarchical tests#338
rwxayheee wants to merge 17 commits intoforlilab:developfrom
rwxayheee:better_json_tests

rwxayheee commented Apr 11, 2025 •

edited

Loading

Uh oh!

rwxayheee commented Apr 11, 2025 •

edited

Loading

Uh oh!

rwxayheee commented Apr 11, 2025 •

edited

Loading

Uh oh!

rwxayheee commented Apr 11, 2025 •

edited

Loading

Uh oh!

rwxayheee commented Apr 12, 2025 •

edited

Loading

Uh oh!

rwxayheee commented Apr 12, 2025 •

edited

Loading

Uh oh!

rwxayheee commented Apr 12, 2025

Uh oh!

rwxayheee commented Apr 14, 2025

Uh oh!

rwxayheee commented Apr 15, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

rwxayheee commented Apr 11, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

rwxayheee commented Apr 11, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Design explained

Uh oh!

rwxayheee commented Apr 11, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Bugs discovered

Uh oh!

rwxayheee commented Apr 11, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Issues and solutions

Uh oh!

rwxayheee commented Apr 12, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

What if...?

Uh oh!

rwxayheee commented Apr 12, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Developer Usage: Steps to write a new, independent hierarchical test

Uh oh!

rwxayheee commented Apr 12, 2025

Uh oh!

rwxayheee commented Apr 14, 2025

Uh oh!

rwxayheee commented Apr 15, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

rwxayheee commented Apr 11, 2025 •

edited

Loading

rwxayheee commented Apr 11, 2025 •

edited

Loading

rwxayheee commented Apr 11, 2025 •

edited

Loading

rwxayheee commented Apr 11, 2025 •

edited

Loading

rwxayheee commented Apr 12, 2025 •

edited

Loading

rwxayheee commented Apr 12, 2025 •

edited

Loading