-
Notifications
You must be signed in to change notification settings - Fork 54
Open
Description
Molecule mobley_3323117 (sulfolane) is written with the non-standard SMILES C1CC[S+2](C1)([O-])[O-], rather than the more standard C1CCS(=O)(=O)C1.
Despite being equivalent in total charge, these forms are inequivalent due to the provided formal charges (+2 for S, -1 for O) vs the standard SMILES (all atoms have 0 formal charge), which are rendered inequivalent in molecular representations in the OpenFF toolkit (with the OpenEye backend):
>>> from openff.toolkit.topology import Molecule
>>> freesolv_molecule = Molecule.from_smiles('C1CC[S+2](C1)([O-])[O-]')
>>> standard_molecule = Molecule.from_smiles('C1CCS(=O)(=O)C1')
>>> freesolv_molecule.generate_unique_atom_names()
>>> standard_molecule.generate_unique_atom_names()
>>> [(atom.name, atom.formal_charge.m) for atom in freesolv_molecule.atoms]
[('C1x', 0), ('C2x', 0), ('C3x', 0), ('S1x', 2), ('C4x', 0), ('O1x', -1), ('O2x', -1), ('H1x', 0), ('H2x', 0), ('H3x', 0), ('H4x', 0), ('H5x', 0), ('H6x', 0), ('H7x', 0), ('H8x', 0)]
>>> [(atom.name, atom.formal_charge.m) for atom in standard_molecule.atoms]
[('C1x', 0), ('C2x', 0), ('C3x', 0), ('S1x', 0), ('O1x', 0), ('O2x', 0), ('C4x', 0), ('H1x', 0), ('H2x', 0), ('H3x', 0), ('H4x', 0), ('H5x', 0), ('H6x', 0), ('H7x', 0), ('H8x', 0)]Would it be reasonable to correct the non-standard SMILES string and re-generate the database?
Or are there ways to automatically standardize the formal charges?
Metadata
Metadata
Assignees
Labels
No labels