Skip to content

Latest commit

 

History

History
37 lines (34 loc) · 6.13 KB

File metadata and controls

37 lines (34 loc) · 6.13 KB

Smiles Errors

Of course. I have analyzed the provided table and incorporated the new, distinct error types into my original list. I've also refined some of the existing error names and descriptions for better clarity and merged concepts that were fundamentally the same.

Here is the updated and more comprehensive table of parser errors for the OpenSMILES specification.

Comprehensive Parser Errors for OPENSMILES

Error Name Error Description Error Category Snippet from Specification
Syntax Errors
InvalidElementSymbol An atom specification contains an invalid element symbol (e.g., "Xx"). Syntax `atom ::= bracket_atom
MismatchedBrackets An opening bracket [ is not matched with a closing bracket ]. Syntax bracket_atom ::= '[' isotope? symbol chiral? hcount? charge? class? ']'
UnmatchedBranchParentheses An opening parenthesis ( for a branch is not matched with a closing parenthesis ). Syntax `branch ::= '(' chain ')'
EmptyBranch A branch () contains no atoms or bonds. Syntax `chain ::= extended_atom
MisplacedBond A bond symbol appears at the very start of a SMILES string, or two bond symbols are adjacent. Syntax `chain ::= extended_atom
InvalidBondSymbol A character used for a bond is not a valid bond symbol. Syntax `bond ::= '-'
InvalidIsotopePlacement An isotope number is specified outside of an atom's brackets. Syntax bracket_atom ::= '[' isotope? symbol ... ']. The isotope must be inside the brackets.
InvalidIsotopeFormat An isotope specification is not a valid integer. Syntax isotope ::= NUMBER
InvalidChargeFormat The charge specification is not valid (e.g. +-1). Multiple + or - signs are also disallowed (e.g., ++, --). Syntax `charge ::= '-'
InvalidExplicitHydrogen The explicit hydrogen count (H) is not followed by a valid digit or is applied to a hydrogen atom itself ([HH1]). Syntax `hcount ::= 'H'
MisplacedChirality A chirality symbol is not directly associated with an atom inside brackets. Syntax bracket_atom ::= '[' ... chiral? ... ']. The chiral specification is part of a bracket_atom.
InvalidChiralitySymbol The chirality specification is not one of the allowed symbols (@, @@, @TH1, etc.). Syntax `chiral ::= '@'
RingDigitOutOfRange A ring closure number is greater than 9 but is not preceded by a % sign. Syntax `ringbond ::= bond? DIGIT
UnmatchedRingNumber A ring closure number does not have a matching partner. Syntax ringbond ::= .... Semantically, each ring number must appear exactly twice.
DotBeforeRingClosure A dot (.) character immediately precedes a ring closure digit. Syntax The grammar allows dots to separate chains smiles ::= chain ('.' chain)*, but not to precede a ringbond on an atom.
InvalidAtomClass An atom class is not specified as a colon (:) followed by a number. Syntax class ::= ':' NUMBER
Semantic Errors
ValenceExceeded An atom has more bonds (including implicit hydrogens) than its allowed valence. Semantic An atom from the organic subset that is not in brackets has "implicit hydrogens" added such that the valence...is in the lowest normal state. This calculated valence cannot be exceeded.
InvalidAtomCombination The combination of element, explicit hydrogens, and charge is chemically impossible. Semantic A common example is [NH4-], where the components are syntactically valid but semantically nonsensical.
InvalidRingClosure A ring is closed on the same atom (C11) or creates a parallel bond to an already connected atom (C1-C1). Semantic C11 is described as "illegal, atom bonded to itself". C12CCCCC12 is "illegal, two bonds between one pair of atoms".
InconsistentRingBonds The bond type specified at the two ends of a ring closure do not match (e.g., C-1...C=1). Semantic C=1CCCCC=1. The specification implies that bonds for a given ring number must be consistent.
InvalidChiralSpecification A chiral specification is applied to an atom with an insufficient or incorrect number of neighbors to be a valid chiral center. Semantic The specification for tetrahedral (@TH..), square-planar (@SP..), etc., each implies a specific number of connections.
InvalidDoubleBondConfig Directional bonds (/, \) are used around a bond that is not a double bond. Semantic The specification for cis/trans isomerism is based on the bonds around a double bond.
InvalidIsotopeValue The specified isotope number is not a valid, positive integer for that element. Semantic isotope ::= NUMBER. While syntactically a number, a semantic check should ensure it is a positive integer.
LowercaseNonAromatic A lowercase element symbol is used for an atom that is not in the allowed aromatic set (b, c, n, o, p, s, se, as). Semantic `aromatic_organic ::= 'b'
InvalidAromaticBondUse An aromatic bond (:) is used in a context where aromaticity cannot be established (e.g., connecting two aliphatic atoms). Semantic Aromatic bonds are used to connect atoms within an aromatic system. Their use outside this context is an error in chemical meaning.
DisconnectedStructure A SMILES string represents multiple disconnected structures but fails to use the dot (.) separator. Semantic smiles ::= chain ('.' chain)*. This implies that separate chemical graphs must be separated by a dot.