Of course. I have analyzed the provided table and incorporated the new, distinct error types into my original list. I've also refined some of the existing error names and descriptions for better clarity and merged concepts that were fundamentally the same.
Here is the updated and more comprehensive table of parser errors for the OpenSMILES specification.
| Error Name | Error Description | Error Category | Snippet from Specification |
|---|---|---|---|
| Syntax Errors | |||
InvalidElementSymbol |
An atom specification contains an invalid element symbol (e.g., "Xx"). | Syntax | `atom ::= bracket_atom |
MismatchedBrackets |
An opening bracket [ is not matched with a closing bracket ]. |
Syntax | bracket_atom ::= '[' isotope? symbol chiral? hcount? charge? class? ']' |
UnmatchedBranchParentheses |
An opening parenthesis ( for a branch is not matched with a closing parenthesis ). |
Syntax | `branch ::= '(' chain ')' |
EmptyBranch |
A branch () contains no atoms or bonds. |
Syntax | `chain ::= extended_atom |
MisplacedBond |
A bond symbol appears at the very start of a SMILES string, or two bond symbols are adjacent. | Syntax | `chain ::= extended_atom |
InvalidBondSymbol |
A character used for a bond is not a valid bond symbol. | Syntax | `bond ::= '-' |
InvalidIsotopePlacement |
An isotope number is specified outside of an atom's brackets. | Syntax | bracket_atom ::= '[' isotope? symbol ... ']. The isotope must be inside the brackets. |
InvalidIsotopeFormat |
An isotope specification is not a valid integer. | Syntax | isotope ::= NUMBER |
InvalidChargeFormat |
The charge specification is not valid (e.g. +-1). Multiple + or - signs are also disallowed (e.g., ++, --). |
Syntax | `charge ::= '-' |
InvalidExplicitHydrogen |
The explicit hydrogen count (H) is not followed by a valid digit or is applied to a hydrogen atom itself ([HH1]). |
Syntax | `hcount ::= 'H' |
MisplacedChirality |
A chirality symbol is not directly associated with an atom inside brackets. | Syntax | bracket_atom ::= '[' ... chiral? ... ']. The chiral specification is part of a bracket_atom. |
InvalidChiralitySymbol |
The chirality specification is not one of the allowed symbols (@, @@, @TH1, etc.). |
Syntax | `chiral ::= '@' |
RingDigitOutOfRange |
A ring closure number is greater than 9 but is not preceded by a % sign. |
Syntax | `ringbond ::= bond? DIGIT |
UnmatchedRingNumber |
A ring closure number does not have a matching partner. | Syntax | ringbond ::= .... Semantically, each ring number must appear exactly twice. |
DotBeforeRingClosure |
A dot (.) character immediately precedes a ring closure digit. |
Syntax | The grammar allows dots to separate chains smiles ::= chain ('.' chain)*, but not to precede a ringbond on an atom. |
InvalidAtomClass |
An atom class is not specified as a colon (:) followed by a number. |
Syntax | class ::= ':' NUMBER |
| Semantic Errors | |||
ValenceExceeded |
An atom has more bonds (including implicit hydrogens) than its allowed valence. | Semantic | An atom from the organic subset that is not in brackets has "implicit hydrogens" added such that the valence...is in the lowest normal state. This calculated valence cannot be exceeded. |
InvalidAtomCombination |
The combination of element, explicit hydrogens, and charge is chemically impossible. | Semantic | A common example is [NH4-], where the components are syntactically valid but semantically nonsensical. |
InvalidRingClosure |
A ring is closed on the same atom (C11) or creates a parallel bond to an already connected atom (C1-C1). |
Semantic | C11 is described as "illegal, atom bonded to itself". C12CCCCC12 is "illegal, two bonds between one pair of atoms". |
InconsistentRingBonds |
The bond type specified at the two ends of a ring closure do not match (e.g., C-1...C=1). |
Semantic | C=1CCCCC=1. The specification implies that bonds for a given ring number must be consistent. |
InvalidChiralSpecification |
A chiral specification is applied to an atom with an insufficient or incorrect number of neighbors to be a valid chiral center. | Semantic | The specification for tetrahedral (@TH..), square-planar (@SP..), etc., each implies a specific number of connections. |
InvalidDoubleBondConfig |
Directional bonds (/, \) are used around a bond that is not a double bond. |
Semantic | The specification for cis/trans isomerism is based on the bonds around a double bond. |
InvalidIsotopeValue |
The specified isotope number is not a valid, positive integer for that element. | Semantic | isotope ::= NUMBER. While syntactically a number, a semantic check should ensure it is a positive integer. |
LowercaseNonAromatic |
A lowercase element symbol is used for an atom that is not in the allowed aromatic set (b, c, n, o, p, s, se, as). | Semantic | `aromatic_organic ::= 'b' |
InvalidAromaticBondUse |
An aromatic bond (:) is used in a context where aromaticity cannot be established (e.g., connecting two aliphatic atoms). |
Semantic | Aromatic bonds are used to connect atoms within an aromatic system. Their use outside this context is an error in chemical meaning. |
DisconnectedStructure |
A SMILES string represents multiple disconnected structures but fails to use the dot (.) separator. |
Semantic | smiles ::= chain ('.' chain)*. This implies that separate chemical graphs must be separated by a dot. |