Skip to content

Port HtmlParser over from amphtml #272

@schlessera

Description

@schlessera

The validator cannot be built on top of the Dom\Document we are using for the sanitizer and optimizer, as it requires precise line/column/length coordinates for pinpointing validation issues in the source files.

The NodeJS validator uses a SAX parser to traverse the HTML, with the actual validation engine being a handler that gets triggered by the SAX events, i.e. startTag(), endTag(), ...

After looking at existing HTML SAX parsers in PHP, my conclusion is to port over the parser implementation from NodeJS instead of reusing an existing PHP HTML SAX parser, for the following reasons:

  • no implementation was recently maintained;
  • third-party dependencies should be avoided whenever we can for the toolbox;
  • a lot of the hard-coded logic of the parser is already found in the toolbox because we needed parts for other tools...
  • ...therefore it can be ported with only modest effort.

Metadata

Metadata

Assignees

No one assigned

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions