Skip to content

Multiple denylist patterns are not applied to CDATA validation #4319

@westonruter

Description

@westonruter

Bug Description

It turns out that CDATA can have multiple blacklisted_cdata_regex constraints, for example:

    blacklisted_cdata_regex: {
      regex: "<!--"
      error_message: "html comments"
    }
    # These regex blacklists are temporary hacks to validate essential CSS
    # rules. They will be replaced later with more principled solutions.
    blacklisted_cdata_regex: {
      regex: "(^|\\W)i-amphtml-"
      error_message: "CSS i-amphtml- name prefix"
    }
    blacklisted_cdata_regex: {
      regex: "!\\s*important"
      error_message: "CSS !important"
    }

However, the Python spec parser is only capturing one:

'cdata' => array(
'blacklisted_cdata_regex' => array(
'error_message' => 'CSS !important',
'regex' => '!\\s*important',
),
'css_spec' => array(

And the sanitizer is only expecting one:

if ( isset( $cdata_spec['blacklisted_cdata_regex'] ) ) {
if ( preg_match( '@' . $cdata_spec['blacklisted_cdata_regex']['regex'] . '@u', $element->textContent ) ) {
if ( isset( $cdata_spec['blacklisted_cdata_regex']['error_message'] ) ) {
// There are only a few error messages, so map them to error codes.
switch ( $cdata_spec['blacklisted_cdata_regex']['error_message'] ) {
case 'CSS !important':
return [ 'code' => self::INVALID_CDATA_CSS_IMPORTANT ];
case 'contents':
return [ 'code' => self::INVALID_CDATA_CONTENTS ];
case 'html comments':
return [ 'code' => self::INVALID_CDATA_HTML_COMMENTS ];
}
}
// Note: This fallback case is not currently reachable because all error messages are accounted for in the switch statement.
return [ 'code' => self::CDATA_VIOLATES_BLACKLIST ];
}

This means at present CSS selectors that contain i-amphtml-* will not get caught by the sanitizer, even though they are invalid. Related: #771.

We need to fix how the spec is parsed for blacklisted_cdata_regex. We should also check to see if this same thing is happening for other properties. At first I thought blacklisted_value_regex would be a candidate, but it seems to already be changed to concatenate all denied patterns into a single regex.

Expected Behaviour

Multiple blacklisted_cdata_regex constraints should be captured and applied during sanitization/validation.

Steps to reproduce

  1. Go to '...'
  2. Click on '....'
  3. Scroll down to '....'
  4. See error

Screenshots

Additional context

  • WordPress version:
  • Plugin version:
  • Gutenberg plugin version (if applicable):
  • AMP plugin template mode:
  • PHP version:
  • OS:
  • Browser: [e.g. chrome, safari]
  • Device: [e.g. iPhone6]

Do not alter or remove anything below. The following sections will be managed by moderators only.

Acceptance criteria

Implementation brief

QA testing instructions

Demo

Changelog entry

Metadata

Metadata

Assignees

Labels

BugSomething isn't workingCSSChangeloggedWhether the issue/PR has been added to release notes.P2Low priorityValidationWS:CoreWork stream for Plugin core

Type

No type

Projects

No projects

Milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions