Skip to content

PHRASING_ELEMS missing <u>, <s>, and <bdi> elements causes paragraph splitting #997

@gregko

Description

@gregko

Problem

The PHRASING_ELEMS array is missing several HTML5 phrasing content elements: U, S, and BDI.

This causes Readability to incorrectly treat these inline elements as block-level, splitting paragraphs around them.

Example

Input HTML:

<p>Seigneur, mon Die<u>u</u> et mon salut</p>

Output (incorrect):

<p>Seigneur, mon Die</p><u>u</u><p> et mon salut</p>

The <u> tag is being treated as a block element, breaking the paragraph into three parts.

Solution
Add the missing phrasing content elements to PHRASING_ELEMS:

 PHRASING_ELEMS: [
   // ... existing elements ...
  "BDI",  // Bidirectional text isolation
  "S",    // Strikethrough
  "U",    // Underline
   // ...
 ]

Per HTML spec, these are all valid phrasing content elements that should be kept inline within paragraphs.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions