Skip to content

Incorrect processing of HTML attributes containing '/' character #83

@agr

Description

@agr

When input markdown contains HTML tags with attributes that contain / character (URLs being the most obvious cause), library fails to parse it properly.

Example input:

<iframe width='400' height='300' src='https://github.com'></iframe>

The output:

<p>&lt;iframe width='400' height='300' src='https://github.com'&gt;</iframe></p>

Expected output: HTML should pass through more or less untouched:

<iframe width='400' height='300' src='https://github.com'></iframe>

The issue here is that HtmlTag.ParseHelper does not correctly handle the / character in the attribute values, considering it, I guess, the end of tag, and then deciding that HTML is malformed and treats it as any other text.

The fix that worked for me is to replace:

while (!p.eof && !char.IsWhiteSpace(p.current) && p.current != '>' && p.current != '/')

line with:

while (!p.eof && !char.IsWhiteSpace(p.current) && p.current != '>' && !p.DoesMatch("/>"))

But I am not sure it won't break something else.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions