-
Notifications
You must be signed in to change notification settings - Fork 79
Description
I am using this to parse emails, but my emails are in unparsed, raw HTML (because I plan to later display them on a webpage and I want to preserve their formatting). The problem is that since this this package uses /^\s*(On(?:(?!.*On\b|\bwrote:)[\s\S])+wrote:)$/m to match the quote headers, if I don't parse and get rid of all the tags (which I don't want to do), the EmailReplyParser fails to parse the quoted headers since technically they are not on the beginning of the line.
Here's a simple example:
<p>On Wednesday, March 22, 2023, 3:25 PM, XXX <XXXX.com> wrote:</p>
To get around this, I removed the ^ and $ from the regular expression which fixed the problem, but I was wondering if maybe there was some original motivation behind having it there in the first place... I don't want to remove something on my end that will break something for me down the line.
Is there a reason for the ^ and $ (beginning and end matching)?
If yes, I suppose there's another workaround where I can use the end of the previous HTML tag "/>" as the "beginning".