This repository was archived by the owner on Feb 28, 2023. It is now read-only.
-
Notifications
You must be signed in to change notification settings - Fork 26
Text added to cards may be incomplete #31
Copy link
Copy link
Open
Labels
Description
When a link is shared and user adds additional text, the added text may not be included in the log.
In the following generated sample, "This is a test." is not included.
<p class="TweetTextSize js-tweet-text tweet-text" lang="" data-aria-label-part="0">How I lost my 25-year battle against corporate claptrap <a href="https://t.co/gIrbtXuRSv" rel="nofollow noopener" dir="ltr" data-expanded-url="https://www.ft.com/lucycolumn" class="twitter-timeline-link" target="_blank" title="https://www.ft.com/lucycolumn" >
<span class="tco-ellipsis"/>
<span class="invisible">https://www.</span>
<span class="js-display-url">ft.com/lucycolumn</span>
<span class="invisible"/>
<span class="tco-ellipsis">
<span class="invisible"> </span>
</span>
</a> This is a test.</p>
This is because cssselect extracts only the text node before the . A workaround could be to use text_content():
def _parse_dm_text(self, element):
dm_text = '' text_tweet = element.cssselect("p.tweet-text")[0]
dm_text = text_tweet.text_content()
return DirectMessageText(dm_text)
The output would be:
[2017-08-16 13:37:49] <Julien Ehrhart> [Card-summary_large_image] https://www.ft.com/lucycolumn How I lost my 25-year battle against corporate claptrap https://www.ft.com/lucycolumn This is a test.
Two issues here:
- The link appears twice (once during the parsing of the card, once during the parsing of the text) -> Acceptable
- The emojis are not in the text so they are stripped from the output -> Not acceptable
Reactions are currently unavailable