Handle "clusters" on paper extraction

On extracting publications (papers) from emails, a class of papers that in email look like 
 * `https://scholar.google.com/scholar?cluster=14905208172666766997&hl=en&oi=scholaralrt&hist=KBiQzPUAAAAJ:3103465405719670724:AAGBfm3tO_7Uk2dTXZseJcyJq0Kjaug97Q&html=&folt=rel` 
 
are skipped (14 papers out of +2k) as ATM we use a regex to extract the pdf URL from such links and it fails to match.
Instead of the usual `/scholar_url?url=<url-to-the.pdf>` pattern, these links looks like `/scholar?cluster=14905208172666766997&...` and a way to get the URL to individual pdf (any from the cluster) is not obvious.

One option is too keep those links as-is, so the user will have to choose the PDF from a scholar page themselves.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Handle "clusters" on paper extraction #85

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Handle "clusters" on paper extraction #85

Description

Metadata

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Issue actions