Skip to content

Commit 3261fe8

Browse files
authored
Merge pull request #74 from fmacpro/resurrection-update
1.0.0 Resurrection update
2 parents e53e187 + 25c8eaa commit 3261fe8

File tree

15 files changed

+26597
-11597
lines changed

15 files changed

+26597
-11597
lines changed

.eslintrc.json

Lines changed: 0 additions & 12 deletions
This file was deleted.

APIDOC.md

Lines changed: 26 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -263,7 +263,7 @@ Remove the style attribute on every e and under.
263263

264264
| Param | Type |
265265
| --- | --- |
266-
| element | <code>jQuery</code> |
266+
| element | <code>jQuery</code> |
267267

268268
<a name="killBreaks"></a>
269269

@@ -404,9 +404,32 @@ Cleans the article content
404404
Initialize a node with the readability object. Also checks the
405405
className/id for special names to add to its score.
406406

407-
**Kind**: global function
407+
**Kind**: global function
408408

409409
| Param | Type |
410410
| --- | --- |
411-
| element | <code>jQuery</code> |
411+
| element | <code>jQuery</code> |
412+
413+
## Dependencies
414+
415+
- [Puppeteer](https://github.com/GoogleChrome/puppeteer/)
416+
- [puppeteer-extra](https://github.com/berstend/puppeteer-extra)
417+
- [puppeteer-extra-plugin-stealth](https://github.com/berstend/puppeteer-extra/tree/master/packages/puppeteer-extra-plugin-stealth)
418+
- [puppeteer-extra-plugin-user-data-dir](overrides/puppeteer-extra-plugin-user-data-dir)
419+
- [lighthouse](https://github.com/GoogleChrome/lighthouse)
420+
- [compromise](https://ghub.io/compromise)
421+
- [retext](https://ghub.io/retext)
422+
- [retext-pos](https://github.com/retextjs/retext-pos)
423+
- [retext-keywords](https://ghub.io/retext-keywords)
424+
- [retext-spell](https://ghub.io/retext-spell)
425+
- [sentiment](https://ghub.io/sentiment)
426+
- [jquery](https://ghub.io/jquery)
427+
- [jsdom](https://ghub.io/jsdom)
428+
- [lodash](https://ghub.io/lodash)
429+
- [absolutify](https://ghub.io/absolutify)
430+
- [clean-html](https://ghub.io/clean-html)
431+
- [dictionary-en-gb](https://ghub.io/dictionary-en-gb)
432+
- [html-to-text](https://ghub.io/html-to-text)
433+
- [nlcst-to-string](https://ghub.io/nlcst-to-string)
434+
- [vfile-reporter-json](https://ghub.io/vfile-reporter-json)
412435

README.md

Lines changed: 13 additions & 12 deletions
Original file line numberDiff line numberDiff line change
@@ -26,14 +26,14 @@ npm install horseman-article-parser --save
2626
### Usage Example
2727

2828
```
29-
var parser = require('horseman-article-parser');
29+
import { parseArticle } from 'horseman-article-parser';
3030
31-
var options = {
31+
const options = {
3232
url: "https://www.theguardian.com/politics/2018/sep/24/theresa-may-calls-for-immigration-based-on-skills-and-wealth",
3333
enabled: ['lighthouse', 'screenshot', 'links', 'sentiment', 'entities', 'spelling', 'keywords']
3434
}
3535
36-
parser.parseArticle(options)
36+
parseArticle(options)
3737
.then(function (article) {
3838
3939
var response = {
@@ -260,33 +260,34 @@ npm run docs
260260
## Dependencies
261261

262262
- [Puppeteer](https://github.com/GoogleChrome/puppeteer/): High-level API to control Chrome or Chromium over the DevTools Protocol
263-
- [compromise](https://ghub.io/compromise): natural language processing in the browser
263+
- [puppeteer-extra](https://github.com/berstend/puppeteer-extra): Framework for puppeteer plugins
264+
- [puppeteer-extra-plugin-stealth](https://github.com/berstend/puppeteer-extra/tree/master/packages/puppeteer-extra-plugin-stealth): Plugin to evade detection
265+
- [puppeteer-extra-plugin-user-data-dir](overrides/puppeteer-extra-plugin-user-data-dir): Persist and reuse Chromium user data
266+
- [lighthouse](https://github.com/GoogleChrome/lighthouse): Automated auditing, performance metrics, and best practices
267+
- [compromise](https://ghub.io/compromise): Natural language processing in the browser
264268
- [retext](https://ghub.io/retext): Natural language processor powered by plugins
265269
- [retext-pos](https://github.com/retextjs/retext-pos): Plugin to add part-of-speech (POS) tags
266270
- [retext-keywords](https://ghub.io/retext-keywords): Keyword extraction with Retext
267271
- [retext-spell](https://ghub.io/retext-spell): Spelling checker for retext
268272
- [sentiment](https://ghub.io/sentiment): AFINN-based sentiment analysis for Node.js
269273
- [jquery](https://ghub.io/jquery): JavaScript library for DOM operations
270274
- [jsdom](https://ghub.io/jsdom): A JavaScript implementation of many web standards
271-
- [lodash](https://ghub.io/lodash): Lodash modular utilities.
275+
- [lodash](https://ghub.io/lodash): Lodash modular utilities
272276
- [absolutify](https://ghub.io/absolutify): Relative to Absolute URL Replacer
273277
- [clean-html](https://ghub.io/clean-html): HTML cleaner and beautifier
274278
- [dictionary-en-gb](https://ghub.io/dictionary-en-gb): English (United Kingdom) spelling dictionary in UTF-8
275-
- [html-to-text](https://ghub.io/html-to-text): Advanced html to plain text converter
279+
- [html-to-text](https://ghub.io/html-to-text): Advanced HTML to plain text converter
276280
- [nlcst-to-string](https://ghub.io/nlcst-to-string): Stringify NLCST
277281
- [vfile-reporter-json](https://ghub.io/vfile-reporter-json): JSON reporter for virtual files
278282

279283

280284
## Dev Dependencies
281285

282-
- [eslint](https://ghub.io/eslint): An AST-based pattern checker for JavaScript.
283-
- [eslint-config-standard](https://ghub.io/eslint-config-standard): JavaScript Standard Style - ESLint Shareable Config
284-
- [eslint-plugin-import](https://ghub.io/eslint-plugin-import): Import with sanity.
286+
- [eslint](https://ghub.io/eslint): An AST-based pattern checker for JavaScript
287+
- [eslint-plugin-import](https://ghub.io/eslint-plugin-import): Import with sanity
285288
- [eslint-plugin-json](https://ghub.io/eslint-plugin-json): Lint JSON files
286-
- [eslint-plugin-node](https://ghub.io/eslint-plugin-node): Additional ESLint&#39;s rules for Node.js
289+
- [eslint-plugin-n](https://ghub.io/eslint-plugin-n): Additional ESLint rules for Node.js
287290
- [eslint-plugin-promise](https://ghub.io/eslint-plugin-promise): Enforce best practices for JavaScript promises
288-
- [eslint-plugin-standard](https://ghub.io/eslint-plugin-standard): ESlint Plugin for the Standard Linter
289-
- [jsdoc-to-markdown](https://github.com/jsdoc2md/jsdoc-to-markdown): Generates markdown API documentation from jsdoc annotated source code.
290291

291292

292293
## License

controllers/keywordParser.js

Lines changed: 26 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,26 @@
1+
import retext from 'retext'
2+
import nlcstToString from 'nlcst-to-string'
3+
import pos from 'retext-pos'
4+
import keywords from 'retext-keywords'
5+
import _ from 'lodash'
6+
7+
export default async function keywordParser (html, options = { maximum: 10 }) {
8+
const file = await retext().use(pos).use(keywords, options).process(html)
9+
10+
const keywordsArr = file.data.keywords.map(keyword => ({
11+
keyword: nlcstToString(keyword.matches[0].node),
12+
score: keyword.score
13+
}))
14+
15+
const keyphrases = file.data.keyphrases.map(phrase => {
16+
const nodes = phrase.matches[0].nodes
17+
const tree = _.map(nodes)
18+
return {
19+
keyphrase: nlcstToString(tree, ''),
20+
score: phrase.score,
21+
weight: phrase.weight
22+
}
23+
}).sort((a, b) => (a.score > b.score) ? -1 : 1)
24+
25+
return { keywords: keywordsArr, keyphrases }
26+
}

controllers/lighthouse.js

Lines changed: 14 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,14 @@
1+
import lighthouseImport from 'lighthouse'
2+
const lighthouse = lighthouseImport.default || lighthouseImport
3+
4+
export default async function lighthouseAnalysis (browser, options, socket) {
5+
socket.emit('parse:status', 'Starting Lighthouse')
6+
7+
const results = await lighthouse(options.url, {
8+
port: (new URL(browser.wsEndpoint())).port,
9+
output: 'json'
10+
})
11+
12+
socket.emit('parse:status', 'Lighthouse Analysis Complete')
13+
return results.lhr
14+
}

controllers/spellCheck.js

Lines changed: 32 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,32 @@
1+
import retext from 'retext'
2+
import spell from 'retext-spell'
3+
import dictionary from 'dictionary-en-gb'
4+
import report from 'vfile-reporter-json'
5+
6+
export default function spellCheck (text, options) {
7+
text = text.replace(/[0-9]{1,}[a-zA-Z]{1,}/gi, '')
8+
9+
return new Promise(function (resolve, reject) {
10+
if (typeof options === 'undefined') {
11+
options = {
12+
dictionary: dictionary
13+
}
14+
}
15+
16+
if (typeof options.dictionary === 'undefined') {
17+
options.dictionary = dictionary
18+
}
19+
20+
retext()
21+
.use(spell, options)
22+
.process(text, function (error, file) {
23+
if (error) {
24+
reject(error)
25+
}
26+
27+
let results = JSON.parse(report(file))
28+
results = results[0].messages
29+
resolve(results)
30+
})
31+
})
32+
}

eslint.config.mjs

Lines changed: 37 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,37 @@
1+
import { FlatCompat } from '@eslint/eslintrc';
2+
import js from '@eslint/js';
3+
import jsonPlugin from 'eslint-plugin-json';
4+
import path from 'path';
5+
import { fileURLToPath } from 'url';
6+
7+
const __filename = fileURLToPath(import.meta.url);
8+
const __dirname = path.dirname(__filename);
9+
10+
const compat = new FlatCompat({
11+
baseDirectory: __dirname,
12+
recommendedConfig: js.configs.recommended
13+
});
14+
15+
export default [
16+
{
17+
ignores: ['eslint.config.mjs', 'overrides/**']
18+
},
19+
...compat.extends(
20+
'eslint:recommended',
21+
'plugin:import/recommended',
22+
'plugin:n/recommended',
23+
'plugin:promise/recommended'
24+
),
25+
jsonPlugin.configs.recommended,
26+
{
27+
languageOptions: {
28+
globals: {
29+
jQuery: 'readonly',
30+
window: 'readonly'
31+
}
32+
},
33+
rules: {
34+
'no-prototype-builtins': 'off'
35+
}
36+
}
37+
];

0 commit comments

Comments
 (0)