Commit c1d42bc
Merge extractus/main
* chore: update dist with latest build
* chore: update param documentation
* chore: update dependencies (extractus#257)
* v6.0.0
- Change to ES6 Module format
- Change and update dependencies
- Also update core logic
Related pr: extractus#219, extractus#220, extractus#222, extractus#224, extractus#227, extractus#228, extractus#232, extractus#238, extractus#240, extractus#241 extractus#243, extractus#245
* v6.0.1
- Change code analysis to GitHub CodeQuality
- Update dependencies
* fix: can't fetch html from document on browser
waiting for WebReflection/linkedom#146 for better solution
* v6.0.2
- Merge pr extractus#265 by @SettingDust (related issue extractus#264)
- Update dependencies
* v6.0.2
- Rebuild
* chore: update `urlpattern-polyfill`
fix extractus#266
* v6.0.3
- Merge pr extractus#269 by @SettingDust (issue extractus#266)
- Fix coding style
* v6.0.3 - Rebuild
* v6.0.4
- Update more parser config
- Improve README & fix expired API key for example
* v6.0.4
- Improve README
* v6.0.4
- Add more test
- Improve README
* v6.0.4
- Improve README
* Update README
- Fix link to default rules
* Update README
* v6.0.5
- Use `test` to match url with patterns (instead `exec`)
- Add more test
- Update README
* v6.0.5
- Update README
* v6.0.6
- Fix potential problem with query rules
- Apply multi transformation from all matched query rules
- Add more guide about query rules
* v7.0.0rc1
- Update processing logic
- Replace `queryRule` with `transformation`
- Re-organize source code structure
* Update README.md
* v7.0.0rc2
* v7.0.0rc3
- Add default `Accept-Encoding` to request options
- Update default sanitizeHtml options
- Update dependencies
* v7.0.0rc3
- rebuild
* Change method to deal with `source` and `description`
- Use `tldts` to get domain, used this value as `source`
- Increase `description` length, tend to take summary from content, remove unneccessary parts
* v7.0.0
- Official release v7 with new concept `transformation`
- Simplify error throwing from axios
* v7.0.1
- Fix function to get description
- Update dependencies
* Update README
* v7.0.2
- Update dependencies
- Add button "Deploy to Deta"
* v7.0.2
- Update dependencies
- Add button "Deploy to Deta"
- Use Deta service for example faas
- Copy types definition to cjs dist (extractus#287)
* v7.0.3
- Update dependencies
- Remove depending on `tldts`
- Use [conditional exports](https://nodejs.org/api/packages.html#conditional-exports)
- Improve pre-defined options
* v7.1.0 - To work with `bun` and `deno`
- Replace `axios` with `cross-fetch`
- Remove 4 API methods relating to axios and htmlcrush
* Update types definition
* v7.1.1
- Fix problem with cross-fetch on deno
* v7.1.1
- Conditional urlpattern
* v7.2.0rc1
- Stop depending on `urlpattern-polyfill` for running on deno/bun
- Replace URLPattern syntax with regular RegExp
* Update README refer links
* v7.2.0rc2 - Rebuild
* Update README
* v7.2.0rc3
- Update type definition
* v7.2.0rc4
- Replace `string-comparison` with `string-similarity` to fix `bun` error
* v7.2.0rc5
- Use internal string-similarity file to by pass bun.js resolve error
* Add examples with node, deno, bun, tsnode
* Remove bun.lockb
* Rebuild
* v7.2.0
- Refactor some parts to run on deno, bun and tsnode
- Add some examples for each platform
- Remove some rarely used configuration methods
* Update examples
* v7.2.1-rc1
- Try to use external `string-similarity` again
- Update build script
- Improve fetch control
- Fix typo error on naming example packages
* v7.2.1
- Rebuild
* v7.2.2-rc1
- Replace global config with on-request `parserOptions`
- Add new param `fetchOptions` to extract()
- Allow to pass request to proxy
- Fix problem while building esm version for browser
- Add example for browser usage
* Update dependencies
* Update README
* v7.2.2-rc2
- Remove dependency `html-crush`
* v7.2.2
- Add options to extract method
- Remove unnecessary dependencies for reduce bundle size
- Add more examples
* v7.2.3
- Optimize performance by removing html validation
* Update README
* Add option to keep/remove line breaks
- Update README
* v7.2.4
- Improve space/newline processing
- no longer remove all linebreaks but multi empty lines are stripped
- Add folder for evaluation
- Update README
* v7.2.5
- Update dependencies
* Update README
* Add more specs for meta data extraction
Related issues: extractus#311
* Add security policy
* Add ci test with node 19.x
* Update security policy.
* Update security contact
* Add contributing guide
- Update ci settings
* Update README
- Move Deta block to Usage section
* Update SECURITY.md
* v7.2.6 - Migrate to extractus org
- Update links and docs (extractus#322)
* Update README
- Fix badge link
* Update coveralls github action
* v7.2.7
- Update dependencies
- Update docs
- Update CI settings
* Update CI settings
* Update CI config
* Fix CI settings
* Update CI settings
* Update README
* Add image to docs
* Update README
- Change badges link
* v7.2.8
- Expose new API method `extractFromHtml()`
- Update dependencies
- Change coding style (remove standardjs)
Related issues: extractus#321, extractus#326
* Update README
* v7.2.9
- Fix issue extractus#329
- Update dependencies
- Improve unit test
* v7.2.10
- Fix issue extractus#331
- Update dependencies
- Remove unnecessary watermark
* Add null to response types
* v7.2.11
- Merge pr extractus#333
- Update dependencies
* v7.2.12
- Set default user-agent
- Avoid error if parserOptions is null
- Update dependencies
* Update ci config
* v7.2.13rc1
- Fix issue on Deno platform
* v7.2.13
- Fix some issue while fetching data on Deno platform
* Rebuild v7.2.13
* v7.2.14
- Add support parsely meta tags
- Update dependencies
* Change string array to dictionary
* v7.2.15
- Fix unsupported package `string-similarity`
- Update deps
* v7.2.15
- Merge with changes from pr extractus#341
* v7.2.16
- Fix issue extractus#347
- Update dependencies
* Add favicon to meta data
* GNU nano 6.4 /workspace/node/article-extractor/.git/COMMIT_EDITMSG Modified
v7.2.17
- Merge pr extractus#350 by @LarchLiu
- Add `agent` to `fetchOptions`
- Update CI to test with Node 20
- Update dependencies
- Update README
* v7.2.17
* v7.2.17
* v7.2.17
* v7.2.18
- Add test for proxy `agent`
- Update dependencies
* v7.3.0
- Add support to `signal`
- Stop support Node < 15
- Stop support commonjs version
- Remove build script
- Update examples code
- Update dependencies
* Update README
* v8.0.0 - Bump version
- Add deno.json & import sections
- Update deps
- Improve README
* Update README
* Update README
* v8.0.1
- Update dependencies
- Update imports section
* Update dependencies
* Use `childNodes` instead of `children`
To get it work as same as Deno DOM
* Update README
* Fix ParserOptions typing
* v8.0.3
- Update deno example (extractus#368)
* Stop ci test with node < 16 because EOL
* Feat: extract pagetype from og:type or ld+json
* v8.0.8
- Merge pr extractus#374 by @andremacola (issue extractus#373)
- Update dependencies
- Update CI config
- Fix function call in eval.js
* Update examples
* v8.0.5
- Fix error while parsing ldjson
- Update dependencies
Related issues: extractus#378, extractus#374, extractus#373
* Fix CI issue with coverall
* Fix CI issue
* Fix CI problem
* Change ci event
* Update CI event
* Fix CI problem
* Fix CI issue
* Fix CI coverall
* v8.0.6
- Update dependencies
- Update security email
* v8.0.7
- Update dependencies
Related issue: extractus#382
* v8.0.8
- Decode content using detected charset
- Update dependencies
- Update eslint config
Related issues: extractus#386, extractus#320
* Add node 22 to ci
* Update examples & test with pupperteer
* v8.0.9
- Stop using purified HTML to extract content (extractus#388)
* v8.0.10
- Fix importing issue
* chore: Improvements in handling LD+JSON data
* v8.0.11
- Merge pr extractus#400 by @andremacola
- Replace jest with native node test runner
- Update dependencies
* Add test coverage
* fix: Cannot read properties of undefined in ld+json
* fix: more tests on ld+json
* v8.0.12
- Merge pr extractus#403 by @andremacola
* Improvements to find dates
* v8.0.13
- Merge pr extractus#405 by @andremacola
* v8.0.14
- Fix inconsistent output (extractus#407)
- Modify some stuff at LdJson extraction (extractus#405)
- Only use value from LdJson if missed from meta tags
- Only accept string value from LdJson
- Stop converting LdJson value to lowercase
* fix: adjustment of poorly formatted ldjson error
* v8.0.15
- Merge pr extractus#410 by @andremacola
* v8.0.16
- Fix issue extractus#412
- Update dependencies
* v8.0.17
- Update dependencies
* Update eval script
* 8.0.18
- Update dependencies
- Update CI config
- Update README
* Update README
* Update README
* v8.0.19
- Fix image lossing while ldjson overwrite meta data
- Update dependencies
* Add test with node 24
* v8.0.20 - Update dependencies
* Remove examples
- To stop dependencies outdated warning
* v8.0.20 - Update packages
* chore: package rename to @arbitral/article-parser and metadata update
* chore: package.json only change name to @arbitral/article-parser (keep upstream author/homepage/repo)
* chore: regenerate package-lock.json after package name change
* fix: satisfy eslint comma-dangle in build.js and configs
---------
Co-authored-by: Dave Schumaker <[email protected]>
Co-authored-by: SettingDust <[email protected]>
Co-authored-by: Dong Nguyen <[email protected]>
Co-authored-by: Will Washburn <[email protected]>
Co-authored-by: mphill <[email protected]>
Co-authored-by: Alex.Liu <[email protected]>
Co-authored-by: Ranmocy <[email protected]>
Co-authored-by: andremacola <[email protected]>1 parent 4aca0a7 commit c1d42bc
File tree
70 files changed
+4261
-1383
lines changed- .github/workflows
- src
- browser
- deno
- utils
- test-data
Some content is hidden
Large Commits have some content hidden by default. Use the searchbox below for content that may be hidden.
70 files changed
+4261
-1383
lines changed| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
8 | 8 | | |
9 | 9 | | |
10 | 10 | | |
11 | | - | |
| 11 | + | |
12 | 12 | | |
13 | 13 | | |
14 | 14 | | |
15 | | - | |
| 15 | + | |
16 | 16 | | |
17 | 17 | | |
18 | | - | |
| 18 | + | |
19 | 19 | | |
20 | 20 | | |
21 | | - | |
| 21 | + | |
22 | 22 | | |
23 | 23 | | |
24 | 24 | | |
25 | 25 | | |
| 26 | + | |
| 27 | + | |
26 | 28 | | |
27 | | - | |
28 | 29 | | |
29 | 30 | | |
30 | 31 | | |
31 | 32 | | |
32 | 33 | | |
33 | | - | |
34 | | - | |
35 | | - | |
36 | | - | |
37 | | - | |
38 | 34 | | |
39 | | - | |
| 35 | + | |
40 | 36 | | |
41 | 37 | | |
42 | 38 | | |
43 | 39 | | |
44 | 40 | | |
45 | | - | |
46 | | - | |
47 | | - | |
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
38 | 38 | | |
39 | 39 | | |
40 | 40 | | |
41 | | - | |
| 41 | + | |
42 | 42 | | |
43 | 43 | | |
44 | 44 | | |
| |||
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
15 | 15 | | |
16 | 16 | | |
17 | 17 | | |
| 18 | + | |
18 | 19 | | |
19 | | - | |
| 20 | + | |
| 21 | + | |
| 22 | + | |
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
1 | | - | |
2 | | - | |
3 | | - | |
4 | | - | |
5 | | - | |
6 | | - | |
7 | | - | |
8 | | - | |
9 | | - | |
10 | | - | |
| 1 | + | |
| 2 | + | |
| 3 | + | |
11 | 4 | | |
12 | | - | |
13 | | - | |
14 | | - | |
15 | | - | |
16 | | - | |
17 | | - | |
18 | | - | |
| 5 | + | |
| 6 | + | |
| 7 | + | |
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
| 1 | + | |
| 2 | + | |
| 3 | + | |
| 4 | + | |
| 5 | + | |
| 6 | + | |
| 7 | + | |
| 8 | + | |
| 9 | + | |
| 10 | + | |
| 11 | + | |
| 12 | + | |
| 13 | + | |
| 14 | + | |
| 15 | + | |
| 16 | + | |
| 17 | + | |
| 18 | + | |
| 19 | + | |
| 20 | + | |
| 21 | + | |
| 22 | + | |
| 23 | + | |
| 24 | + | |
| 25 | + | |
| 26 | + | |
| 27 | + | |
| 28 | + | |
| 29 | + | |
| 30 | + | |
| 31 | + | |
| 32 | + | |
| 33 | + | |
| 34 | + | |
| 35 | + | |
| 36 | + | |
| 37 | + | |
| 38 | + | |
| 39 | + | |
| 40 | + | |
| 41 | + | |
| 42 | + | |
| 43 | + | |
| 44 | + | |
| 45 | + | |
| 46 | + | |
| 47 | + | |
| 48 | + | |
| 49 | + | |
| 50 | + | |
| 51 | + | |
| 52 | + | |
| 53 | + | |
| 54 | + | |
| 55 | + | |
| 56 | + | |
| 57 | + | |
| 58 | + | |
| 59 | + | |
| 60 | + | |
| 61 | + | |
| 62 | + | |
| 63 | + | |
| 64 | + | |
| 65 | + | |
| 66 | + | |
| 67 | + | |
| 68 | + | |
| 69 | + | |
| 70 | + | |
| 71 | + | |
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
1 | 1 | | |
2 | 2 | | |
3 | | - | |
| 3 | + | |
4 | 4 | | |
5 | 5 | | |
6 | 6 | | |
| |||
0 commit comments