|
| 1 | +--- |
| 2 | +title: Node |
| 3 | +weight: 1 |
| 4 | +--- |
| 5 | + |
| 6 | +# Node.js API [Beta] |
| 7 | + |
| 8 | +As a Node module dependency, the engine exposes a JavaScript API that can be called in your own code. The following modules are available. |
| 9 | + |
| 10 | +## `fetch` |
| 11 | + |
| 12 | +The `fetch` module gets the MIME type and content of a document from its URL |
| 13 | + |
| 14 | +```js |
| 15 | +import fetch from '@opentermsarchive/engine/fetch'; |
| 16 | +``` |
| 17 | + |
| 18 | +Documentation on how to use `fetch` is provided [as JSDoc](/jsdoc/index.html). |
| 19 | + |
| 20 | +### Headless browser management |
| 21 | + |
| 22 | +If you pass the `executeClientScripts` option to `fetch`, a headless browser will be used to download and execute the page before serialising its DOM. For performance reasons, the starting and stopping of the browser is your responsibility to avoid instantiating a browser on each fetch. Here is an example on how to use this feature: |
| 23 | + |
| 24 | +```js |
| 25 | +import fetch, { launchHeadlessBrowser, stopHeadlessBrowser } from '@opentermsarchive/engine/fetch'; |
| 26 | + |
| 27 | +await launchHeadlessBrowser(); |
| 28 | +await fetch({ executeClientScripts: true, ... }); |
| 29 | +await fetch({ executeClientScripts: true, ... }); |
| 30 | +await fetch({ executeClientScripts: true, ... }); |
| 31 | +await stopHeadlessBrowser(); |
| 32 | +``` |
| 33 | + |
| 34 | +The `fetch` module options are defined as a [`node-config` submodule](https://github.com/node-config/node-config/wiki/Sub-Module-Configuration). The default `fetcher` configuration can be overridden by adding a `fetcher` object to the local configuration file. |
| 35 | + |
| 36 | +## `extract` |
| 37 | + |
| 38 | +The `extract` module transforms HTML or PDF content into a Markdown string according to a declaration. |
| 39 | + |
| 40 | +```js |
| 41 | +import extract from '@opentermsarchive/engine/extract'; |
| 42 | +``` |
| 43 | + |
| 44 | +The `extract` function documentation is available [as JSDoc](/jsdoc/index.html). |
| 45 | + |
| 46 | +## `SourceDocument` |
| 47 | + |
| 48 | +The `SourceDocument` class encapsulates information about a terms' source document tracked by Open Terms Archive. |
| 49 | + |
| 50 | +```js |
| 51 | +import SourceDocument from '@opentermsarchive/engine/sourceDocument'; |
| 52 | +``` |
| 53 | + |
| 54 | +The `SourceDocument` format is defined [in source code](https://github.com/OpenTermsArchive/engine/tree/main/src/archivist/services/sourceDocument.js). |
0 commit comments