You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: README.md
+41-33Lines changed: 41 additions & 33 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -15,44 +15,52 @@ npm install scrappy --save
15
15
16
16
## Usage
17
17
18
-
Starting from `extractFromUrl`, **scrappy** creates a HTTP request (`scrapeUrl`) and streams the response into the scraper (`scrapeStream`). The scraper extracts metadata based on various specifications and standards, including HTML, RDFa, JSON-LD, Microdata, Open Graph and OEmbed. With all the relevant metadata, it uses `extract` to select the appropriate snippet. If you need snippets in a different format, you can create your own extraction method which accepts the scraped metadata.
18
+
**Scrappy** uses a simple two step process to extract the metadata from any URL or file. First, it runs through plugin-able `scrapeStream` middleware to extract metadata about the file itself. With the result in hand, it gets passed on to a plugin-able `extract` pipeline to format the metadata for presentation and extract additional metadata about related entities.
19
+
20
+
### Scraping
21
+
22
+
#### `scrapeUrl`
23
+
24
+
```ts
25
+
function scrapeUrl(url:string, plugin?:Plugin):Promise<ScrapeResult>
Acceptsareadablestreamandinputscraperesult (ataminimumshouldhave`url`, butcouldaddotherknownmetadata-e.g. fromHTTPheaders), andreturnsthescraperesultafterrunningthroughthepluginfunction. It also accepts an `abort` function, which can be used to close the stream early.
Extractionisbasedonasinglefunction, `extract`. It accepts the scrape result, and an optional array of helpers. The default extraction maps the scrape result into a proprietary format useful for applications to visualize. After the extraction is done, it iterates over each of the helper functions to transform the extracted snippet.
Thisexampleuses [`scrapeAndExtract`](src/index.ts) (asimplewrapperaround`scrapeUrl`and`extract`) toretrievemetadatafromawebpage. Inyourownapplication, youmaywanttowriteyourown`makeRequest`function or override other parts of the pipeline (e.g. toenablecachingorcustomizetheuser-agent, etc).
0 commit comments