borderless
diff --git a/‎.eslintrc.js‎
Lines changed: 15 additions & 0 deletions b/‎.eslintrc.js‎
Lines changed: 15 additions & 0 deletions
diff --git a/‎.gitignore‎
Lines changed: 0 additions & 1 deletion b/‎.gitignore‎
Lines changed: 0 additions & 1 deletion
diff --git a/‎.travis.yml‎
Lines changed: 1 addition & 0 deletions b/‎.travis.yml‎
Lines changed: 1 addition & 0 deletions
diff --git a/‎README.md‎
Lines changed: 20 additions & 37 deletions b/‎README.md‎
Lines changed: 20 additions & 37 deletions
diff --git a/‎fixtures/http!cloudinary.com!pricing/body‎
Lines changed: 10 additions & 0 deletions b/‎fixtures/http!cloudinary.com!pricing/body‎
Lines changed: 10 additions & 0 deletions
diff --git a/‎fixtures/http!cloudinary.com!pricing/meta.json‎
Lines changed: 18 additions & 0 deletions b/‎fixtures/http!cloudinary.com!pricing/meta.json‎
Lines changed: 18 additions & 0 deletions
diff --git a/‎fixtures/http!cnn.com/body‎
Lines changed: 189 additions & 0 deletions b/‎fixtures/http!cnn.com/body‎
Lines changed: 189 additions & 0 deletions
diff --git a/‎fixtures/http!cnn.com/meta.json‎
Lines changed: 32 additions & 0 deletions b/‎fixtures/http!cnn.com/meta.json‎
Lines changed: 32 additions & 0 deletions
diff --git a/‎fixtures/http!d.pr!a!q3z9/body‎
Lines changed: 54 additions & 0 deletions b/‎fixtures/http!d.pr!a!q3z9/body‎
Lines changed: 54 additions & 0 deletions
diff --git a/‎fixtures/http!d.pr!a!q3z9/meta.json‎
Lines changed: 17 additions & 0 deletions b/‎fixtures/http!d.pr!a!q3z9/meta.json‎
Lines changed: 17 additions & 0 deletions
@@ -0,0 +1,15 @@
+module.exports = {
+  parser: "@typescript-eslint/parser",
+  extends: [
+    // "eslint:recommended",
+    // "plugin:@typescript-eslint/eslint-recommended",
+    "plugin:@typescript-eslint/recommended-requiring-type-checking",
+    "prettier/@typescript-eslint",
+    "plugin:prettier/recommended"
+  ],
+  parserOptions: {
+    ecmaVersion: 2018,
+    sourceType: "module",
+    project: "tsconfig.json"
+  }
+};
@@ -4,4 +4,3 @@ coverage/
 node_modules/
 npm-debug.log
 dist/
-typings/
@@ -7,6 +7,7 @@ notifications:
     on_failure: change
 
 node_js:
+  - "10"
   - "stable"
 
 after_script: "npm install coveralls@2 && cat ./coverage/lcov.info | coveralls"
@@ -4,66 +4,49 @@
 [![NPM downloads](https://img.shields.io/npm/dm/scrappy.svg?style=flat)](https://npmjs.org/package/scrappy)
 [![Build status](https://img.shields.io/travis/blakeembrey/node-scrappy.svg?style=flat)](https://travis-ci.org/blakeembrey/node-scrappy)
 [![Test coverage](https://img.shields.io/coveralls/blakeembrey/node-scrappy.svg?style=flat)](https://coveralls.io/r/blakeembrey/node-scrappy?branch=master)
-[![Greenkeeper badge](https://badges.greenkeeper.io/blakeembrey/node-scrappy.svg)](https://greenkeeper.io/)
 
 > Extract rich metadata from URLs.
 
 [Try it using Runkit!](https://runkit.com/blakeembrey/scrappy)
 
 ## Installation
 
-```sh
+```
 npm install scrappy --save
 ```
 
 ## Usage
 
-**Scrappy** uses a simple two step process to extract the metadata from any URL or file. First, it runs through plugin-able `scrapeStream` middleware to extract metadata about the file itself. With the result in hand, it gets passed on to a plugin-able `extract` pipeline to format the metadata for presentation and extract additional metadata about related entities.
-
-### Scraping
-
-#### `scrapeUrl`
+**Scrappy** attempts to parse and extract rich structured metadata from URLs.
 
-```ts
-function scrapeUrl(url: string, plugin?: Plugin): Promise<ScrapeResult>
+```js
+import { scraper, urlScraper } from "scrappy";
 ```
 
-Makes the HTTP request and passes the response into `scrapeResponse`.
-
-#### `scrapeResponse`
-
-```ts
-function scrapeResponse (res: Response, plugin?: Plugin): Promise<ScrapeResult>
-```
+### Scraper
 
-Accepts a HTTP response object and transforms it into `scrapeStream`.
+Accepts a `request` function and optional `plugins` array. The request is expected to return a "page" object, which is the same shape as the input to `scrape(page)`.
 
-#### `scrapeStream`
+```js
+const scrape = scraper({ request });
+const res = await fetch("http://example.com"); // E.g. `popsicle`.
 
-```ts
-function scrapeStream (stream: Readable, input: ScrapeResult, abort?: () => void, plugin = DEFAULT_SCRAPER): Promise<ScrapeResult>
+await scrape({
+  url: res.url,
+  status: res.status,
+  headers: res.headers.asObject(),
+  body: res.stream() // Must stream the request instead of buffering to support large responses.
+});
 ```
 
-Accepts a readable stream and input scrape result (at a minimum should have `url`, but could add other known metadata - e.g. from HTTP headers), and returns the scrape result after running through the plugin function. It also accepts an `abort` function, which can be used to close the stream early.
-
-The default plugins are in the [`plugins/` directory](src/scrape/plugins) and combined into a single pipeline using `compose` (based on `throwback`, but calls `next(stream)` to pass a stream forward).
-
-### Extraction
-
-Extraction is based on a single function, `extract`. It accepts the scrape result, and an optional array of helpers. The default extraction maps the scrape result into a proprietary format useful for applications to visualize. After the extraction is done, it iterates over each of the helper functions to transform the extracted snippet.
-
-Some built-in extraction helpers are available in the [`helpers/` directory](src/extract/helpers), including a default favicon selector and image dimension extraction.
-
-### Example
-
-This example uses [`scrapeAndExtract`](src/index.ts) (a simple wrapper around `scrapeUrl` and `extract`) to retrieve metadata from a webpage. In your own application, you may want to write your own `makeRequest` function or override other parts of the pipeline (e.g. to enable caching or customize the user-agent, etc).
+### URL Scraper
 
-```ts
-import { scrapeAndExtract } from 'scrappy'
+Simpler wrapper around `scraper` that automatically makes a `request(url)` for the page.
 
-const url = 'https://medium.com/slack-developer-blog/everything-you-ever-wanted-to-know-about-unfurling-but-were-afraid-to-ask-or-how-to-make-your-e64b4bb9254#.a0wjf4ltt'
+```js
+const scrape = urlScraper({ request });
 
-scrapeAndExtract(url).then(console.log.bind(console))
+await scrape("http://example.com");
 ```
 
 ## License
 
@@ -0,0 +1,18 @@
+{
+  "url": "https://cloudinary.com/pricing",
+  "headers": {
+    ":status": "200",
+    "date": "Sun, 02 Feb 2020 00:36:52 GMT",
+    "content-type": "text/html",
+    "etag": "W/\"5e361202-693c\"",
+    "last-modified": "Sun, 02 Feb 2020 00:04:18 GMT",
+    "strict-transport-security": "max-age=86400",
+    "cf-cache-status": "DYNAMIC",
+    "expect-ct": "max-age=604800, report-uri=\"https://report-uri.cloudflare.com/cdn-cgi/beacon/expect-ct\"",
+    "alt-svc": "h3-24=\":443\"; ma=86400, h3-23=\":443\"; ma=86400",
+    "server": "cloudflare",
+    "cf-ray": "55e817e66f3ded87-SJC",
+    "content-encoding": "br"
+  },
+  "status": 200
+}
@@ -0,0 +1,32 @@
+{
+  "url": "https://www.cnn.com/",
+  "headers": {
+    ":status": "200",
+    "content-type": "text/html; charset=utf-8",
+    "x-servedbyhost": "::ffff:127.0.0.1",
+    "access-control-allow-origin": "*",
+    "cache-control": "max-age=60",
+    "content-security-policy": "default-src 'self' blob: https://*.cnn.com:* http://*.cnn.com:* *.cnn.io:* *.cnn.net:* *.turner.com:* *.turner.io:* *.ugdturner.com:* courageousstudio.com *.vgtf.net:*; script-src 'unsafe-eval' 'unsafe-inline' 'self' *; style-src 'unsafe-inline' 'self' blob: *; child-src 'self' blob: *; frame-src 'self' *; object-src 'self' *; img-src 'self' data: blob: *; media-src 'self' data: blob: *; font-src 'self' data: *; connect-src 'self' *; frame-ancestors 'self' https://*.cnn.com:* http://*.cnn.com https://*.cnn.io:* http://*.cnn.io:* *.turner.com:* courageousstudio.com;",
+    "x-content-type-options": "nosniff",
+    "x-xss-protection": "1; mode=block",
+    "content-encoding": "gzip",
+    "via": "1.1 varnish, 1.1 varnish",
+    "accept-ranges": "bytes",
+    "date": "Sun, 02 Feb 2020 00:38:12 GMT",
+    "age": "135",
+    "set-cookie": [
+      "countryCode=US; Domain=.cnn.com; Path=/; SameSite=Lax",
+      "geoData=san francisco|CA|94103|US|NA|-800|broadband; Domain=.cnn.com; Path=/; SameSite=Lax",
+      "FastAB=0=5517,1=3181,2=7341,3=4965,4=4299,5=0796,6=9752,7=1669,8=6378,9=3267; Domain=.cnn.com; Path=/; Expires=Thu Jul 01 2021 00:00:00 GMT; SameSite=Lax",
+      "tryThing01=9516; Domain=.cnn.com; Path=/; Expires=Sun Mar 01 2020 00:00:00 GMT; SameSite=Lax",
+      "tryThing02=5782; Domain=.cnn.com; Path=/; Expires=Wed Jan 01 2020 00:00:00 GMT; SameSite=Lax"
+    ],
+    "x-served-by": "cache-iad2127-IAD, cache-sea4437-SEA",
+    "x-cache": "HIT, HIT",
+    "x-cache-hits": "5, 37",
+    "x-timer": "S1580603892.073164,VS0,VE0",
+    "vary": "Accept-Encoding",
+    "content-length": "154194"
+  },
+  "status": 200
+}
@@ -0,0 +1,17 @@
+{
+  "url": "https://d.pr/a/q3z9",
+  "headers": {
+    ":status": "200",
+    "date": "Sun, 02 Feb 2020 00:36:36 GMT",
+    "content-type": "text/html; charset=utf-8",
+    "content-length": "26392",
+    "set-cookie": [
+      "AWSALB=flDMeldhwHQSmDOX9UeHGfURtsUc3F8JPwOUWgc4ijHNPMOxCrHpSJEoJpW/2fFTjEOVvSZwRkHuH/Sk0fPhAjoka9tKsqr289S99UNUPljkQRhaqt2iPK7GDIQL; Expires=Sun, 09 Feb 2020 00:36:36 GMT; Path=/",
+      "AWSALBCORS=flDMeldhwHQSmDOX9UeHGfURtsUc3F8JPwOUWgc4ijHNPMOxCrHpSJEoJpW/2fFTjEOVvSZwRkHuH/Sk0fPhAjoka9tKsqr289S99UNUPljkQRhaqt2iPK7GDIQL; Expires=Sun, 09 Feb 2020 00:36:36 GMT; Path=/; SameSite=None; Secure"
+    ],
+    "server": "nginx/1.15.7",
+    "content-security-policy": "frame-ancestors d.pr http://d.pr https://d.pr",
+    "etag": "W/\"6718-Wb1t0BmfkgcqUNKOJLrxMXkkH+M\""
+  },
+  "status": 200
+}