Update documentation in README to latest

blakeembrey · blakeembrey · commit f0b8146529f5 · 2017-02-06T14:04:14.000-08:00
diff --git a/README.md b/README.md
@@ -15,44 +15,52 @@ npm install scrappy --save
 
 ## Usage
 
-Starting from `extractFromUrl`, **scrappy** creates a HTTP request (`scrapeUrl`) and streams the response into the scraper (`scrapeStream`). The scraper extracts metadata based on various specifications and standards, including HTML, RDFa, JSON-LD, Microdata, Open Graph and OEmbed. With all the relevant metadata, it uses `extract` to select the appropriate snippet. If you need snippets in a different format, you can create your own extraction method which accepts the scraped metadata.
+**Scrappy** uses a simple two step process to extract the metadata from any URL or file. First, it runs through plugin-able `scrapeStream` middleware to extract metadata about the file itself. With the result in hand, it gets passed on to a plugin-able `extract` pipeline to format the metadata for presentation and extract additional metadata about related entities.
+
+### Scraping
+
+#### `scrapeUrl`
+
+```ts
+function scrapeUrl(url: string, plugin?: Plugin): Promise<ScrapeResult>
+```
+
+Makes the HTTP request and passes the response into `scrapeResponse`.
+
+#### `scrapeResponse`
+
+```ts
+function scrapeResponse (res: Response, plugin?: Plugin): Promise<ScrapeResult>
+```
+
+Accepts a HTTP response object and transforms it into `scrapeStream`.
+
+#### `scrapeStream`
+
+```ts
+function scrapeStream (stream: Readable, input: ScrapeResult, abort?: () => void, plugin = DEFAULT_SCRAPER): Promise<ScrapeResult>
+```
+
+Accepts a readable stream and input scrape result (at a minimum should have `url`, but could add other known metadata - e.g. from HTTP headers), and returns the scrape result after running through the plugin function. It also accepts an `abort` function, which can be used to close the stream early.
+
+The default plugins are in the [`plugins/` directory](src/scrape/plugins) and combined into a single pipeline using `compose` (based on `throwback`, but calls `next(stream)` to pass a stream forward).
+
+### Extraction
+
+Extraction is based on a single function, `extract`. It accepts the scrape result, and an optional array of helpers. The default extraction maps the scrape result into a proprietary format useful for applications to visualize. After the extraction is done, it iterates over each of the helper functions to transform the extracted snippet.
+
+Some built-in extraction helpers are available in the [`helpers/` directory](src/extract/helpers), including a default favicon selector and image dimension extraction.
+
+### Example
+
+This example uses [`scrapeAndExtract`](src/index.ts) (a simple wrapper around `scrapeUrl` and `extract`) to retrieve metadata from a webpage. In your own application, you may want to write your own `makeRequest` function or override other parts of the pipeline (e.g. to enable caching or customize the user-agent, etc).
 
 ```ts
-import { scrapeUrl, scrapeStream, extract, extractFromUrl } from 'scrappy'
+import { scrapeAndExtract } from 'scrappy'
 
 const url = 'https://medium.com/slack-developer-blog/everything-you-ever-wanted-to-know-about-unfurling-but-were-afraid-to-ask-or-how-to-make-your-e64b4bb9254#.a0wjf4ltt'
 
-extractFromUrl(url).then(function (snippet) {
-  // {
-  //   "type": "summary",
-  //   "imageUrl": "https://cdn-images-1.medium.com/max/1200/1*QOMaDLcO8rExD0ctBV3BWg.png",
-  //   "contentUrl": "https://medium.com/slack-developer-blog/everything-you-ever-wanted-to-know-about-unfurling-but-were-afraid-to-ask-or-how-to-make-your-e64b4bb9254",
-  //   "originalUrl": "https://medium.com/slack-developer-blog/everything-you-ever-wanted-to-know-about-unfurling-but-were-afraid-to-ask-or-how-to-make-your-e64b4bb9254#.a0wjf4ltt",
-  //   "encodingFormat": "html",
-  //   "headline": "Everything you ever wanted to know about unfurling but were afraid to ask /or/ How to make your… — Slack Platform Blog",
-  //   "caption": "Let’s start with the most obvious question first. This is what an “unfurl” is:",
-  //   "siteName": "Medium",
-  //   "author": "Matt Haughey",
-  //   "publisher": "https://www.facebook.com/medium",
-  //   "apps": {
-  //     "iphone": {
-  //       "id": "828256236",
-  //       "name": "Medium",
-  //       "url": "medium://p/e64b4bb9254"
-  //     },
-  //     "ipad": {
-  //       "id": "828256236",
-  //       "name": "Medium",
-  //       "url": "medium://p/e64b4bb9254"
-  //     },
-  //     "android": {
-  //       "id": "com.medium.reader",
-  //       "name": "Medium",
-  //       "url": "medium://p/e64b4bb9254"
-  //     }
-  //   }
-  // }
-})
+scrapeAndExtract(url).then(console.log.bind(console))
 ```
 
 ## Development