You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: Scraper-Reference.md
+6-3Lines changed: 6 additions & 3 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -124,7 +124,7 @@ More information about how filters work is available on the [Filter Reference](h
124
124
125
125
- `:container`[String or Proc]
126
126
A CSS selector of the container element. Everything outside of it will be removed and become unavailable to the other filters. If more than one element match the selector, the first one inside the DOM is used. If no elements match the selector, an error is raised.
127
-
If the value is a Proc, it is called repeatedly for each page, with the filter instance as argument, and should return a selector or `nil`.
127
+
If the value is a Proc, it is called for each page with the filter instance as argument, and should return a selector or `nil`.
128
128
The default container is the `<body>` element.
129
129
_Note: links outside of the container element will not be followed by the scraper. To remove links that should be followed, use a [`CleanHtml`](https://github.com/Thibaut/devdocs/wiki/Filter-Reference#cleanhtmlfilter) filter later in the stack._
130
130
@@ -146,11 +146,14 @@ More information about how filters work is available on the [Filter Reference](h
146
146
147
147
Internal URLs are the ones _inside_ the scraper's `base_url` ("inside" more or less means "starting with", except that `/docs` is outside `/doc`). They will be scraped unless excluded by one of the following rules. All internal URLs are converted to relative URLs inside the pages.
148
148
149
+
- `:skip_links`[Boolean or Proc]
150
+
If `false`, does not convert or follow any internal URL (creating a single-page documentation).
151
+
If the value is a Proc, it is called for each page with the filter instance as argument.
152
+
- `:follow_links`[Proc]
153
+
Called for page with the filter instance as argument. If the returned value is `false`, does not add internal URLs to the queue.
149
154
- `:trailing_slash`[Boolean]
150
155
If `true`, adds a trailing slash to all internal URLs. If `false`, removes it.
151
156
This is another option used to remove duplicate pages.
152
-
- `:skip_links`[Proc]
153
-
Called with each `<a>` node. If the returned value is `true`, the link is skipped and its URL ignored.
154
157
- `:skip`[Array]
155
158
Ignores internal URLs whose sub-paths (path from the `base_url`) are in the Array (case-insensitive).
0 commit comments