You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: specs/crawler/common/schemas/configuration.yml
+3-9Lines changed: 3 additions & 9 deletions
Original file line number
Diff line number
Diff line change
@@ -39,19 +39,15 @@ Configuration:
39
39
items:
40
40
type: string
41
41
description: |
42
-
URLs to exclude from crawling.
43
-
44
-
Uses [micromatch](https://github.com/micromatch/micromatch) for negation, wildcards, and more.
42
+
Use [micromatch](https://github.com/micromatch/micromatch) for negation, wildcards, and more.
45
43
externalData:
46
44
type: array
47
45
description: |
48
46
References to external data sources for enriching the extracted records.
49
-
50
-
For more information, see [Enrich extracted records with external data](https://www.algolia.com/doc/tools/crawler/guides/enriching-extraction-with-external-data/).
51
47
maxItems: 10
52
48
items:
53
49
type: string
54
-
description: Reference to an external data source you configured in the Crawler dashboard.
50
+
description: For more information, see [Enrich extracted records with external data](https://www.algolia.com/doc/tools/crawler/guides/enriching-extraction-with-external-data/).
55
51
example: testCSV
56
52
extraUrls:
57
53
type: array
@@ -94,15 +90,13 @@ Configuration:
94
90
95
91
All URLs with the matching query parameters are treated as identical.
96
92
This prevents indexing URLs that just differ by their query parameters.
97
-
98
-
Use wildcards to match multiple query parameters.
99
93
maxItems: 9999
100
94
example:
101
95
- ref
102
96
- utm_*
103
97
items:
104
98
type: string
105
-
description: Query parameter to ignore. You can include wildcards to match a range of similar query parameters.
99
+
description: Use wildcards to match multiple query parameters.
106
100
ignoreRobotsTxtRules:
107
101
type: boolean
108
102
description: Whether to ignore rules defined in your `robots.txt` file.
0 commit comments