@@ -95,10 +95,10 @@ security:
9595 - BasicAuth : []
9696tags :
9797 - name : actions
98- x-displayName : Actions
98+ x-displayName : State
9999 description : >
100- Actions change the state of crawlers, such as pausing and unpausing
101- schedules or testing the crawler with specific URLs.
100+ Change the state of crawlers, such as pausing crawl schedules or testing
101+ the crawler with specific URLs.
102102 - name : config
103103 x-displayName : Configuration
104104 description : >
@@ -117,7 +117,7 @@ tags:
117117 The editor has autocomplete and built-in validation so you can try your
118118 configuration changes before committing them.
119119 - name : crawlers
120- x-displayName : Crawler
120+ x-displayName : Manage
121121 description : |
122122 A crawler is an object with a name and a [configuration](#tag/config).
123123 Use these endpoints to create, rename, and delete crawlers.
@@ -817,7 +817,7 @@ components:
817817
818818
819819 For more information, see the [`cache`
820- documentation](https://www.algolia.com/doc/tools/crawler/apis/configuration/ cache/).
820+ documentation](https://www.algolia.com/doc/tools/crawler/apis/cache/).
821821 properties :
822822 enabled :
823823 type : boolean
@@ -861,7 +861,7 @@ components:
861861
862862
863863 For more information, see the [`hostnameAliases`
864- documentation](https://www.algolia.com/doc/tools/crawler/apis/configuration/hostname-aliases /).
864+ documentation](https://www.algolia.com/doc/tools/crawler/apis/hostnamealiases /).
865865 additionalProperties :
866866 type : string
867867 description : Hostname that should be added in the records.
@@ -919,11 +919,11 @@ components:
919919 discoveryPatterns :
920920 type : array
921921 description : >
922- Indicates additional pages that the crawler should visit.
922+ Indicates _intermediary_ pages that the crawler should visit.
923923
924924
925925 For more information, see the [`discoveryPatterns`
926- documentation](https://www.algolia.com/doc/tools/crawler/apis/configuration/discovery-patterns /).
926+ documentation](https://www.algolia.com/doc/tools/crawler/apis/discoverypatterns /).
927927 items :
928928 $ref : ' #/components/schemas/urlPattern'
929929 fileTypesToMatch :
@@ -986,7 +986,7 @@ components:
986986
987987
988988 For details, consult the [`recordExtractor`
989- documentation](https://www.algolia.com/doc/tools/crawler/apis/configuration/actions/#parameter-param- recordextractor).
989+ documentation](https://www.algolia.com/doc/tools/crawler/apis/recordextractor/ ).
990990 properties :
991991 __type :
992992 $ref : ' #/components/schemas/configurationRecordExtractorType'
@@ -1017,10 +1017,19 @@ components:
10171017 ignoreCanonicalTo :
10181018 oneOf :
10191019 - type : boolean
1020- description : |
1021- Whether to ignore canonical redirects.
1020+ description : >
1021+ Determines if the crawler should extract records from a page with a
1022+ [canonical
1023+ URL](https://www.algolia.com/doc/tools/crawler/getting-started/crawler-configuration/#canonical-urls-and-crawler-behaviorr).
1024+
1025+
1026+ If ignoreCanonicalTo is set to:
1027+
10221028
1023- If true, canonical URLs for pages are ignored.
1029+ - `true` all canonical URLs are ignored.
1030+
1031+ - One or more URL patterns, the crawler will ignore the canonical
1032+ URL if it matches a pattern.
10241033 - type : array
10251034 description : |
10261035 Canonical URLs or URL patterns to ignore.
@@ -2702,10 +2711,12 @@ components:
27022711 type : number
27032712 default : 0
27042713 description : Minimum waiting time in milliseconds.
2714+ example : 7000
27052715 max :
27062716 type : number
27072717 default : 20000
27082718 description : Maximum waiting time in milliseconds.
2719+ example : 15000
27092720 browserRequest :
27102721 type : object
27112722 description : |
@@ -2807,11 +2818,15 @@ components:
28072818 - $ref : ' #/components/schemas/oauthRequest'
28082819 renderJavaScript :
28092820 description : >
2810- Crawl JavaScript-rendered pages with a headless browser.
2821+ If `true`, use a Chrome headless browser to crawl pages.
2822+
28112823
2824+ Because crawling JavaScript-based web pages is slower than crawling
2825+ regular HTML pages, you can apply this setting to a specific list of
2826+ pages.
28122827
2813- For more information, see the [`renderJavaScript`
2814- documentation](https://www.algolia.com/doc/tools/crawler/apis/configuration/render-java-script/) .
2828+ Use [micromatch](https://github.com/micromatch/micromatch) to define URL
2829+ patterns, including negations and wildcards .
28152830 oneOf :
28162831 - type : boolean
28172832 description : Whether to render all pages.
@@ -2820,25 +2835,30 @@ components:
28202835 items :
28212836 type : string
28222837 description : URL or URL pattern to render.
2823- example : https://www.example.com
2838+ example :
2839+ - http://www.mysite.com/dynamic-pages/**
28242840 - title : headlessBrowserConfig
28252841 type : object
28262842 description : Configuration for rendering HTML.
28272843 properties :
28282844 enabled :
28292845 type : boolean
2830- description : Whether to render matching URLs.
2846+ description : Whether to enable JavaScript rendering.
2847+ example : true
28312848 patterns :
28322849 type : array
28332850 description : URLs or URL patterns to render.
28342851 items :
28352852 type : string
2853+ example :
2854+ - http://www.mysite.com/dynamic-pages/**
28362855 adBlock :
28372856 type : boolean
2857+ default : false
28382858 description : >
2839- Whether to turn on the built-in adblocker .
2859+ Whether to use the Crawler's ad blocker .
28402860
2841- This blocks most ads and tracking scripts but can break some
2861+ It blocks most ads and tracking scripts but can break some
28422862 sites.
28432863 waitTime :
28442864 $ref : ' #/components/schemas/waitTime'
@@ -2847,7 +2867,7 @@ components:
28472867 - patterns
28482868 requestOptions :
28492869 type : object
2850- description : Options to add to all HTTP requests made by the crawler.
2870+ description : Lets you add options to HTTP requests made by the crawler.
28512871 properties :
28522872 proxy :
28532873 type : string
@@ -2898,7 +2918,7 @@ components:
28982918
28992919
29002920 For more information, see the [`schedule`
2901- documentation](https://www.algolia.com/doc/tools/crawler/apis/configuration/ schedule/).
2921+ documentation](https://www.algolia.com/doc/tools/crawler/apis/schedule/).
29022922 example : every weekday at 12:00 pm
29032923 Configuration :
29042924 type : object
@@ -2922,7 +2942,7 @@ components:
29222942
29232943
29242944 For more information, see the [`apiKey`
2925- documentation](https://www.algolia.com/doc/tools/crawler/apis/configuration/api-key /).
2945+ documentation](https://www.algolia.com/doc/tools/crawler/apis/apikey /).
29262946 appId :
29272947 $ref : ' #/components/schemas/applicationID'
29282948 exclusionPatterns :
@@ -2961,11 +2981,11 @@ components:
29612981 type : array
29622982 maxItems : 9999
29632983 description : >
2964- URLs from where to start crawling.
2965-
2984+ The Crawler treats `extraUrls` the same as `startUrls`.
29662985
2967- For more information, see the [`extraUrls`
2968- documentation](https://www.algolia.com/doc/tools/crawler/apis/configuration/extra-urls/).
2986+ Specify `extraUrls` if you want to differentiate between URLs you
2987+ manually added to fix site crawling from those you initially
2988+ specified in `startUrls`.
29692989 items :
29702990 type : string
29712991 ignoreCanonicalTo :
@@ -2977,7 +2997,7 @@ components:
29772997
29782998
29792999 For more information, see the [`ignoreNoFollowTo`
2980- documentation](https://www.algolia.com/doc/tools/crawler/apis/configuration/ignore-no-follow-to /).
3000+ documentation](https://www.algolia.com/doc/tools/crawler/apis/ignorenofollowto /).
29813001 ignoreNoIndex :
29823002 type : boolean
29833003 description : |
@@ -3022,8 +3042,13 @@ components:
30223042 Crawler index settings.
30233043
30243044
3025- For more information, see the [`initialIndexSettings`
3026- documentation](https://www.algolia.com/doc/tools/crawler/apis/configuration/initial-index-settings/).
3045+ These index settings are only applied during the first crawl of an
3046+ index.
3047+
3048+ Any subsequent changes won't be applied to the index.
3049+
3050+ Instead, make changes to your index settings in the [Algolia
3051+ dashboard](https://dashboard.algolia.com/explorer/configuration/).
30273052 additionalProperties :
30283053 $ref : ' #/components/schemas/indexSettings'
30293054 x-additionalPropertiesName : indexName
@@ -3035,7 +3060,7 @@ components:
30353060
30363061
30373062 For more information, see the [`linkExtractor`
3038- documentation](https://www.algolia.com/doc/tools/crawler/apis/configuration/link-extractor /).
3063+ documentation](https://www.algolia.com/doc/tools/crawler/apis/linkextractor /).
30393064 properties :
30403065 __type :
30413066 $ref : ' #/components/schemas/configurationRecordExtractorType'
@@ -3067,11 +3092,18 @@ components:
30673092 maximum : 100
30683093 maxUrls :
30693094 type : number
3070- description : |
3071- Maximum number of crawled URLs.
3095+ description : >
3096+ Limits the number of URLs your crawler processes.
3097+
3098+
3099+ Change it to a low value, such as 100, for quick crawling tests.
3100+
3101+ Change it to a higher explicit value for full crawls to prevent it
3102+ from getting "lost" in complex site structures.
3103+
30723104
3073- Setting `maxUrls` doesn't guarantee consistency between crawls
3074- because the crawler processes URLs in parallel .
3105+ Because the Crawler works on many pages simultaneously, `maxUrls`
3106+ doesn't guarantee finding the same pages each time it runs .
30753107 minimum : 1
30763108 maximum : 15000000
30773109 rateLimit :
0 commit comments