Skip to content

Commit d00bcc1

Browse files
committed
Add docs for CRAWLER_LICENSEE_PARALLELISM
1 parent 1448155 commit d00bcc1

File tree

1 file changed

+8
-0
lines changed

1 file changed

+8
-0
lines changed

service_config/crawler.md

Lines changed: 8 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -6,6 +6,7 @@
66
- [CRAWLER\_GITHUB\_TOKEN](#crawler_github_token)
77
- [CRAWLER\_HOST](#crawler_host)
88
- [CRAWLER\_INSIGHTS\_KEY](#crawler_insights_key)
9+
- [CRAWLER\_LICENSEE\_PARALLELISM](#crawler_licensee_parallelism)
910
- [CRAWLER\_NAME](#crawler_name)
1011
- [CRAWLER\_QUEUE\_PREFIX](#crawler_queue_prefix)
1112
- [CRAWLER\_QUEUE\_PROVIDER](#crawler_queue_provider)
@@ -34,6 +35,7 @@ The environmental variables for the cdcrawler-dev App Service include:
3435
* CRAWLER_GITHUB_TOKEN
3536
* CRAWLER_HOST
3637
* CRAWLER_INSIGHTS_KEY
38+
* CRAWLER_LICENSEE_PARALLELISM
3739
* CRAWLER_NAME
3840
* CRAWLER_QUEUE_AZURE_CONNECTION_STRING
3941
* CRAWLER_QUEUE_PREFIX
@@ -87,6 +89,12 @@ Note that we only use this in the development environment, not in the production
8789

8890
We use [Azure Application Insights](https://docs.microsoft.com/en-us/azure/azure-monitor/app/app-insights-overview) to monitor the crawler application. This requires a key and this is where it is kept.
8991

92+
### CRAWLER_LICENSEE_PARALLELISM
93+
94+
This is the maximum number of `licensee` processes to run in parallel. `licensee` is a tool to collect license
95+
information. The default value is `10` and setting it to a smaller value can reduce CPU spikes and lead to the crawler
96+
having a more uniform CPU usage.
97+
9098
### CRAWLER_NAME
9199

92100
This is a name to refer to the crawler with. Note that we set it in the App Service in the development environment and in [the Docker file](https://github.com/clearlydefined/crawler/blob/32a0d6b59edfda5d3226c50680e4a8338af395cd/Dockerfile) for the Prod environment.

0 commit comments

Comments
 (0)