You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
@@ -131,7 +133,7 @@ It's unclear where this environmental variable is used within the crawler.
131
133
132
134
We use multiple services to store the crawler's harvests of license information.
133
135
134
-
If you look at the value of this environmental variable, you will see that it is **"cdDispatch+cd(azblob)+webhook"**
136
+
If you look at the value of this environmental variable, you will see that it is **"cdDispatch+cd(azblob)+webhook"**. In the production crawler Dockerfile, it is configured as **"cdDispatch+cd(azblob)+azqueue"**.
135
137
136
138
These are used by [the crawler configuration code](https://github.com/clearlydefined/crawler/blob/32a0d6b59edfda5d3226c50680e4a8338af395cd/config/cdConfig.js).
137
139
@@ -151,6 +153,10 @@ We use a few different "dispatchers" - which are used to fetch GitHub repos or P
151
153
152
154
cdDispatch refers to the generic base file that handles calls to the various dispatchers.
153
155
156
+
**azqueue**
157
+
158
+
This refers to an Azure Storage Queue used by the crawler to notify a service upon the completion of a tool’s processing. The default queue name is `harvests`. More details on the configuration can be found in the [cdConfig.js file](https://github.com/clearlydefined/crawler/blob/32a0d6b59edfda5d3226c50680e4a8338af395cd/config/cdConfig.js#L95).
159
+
154
160
### CRAWLER_WEBHOOK
155
161
156
162
These environmental variables are used to define the url for the ClearlyDefined service's webhook URL (This is what the crawler calls after it completes a harvest).
@@ -159,6 +165,12 @@ In Dev the webhook url is "https://dev-api.clearlydefined.io/webhook".
159
165
160
166
The token is what we use to authenticate to the API (so that only the crawler can call that part of the ClearlyDefined Service api)
This environment variable is optional and specifically applies to the `azqueue` crawler store provider. It sets the visibility timeout, which determines how long messages remain hidden after being pushed onto the queue.
171
+
172
+
The default value is `0`. For production crawlers, this value is explicitly set to `300 seconds` (5 minutes).
173
+
162
174
## Docker environmental variables
163
175
164
176
The Docker environmental variables define what container image is used for the Crawler, as well as what registry that image is kept in, and authentication info for the registry.
0 commit comments