Skip to content

Commit 8bb4568

Browse files
committed
update documentation
Signed-off-by: Nell Shamrell <[email protected]>
1 parent b177672 commit 8bb4568

File tree

2 files changed

+23
-2
lines changed

2 files changed

+23
-2
lines changed

README.md

Lines changed: 17 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -185,8 +185,24 @@ $ curl http://localhost:4000/harvest/maven/mavencentral/org.flywaydb/flyway-mave
185185

186186
### Clearly Defined Crawler
187187

188-
TODO
188+
The Crawler is what "crawls" package registries, github, and more to scan and collect license information.
189189

190+
This is run within it's own container. Queues used by the crawler are current run in the container's memory.
191+
192+
As noted above, any Clearly Defined environment needs a place to store raw harvest information. In the case of this development environment, we use the same file storage place as the service (harvest information is stored in a volume that is mounted to both containers).
193+
194+
To see this in action, you can request a package that has not been harvested through either the UI or through the service API.
195+
196+
To request it through the UI, navigate to http://localhost:3000/definitions/npm/npmjs/-/npm/7.3.0 in your browser.
197+
198+
To request it through the API, run:
199+
200+
```bash
201+
$ curl localhost:4000/definitions/npm/npmjs/-/npm/7.3.0
202+
```
203+
204+
You will first see that it does not have the definition. Check back in a few minutes after you
205+
run these commands and you should see newly harvested data.
190206

191207
### Clearly Defined Mongo DB
192208

sample_env

Lines changed: 6 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -27,4 +27,9 @@ HARVEST_STORE_PROVIDER="file"
2727
FILE_STORE_LOCATION="/tmp/harvested_data"
2828

2929
# Crawler Info
30-
CRAWLER_GITHUB_TOKEN="<your GitHub token>"
30+
CRAWLER_API_URL="http://crawler:5000"
31+
CRAWLER_GITHUB_TOKEN="<your GitHub token>"
32+
CRAWLER_DEADLETTER_PROVIDER=cd(file)
33+
CRAWLER_NAME=cdcrawlerlocal
34+
CRAWLER_QUEUE_PROVIDER=memory
35+
CRAWLER_STORE_PROVIDER=cd(file)

0 commit comments

Comments
 (0)