You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: README.md
+17-6Lines changed: 17 additions & 6 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -6,23 +6,34 @@ A service that crawls projects and packages for information relevant to ClearlyD
6
6
7
7
The quickest way to get a fully functional local ClearlyDefined set up (including the crawler) is to use the [Dockerized ClearlyDefined environment setup](https://github.com/clearlydefined/docker_dev_env_experiment). This runs all services locally and does not require access to the ClearlyDefined Azure account.
8
8
9
+
**THIS IS THE SUGGEST DEV WORKFLOW AS LOCAL INSTALL/SETUP IS VERY ANGRY RIGHT NOW.**
10
+
9
11
## Alternative Setup
10
12
11
-
Some parts of this set up may require access to the ClearlyDefined Azure Account.
13
+
Some parts of this set up may require access to the ClearlyDefined Azure Account or proper setup of [Azurite](https://learn.microsoft.com/en-us/azure/storage/common/storage-use-azurite) for local storage management.
12
14
13
15
1. Clone this repo
14
-
1.`cd` to the repo dir and run `npm install`
15
-
1. Copy the `template.env.json` file to the **parent** directory of the repo and rename it to `env.json`. Ideally this repo is colocated with the other ClearlyDefined repos. You can share the `env.json` file. Just merge the two files. Some properties are meant to be shared.
16
-
1. After copying/merging, update the file to have the property values for your system. See the [Configuration](#configuration) section for more details.
17
-
1. Install [ScanCode](https://github.com/nexB/scancode-toolkit) if desired (see below).
18
-
1. Run `npm start`
16
+
2.`cd` to the repo dir and run `npm install`
17
+
3. Copy the `template.env.json` file to the **parent** directory of the repo and rename it to `env.json`. Ideally this repo is colocated with the other ClearlyDefined repos. You can share the `env.json` file. Just merge the two files. Some properties are meant to be shared.
18
+
4. After copying/merging, update the file to have the property values for your system. See the [Configuration](#configuration) section for more details.
19
+
5. Install [ScanCode](https://github.com/nexB/scancode-toolkit) if desired (see below).
20
+
6. Run `npm start`
19
21
20
22
That results in the ClearlyDefined crawler starting up and listening for POSTs on port 5000. See the [Configuration](#configuration) section for info on how to change the port.
21
23
22
24
### ScanCode install notes
23
25
24
26
Due to an issue with ScanCode's install configuration on Windows, you may need to **replace** the `bin` folder (actually a "junction") with the contents of the `Scripts` folder. That is, delete `bin` and copy `Scripts` to `bin`. See https://github.com/nexB/scancode-toolkit/issues/1129 for more details.
25
27
28
+
## Setup for running tests
29
+
30
+
If you are just interested in running the tests for the crawler then you need to be using Node v18.20.8 on your local system to get `npm install` and the tests to run safely. This will be fixed as we upgrade the dependencies but for now this is the best solution. If you are using [nvm](https://github.com/nvm-sh/nvm) you can follow the instructions below, using the lowest working node version, to get things setup to run tests.
31
+
32
+
1. Clone this repo
33
+
2. cd into the `crawler` directory
34
+
3. run `nvm install v18.20.8; nvm use v18.20.8` to instal the correct node version
35
+
4. run `npm run test` and profit
36
+
26
37
## Queuing work with the crawler
27
38
28
39
The crawler takes _requests_ to rummage around and find relevant information about projects. For example, to crawl an NPM, or a GitHub repo, POST one of the following JSON bodies to `http://localhost:5000/requests`. Note that you can also queue an array of requests by POSTing a single (or array of) JSON request object. For example,
0 commit comments