You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: services/horizon/docker/verify-range/README.md
+70-18Lines changed: 70 additions & 18 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -2,6 +2,11 @@
2
2
3
3
This docker image allows running multiple instances of `horizon ingest verify-command` on a single machine or running it in [AWS Batch](https://aws.amazon.com/batch/).
4
4
5
+
## Data directory
6
+
The image by default stores all outputs of db, captive core, and buffered storage processes
7
+
under the `/data` directory at runtime in the container. Therefore it is strongly recommended
8
+
to provide an external volume mount to the container for `/data` of at least 300-500GB. Specify this at docker run time via - `docker run -v /host/volume:/data`
9
+
5
10
## Env variables
6
11
7
12
### Running locally
@@ -14,16 +19,51 @@ This docker image allows running multiple instances of `horizon ingest verify-co
|`AWS_BATCH_JOB_ARRAY_INDEX`| The zero based index of a single batch Job. |
25
+
|`BATCH_START_LEDGER`| The `FROM` ledger of the requested ledger range to verify. |
26
+
|`BATCH_SIZE`| Size of the batch, must be multiple of 64. |
27
+
28
+
29
+
### Datastore and GCP Credentials Usage
30
+
31
+
This image supports connecting to GCS buckets for ledger data instead of captive core. To use this feature configure the container with these additional settings:
22
32
23
-
#### Example
33
+
#### GCP Credentials
34
+
- Purpose: To access GCS buckets the image needs GCP credentials.
35
+
- Two options are available to provide this to container:
36
+
- As an environment variable:
37
+
- Pass the GCP JSON credentials as a string in a `GCP_CREDS` environment variable:
38
+
```sh
39
+
docker run -e GCP_CREDS='{...}' ...
40
+
```
41
+
- As a volume mount:
42
+
- Mount the GCP json credentials file on host to the container, e.g.:
43
+
```sh
44
+
docker run -v /host/path/credentials.json:/tmp/credentials.json -e GOOGLE_APPLICATION_CREDENTIALS=/tmp/credentials.json ...
45
+
```
24
46
25
-
When you start 10 jobs with `BATCH_START_LEDGER=63` and `BATCH_SIZE=64`
26
-
it will run the following ranges:
47
+
#### GCS Datastore settings
48
+
- Purpose: Defines the GCS bucket name and ledger partioning used on the buckets. These settings are referenced as a single toml file at runtime. Here is an example [datastore_config.toml](../../../galexie/config.example.toml)
49
+
- Two options are available to provide this to container:
50
+
- As an environment variable:
51
+
- Pass the datastore TOML config as a string(including line breaks, tabs) in the `DATASTORE_CONFIG_PLAIN` environment variable:
52
+
```sh
53
+
docker run -e DATASTORE_CONFIG_PLAIN='[buffered_storage_backend_config]\nbuffer_size = 5\n ...'
54
+
```
55
+
- As a volume mount:
56
+
- Mount the datastore toml config file from host to the container, e.g.:
57
+
```sh
58
+
docker run -v /host/path/datastore-config.toml:/tmp/datastore-config.toml -e DATASTORE_CONFIG=/tmp/datastore-config.toml
59
+
```
60
+
61
+
### Examples of running container
62
+
63
+
#### Batch jobs example
64
+
When run from aws batch, given `BATCH_START_LEDGER=63` and `BATCH_SIZE=64`
65
+
it will generate runner jobs and give them each a `AWS_BATCH_JOB_ARRAY_INDEX`.
66
+
The verify-range container will then generate the associated ledger ranges per each job:
27
67
28
68
|`AWS_BATCH_JOB_ARRAY_INDEX`|`FROM`|`TO`|
29
69
|-----------------------------|--------|------|
@@ -32,13 +72,25 @@ it will run the following ranges:
32
72
| 2 | 191 | 255 |
33
73
| 3 | 255 | 319 |
34
74
35
-
## Tips when using AWS Batch
36
-
37
-
* In "Job definition" set vCPUs to 2 and Memory to 4096. This represents the `c5.large` instances Horizon should be using.
38
-
* In "Compute environments":
39
-
* Set instance type to "c5.large".
40
-
* Set "Maximum vCPUs" to 2x the number of instances you want to start (because "c5.large" has 2 vCPUs). Ex. 10 vCPUs = 5 x "c5.large" instances.
41
-
* Use spot instances! It's much cheaper and speed of testing will be the same in 99% of cases.
42
-
* You need to publish the image if there are any changes in `Dockerfile` or one of the scripts.
43
-
* When batch processing is over check if instances have been terminated. Sometimes AWS doesn't terminate them.
44
-
* Make sure the job timeout is set to a larger value if you verify larger ranges. Default is just 100 seconds.
75
+
#### `docker run` example
76
+
Running the verify-range image as local container with `docker run`
77
+
* run verify range and use captive core to get ledgers from pubnet:
78
+
```
79
+
docker run -e FROM=63 \
80
+
-e TO=127 \
81
+
-e BRANCH=<targetversion> \
82
+
-e BASE_BRANCH=master \
83
+
verify-range:latest
84
+
```
85
+
* run verify range with gcs datastore to use precomputed ledger metadata from buckets, captive core is not used:
0 commit comments