Skip to content

Commit 06286b2

Browse files
committed
Update documentation inline with the chunk cache additions
1 parent d394384 commit 06286b2

File tree

4 files changed

+64
-3
lines changed

4 files changed

+64
-3
lines changed

docs/architecture.md

Lines changed: 9 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -56,6 +56,15 @@ This is implemented using the `S3ClientMap` in `src/s3_client.rs` and benchmarke
5656

5757
Downloaded storage chunk data is returned to the request handler as a [Bytes](https://docs.rs/bytes/latest/bytes/struct.Bytes.html) object, which is a wrapper around a `u8` (byte) array.
5858

59+
## S3 object caching
60+
61+
A cache can be optionally enabled to store downloaded S3 objects to disk, this allows the Reductionist to repeat operations on already downloaded data objects utilising faster disk I/O over network I/O.
62+
Authenticaiton is passed through to the S3 object store and access to cached data by users other than the original requestor is allowed if S3 authentication permits. Authentication can be optionally disabled for further cache speedup in trusted environments.
63+
64+
A [Tokio MPSC channel](https://docs.rs/tokio/latest/tokio/sync/mpsc/index.html) bridges write access between the requests of the asynchronous [Axum](https://docs.rs/axum) web framework and synchronous writes to the disk cache; this allows requests to the Reductionist to continue unblocked along their operation pipeline whilst being queued for cache storage.
65+
66+
The disk cache can be managed overall by size and by time to live (TTL) on individual data objects with automatic pruning removing expired objects. Cache state is maintained on disk allowing the cache to be reused across restarts of the Reductionist.
67+
5968
## Filters and compression
6069

6170
When a variable in a netCDF, HDF5 or Zarr dataset is created, it may be compressed to reduce storage requirements.

docs/deployment.md

Lines changed: 52 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -193,13 +193,64 @@ Note, this is the default.
193193
Create a `certs` directory under the home directory of the non-privileged deployment user, this will be done automatically and the following files will be added if Step is deployed.
194194
If using third party certificates the following files must be added manually using the file names shown:
195195

196-
| Filename | Description |
196+
| Filename | Description |
197197
| -------- | ------- |
198198
| certs/key.pem | Private key file |
199199
| certs/cert.pem | Certificate file including any intermediates |
200200

201201
Certificates can be added post Reductionist deployment but the Reductionist's container will need to be restarted afterwards.
202202

203+
## Reductionist Configuration
204+
205+
In addition to the `certs` configuration above the file `deployment/group_vars/all` covers the following configuration.
206+
207+
| Ansible Parameter | Description |
208+
| - | - |
209+
| reductionist_build_image | Whether to locally build the Reductionist container |
210+
| reductionist_src_url | Source URL for the Reductionist repository |
211+
| reductionist_src_version | Repository branch to use for local builds |
212+
| reductionist_repo_location | Where to clone the Reductionist repository |
213+
| reductionist_clone_repo | By default the repository cloning overwrites local changes, this disables |
214+
| reductionist_name | Name for Reductionist container |
215+
| reductionist_image | Container URL if downloading and not building |
216+
| reductionist_tag | Container tag |
217+
| reductionist_networks | List of container networks |
218+
| reductionist_env | Configures the Reductionist environment, see table of environment variables below |
219+
| reductionist_remote_certs_path | Path to certificates on the host |
220+
| reductionist_container_certs_path | Path to certificates within the container |
221+
| reductionist_remote_cache_path | Path to cache on host filesystem |
222+
| reductionist_container_cache_path | Path to cache within the container |
223+
| reductionist_volumes | Volumes to map from host to container |
224+
| reductionist_host | Used when deploying HAProxy to test connectivity to backend Reductionist(s) |
225+
| reductionist_cert_not_after | Certificate validity |
226+
227+
The ``reductionist_env`` parameter allows configuration of the environment variables passed to the Reductionist at runtime:
228+
229+
| Environment Variable | Description |
230+
| - | - |
231+
| REDUCTIONIST_HOST | The IP address on which to listen on, default "0.0.0.0" |
232+
| REDUCTIONIST_PORT | Port to listen on |
233+
| REDUCTIONIST_HTTPS | Whether to enable https connections |
234+
| REDUCTIONIST_CERT_FILE | Path to the certificate file used for https |
235+
| REDUCTIONIST_KEY_FILE | Path to the key file used for https |
236+
| REDUCTIONIST_SHUTDOWN_TIMEOUT | Maximum time in seconds to wait for operations to complete after receiving the 'ctrl+c' signal |
237+
| REDUCTIONIST_ENABLE_JAEGER | Whether to enable sending traces to Jaeger |
238+
| REDUCTIONIST_USE_RAYON | Whether to use Rayon for execution of CPU-bound tasks |
239+
| REDUCTIONIST_MEMORY_LIMIT | Memory limit in bytes |
240+
| REDUCTIONIST_S3_CONNECTION_LIMIT | S3 connection limit |
241+
| REDUCTIONIST_THREAD_LIMIT | Thread limit for CPU-bound tasks |
242+
| REDUCTIONIST_USE_CHUNK_CACHE | Whether to enable caching of downloaded data objects to disk |
243+
| REDUCTIONIST_CHUNK_CACHE_PATH | Absolute filesystem path used for the cache. Defaults to container cache path, see Ansible Parameters above |
244+
| REDUCTIONIST_CHUNK_CACHE_AGE | Time in seconds a chunk is kept in the cache |
245+
| REDUCTIONIST_CHUNK_CACHE_PRUNE_INTERVAL | Time in seconds between periodic pruning of the cache |
246+
| REDUCTIONIST_CHUNK_CACHE_SIZE_LIMIT | Maximum cache size, i.e. "100GB" |
247+
| REDUCTIONIST_CHUNK_CACHE_QUEUE_SIZE | Tokio MPSC buffer size used to queue downloaded objects between the asynchronous web engine and the synchronous cache |
248+
| REDUCTIONIST_CHUNK_CACHE_BYPASS_AUTH | Allow bypassing of S3 authentication when accessing cached data |
249+
250+
251+
Note, after changing any of the above parameters the Reductionist must be deployed, or redeployed, using the ansible playbook for the change to take effect.
252+
The idempotent nature of ansible necessitates that if redeploying then a running Reductionist container must be removed first.
253+
203254
## Usage
204255

205256
Once deployed, the Reductionist API is accessible on port 8080 by HAProxy. The Prometheus UI is accessible on port 9090 on the host running Prometheus. The Jaeger UI is accessible on port 16686 on the host running Jaeger.

0 commit comments

Comments
 (0)