You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
This guide will help you deploy an Open Terms Archive collection to a server.
8
+
This guide will help you deploy an Open Terms Archive collection to a server. The deployment is automated using [Ansible](https://docs.ansible.com/ansible/latest/index.html) and will set up the Open Terms Archive engine and configure it to track your collection's terms.
9
+
10
+
## Prerequisites
11
+
12
+
Before starting, ensure you have:
13
+
14
+
- A basic understanding of the [deployment architecture]({{< relref "deployment/reference/architecture" >}})
15
+
- A server with admin access
16
+
- All collections repositories created, if not, see the [guide to create repositories]({{< relref "collections/how-to/create-repositories" >}})
17
+
- At least one declaration added to your collection
18
+
- A GitHub user account to automate actions such as committing entries in versions and snapshots repositories, reporting issues when tracking fails, publishing releases…
19
+
-[Ansible](https://docs.ansible.com/ansible/latest/installation_guide/intro_installation.html) installed on your local machine
9
20
10
21
## 1. Configure the server
11
22
12
23
First, ensure your server provides unsupervised access:
13
24
14
-
1. Check the SSH host key:
25
+
1. Check the SSH host key and get the SSH fingerprint by running the following command on your local machine:
26
+
15
27
```shell
16
-
ssh-keyscan --type=ed25519 <server_address>
28
+
ssh-keyscan -t ed25519 <server_address>
17
29
```
18
-
If no Ed25519 key appears, generate one on the server:
30
+
31
+
If no Ed25519 key appears, generate one by running the following commands on the server:
> **Note**: A server fingerprint is a unique identifier for your server's SSH key. It helps verify that you're connecting to the correct server and not a malicious one. The fingerprint is a hash of the server's public key and is used to prevent man-in-the-middle attacks. You'll need this fingerprint in the next steps for secure deployment.
39
+
40
+
2. Create a dedicated user account specifically for deployment purposes, by running the following commands on the server:
41
+
25
42
```shell
26
-
adduser <user>
27
-
usermod --append --groups=sudo <user>
43
+
adduser <deployment_user>
44
+
usermod --append --groups=sudo <deployment_user>
28
45
```
29
46
30
-
3. Grant passwordless sudo access:
47
+
> **Note**: The `adduser` command might not be installed by default on your system. It can be installed with `sudo apt-get install adduser`.
48
+
49
+
3. Configure passwordless sudo access for this user, by adding the following line to the `/etc/sudoers` file on the server:
50
+
31
51
```shell
32
-
# Add to /etc/sudoers:
33
-
<user> ALL=(ALL) NOPASSWD:ALL
52
+
<deployment_user> ALL=(ALL) NOPASSWD:ALL
34
53
```
35
54
55
+
> **Note**: While passwordless sudo access does reduce security compared to requiring a password, it is required for full automation in deployment workflows with Ansible. The deployment process requires system-level operations (like installing packages and configuring services) that must be executed without manual intervention. To mitigate security risks, this configuration is limited to a dedicated deployment user that should only be used for deployment purposes, and the server must be properly secured with SSH key authentication.
56
+
36
57
## 2. Set up the deployment configuration
37
58
38
-
1. Clone the collection declarations repository:
59
+
1. Clone the collection declarations repository that you want to deploy and navigate to the collection folder:
- Go to `https://github.com/OpenTermsArchive/<collection_name>-declarations/settings/secrets/actions`
75
+
3. Add the server fingerprint to GitHub, to allow the deployment workflow to uniquely identify the server:
76
+
- Go to `https://github.com/<organization>/<collection_id>-declarations/settings/secrets/actions`
52
77
- Create a new secret named `SERVER_FINGERPRINT` with your Ed25519 fingerprint
53
78
54
79
## 3. Configure SSH deployment keys
55
80
56
-
1. On the server, generate a deployment key:
81
+
1. On the server, generate a deployment key, which will be used by the continuous deployment workflow to connect to the server to deploy the collection:
- Create `SERVER_SSH_KEY` with the private key content
88
+
2. Add the private key to GitHub, to allow the deployment workflow to connect to the server:
89
+
- Go to `https://github.com/<organization>/<collection_id>-declarations/settings/secrets/actions`
90
+
- Create a new secret named `SERVER_SSH_KEY` with the private key content
65
91
66
-
3. Back up the keys:
67
-
- Store both public and private keys in the shared password database
68
-
- Create an entry titled "Deployment SSH key" in the collection folder
92
+
{{< showIfParam "ota" >}}
93
+
3. Back up the keys in the shared password database by creating an entry titled "Deployment SSH Key" in the collection folder and storing both public and private keys in this entry
94
+
{{< /showIfParam >}}
69
95
70
96
## 4. Set up GitHub permissions
71
97
72
-
1. Create a fine-grained GitHub token:
73
-
- Log in as OTA-Bot
98
+
1. Log in as the user account dedicated to bot-related actions in GitHub
99
+
100
+
2. Create a fine-grained GitHub token:
74
101
- Create a new token at github.com/settings/personal-access-tokens/new
75
-
- Set repository access for both declarations and versions repos
102
+
- Set repository access for both declarations and versions repositories
76
103
- Grant "Contents" and "Issues" write permissions
77
104
78
-
2. Back up the token:
79
-
- Store it in the shared password database under "GitHub Token"
105
+
3. If relevant, get the token approved by having an organization admin approve the token request
80
106
81
-
3. Get the token approved:
82
-
- Have an organization admin approve the token request
107
+
4. Keep this token for the next steps
83
108
84
-
## 5. Configure secrets
109
+
{{< showIfParam "ota" >}}
110
+
5. Back up the token in the shared password database by creating an entry titled "GitHub Token" in the collection folder and storing the token in this entry
111
+
{{< /showIfParam >}}
112
+
113
+
## 5. Configure and encrypt secrets
114
+
115
+
This section uses [Ansible Vault](https://docs.ansible.com/ansible/latest/vault_guide/index.html), a feature of Ansible that allows you to encrypt sensitive data like passwords and keys. The encrypted files can be safely committed to version control while keeping the actual secrets secure. The vault key you'll create will be used to encrypt and decrypt these secrets.
85
116
86
117
1. Generate and store a vault key:
87
118
- Generate a secure password without quotes/backticks
88
-
- Store it in the password database
89
-
- Create `deployment/vault.key` with the password
90
-
- Add it as `ANSIBLE_VAULT_KEY` in GitHub secrets
119
+
- Inside the collection folder, create a file named `deployment/vault.key` and paste the generated password into it.
120
+
- Go to `https://github.com/<organization>/<collection_id>-declarations/settings/secrets/actions`
121
+
- Create a new secret named `ANSIBLE_VAULT_KEY` and paste the same password into it.
91
122
92
-
2. Store GitHub token:
93
-
```
94
-
# In deployment/.env:
123
+
> **Note**: The same vault key is used in two places:
124
+
> - Locally as `vault.key` to encrypt/decrypt files during development
125
+
> - In GitHub Actions as `ANSIBLE_VAULT_KEY` to decrypt files during automated deployment
126
+
127
+
2. Store the GitHub token, generated in the previous section, in `deployment/.env`:
128
+
129
+
```shell
95
130
OTA_ENGINE_GITHUB_TOKEN=your_token
96
131
```
97
132
98
-
3. Encrypt the `.env` file:
133
+
3. Encrypt the `.env` file by running the following command inside the `deployment` folder of the collection:
134
+
99
135
```shell
100
136
ansible-vault encrypt .env
101
137
```
102
138
139
+
> **Note**: Running the command from the `deployment` folder will ensure that the `vault.key` file is used as vault key, since this folder contains an `ansible.cfg` file that explicitly configures this behavior.
140
+
>
141
+
> To decrypt an encrypted file, use:
142
+
>
143
+
> ```shell
144
+
> ansible-vault decrypt deployment/.env
145
+
> ```
146
+
>
147
+
> After making changes, re-encrypt it:
148
+
>
149
+
> ```shell
150
+
> ansible-vault encrypt deployment/.env
151
+
> ```
152
+
153
+
4. Commit the changes to the repository
154
+
155
+
{{< showIfParam "ota" >}}
156
+
5. Back up the vault key in the shared password database by creating an entry titled "Vault Key" in the collection folder and storing the vault key in this entry
157
+
{{< /showIfParam >}}
158
+
103
159
## 6. Set up collection-specific SSH key
104
160
105
-
1. Generate a new key:
161
+
1. Generate a new key, which will be used by the Open Terms Archive engine to perform actions on GitHub as the bot user:
2. Store the private key in `deployment/github-bot-private-key`
168
+
169
+
3. Encrypt the private key file by running the following command inside the `deployment` folder of the collection:
170
+
111
171
```shell
112
-
# Copy private key to deployment/github-bot-private-key
113
172
ansible-vault encrypt github-bot-private-key
114
173
```
115
174
116
-
3. Add the public key to OTA-Bot's GitHub account:
175
+
4. Commit the changes to the repository
176
+
177
+
5. Add the public key to bot user's GitHub account:
117
178
- Go to github.com/settings/ssh/new
118
179
- Add the public key with title "<collection_name> collection"
119
180
181
+
{{< showIfParam "ota" >}}
182
+
6. Back up the key in the shared password database by creating an entry titled "OTA-Bot GitHub SSH key" in the collection folder and storing both public and private keys in this entry
183
+
{{< /showIfParam >}}
184
+
120
185
## 7. Configure email notifications
121
186
122
-
1. Generate SMTP credentials:
123
-
- Create a new SMTP key in Brevo
124
-
- Name it "<collection_name> collection"
187
+
This section describes how to configure the engine to use a specific SMTP server to send email notifications when it encounters errors during the tracking process. This helps you stay informed about issues that need attention and allows you to restart the tracking process if needed.
188
+
189
+
1. Get the SMTP credentials (host, username, password) from your email provider
190
+
191
+
2. Update collection SMTP configuration within the `logger` key of `@opentermsarchive/engine` in the `config/production.json` file:
192
+
193
+
```json
194
+
"logger": {
195
+
"smtp": {
196
+
"host": "<smtp_host>",
197
+
"username": "<smtp_username>"
198
+
},
199
+
},
200
+
```
201
+
202
+
3. Store the password in `deployment/.env`:
125
203
126
-
2. Store the credentials:
127
204
```shell
128
-
# In deployment/.env:
129
205
OTA_ENGINE_SMTP_PASSWORD=your_smtp_key
130
206
```
131
207
132
-
3. Encrypt the `.env` file:
208
+
> **Note**: To decrypt the file encrypted in a previous step in order to add the password, run `ansible-vault decrypt .env`
209
+
210
+
4. Encrypt the `.env` file:
211
+
133
212
```shell
134
213
ansible-vault encrypt .env
135
214
```
136
215
216
+
{{< showIfParam "ota" >}}
217
+
5. Create a new SMTP key in Brevo and name it "<collection_name> collection"
218
+
6. Back up the key in the shared password database by creating an entry titled "SMTP Key" in the collection folder and storing the credentials in this entry
219
+
{{< /showIfParam >}}
220
+
137
221
## 8. Test the deployment
138
222
139
223
1. Via GitHub Actions:
140
224
- Check that the `deploy` action completes successfully
This document provides an overview of the key components and elements involved in the deployment process of a collection.
10
+
11
+
## Repository structure
12
+
13
+
A collection is defined by three repositories that work together to manage and track terms.
14
+
15
+
The declarations repository, `<collection_name>-declarations`, serves as the primary workspace for collection maintainers, containing declarations of the terms to track along with engine and deployment configurations.
16
+
17
+
This repository is complemented by two automatically managed repositories:
18
+
19
+
- The versions repository, `<collection_name>-versions`, which maintains a chronological history of terms changes in their readable format
20
+
- The snapshots repository, `<collection_name>-snapshots`, which maintains a chronological history of the original source document (HTML, PDF…) from which the terms will be extracted
21
+
22
+
These repositories must be considered as databases and are automatically updated by the engine whenever changes are detected in the tracked terms.
23
+
24
+
## Infrastructure
25
+
26
+
The server is where the Open Terms Archive engine runs.
27
+
28
+
The server requires administrative access to allow setting up the system in the appropriate state.
29
+
30
+
It has an Ed25519 SSH host key pair, `ssh_host_ed25519_key`, which provides a unique server fingerprint, `<server_ssh_fingerprint>`, for identity verification.
31
+
32
+
There is also a dedicated deployment user account, `<deployment_user>`, with passwordless sudo access to facilitate automated deployment tasks while maintaining security.
33
+
34
+
Process management is handled through [PM2](https://pm2.keymetrics.io/) and ensures the Open Terms Archive engine runs continuously and reliably.
35
+
36
+
The engine itself is the core application that performs the actual term tracking and repository management tasks.
37
+
38
+
## Security elements
39
+
40
+
### Authentication
41
+
42
+
Security is maintained through multiple layers of authentication.
43
+
44
+
The server's SSH host key pair, `ssh_host_ed25519_key`, generates a unique server fingerprint, `<server_ssh_fingerprint>`. This fingerprint verifies server identity and prevents man-in-the-middle attacks during deployment.
45
+
46
+
The deployment process uses a dedicated SSH key pair, `ota-deploy`, for secure server connections during the continuous deployment workflow.
47
+
48
+
A separate collection-specific SSH key pair, `<collection_name>-key`, enables the engine to perform GitHub actions as a bot user.
49
+
50
+
Access to GitHub repositories is controlled through a fine-grained access token, `OTA_ENGINE_GITHUB_TOKEN`, that provides specific permissions for repository management.
51
+
52
+
### Secret management
53
+
54
+
Sensitive information is protected by the [Ansible Vault](https://docs.ansible.com/ansible/latest/vault_guide/index.html) encryption system.
55
+
56
+
The vault system uses a master password, `vault.key` to encrypt and decrypt sensitive data. This includes the environment configuration file, `.env`, and the GitHub bot's private key, `github-bot-private-key`, ensuring that sensitive credentials remain secure while still being accessible to the deployment process.
57
+
58
+
## Automation tools
59
+
60
+
[GitHub Actions](https://docs.github.com/en/actions) and [Ansible](https://www.ansible.com/) automate the deployment process. GitHub Actions runs the workflow while Ansible configures the server and deploys the engine.
61
+
62
+
A dedicated GitHub user account is used for bot-related actions such as committing entries in versions and snapshots repositories, reporting issues when tracking fails, and publishing releases. This account is configured with specific permissions to perform these automated tasks.
63
+
64
+
The engine sends email notifications to collection administrators when errors or issues occur during the tracking process, enabling prompt intervention when needed.
65
+
66
+
The engine automatically creates issues in the declarations repository to notify collection maintainers when terms can no longer be tracked. These issues provide details about the tracking failure to allow maintainers to investigate and resolve the problem.
67
+
68
+
## Configuration files
69
+
70
+
The system's behavior is controlled through several key configuration files:
71
+
72
+
-`inventory.yml`: Defines server address and deployment parameters
-`vault.key`: Protects sensitive data through encryption
75
+
76
+
## Maintenance
77
+
78
+
The Open Terms Archive system is designed for continuous operation with minimal intervention.
79
+
80
+
The engine automatically tracks changes in terms, commits updates to the appropriate repositories, reports issues and sends notifications when issues occur.
81
+
82
+
System health is maintained through PM2's process management capabilities.
83
+
84
+
Regular administrative maintenance involves updating collections dependencies such as engine and deployment recipes. It also includes monitoring email notifications and reviewing application logs in case of issues or tracking interruptions.
0 commit comments