Skip to content

Commit db43488

Browse files
authored
Improve how to deploy a collection (#172)
2 parents ec6a059 + dc53468 commit db43488

File tree

15 files changed

+390
-58
lines changed

15 files changed

+390
-58
lines changed

config/_default/menus.en.yaml

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -36,7 +36,7 @@ footer:
3636
- name: About
3737
url: https://opentermsarchive.org/en/about/
3838
weight: 7
39-
- name: Contact-us
39+
- name: Contact us
4040
identifier: mailto
4141
weight: 8
4242
footer_sub:

content/collections/how-to/take-over.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -1,6 +1,6 @@
11
---
22
title: Take over a collection
3-
weight: 4
3+
weight: 5
44
---
55

66
# How to take over a collection

content/collections/how-to/terminate.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -1,6 +1,6 @@
11
---
22
title: Terminate a collection
3-
weight: 3
3+
weight: 4
44
---
55

66
# How to terminate a collection

content/deployment/how-to/deploy.md

Lines changed: 138 additions & 53 deletions
Original file line numberDiff line numberDiff line change
@@ -5,141 +5,226 @@ weight: 1
55

66
# How to deploy a collection
77

8-
This guide will help you deploy an Open Terms Archive collection to a server.
8+
This guide will help you deploy an Open Terms Archive collection to a server. The deployment is automated using [Ansible](https://docs.ansible.com/ansible/latest/index.html) and will set up the Open Terms Archive engine and configure it to track your collection's terms.
9+
10+
## Prerequisites
11+
12+
Before starting, ensure you have:
13+
14+
- A basic understanding of the [deployment architecture]({{< relref "deployment/reference/architecture" >}})
15+
- A server with admin access
16+
- All collections repositories created, if not, see the [guide to create repositories]({{< relref "collections/how-to/create-repositories" >}})
17+
- At least one declaration added to your collection
18+
- A GitHub user account to automate actions such as committing entries in versions and snapshots repositories, reporting issues when tracking fails, publishing releases…
19+
- [Ansible](https://docs.ansible.com/ansible/latest/installation_guide/intro_installation.html) installed on your local machine
920

1021
## 1. Configure the server
1122

1223
First, ensure your server provides unsupervised access:
1324

14-
1. Check the SSH host key:
25+
1. Check the SSH host key and get the SSH fingerprint by running the following command on your local machine:
26+
1527
```shell
16-
ssh-keyscan --type=ed25519 <server_address>
28+
ssh-keyscan -t ed25519 <server_address>
1729
```
18-
If no Ed25519 key appears, generate one on the server:
30+
31+
If no Ed25519 key appears, generate one by running the following commands on the server:
32+
1933
```shell
20-
sudo ssh-keygen --type=ed25519 --file=/etc/ssh/ssh_host_ed25519_key
34+
sudo ssh-keygen -t ed25519 -f /etc/ssh/ssh_host_ed25519_key
2135
sudo systemctl restart ssh
2236
```
2337

24-
2. Create a non-root user if needed:
38+
> **Note**: A server fingerprint is a unique identifier for your server's SSH key. It helps verify that you're connecting to the correct server and not a malicious one. The fingerprint is a hash of the server's public key and is used to prevent man-in-the-middle attacks. You'll need this fingerprint in the next steps for secure deployment.
39+
40+
2. Create a dedicated user account specifically for deployment purposes, by running the following commands on the server:
41+
2542
```shell
26-
adduser <user>
27-
usermod --append --groups=sudo <user>
43+
adduser <deployment_user>
44+
usermod --append --groups=sudo <deployment_user>
2845
```
2946

30-
3. Grant passwordless sudo access:
47+
> **Note**: The `adduser` command might not be installed by default on your system. It can be installed with `sudo apt-get install adduser`.
48+
49+
3. Configure passwordless sudo access for this user, by adding the following line to the `/etc/sudoers` file on the server:
50+
3151
```shell
32-
# Add to /etc/sudoers:
33-
<user> ALL=(ALL) NOPASSWD:ALL
52+
<deployment_user> ALL=(ALL) NOPASSWD:ALL
3453
```
3554

55+
> **Note**: While passwordless sudo access does reduce security compared to requiring a password, it is required for full automation in deployment workflows with Ansible. The deployment process requires system-level operations (like installing packages and configuring services) that must be executed without manual intervention. To mitigate security risks, this configuration is limited to a dedicated deployment user that should only be used for deployment purposes, and the server must be properly secured with SSH key authentication.
56+
3657
## 2. Set up the deployment configuration
3758

38-
1. Clone the collection declarations repository:
59+
1. Clone the collection declarations repository that you want to deploy and navigate to the collection folder:
60+
3961
```shell
40-
git clone https://github.com/OpenTermsArchive/<collection_id>-declarations.git
62+
git clone https://github.com/<organization>/<collection_id>-declarations.git
63+
cd <collection_id>-declarations
4164
```
4265

43-
2. Configure the inventory file `deployment/inventory.yml`:
66+
2. Configure the inventory file `deployment/inventory.yml` with your server's IP address, deployment user, server fingerprint and the repository URL:
67+
4468
```yaml
45-
<host>: "your.server.ip"
46-
ansible_user: "your_username"
47-
ed25519_fingerprint: "your_ssh_fingerprint"
69+
<server_ip>:
70+
ansible_user: <deployment_user>
71+
ed25519_fingerprint: <server_ssh_fingerprint>
72+
ota_source_repository: https://github.com/<organization>/<collection_id>-declarations.git
4873
```
4974
50-
3. Add the server fingerprint to GitHub:
51-
- Go to `https://github.com/OpenTermsArchive/<collection_name>-declarations/settings/secrets/actions`
75+
3. Add the server fingerprint to GitHub, to allow the deployment workflow to uniquely identify the server:
76+
- Go to `https://github.com/<organization>/<collection_id>-declarations/settings/secrets/actions`
5277
- Create a new secret named `SERVER_FINGERPRINT` with your Ed25519 fingerprint
5378

5479
## 3. Configure SSH deployment keys
5580

56-
1. On the server, generate a deployment key:
81+
1. On the server, generate a deployment key, which will be used by the continuous deployment workflow to connect to the server to deploy the collection:
82+
5783
```shell
58-
ssh-keygen --type=ed25519 --quiet --passphrase="" --file=~/.ssh/ota-deploy
84+
ssh-keygen -t ed25519 -N "" -f ~/.ssh/ota-deploy
5985
cat ~/.ssh/ota-deploy.pub >> ~/.ssh/authorized_keys
6086
```
6187

62-
2. Add the private key to GitHub:
63-
- Go to the repository secrets
64-
- Create `SERVER_SSH_KEY` with the private key content
88+
2. Add the private key to GitHub, to allow the deployment workflow to connect to the server:
89+
- Go to `https://github.com/<organization>/<collection_id>-declarations/settings/secrets/actions`
90+
- Create a new secret named `SERVER_SSH_KEY` with the private key content
6591

66-
3. Back up the keys:
67-
- Store both public and private keys in the shared password database
68-
- Create an entry titled "Deployment SSH key" in the collection folder
92+
{{< showIfParam "ota" >}}
93+
3. Back up the keys in the shared password database by creating an entry titled "Deployment SSH Key" in the collection folder and storing both public and private keys in this entry
94+
{{< /showIfParam >}}
6995

7096
## 4. Set up GitHub permissions
7197

72-
1. Create a fine-grained GitHub token:
73-
- Log in as OTA-Bot
98+
1. Log in as the user account dedicated to bot-related actions in GitHub
99+
100+
2. Create a fine-grained GitHub token:
74101
- Create a new token at github.com/settings/personal-access-tokens/new
75-
- Set repository access for both declarations and versions repos
102+
- Set repository access for both declarations and versions repositories
76103
- Grant "Contents" and "Issues" write permissions
77104

78-
2. Back up the token:
79-
- Store it in the shared password database under "GitHub Token"
105+
3. If relevant, get the token approved by having an organization admin approve the token request
80106

81-
3. Get the token approved:
82-
- Have an organization admin approve the token request
107+
4. Keep this token for the next steps
83108

84-
## 5. Configure secrets
109+
{{< showIfParam "ota" >}}
110+
5. Back up the token in the shared password database by creating an entry titled "GitHub Token" in the collection folder and storing the token in this entry
111+
{{< /showIfParam >}}
112+
113+
## 5. Configure and encrypt secrets
114+
115+
This section uses [Ansible Vault](https://docs.ansible.com/ansible/latest/vault_guide/index.html), a feature of Ansible that allows you to encrypt sensitive data like passwords and keys. The encrypted files can be safely committed to version control while keeping the actual secrets secure. The vault key you'll create will be used to encrypt and decrypt these secrets.
85116

86117
1. Generate and store a vault key:
87118
- Generate a secure password without quotes/backticks
88-
- Store it in the password database
89-
- Create `deployment/vault.key` with the password
90-
- Add it as `ANSIBLE_VAULT_KEY` in GitHub secrets
119+
- Inside the collection folder, create a file named `deployment/vault.key` and paste the generated password into it.
120+
- Go to `https://github.com/<organization>/<collection_id>-declarations/settings/secrets/actions`
121+
- Create a new secret named `ANSIBLE_VAULT_KEY` and paste the same password into it.
91122

92-
2. Store GitHub token:
93-
```
94-
# In deployment/.env:
123+
> **Note**: The same vault key is used in two places:
124+
> - Locally as `vault.key` to encrypt/decrypt files during development
125+
> - In GitHub Actions as `ANSIBLE_VAULT_KEY` to decrypt files during automated deployment
126+
127+
2. Store the GitHub token, generated in the previous section, in `deployment/.env`:
128+
129+
```shell
95130
OTA_ENGINE_GITHUB_TOKEN=your_token
96131
```
97132

98-
3. Encrypt the `.env` file:
133+
3. Encrypt the `.env` file by running the following command inside the `deployment` folder of the collection:
134+
99135
```shell
100136
ansible-vault encrypt .env
101137
```
102138

139+
> **Note**: Running the command from the `deployment` folder will ensure that the `vault.key` file is used as vault key, since this folder contains an `ansible.cfg` file that explicitly configures this behavior.
140+
>
141+
> To decrypt an encrypted file, use:
142+
>
143+
> ```shell
144+
> ansible-vault decrypt deployment/.env
145+
> ```
146+
>
147+
> After making changes, re-encrypt it:
148+
>
149+
> ```shell
150+
> ansible-vault encrypt deployment/.env
151+
> ```
152+
153+
4. Commit the changes to the repository
154+
155+
{{< showIfParam "ota" >}}
156+
5. Back up the vault key in the shared password database by creating an entry titled "Vault Key" in the collection folder and storing the vault key in this entry
157+
{{< /showIfParam >}}
158+
103159
## 6. Set up collection-specific SSH key
104160

105-
1. Generate a new key:
161+
1. Generate a new key, which will be used by the Open Terms Archive engine to perform actions on GitHub as the bot user:
162+
106163
```shell
107-
ssh-keygen --type=ed25519 --comment=[email protected] --passphrase="" --file=./<collection_name>-key
164+
ssh-keygen -t ed25519 -C [email protected] -N "" -f ./<collection_name>-key
108165
```
109166

110-
2. Encrypt and store the private key:
167+
2. Store the private key in `deployment/github-bot-private-key`
168+
169+
3. Encrypt the private key file by running the following command inside the `deployment` folder of the collection:
170+
111171
```shell
112-
# Copy private key to deployment/github-bot-private-key
113172
ansible-vault encrypt github-bot-private-key
114173
```
115174

116-
3. Add the public key to OTA-Bot's GitHub account:
175+
4. Commit the changes to the repository
176+
177+
5. Add the public key to bot user's GitHub account:
117178
- Go to github.com/settings/ssh/new
118179
- Add the public key with title "<collection_name> collection"
119180

181+
{{< showIfParam "ota" >}}
182+
6. Back up the key in the shared password database by creating an entry titled "OTA-Bot GitHub SSH key" in the collection folder and storing both public and private keys in this entry
183+
{{< /showIfParam >}}
184+
120185
## 7. Configure email notifications
121186

122-
1. Generate SMTP credentials:
123-
- Create a new SMTP key in Brevo
124-
- Name it "<collection_name> collection"
187+
This section describes how to configure the engine to use a specific SMTP server to send email notifications when it encounters errors during the tracking process. This helps you stay informed about issues that need attention and allows you to restart the tracking process if needed.
188+
189+
1. Get the SMTP credentials (host, username, password) from your email provider
190+
191+
2. Update collection SMTP configuration within the `logger` key of `@opentermsarchive/engine` in the `config/production.json` file:
192+
193+
```json
194+
"logger": {
195+
"smtp": {
196+
"host": "<smtp_host>",
197+
"username": "<smtp_username>"
198+
},
199+
},
200+
```
201+
202+
3. Store the password in `deployment/.env`:
125203

126-
2. Store the credentials:
127204
```shell
128-
# In deployment/.env:
129205
OTA_ENGINE_SMTP_PASSWORD=your_smtp_key
130206
```
131207

132-
3. Encrypt the `.env` file:
208+
> **Note**: To decrypt the file encrypted in a previous step in order to add the password, run `ansible-vault decrypt .env`
209+
210+
4. Encrypt the `.env` file:
211+
133212
```shell
134213
ansible-vault encrypt .env
135214
```
136215

216+
{{< showIfParam "ota" >}}
217+
5. Create a new SMTP key in Brevo and name it "<collection_name> collection"
218+
6. Back up the key in the shared password database by creating an entry titled "SMTP Key" in the collection folder and storing the credentials in this entry
219+
{{< /showIfParam >}}
220+
137221
## 8. Test the deployment
138222

139223
1. Via GitHub Actions:
140224
- Check that the `deploy` action completes successfully
141225

142226
2. Via local deployment:
227+
143228
```shell
144229
cd <collection_id>-declarations/deployment
145230
ansible-galaxy collection install --requirements-file requirements.yml
Lines changed: 84 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,84 @@
1+
---
2+
title: Deployment architecture
3+
linkTitle: Architecture
4+
weight: 1
5+
---
6+
7+
# Deployment architecture
8+
9+
This document provides an overview of the key components and elements involved in the deployment process of a collection.
10+
11+
## Repository structure
12+
13+
A collection is defined by three repositories that work together to manage and track terms.
14+
15+
The declarations repository, `<collection_name>-declarations`, serves as the primary workspace for collection maintainers, containing declarations of the terms to track along with engine and deployment configurations.
16+
17+
This repository is complemented by two automatically managed repositories:
18+
19+
- The versions repository, `<collection_name>-versions`, which maintains a chronological history of terms changes in their readable format
20+
- The snapshots repository, `<collection_name>-snapshots`, which maintains a chronological history of the original source document (HTML, PDF…) from which the terms will be extracted
21+
22+
These repositories must be considered as databases and are automatically updated by the engine whenever changes are detected in the tracked terms.
23+
24+
## Infrastructure
25+
26+
The server is where the Open Terms Archive engine runs.
27+
28+
The server requires administrative access to allow setting up the system in the appropriate state.
29+
30+
It has an Ed25519 SSH host key pair, `ssh_host_ed25519_key`, which provides a unique server fingerprint, `<server_ssh_fingerprint>`, for identity verification.
31+
32+
There is also a dedicated deployment user account, `<deployment_user>`, with passwordless sudo access to facilitate automated deployment tasks while maintaining security.
33+
34+
Process management is handled through [PM2](https://pm2.keymetrics.io/) and ensures the Open Terms Archive engine runs continuously and reliably.
35+
36+
The engine itself is the core application that performs the actual term tracking and repository management tasks.
37+
38+
## Security elements
39+
40+
### Authentication
41+
42+
Security is maintained through multiple layers of authentication.
43+
44+
The server's SSH host key pair, `ssh_host_ed25519_key`, generates a unique server fingerprint, `<server_ssh_fingerprint>`. This fingerprint verifies server identity and prevents man-in-the-middle attacks during deployment.
45+
46+
The deployment process uses a dedicated SSH key pair, `ota-deploy`, for secure server connections during the continuous deployment workflow.
47+
48+
A separate collection-specific SSH key pair, `<collection_name>-key`, enables the engine to perform GitHub actions as a bot user.
49+
50+
Access to GitHub repositories is controlled through a fine-grained access token, `OTA_ENGINE_GITHUB_TOKEN`, that provides specific permissions for repository management.
51+
52+
### Secret management
53+
54+
Sensitive information is protected by the [Ansible Vault](https://docs.ansible.com/ansible/latest/vault_guide/index.html) encryption system.
55+
56+
The vault system uses a master password, `vault.key` to encrypt and decrypt sensitive data. This includes the environment configuration file, `.env`, and the GitHub bot's private key, `github-bot-private-key`, ensuring that sensitive credentials remain secure while still being accessible to the deployment process.
57+
58+
## Automation tools
59+
60+
[GitHub Actions](https://docs.github.com/en/actions) and [Ansible](https://www.ansible.com/) automate the deployment process. GitHub Actions runs the workflow while Ansible configures the server and deploys the engine.
61+
62+
A dedicated GitHub user account is used for bot-related actions such as committing entries in versions and snapshots repositories, reporting issues when tracking fails, and publishing releases. This account is configured with specific permissions to perform these automated tasks.
63+
64+
The engine sends email notifications to collection administrators when errors or issues occur during the tracking process, enabling prompt intervention when needed.
65+
66+
The engine automatically creates issues in the declarations repository to notify collection maintainers when terms can no longer be tracked. These issues provide details about the tracking failure to allow maintainers to investigate and resolve the problem.
67+
68+
## Configuration files
69+
70+
The system's behavior is controlled through several key configuration files:
71+
72+
- `inventory.yml`: Defines server address and deployment parameters
73+
- `production.json`: Stores application-specific settings
74+
- `vault.key`: Protects sensitive data through encryption
75+
76+
## Maintenance
77+
78+
The Open Terms Archive system is designed for continuous operation with minimal intervention.
79+
80+
The engine automatically tracks changes in terms, commits updates to the appropriate repositories, reports issues and sends notifications when issues occur.
81+
82+
System health is maintained through PM2's process management capabilities.
83+
84+
Regular administrative maintenance involves updating collections dependencies such as engine and deployment recipes. It also includes monitoring email notifications and reviewing application logs in case of issues or tracking interruptions.

content/deployment/reference/server-specifications.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -1,6 +1,6 @@
11
---
22
title: Server specifications
3-
weight: 1
3+
weight: 2
44
---
55

66
# Server specifications

0 commit comments

Comments
 (0)