Skip to content
Merged
Show file tree
Hide file tree
Changes from 15 commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
2 changes: 1 addition & 1 deletion config/_default/menus.en.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -36,7 +36,7 @@ footer:
- name: About
url: https://opentermsarchive.org/en/about/
weight: 7
- name: Contact-us
- name: Contact us
identifier: mailto
weight: 8
footer_sub:
Expand Down
2 changes: 1 addition & 1 deletion content/collections/how-to/take-over.md
Original file line number Diff line number Diff line change
@@ -1,6 +1,6 @@
---
title: Take over a collection
weight: 4
weight: 5
---

# How to take over a collection
Expand Down
2 changes: 1 addition & 1 deletion content/collections/how-to/terminate.md
Original file line number Diff line number Diff line change
@@ -1,6 +1,6 @@
---
title: Terminate a collection
weight: 3
weight: 4
---

# How to terminate a collection
Expand Down
191 changes: 138 additions & 53 deletions content/deployment/how-to/deploy.md
Original file line number Diff line number Diff line change
Expand Up @@ -5,141 +5,226 @@ weight: 1

# How to deploy a collection

This guide will help you deploy an Open Terms Archive collection to a server.
This guide will help you deploy an Open Terms Archive collection to a server. The deployment is automated using [Ansible](https://docs.ansible.com/ansible/latest/index.html) and will set up the Open Terms Archive engine and configure it to track your collection's terms.

## Prerequisites

Before starting, ensure you have:

- A basic understanding of the [deployment architecture]({{< relref "deployment/reference/architecture" >}})
- A server with admin access
- All collections repositories created, if not, see the [guide to create repositories]({{< relref "collections/how-to/create-repositories" >}})
- At least one declaration added to your collection
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
- At least one declaration added to your collection
- At least one declaration in your collection (if you created your declaration from the Demo template, one is provided by default)

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

When we use the demo template, all declarations are removed by the first time setup process.

- A GitHub user account to automate actions such as committing entries in versions and snapshots repositories, reporting issues when tracking fails, publishing releases…
- [Ansible](https://docs.ansible.com/ansible/latest/installation_guide/intro_installation.html) installed on your local machine

## 1. Configure the server

First, ensure your server provides unsupervised access:

1. Check the SSH host key:
1. Check the SSH host key and get the SSH fingerprint by running the following command on your local machine:

```shell
ssh-keyscan --type=ed25519 <server_address>
ssh-keyscan -t ed25519 <server_address>
```
If no Ed25519 key appears, generate one on the server:

If no Ed25519 key appears, generate one by running the following commands on the server:

```shell
sudo ssh-keygen --type=ed25519 --file=/etc/ssh/ssh_host_ed25519_key
sudo ssh-keygen -t ed25519 -f /etc/ssh/ssh_host_ed25519_key
sudo systemctl restart ssh
```

2. Create a non-root user if needed:
> **Note**: A server fingerprint is a unique identifier for your server's SSH key. It helps verify that you're connecting to the correct server and not a malicious one. The fingerprint is a hash of the server's public key and is used to prevent man-in-the-middle attacks. You'll need this fingerprint in the next steps for secure deployment.

2. Create a dedicated user account specifically for deployment purposes, by running the following commands on the server:

```shell
adduser <user>
usermod --append --groups=sudo <user>
adduser <deployment_user>
usermod --append --groups=sudo <deployment_user>
```

3. Grant passwordless sudo access:
> **Note**: The `adduser` command might not be installed by default on your system. It can be installed with `sudo apt-get install adduser`.

3. Configure passwordless sudo access for this user, by adding the following line to the `/etc/sudoers` file on the server:

```shell
# Add to /etc/sudoers:
<user> ALL=(ALL) NOPASSWD:ALL
<deployment_user> ALL=(ALL) NOPASSWD:ALL
```

> **Note**: While passwordless sudo access does reduce security compared to requiring a password, it is required for full automation in deployment workflows with Ansible. The deployment process requires system-level operations (like installing packages and configuring services) that must be executed without manual intervention. To mitigate security risks, this configuration is limited to a dedicated deployment user that should only be used for deployment purposes, and the server must be properly secured with SSH key authentication.

## 2. Set up the deployment configuration

1. Clone the collection declarations repository:
1. Clone the collection declarations repository that you want to deploy and navigate to the collection folder:

```shell
git clone https://github.com/OpenTermsArchive/<collection_id>-declarations.git
git clone https://github.com/<organization>/<collection_id>-declarations.git
cd <collection_id>-declarations
```

2. Configure the inventory file `deployment/inventory.yml`:
2. Configure the inventory file `deployment/inventory.yml` with your server's IP address, deployment user, server fingerprint and the repository URL:

```yaml
<host>: "your.server.ip"
ansible_user: "your_username"
ed25519_fingerprint: "your_ssh_fingerprint"
<server_ip>:
ansible_user: <deployment_user>
ed25519_fingerprint: <server_ssh_fingerprint>
ota_source_repository: https://github.com/<organization>/<collection_id>-declarations.git
```

3. Add the server fingerprint to GitHub:
- Go to `https://github.com/OpenTermsArchive/<collection_name>-declarations/settings/secrets/actions`
3. Add the server fingerprint to GitHub, to allow the deployment workflow to uniquely identify the server:
- Go to `https://github.com/<organization>/<collection_id>-declarations/settings/secrets/actions`
- Create a new secret named `SERVER_FINGERPRINT` with your Ed25519 fingerprint

## 3. Configure SSH deployment keys

1. On the server, generate a deployment key:
1. On the server, generate a deployment key, which will be used by the continuous deployment workflow to connect to the server to deploy the collection:

```shell
ssh-keygen --type=ed25519 --quiet --passphrase="" --file=~/.ssh/ota-deploy
ssh-keygen -t ed25519 -N "" -f ~/.ssh/ota-deploy
cat ~/.ssh/ota-deploy.pub >> ~/.ssh/authorized_keys
```

2. Add the private key to GitHub:
- Go to the repository secrets
- Create `SERVER_SSH_KEY` with the private key content
2. Add the private key to GitHub, to allow the deployment workflow to connect to the server:
- Go to `https://github.com/<organization>/<collection_id>-declarations/settings/secrets/actions`
- Create a new secret named `SERVER_SSH_KEY` with the private key content

3. Back up the keys:
- Store both public and private keys in the shared password database
- Create an entry titled "Deployment SSH key" in the collection folder
{{< showIfParam "ota" >}}
3. Back up the keys in the shared password database by creating an entry titled "Deployment SSH Key" in the collection folder and storing both public and private keys in this entry
{{< /showIfParam >}}

## 4. Set up GitHub permissions

1. Create a fine-grained GitHub token:
- Log in as OTA-Bot
1. Log in as the user account dedicated to bot-related actions in GitHub

2. Create a fine-grained GitHub token:
- Create a new token at github.com/settings/personal-access-tokens/new
- Set repository access for both declarations and versions repos
- Set repository access for both declarations and versions repositories
- Grant "Contents" and "Issues" write permissions

2. Back up the token:
- Store it in the shared password database under "GitHub Token"
3. If relevant, get the token approved by having an organization admin approve the token request

3. Get the token approved:
- Have an organization admin approve the token request
4. Keep this token for the next steps

## 5. Configure secrets
{{< showIfParam "ota" >}}
5. Back up the token in the shared password database by creating an entry titled "GitHub Token" in the collection folder and storing the token in this entry
{{< /showIfParam >}}

## 5. Configure and encrypt secrets

This section uses [Ansible Vault](https://docs.ansible.com/ansible/latest/vault_guide/index.html), a feature of Ansible that allows you to encrypt sensitive data like passwords and keys. The encrypted files can be safely committed to version control while keeping the actual secrets secure. The vault key you'll create will be used to encrypt and decrypt these secrets.

1. Generate and store a vault key:
- Generate a secure password without quotes/backticks
- Store it in the password database
- Create `deployment/vault.key` with the password
- Add it as `ANSIBLE_VAULT_KEY` in GitHub secrets
- Inside the collection folder, create a file named `deployment/vault.key` and paste the generated password into it.
- Go to `https://github.com/<organization>/<collection_id>-declarations/settings/secrets/actions`
- Create a new secret named `ANSIBLE_VAULT_KEY` and paste the same password into it.

2. Store GitHub token:
```
# In deployment/.env:
> **Note**: The same vault key is used in two places:
> - Locally as `vault.key` to encrypt/decrypt files during development
> - In GitHub Actions as `ANSIBLE_VAULT_KEY` to decrypt files during automated deployment

2. Store the GitHub token, generated in the previous section, in `deployment/.env`:

```shell
OTA_ENGINE_GITHUB_TOKEN=your_token
```

3. Encrypt the `.env` file:
3. Encrypt the `.env` file by running the following command inside the `deployment` folder of the collection:

```shell
ansible-vault encrypt .env
```

> **Note**: Running the command from the `deployment` folder will ensure that the `vault.key` file is used as vault key, since this folder contains an `ansible.cfg` file that explicitly configures this behavior.
>
> To decrypt an encrypted file, use:
>
> ```shell
> ansible-vault decrypt deployment/.env
> ```
>
> After making changes, re-encrypt it:
>
> ```shell
> ansible-vault encrypt deployment/.env
> ```

4. Commit the changes to the repository

{{< showIfParam "ota" >}}
5. Back up the vault key in the shared password database by creating an entry titled "Vault Key" in the collection folder and storing the vault key in this entry
{{< /showIfParam >}}

## 6. Set up collection-specific SSH key

1. Generate a new key:
1. Generate a new key, which will be used by the Open Terms Archive engine to perform actions on GitHub as the bot user:

```shell
ssh-keygen --type=ed25519 --comment=[email protected] --passphrase="" --file=./<collection_name>-key
ssh-keygen -t ed25519 -C [email protected] -N "" -f ./<collection_name>-key
```

2. Encrypt and store the private key:
2. Store the private key in `deployment/github-bot-private-key`

3. Encrypt the private key file by running the following command inside the `deployment` folder of the collection:

```shell
# Copy private key to deployment/github-bot-private-key
ansible-vault encrypt github-bot-private-key
```

3. Add the public key to OTA-Bot's GitHub account:
4. Commit the changes to the repository

5. Add the public key to bot user's GitHub account:
- Go to github.com/settings/ssh/new
- Add the public key with title "<collection_name> collection"

{{< showIfParam "ota" >}}
6. Back up the key in the shared password database by creating an entry titled "OTA-Bot GitHub SSH key" in the collection folder and storing both public and private keys in this entry
{{< /showIfParam >}}

## 7. Configure email notifications

1. Generate SMTP credentials:
- Create a new SMTP key in Brevo
- Name it "<collection_name> collection"
This section describes how to configure the engine to use a specific SMTP server to send email notifications when it encounters errors during the tracking process. This helps you stay informed about issues that need attention and allows you to restart the tracking process if needed.

1. Get the SMTP credentials (host, username, password) from your email provider

2. Update collection SMTP configuration within the `logger` key of `@opentermsarchive/engine` in the `config/production.json` file:

```json
"logger": {
"smtp": {
"host": "<smtp_host>",
"username": "<smtp_username>"
},
},
```

3. Store the password in `deployment/.env`:

2. Store the credentials:
```shell
# In deployment/.env:
OTA_ENGINE_SMTP_PASSWORD=your_smtp_key
```

3. Encrypt the `.env` file:
> **Note**: To decrypt the file encrypted in a previous step in order to add the password, run `ansible-vault decrypt .env`

4. Encrypt the `.env` file:

```shell
ansible-vault encrypt .env
```

{{< showIfParam "ota" >}}
5. Create a new SMTP key in Brevo and name it "<collection_name> collection"
6. Back up the key in the shared password database by creating an entry titled "SMTP Key" in the collection folder and storing the credentials in this entry
{{< /showIfParam >}}

## 8. Test the deployment

1. Via GitHub Actions:
- Check that the `deploy` action completes successfully

2. Via local deployment:

```shell
cd <collection_id>-declarations/deployment
ansible-galaxy collection install --requirements-file requirements.yml
Expand Down
84 changes: 84 additions & 0 deletions content/deployment/reference/architecture.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,84 @@
---
title: Deployment architecture
linkTitle: Architecture
weight: 1
---

# Deployment architecture

This document provides an overview of the key components and elements involved in the deployment process of a collection.

## Repository structure

A collection is defined by three repositories that work together to manage and track terms.

The declarations repository, `<collection_name>-declarations`, serves as the primary workspace for collection maintainers, containing declarations of the terms to track along with engine and deployment configurations.

This repository is complemented by two automatically managed repositories:

- The versions repository, `<collection_name>-versions`, which maintains a chronological history of terms changes in their readable format
- The snapshots repository, `<collection_name>-snapshots`, which maintains a chronological history of the original source document (HTML, PDF…) from which the terms will be extracted

These repositories must be considered as databases and are automatically updated by the engine whenever changes are detected in the tracked terms.

## Infrastructure

The server is where the Open Terms Archive engine runs.

The server requires administrative access to allow setting up the system in the appropriate state.

It has an Ed25519 SSH host key pair, `ssh_host_ed25519_key`, which provides a unique server fingerprint, `<server_ssh_fingerprint>`, for identity verification.

There is also a dedicated deployment user account, `<deployment_user>`, with passwordless sudo access to facilitate automated deployment tasks while maintaining security.

Process management is handled through [PM2](https://pm2.keymetrics.io/) and ensures the Open Terms Archive engine runs continuously and reliably.

The engine itself is the core application that performs the actual term tracking and repository management tasks.

## Security elements

### Authentication

Security is maintained through multiple layers of authentication.

The server's SSH host key pair, `ssh_host_ed25519_key`, generates a unique server fingerprint, `<server_ssh_fingerprint>`. This fingerprint verifies server identity and prevents man-in-the-middle attacks during deployment.

The deployment process uses a dedicated SSH key pair, `ota-deploy`, for secure server connections during the continuous deployment workflow.

A separate collection-specific SSH key pair, `<collection_name>-key`, enables the engine to perform GitHub actions as a bot user.

Access to GitHub repositories is controlled through a fine-grained access token, `OTA_ENGINE_GITHUB_TOKEN`, that provides specific permissions for repository management.

### Secret management

Sensitive information is protected by the [Ansible Vault](https://docs.ansible.com/ansible/latest/vault_guide/index.html) encryption system.

The vault system uses a master password, `vault.key` to encrypt and decrypt sensitive data. This includes the environment configuration file, `.env`, and the GitHub bot's private key, `github-bot-private-key`, ensuring that sensitive credentials remain secure while still being accessible to the deployment process.

## Automation tools

[GitHub Actions](https://docs.github.com/en/actions) and [Ansible](https://www.ansible.com/) automate the deployment process. GitHub Actions runs the workflow while Ansible configures the server and deploys the engine.

A dedicated GitHub user account is used for bot-related actions such as committing entries in versions and snapshots repositories, reporting issues when tracking fails, and publishing releases. This account is configured with specific permissions to perform these automated tasks.

The engine sends email notifications to collection administrators when errors or issues occur during the tracking process, enabling prompt intervention when needed.

The engine automatically creates issues in the declarations repository to notify collection maintainers when terms can no longer be tracked. These issues provide details about the tracking failure to allow maintainers to investigate and resolve the problem.

## Configuration files

The system's behavior is controlled through several key configuration files:

- `inventory.yml`: Defines server address and deployment parameters
- `production.json`: Stores application-specific settings
- `vault.key`: Protects sensitive data through encryption

## Maintenance

The Open Terms Archive system is designed for continuous operation with minimal intervention.

The engine automatically tracks changes in terms, commits updates to the appropriate repositories, reports issues and sends notifications when issues occur.

System health is maintained through PM2's process management capabilities.

Regular administrative maintenance involves updating collections dependencies such as engine and deployment recipes. It also includes monitoring email notifications and reviewing application logs in case of issues or tracking interruptions.
2 changes: 1 addition & 1 deletion content/deployment/reference/server-specifications.md
Original file line number Diff line number Diff line change
@@ -1,6 +1,6 @@
---
title: Server specifications
weight: 1
weight: 2
---

# Server specifications
Expand Down
Loading