Chatops deployment ansible config #214

khalford · 2025-05-16T10:29:48Z

I would suggest reviewers review commit-by-commit rather than the entire file diff

ChatOps services major deployment adds the following:

Systemd exporter play to configure systemd-exporter for all hosts
Add CAdvisor for ChatOps containers monitoring rather than the internal docker metrics endpoint
Add Elastic stack VM in Terraform
Add Elasticsearch play
Add Kibana play
Add Logstash play
Add filebeat to all hosts for all services to send their logs into logstash
Rename "loadbalancer" to "haproxy"
Add SSL to most services so we are using HTTPS interally
Switch to a persistent volume for Prometheus
Add (and not add) IRIS IAM authentication to Kibana and Elasticsearch
Various smaller changes
Starts work on the README.md

Adding a new VM to the Terraform configuration to host the Elastic "ELK" stack. Also, creates security groups for the correct ports needed by this service and attaches a volume to the VM to be used as persistent database storage.

Downloads the Elasticsearch binaries and copies the systemd unit file to run Elasticsearch as a service.

Adding a play to install and configure Kibana on the Elastic Stack host. This sets up a systemd service. Also, adds kibana.domain.. to the certbot certificate generation and adds a kibana backend to the HAProxy config.

Install and set up logstash on the elastic stack host. Creates a systemd service for logstash and adds the filter that adds extra fields into the chatops logs.

Set up the Elastic, kibana and logstash user passwords to what we have saved as variables. This is required for the first time set up but also allows us to easily rotate the passwords if needed.

Adding the elastic stack system passwords to the monitoring vault

Install filebeat to all hosts and copy service specific filebeat config files to them. Filebeat will read the specified log files and push them to logstash

Small lint changes across all files such as triple dash file starting lines, substituting single to double quotes and fixing white space in braces

Lots of linting changes such as single to double quote conversions. And renaming some tasks to make it clearer what they do. Also using the known_hosts module instead of commands to acheive the same result but with more support

EGI services are not currently working and attempting to update the apt cache from the EGI repository is failing and blocking the playbook.

Adds the config for enabling elasticsearch and kibana IRIS IAM authentication. The code is left commented out as this feature is not available for the basic license.

Creating a dedicated systemd exporter security group to add to machines. If the deployer doesn't want to utilise systemd they can remove the securty group from the VMs rather than manipulating the other security groups

We must use shell for these commands as they use writing operations such as ">" that are not supported by the command module

Using ansible lint with --fix to autocorrect minor changes. This includes: triple dash file starting line, double quotes instead of single quotes, single whitespaces in arrays and braces, and more

Force the restart of ChatOps containers when the play is run to make sure they reload their secrets and config files

Updating ChatOps version to latest and updating secrets`

This dev folder will contain variables for the development environment. This is how we will separate prod and dev variables

Other hosts than localhost need the terraform floating ip so we are moving it

Using the openstack plugin to create a dynamic inventory at runtime based off of metadata on the VM. To achieve this we add a service and env tag to the VM metadata allowing them to be put into groups. We can also set up multiple deployments now. Meaning we can run production and development at the same time.

Disabling gather facts on roles which don't need to gather facts greatly speeds up running the entire playbook. The Prometheus and Elastic roles still need to gather facts as they have to read facts from local files.

Since we have a dynamic inventory we can no longer create the inventory variables as we used to with jinja templating a hosts.ini file. So we need to create a separate facts file and copy it to the prometheus and elastic hosts

Now we have a dynamic inventory we access the IPs of hosts in a different way. Adding this lovely complicated maps everywhere to produce the same result. In the dynamic inventory the "inventory_hostname" variable is the hostname of the VM. Whereas when we were templating our own hosts.ini file we put in the IP addresses of the VMs. The "inventory_hostname" cannot be changed to IP address as the plugin allows it to be only the actual VM hostname or UUID

Adding variables and secrets for the production deployment.

We need to make this variable accessible to all groups so when we generate files such as the ssh keys and SSL certificates we can name them after the environment. Also, renaming the variable to "env" as it is much shorter and gets referenced a lot

Renaming the dev deployment to "chatops-development" as we might be running two deployments in a single project and need a way to discern the VMs from each other

As we now have dev and prod environments we need to change the Slack app credentials here with the dev Slack app credentials

When using delegate to the jump host is using the local user (fed id) to log into the jump host. Setting remote_user here seems to fix it

As we need to generate certificates for distinct environments we must change the name of the SSL directory to a environment specific name so they don't use the same certificates. If we used the same certificates then when either deployment is compromised we would have to change certificates on both environments rather than one.

Changing the full Docker image path to be a variable. If the Docker registry or project or image tag change, it can all be changed from this variable rather than the playbook

Adding blank IRIS IAM credentials for kibana so the template still will parse jinja even though it's commented out

Root access required to make directories under /var/log and changing the permissions. This was missed when it was introduced

To ensure any person setting up their environment to run these playbooks will be able to install the correct collections.

When handlers were introduced to Alertmanager and Prometheus they were not indented correctly. Fixing this by de-indenting all notify statements by 2 spaces. Also fixes the Copy module not being able to create the parent to its destination directory. We solve this by creating the service (/opt/service) directory before moving the binaries. We then change the unarchive module to check for the executable binary instead of the folder itself. So it checks for /opt/service/service rather than for /opt/service.

Since we have multiple SSH keys we need the environments prod and dev to point to the prod and dev ssh keys for the jump host

The Grafana variables are currently in the monitoring group vars. They are not needed by any other hosts in the monitoring group and so should be moved into their own group to limit the scope of where secrets are exposed

khalford self-assigned this May 16, 2025

khalford added the deployment label May 16, 2025

khalford force-pushed the chatops_deployment_ansible_config branch 14 times, most recently from 463eb59 to 789c159 Compare May 23, 2025 08:35

khalford force-pushed the chatops_deployment_ansible_config branch 4 times, most recently from 21d59eb to b6096ba Compare June 3, 2025 09:30

khalford added 10 commits June 4, 2025 10:22

ENH: Add an Elastic stack VM to Terraform config

00dff9a

Adding a new VM to the Terraform configuration to host the Elastic "ELK" stack. Also, creates security groups for the correct ports needed by this service and attaches a volume to the VM to be used as persistent database storage.

ENH: Add Elasticsearch config

0ecbf5b

Downloads the Elasticsearch binaries and copies the systemd unit file to run Elasticsearch as a service.

ENH: Add Kibana configuration

8371d25

Adding a play to install and configure Kibana on the Elastic Stack host. This sets up a systemd service. Also, adds kibana.domain.. to the certbot certificate generation and adds a kibana backend to the HAProxy config.

ENH: Add logstash configuration

374c4e1

Install and set up logstash on the elastic stack host. Creates a systemd service for logstash and adds the filter that adds extra fields into the chatops logs.

ENH: Add a play to set up the system user password

40126da

Set up the Elastic, kibana and logstash user passwords to what we have saved as variables. This is required for the first time set up but also allows us to easily rotate the passwords if needed.

MAINT: Update elastic stack passwords in vault

6743cc6

Adding the elastic stack system passwords to the monitoring vault

ENH: Add filebeat to all hosts

6b3a7a3

Install filebeat to all hosts and copy service specific filebeat config files to them. Filebeat will read the specified log files and push them to logstash

LINT: Using ansible-lint --fix to make small lint changes

789c40f

Small lint changes across all files such as triple dash file starting lines, substituting single to double quotes and fixing white space in braces

MAINT: Linting and task changes for ssh_known_hosts role

2b1ced8

Lots of linting changes such as single to double quote conversions. And renaming some tasks to make it clearer what they do. Also using the known_hosts module instead of commands to acheive the same result but with more support

TMP: Remove EGI repository from all machines

8447331

EGI services are not currently working and attempting to update the apt cache from the EGI repository is failing and blocking the playbook.

khalford added 24 commits June 4, 2025 10:22

ENH: Add IRIS IAM authentication to kibana and elasticsearch

8c01514

Adds the config for enabling elasticsearch and kibana IRIS IAM authentication. The code is left commented out as this feature is not available for the basic license.

MAINT: Create a dedicated systemd-exporter sec group

7625ac4

Creating a dedicated systemd exporter security group to add to machines. If the deployer doesn't want to utilise systemd they can remove the securty group from the VMs rather than manipulating the other security groups

BUG: Switching back to shell

e77923f

We must use shell for these commands as they use writing operations such as ">" that are not supported by the command module

LINT: Run ansible lint --fix for the first time

c499b46

Using ansible lint with --fix to autocorrect minor changes. This includes: triple dash file starting line, double quotes instead of single quotes, single whitespaces in arrays and braces, and more

TMP: DOC README

6d2e86f

MAINT: Force restart of ChatOps containers

baa61a4

Force the restart of ChatOps containers when the play is run to make sure they reload their secrets and config files

MAINT: Update ChatOps version and varibles

a894f54

Updating ChatOps version to latest and updating secrets`

MAINT: Move all variables into dev folder

7230958

This dev folder will contain variables for the development environment. This is how we will separate prod and dev variables

MAINT: Move terraform floating IP into all vars as other hosts need it

747a7c8

Other hosts than localhost need the terraform floating ip so we are moving it

MAINT: Disable gather facts on plays

3114d1b

Disabling gather facts on roles which don't need to gather facts greatly speeds up running the entire playbook. The Prometheus and Elastic roles still need to gather facts as they have to read facts from local files.

ENH: Create a facts file for the prometheus and elastic VM

4aee652

Since we have a dynamic inventory we can no longer create the inventory variables as we used to with jinja templating a hosts.ini file. So we need to create a separate facts file and copy it to the prometheus and elastic hosts

ENH: Add production environment of group_vars

9beebda

Adding variables and secrets for the production deployment.

MAINT: Rename the dev deployment

04503c5

Renaming the dev deployment to "chatops-development" as we might be running two deployments in a single project and need a way to discern the VMs from each other

MAINT: Update ChatOps secrets

12f4749

As we now have dev and prod environments we need to change the Slack app credentials here with the dev Slack app credentials

BUG: Specify remote user for delegated task

7f405f8

When using delegate to the jump host is using the local user (fed id) to log into the jump host. Setting remote_user here seems to fix it

ENH: Set ChatOps image from variable

c35337f

Changing the full Docker image path to be a variable. If the Docker registry or project or image tag change, it can all be changed from this variable rather than the playbook

BUG: Add blank IRIS IAM credentials

6418f49

Adding blank IRIS IAM credentials for kibana so the template still will parse jinja even though it's commented out

BUG: Become root for making the log directories

6ba8b57

Root access required to make directories under /var/log and changing the permissions. This was missed when it was introduced

DOC: Add an Ansible requirements file

4abc8f8

To ensure any person setting up their environment to run these playbooks will be able to install the correct collections.

TMP: Update docs

a08245e

khalford force-pushed the chatops_deployment_ansible_config branch from b6096ba to a08245e Compare June 4, 2025 09:22

khalford added 4 commits June 4, 2025 10:30

Update docs

1504bda

BUG: Fix environments pointing to the wrong ssh key

d7a31f8

Since we have multiple SSH keys we need the environments prod and dev to point to the prod and dev ssh keys for the jump host

MAINT: Move Grafana variables into grafana group vars

d19184a

The Grafana variables are currently in the monitoring group vars. They are not needed by any other hosts in the monitoring group and so should be moved into their own group to limit the scope of where secrets are exposed

khalford closed this Jun 4, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Chatops deployment ansible config #214

Chatops deployment ansible config #214

Uh oh!

khalford commented May 16, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Chatops deployment ansible config #214

Chatops deployment ansible config #214

Uh oh!

Conversation

khalford commented May 16, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants