Skip to content

Conversation

@khalford
Copy link
Member

I would suggest reviewers review commit-by-commit rather than the entire file diff

ChatOps services major deployment adds the following:

  • Systemd exporter play to configure systemd-exporter for all hosts
  • Add CAdvisor for ChatOps containers monitoring rather than the internal docker metrics endpoint
  • Add Elastic stack VM in Terraform
  • Add Elasticsearch play
  • Add Kibana play
  • Add Logstash play
  • Add filebeat to all hosts for all services to send their logs into logstash
  • Rename "loadbalancer" to "haproxy"
  • Add SSL to most services so we are using HTTPS interally
  • Switch to a persistent volume for Prometheus
  • Add (and not add) IRIS IAM authentication to Kibana and Elasticsearch
  • Various smaller changes
  • Starts work on the README.md

@khalford khalford self-assigned this May 16, 2025
@khalford khalford force-pushed the chatops_deployment_ansible_config branch 14 times, most recently from 463eb59 to 789c159 Compare May 23, 2025 08:35
@khalford khalford force-pushed the chatops_deployment_ansible_config branch 4 times, most recently from 21d59eb to b6096ba Compare June 3, 2025 09:30
khalford added 10 commits June 4, 2025 10:22
Adding a new VM to the Terraform configuration to host the Elastic "ELK"
stack. Also, creates security groups for the correct ports needed by
this service and attaches a volume to the VM to be used as persistent database
storage.
Downloads the Elasticsearch binaries and copies the systemd unit file to
run Elasticsearch as a service.
Adding a play to install and configure Kibana on the Elastic Stack host.
This sets up a systemd service. Also, adds kibana.domain.. to the
certbot certificate generation and adds a kibana backend to the HAProxy
config.
Install and set up logstash on the elastic stack host. Creates a systemd
service for logstash and adds the filter that adds extra fields into the
chatops logs.
Set up the Elastic, kibana and logstash user passwords to what we have
saved as variables. This is required for the first time set up but also
allows us to easily rotate the passwords if needed.
Adding the elastic stack system passwords to the monitoring vault
Install filebeat to all hosts and copy service specific filebeat config files to them. Filebeat will read the specified log files and push them to logstash
Small lint changes across all files such as triple dash file starting lines, substituting single to double quotes and fixing white space in braces
Lots of linting changes such as single to double quote conversions. And
renaming some tasks to make it clearer what they do. Also using the
known_hosts module instead of commands to acheive the same result but
with more support
EGI services are not currently working and attempting to update the apt
cache from the EGI repository is failing and blocking the playbook.
khalford added 24 commits June 4, 2025 10:22
Adds the config for enabling elasticsearch and kibana IRIS IAM authentication. The code is left commented out as this feature is not available for the basic license.
Creating a dedicated systemd exporter security group to add to machines.
If the deployer doesn't want to utilise systemd they can remove the
securty group from the VMs rather than manipulating the other security
groups
We must use shell for these commands as they use writing operations such
as ">" that are not supported by the command module
Using ansible lint with --fix to autocorrect minor changes. This
includes: triple dash file starting line, double quotes instead of
single quotes, single whitespaces in arrays and braces, and more
Force the restart of ChatOps containers when the play is run to make sure they reload their secrets and config files
Updating ChatOps version to latest and updating secrets`
This dev folder will contain variables for the development environment. This is how we will separate prod and dev variables
Other hosts than localhost need the terraform floating ip so we are moving it
Using the openstack plugin to create a dynamic inventory at runtime based off of metadata on the VM. To achieve this we add a service and env tag to the VM metadata allowing them to be put into groups. We can also set up multiple deployments now. Meaning we can run production and development at the same time.
Disabling gather facts on roles which don't need to gather facts greatly speeds up running the entire playbook. The Prometheus and Elastic roles still need to gather facts as they have to read facts from local files.
Since we have a dynamic inventory we can no longer create the inventory variables as we used to with jinja templating a hosts.ini file. So we need to create a separate facts file and copy it to the prometheus and elastic hosts
Now we have a dynamic inventory we access the IPs of hosts in a different way. Adding this lovely complicated maps everywhere to produce the same result. In the dynamic inventory the "inventory_hostname" variable is the hostname of the VM. Whereas when we were templating our own hosts.ini file we put in the IP addresses of the VMs. The "inventory_hostname" cannot be changed to IP address as the plugin allows it to be only the actual VM hostname or UUID
Adding variables and secrets for the production deployment.
We need to make this variable accessible to all groups so when we generate files such as the ssh keys and SSL certificates we can name them after the environment.

Also, renaming the variable to "env" as it is much shorter and gets referenced a lot
Renaming the dev deployment to "chatops-development" as we might be running two deployments in a single project and need a way to discern the VMs from each other
As we now have dev and prod environments we need to change the Slack app credentials here with the dev Slack app credentials
When using delegate to the jump host is using the local user (fed id) to log into the jump host. Setting remote_user here seems to fix it
As we need to generate certificates for distinct environments we must change the name of the SSL directory to a environment specific name so they don't use the same certificates. If we used the same certificates then when either deployment is compromised we would have to change certificates on both environments rather than one.
Changing the full Docker image path to be a variable. If the Docker registry or project or image tag change, it can all be changed from this variable rather than the playbook
Adding blank IRIS IAM credentials for kibana so the template still will parse jinja even though it's commented out
Root access required to make directories under /var/log and changing the permissions. This was missed when it was introduced
To ensure any person setting up their environment to run these playbooks will be able to install the correct collections.
@khalford khalford force-pushed the chatops_deployment_ansible_config branch from b6096ba to a08245e Compare June 4, 2025 09:22
khalford added 4 commits June 4, 2025 10:30
When handlers were introduced to Alertmanager and Prometheus they were
not indented correctly. Fixing this by de-indenting all notify
statements by 2 spaces.

Also fixes the Copy module not being able to create the parent to its
destination directory. We solve this by creating the service
(/opt/service) directory before moving the binaries. We then change the
unarchive module to check for the executable binary instead of the
folder itself. So it checks for /opt/service/service rather than for
/opt/service.
Since we have multiple SSH keys we need the environments prod and dev to point to the prod and dev ssh keys for the jump host
The Grafana variables are currently in the monitoring group vars. They are not needed by any other hosts in the monitoring group and so should be moved into their own group to limit the scope of where secrets are exposed
@khalford khalford closed this Jun 4, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants