Skip to content

Commit cbd99c1

Browse files
committed
Create Blog “open-sourcing-workshops-on-demand-part4-managing-the-backend”
1 parent 500f788 commit cbd99c1

File tree

1 file changed

+228
-0
lines changed

1 file changed

+228
-0
lines changed
Lines changed: 228 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,228 @@
1+
---
2+
title: "Open Sourcing Workshops-on-Demand part4: Managing the Backend"
3+
date: 2023-03-03T13:27:23.312Z
4+
author: Frederic Passeron
5+
authorimage: /img/frederic-passeron-hpedev-192.jpg
6+
disable: false
7+
---
8+
9+
10+
#### Backend server management:
11+
12+
I will detail how one needs to manage and work with this server on a daily basis. What we usually call day 2 operation. If you take a look at the file structure of the wod-backend directory, you will discover that we did our best to sort things properly depending on their relationship to system or workshops.
13+
14+
##### Content of the backend server:
15+
16+
Simple tree view of the wod-backend directory:
17+
18+
![](/img/wod-blogserie2-tree4.png "Tree view of wod-backend directory")
19+
20+
The `ansible` folder contains all the necessary playbooks and variables files to support the main functions of the backend server. It provides playbooks for installation of the server or appliances, setup of the servers (backend, frontend, api-db),of appliances or workshops as well as maintenance tasks.
21+
22+
At the root of this directory can be found:
23+
24+
`Check*.yml playbooks`: These playbooks are used to perform checks on the different systems. These checks ensure that this a compliant WoD system by checking Firewall rules and many other things. We will see this a bit later in more details.
25+
26+
`Copy_folder.yml`: This is historically one of very first playbook we used and therefore a very important one. It performs the necessary actions to deploy, personnalize (by substituting ansible variables) the selected notebook to the appropriate student home folder.
27+
28+
`compile_scripts.yml`: Should you need to hide from the student a simple api call that is made on some private endpoint with non shareable data (credentials for instance), this playbook will make sure to compile it and create a executable file allowing it to happen.
29+
30+
`distrib.yml`: This playbook retrieves the distribution name and version from the machine it is run on.
31+
32+
`install_*.yml`: These playbooks take care of installing the necessary packages needed by the defined type (frontend, backend, api-db, base-system or even appliance)
33+
34+
`setup_*.ym`: There are several types of setup playbooks in this directory.
35+
36+
* `setup_WKSHP-*.yml`: These playbooks are responsible for preparing a base appliance for a given workshop by adding and configuring the necessary packages or services related to the workshop.
37+
* `setup_appliance.yml`: this playbook is used to perform the base setup for a JupyterHub environment server or appliance. It includes setup_base_appliance.yml playbook.
38+
* `setup_base_appliance`: takes care of setting the minimal requierements for an appliance. it includes therefore `install_base_system.yml` playbook. On top of it, it creates and configure the necessary users.
39+
* `setup_docker_based_appliance.yml`: Quite self explanatory ? it performs setup tasks to enable docker on a given appliance.
40+
41+
It also hosts the `inventory` file describing the role of jupyterhub servers. Place your jupyterhub machine (FQDN) in a group used as PBKDIR namerole
42+
43+
```shellsession
44+
#
45+
# Place to your jupyterhub machine (FQDN) in a group used as PBKDIR name
46+
#
47+
[production]
48+
127.0.0.1 ansible_connection=localhost
49+
```
50+
51+
The `conf` folder hosts configuration files in a jinja format. Once expanded, the resulting files will be used by relevant workshops. I will explain in a future article all the steps and requirements to create a workshop.
52+
53+
As part of the refacturing work to open source the project, we reaaranged the different scripts locations. We have created an install folder to handle the different installation scripts either from a Jupyterhub 's perpective or from an appliance 's standpoint too.
54+
55+
We separated the workshops related scripts from the pure system ones. When one creates a workshop, one needs to provide a series of notebooks and in some cases some scripts to manage the creation, setup of a related appliance along with additional scripts to manage its lifecycle in the overall workshops-on-Demand architecture (Create, Cleanup, Reset scripts at deployment or Cleanup times). These scripts need to be located in the script folder. On the other hand, the system scripts are located in the sys folder.
56+
57+
![](/img/tree-wkshop2.png "Tree view of the sys directory")
58+
59+
this directory hosts important configuration files for the system, for Jupyterhub. You can see for instance `fail2ban` configuration files. Some jinja templates are present here too. These templates will be expanded trough the deliver mechanism allowing the creation of files customized with ansible variables. All the wod related tasks are prefixed with wod for better understanding and ease of use.
60+
61+
They can refer to some Jupyterhub kernel needs like wod-build-evcxr.sh.j2 that aims at creating a script allowing the rust kernel installation. Some other templates are related to the system and Jupyterhub. `wod-kill-processes.pl.j2` has been created after discovering the harsh reality of online mining...In a ideal world, I would not have to explain further as the script would not be needed. Unfortunately, this is not the case, when one offers access to some harware freely online, he can expect sooner or later to see his original and pure idea to be highjacked...Let's say that you want to provide some AI/ML 101 type of workshops, you may consider providing servers with some GPUs. Any twisted minded cryptominer discovering your ressources will definitely think he hits the jackpot! This little anecdot actually happened to us and not only on GPU based servers, some regular servers got hit as well. We found out that performance on some servers became very poor and wehen looking into it, we found some scripts that were not supposed to be here and to run here...As a result, we implemented monitors to check load on our server and make sure that to kill any suspicious process before kicking out the misbehaving student.
62+
63+
`wod-test-action.sh.j2` is another interesting template that will create a script that we use for testing. This script mimics the procmail API and allows you to test the deployment of a workshop from the shell.
64+
65+
```shellsession
66+
wodadmin@server:/usr/local/bin$ ./wod-test-action.sh
67+
Syntax: wod-test-action.sh <CREATE|CLEANUP|RESET|PURGE|PDF|WORD> WKSHOP [MIN[,MAX]
68+
ACTION is mandatory
69+
wodadmin@server:/usr/local/bin$
70+
```
71+
72+
It requires the verb, the workshop's name and the student id. Using the script, one does not need to provide participant id. The script is run locally on the Jupyterhub server.
73+
74+
```shellsession
75+
wodadmin@server:/usr/local/bin$ ./wod-test-action.sh
76+
Syntax: wod-test-action.sh <CREATE|CLEANUP|RESET|PURGE|PDF|WORD> WKSHOP [MIN[,MAX]
77+
ACTION is mandatory
78+
wodadmin@server:/usr/local/bin$ ./wod-test-action.sh CREATE WKSHP-API101 121
79+
Action: CREATE
80+
We are working on WKSHP-API101
81+
Student range: 121
82+
Sending a mail to CREATE student 121 for workshop WKSHP-API101
83+
220 server.xyz.com ESMTP Postfix (Ubuntu)
84+
250 2.1.0 Ok
85+
250 2.1.5 Ok
86+
354 End data with <CR><LF>.<CR><LF>
87+
250 2.0.0 Ok: queued as 9749E15403AB
88+
221 2.0.0 Bye
89+
```
90+
91+
In order to retrieve the result of the script, you simply need to run a `tail` command
92+
93+
```shellsession
94+
wodadmin@server:~$ tail -100f .mail/from
95+
++ date
96+
....
97+
From [email protected] Fri Mar 3 09:08:35 2023
98+
Subject: CREATE 121 0
99+
Folder: /home/wodadmin/wod-backend/scripts/procmail-action.sh CREATE 11
100+
+ source /home/wodadmin/wod-backend/scripts/wod.sh
101+
....
102+
+ echo 'end of procmail-action for student 121 (passwd werty123) with workshop WKSHP-API101 with action CREATE at Fri Mar 3 09:11:39 UTC 2023'
103+
```
104+
105+
The very last line of the trace will provide you with the credentials necessary to test your workshop.
106+
107+
There are two types of activities that can occur on the backend server: punctual or regular. The punctual activity is one that is performed once every now and then. The regular one is usually set up on the backend server as a cron job. Sometimes however, one of these cron tasks can be forced manually if necessary. One the main scheduled task is the `deliver` task. I will explain it later on in this chapter. I will start now by explaining an important possible punctual task, the update of the backend server.
108+
109+
##### Update of the backend server:
110+
111+
The backend server hosts all the necessary content for delivering workshops: it implies notebooks and scripts and playbooks to deploy and personalize them. It also hosts some services that are needed by the overall architecture solution (Jupyterhub, Procmail, Fail2ban among others).
112+
113+
Services are installed once and for all at the installation time. These services may evolve over time. One may need to update the jupyterhub application to fix a bug or get new features. In the same fashion, you may consider bumping from one python version to a new major one. If you are willing to update these services or add additional ones, you will need to update the relevant installation playbooks in wod-backend/ansible directory.
114+
115+
here is a small extract of the `install_backend.yml` playbook: Full version [here](https://github.com/Workshops-on-Demand/wod-backend/blob/main/ansible/install_backend.yml)
116+
117+
```shellsession
118+
vi install_backend
119+
- hosts: all
120+
gather_facts: true
121+
vars:
122+
IJAVAVER: "1.3.0"
123+
KUBECTLVER: "1.21.6"
124+
125+
tasks:
126+
- name: Include variables for the underlying distribution
127+
include_vars: "{{ ANSIBLEDIR }}/group_vars/{{ ansible_distribution }}-{{ ansible_distribution_major_version }}.yml"
128+
129+
- name: Base setup for a JupyterHub environment server or appliance
130+
include_tasks: "{{ ANSIBLEDIR }}/setup_base_appliance.yml"
131+
132+
- name: Add CentOS SC repository into repo list
133+
become: yes
134+
become_user: root
135+
yum:
136+
name: centos-release-scl-rh
137+
state: present
138+
when:
139+
- ansible_distribution == "CentOS"
140+
- ansible_distribution_major_version >= "7"
141+
142+
- name: Add conda GPG Key to APT
143+
become: yes
144+
become_user: root
145+
apt_key:
146+
url: https://repo.anaconda.com/pkgs/misc/gpgkeys/anaconda.asc
147+
state: present
148+
when:
149+
- ansible_distribution == "Ubuntu"
150+
- ansible_distribution_major_version >= "20"
151+
152+
# TODO: Do it for EPEL if really needed
153+
- name: Add conda APT repository
154+
become: yes
155+
become_user: root
156+
apt_repository:
157+
repo: deb [arch=amd64] https://repo.anaconda.com/pkgs/misc/debrepo/conda stable main
158+
state: present
159+
when:
160+
- ansible_distribution == "Ubuntu"
161+
```
162+
163+
Possible Use Cases:
164+
165+
* Upgrade to a newer version of Jupyterhub
166+
* Add a new kernel to Jupyterhub
167+
* Add a new Ansible Galaxy collection
168+
* Add a new package needed by a workshop. For e.g:
169+
170+
* Kubectl client
171+
* Terraform client
172+
* PowerShell module
173+
* Python Library
174+
175+
You will start by moving to your public backend forked repository and apply the necessary changes before committing and push locally.
176+
177+
Then you will perform a merge request with the main repository. We plan to integrate here in a proper CICD (continuous integration continous development) pipeline to allow a vagrant based test deployment. Whenever someone performs a merge request on the main repo, the test deployment task kicks in and deploy a virtual backend server on which the new version of the installation process is automatically tested. When successful, the merge request is accepted. Once merged, you will need to move to your backend server and perform git remote update and git rebase on the wod-backend directory. Once done, you will then be able to perform the installation process.
178+
179+
##### Regular maintenance of the backend server:
180+
181+
On a daily basis, some tasks are launched to check the integrity of the backend server. Some tasks are related to the security integrity of the system. The following playbook is at the heart of this verification: **wod-backend/ansible/check_backend.yml**. Full version of the file is available [here](https://github.com/Workshops-on-Demand/wod-backend/blob/main/ansible/check_backend.yml) for review.
182+
183+
It checks a quite long list of items like:
184+
185+
* Wod System compliancy: is this really a wod system? by calling out [check_system.yml](https://github.com/Workshops-on-Demand/wod-backend/blob/main/ansible/check_system.yml) playbook
186+
187+
\ This first check includes:
188+
189+
* nproc hard and soft limits
190+
* nofile hard and soft limits
191+
* Setup sysctl params
192+
193+
* net.ipv4.tcp_keepalive_time, value: "1800"
194+
* kernel.threads-max, value: "4096000"
195+
* kernel.pid_max, value: "200000"
196+
* vm.max_map_count, value: "600000"
197+
* Setup UDP and TCP firewall rules
198+
* Enable services:
199+
200+
* Firewalld
201+
* Ntp
202+
* Student Management:
203+
204+
* Ensure limits are correct for students accounts
205+
* Copy the skeleton content under /etc/skel
206+
* Test .profile file
207+
* Ensure vim is the default EDITOR
208+
* Setup logind.conf
209+
* Manage /etc/hosts file
210+
* Install the pkg update script
211+
* Setup crontab for daily pkg security update
212+
* Deliver create/reset/setup scripts as ansible template for variable expansion
213+
* Install utility scripts
214+
* Deliver the system scripts (cleanup-processes.sh.j2)
215+
* Installation of the cleanup-processes script
216+
* Setup weekly cleanup processes task
217+
* Enable WoD service
218+
* Test private tasks YAML file
219+
* Call private tasks if available.We perform private part before users management to allow interruption of the deliver script during normal operations - waiting till end of users management can take hours for 2000 users. Potential impact: private scripts are run before users creation, so may miss some part of setup.
220+
* User Management:
221+
222+
* Remove existing jupyterhub users
223+
* Remove Linux users and their home directory
224+
* Ensure dedicated students groups exist
225+
* Ensure Linux students users exists with their home directory
226+
* Ensure jupyterhub students users exist
227+
* Setup ACL for students with jupyterhub account
228+
* Setup default ACL for students with jupyterhub account

0 commit comments

Comments
 (0)