|
10 | 10 | ---
|
11 | 11 | In previous articles of this series dedicated to the [open sourcing of our Workshops-on-Demand project](https://developer.hpe.com/blog/willing-to-build-up-your-own-workshops-on-demand-infrastructure/), I covered the reasons why we open sourced the project and how we did it. I also explained in details how you could install your own Workshops-on-Demand backend server. I also took the time to detail the automation that was hosted on this backend server. Today, I plan to describe to you the management of this backend server. This is what is often referred to as Day2 operations.
|
12 | 12 |
|
13 |
| -Once up and running, the main purpose of the backend server is to deliver workshops-on-Demand. But to do so, it may require updates, upgrades, and/or new kernels for the Jupyterhub server. If new workshops are created, this means you'll need new jinja templates for related workshops' scripts (i.e `create<WKSHP>.sh`, `cleanup<WKSHP>.sh`, `reset<WKSHP>.sh`, among others). This also means new variable files. And obviously, these templates and variables will need to be taken into account by scripts and notebooks. Some tasks handle all of this. And that's what I'll show now. |
| 13 | +Once up and running, the main purpose of the backend server is to deliver workshops-on-Demand. But to do so, it may require updates, upgrades, and/or new kernels for the JupyterHub server. If new workshops are created, this means you'll need new jinja templates for related workshops' scripts (i.e `create<WKSHP>.sh`, `cleanup<WKSHP>.sh`, `reset<WKSHP>.sh`, among others). This also means new variable files. And obviously, these templates and variables will need to be taken into account by scripts and notebooks. Some tasks handle all of this. And that's what I'll show now. |
14 | 14 |
|
15 | 15 | #### Backend server management:
|
16 | 16 |
|
@@ -63,7 +63,7 @@ We separated the workshops' related scripts from the system ones. When one creat
|
63 | 63 |
|
64 | 64 | This directory hosts important configuration files for both the system and JupyterHub. You can see for instance `fail2ban` configuration files. Some Jinja templates are present here, too. These templates will be expanded through the `deliver` mechanism allowing the creation of files customized with Ansible variables. All the wod related tasks are prefixed with wod for better understanding and ease of use.
|
65 | 65 |
|
66 |
| -These Jinja templates can refer to some Jupyterhub kernel needs like `wod-build-evcxr.sh.j2` that aims at creating a script allowing the rust kernel installation. Some other templates are related to the system and JupyterHub. `wod-kill-processes.pl.j2` has been created after discovering the harsh reality of online mining. In a ideal world, I would not have to explain further as the script would not be needed. Unfortunately, this is not the case. When one offers access to some hardware freely online, sooner or later, he can expect to see his original idea to be hyjacked. |
| 66 | +These Jinja templates can refer to some JupyterHub kernel needs like `wod-build-evcxr.sh.j2` that aims at creating a script allowing the rust kernel installation. Some other templates are related to the system and JupyterHub. `wod-kill-processes.pl.j2` has been created after discovering the harsh reality of online mining. In a ideal world, I would not have to explain further as the script would not be needed. Unfortunately, this is not the case. When one offers access to some hardware freely online, sooner or later, he can expect to see his original idea to be hyjacked. |
67 | 67 |
|
68 | 68 | Let's say that you want to provide some AI/ML 101 type of workshops. As part of it,
|
69 | 69 | you may consider providing servers with some GPUs. Any twisted minded cryptominer discovering your resources will definitely think he's hits the jackpot! This little anecdot actually happened to us and not only on GPU based servers, some regular servers got hit as well. We found out that performance on some servers became very poor and when looking into it, we found some scripts that were not supposed to run there. As a result, we implemented monitors to check the load on our servers and made sure that to kill any suspicious processes before kicking out the misbehaving student.
|
@@ -205,29 +205,29 @@ This first check includes:
|
205 | 205 | * kernel.threads-max, value: "4096000"
|
206 | 206 | * kernel.pid_max, value: "200000"
|
207 | 207 | * vm.max_map_count, value: "600000"
|
208 |
| -* Setup UDP and TCP firewall rules |
| 208 | + * Setup UDP and TCP firewall rules |
209 | 209 | * Enable services:
|
210 | 210 |
|
211 | 211 | * Firewalld
|
212 | 212 | * Ntp
|
213 | 213 | * Student Management:
|
214 | 214 |
|
215 | 215 | * Ensure limits are correct for students accounts
|
216 |
| -* Copy the skeleton content under /etc/skel |
217 |
| -* Test `.profile` file |
218 |
| -* Ensure vim is the default EDITOR |
219 |
| -* Setup `logind.conf` |
220 |
| -* Manage `/etc/hosts` file |
221 |
| -* Install the pkg update script |
222 |
| -* Setup `crontab` for daily pkg security update |
223 |
| -* Deliver create/reset/setup scripts as ansible template for variable expansion |
224 |
| -* Install utility scripts |
225 |
| -* Deliver the system scripts (`cleanup-processes.sh.j2`) |
226 |
| -* Installation of the cleanup-processes script |
227 |
| -* Setup weekly cleanup processes task |
228 |
| -* Enable WoD service |
229 |
| -* Test private tasks YAML file |
230 |
| -* Call private tasks if available. It performs the private part before users management to allow interruption of the deliver script during normal operations - waiting till end of users management can take hours for 2000 users. Potential impact: private scripts are run before users creation, so may miss some part of setup. |
| 216 | + * Copy the skeleton content under /etc/skel |
| 217 | + * Test `.profile` file |
| 218 | + * Ensure vim is the default EDITOR |
| 219 | + * Setup `logind.conf` |
| 220 | + * Manage `/etc/hosts` file |
| 221 | + * Install the pkg update script |
| 222 | + * Setup `crontab` for daily pkg security update |
| 223 | + * Deliver create/reset/setup scripts as ansible template for variable expansion |
| 224 | + * Install utility scripts |
| 225 | + * Deliver the system scripts (`cleanup-processes.sh.j2`) |
| 226 | + * Installation of the cleanup-processes script |
| 227 | + * Setup weekly cleanup processes task |
| 228 | + * Enable WoD service |
| 229 | + * Test private tasks YAML file |
| 230 | + * Call private tasks if available. It performs the private part before users management to allow interruption of the deliver script during normal operations - waiting till end of users management can take hours for 2000 users. Potential impact: private scripts are run before users creation, so may miss some part of setup. |
231 | 231 | * User Management:
|
232 | 232 |
|
233 | 233 | * Remove existing JupyterHub users
|
|
0 commit comments