Skip to content

Decision Engine integration test

Vito Di Benedetto edited this page Feb 17, 2022 · 25 revisions

⚠️ (Work in progress) ⚠️

Decision Engine installation

VM setup (FermiCloud)

On FermiCloud setup your VM and make sure to include in your setup "gwms-ports" Security Group. This will allow GlideinWMS to work properly with your VM. In case you already have a VM to repurpose for DE, the Security Group can still be added to an existing VM.

Decision Engine installation

The general installation guide for Decision Engine is available here.

Install and configure Decision Engine to run the integration test.

Below there are instructions to install and configure Decision Engine to run the integration test.

Decision engine uses a PostgreSQL database back-end and Redis as message broker and cache.

You need to install first PostgreSQL, Redis, and then the Decision engine framework (decisionengine) and install and add the standard channels (decisionengine_modules).

The following instructions assume a system installation, performed as root. decisionengine will run as the decisionengine user.

Install PostgreSQL

The default postgresql installed on RH7 is 9.2 which is outdated. Suggest to remove it and install 12 instead :

  1. Remove old postgresql
yum erase -y postgresql*
  1. Install postgresql 12
yum install -y https://download.postgresql.org/pub/repos/yum/reporpms/EL-7-x86_64/pgdg-redhat-repo-latest.noarch.rpm
yum install -y postgresql12 postgresql12-server
# optional, also: postgresql11-devel
  1. Enable postgresql
systemctl enable postgresql-12
  1. Init the database
/usr/pgsql-12/bin/postgresql-12-setup initdb
  1. edit /var/lib/pgsql/12/data/pg_hba.conf like the following:
[root@fermicloud371 ~]# diff  /var/lib/pgsql/12/data/pg_hba.conf~ /var/lib/pgsql/12/data/pg_hba.conf
80c80
< local   all             all                                     peer
---
> local   all             all                                     trust
82c82
< host    all             all             127.0.0.1/32            ident
---
> host    all             all             127.0.0.1/32            trust
84c84
< host    all             all             ::1/128                 ident
---
> host    all             all             ::1/128                 trust

This is setting the authentication method to trust.

  1. start the database
systemctl start postgresql-12
  1. create decisionengine
createdb -U postgres decisionengine

The schema and the connection will be created and configured during the Decision engine framework installation.

To use the database you have to add it to the environment

export PG_VERSION=12
export PATH="/usr/pgsql-${PG_VERSION}/bin:~/.local/bin:$PATH"

Install Decision Engine prerequisites

  1. Prerequisites setup. Make sure that the required yum repositories and some required packages (python3, gcc, ...) are installed and up to date.
yum install -y http://ftp.scientificlinux.org/linux/scientific/7x/repos/x86_64/yum-conf-softwarecollections-2.0-1.el7.noarch.rpm
yum install -y https://dl.fedoraproject.org/pub/epel/epel-release-latest-7.noarch.rpm
# gcc, swig and make are needed for dependencies (jsonnet)
yum -y install python3 python3-pip python3-setuptools python3-wheel \
    gcc gcc-c++ make \
    python3-devel swig openssl-devel git rpm-build
python3 -m pip install --upgrade --user pip
python3 -m pip install --upgrade --user setuptools wheel setuptools-scm[toml]

# To install the modules you will also need GlideinWMS Frontend, which is in the OSG repository.
# Assuming the use of OSG 3.5 that supports both GSI and tokens, here is a brief summary of the setup:
yum install -y yum-priorities
yum install -y https://repo.opensciencegrid.org/osg/3.5/osg-3.5-el7-release-latest.rpm
# HTCondor 8.9.x or 9.x, required by GlideinWMS, is in the osg-upcoming repository. It should be enabled to find the dependency
# GlideinWMS 3.9.x is in osg-contrib. The repository should be enabled to find the dependency
# In both the following files set: enabled=1
vi /etc/yum.repos.d/osg-upcoming.repo
vi /etc/yum.repos.d/osg-contrib.repo
# Change the Epel repository priority to make sure that comes after the OSG repositories, which are 98. Make sure that epel has:
priority=99
vi /etc/yum.repos.d/epel.repo

The complete version of the GlideinWMS installation instructions is available here

A minimal GlideinWMS installation for Decision Engine is the following:

yum --enablerepo=osg-development -y install glideinwms-vofrontend-libs glideinwms-vofrontend-glidein \
    voms-clients-cpp osg-ca-certs fetch-crl vo-client glideinwms-minimal-condor glideinwms-userschedd \
    glideinwms-usercollector glideinwms-vofrontend-core glideinwms-vofrontend-httpd httpd globus-proxy-utils

Install other utilities

yum -y install fermilab-util_kx509

Install Decision Engine and the standard modules

(Following instructions are for Decision Engine 2.0 pre-release)

Decision Engine RPMs are made available as release assets in GitHub.

RPMs for Decision Engine 2.0 pre-release:

  • Decision Engine RPM is available here
  • Decision Engine modules RPM is available here

Install Decision Engine RPMs

yum -y install decisionengine-* decisionengine_modules-*

decisionengine user setup

To complete the setup, it is necessary to install some Python dependencies.
To avoid to pollute the system Python we will install them for the decisionengine user who is running the service.

ksu decisionengine -e /bin/bash
python3 -m pip install --upgrade pip setuptools wheel --user
python3 -m pip install --user jsonnet==0.17.0
python3 -m pip install --user tabulate toposort structlog
python3 -m pip install --user "bill-calculator-hep>=0.1.4" "boto3>=1.17.10"
python3 -m pip install --user "gcs-oauth2-boto-plugin>=2.7" "google-api-python-client>=1.12.8"
python3 -m pip install --user "google_auth<2dev,>=1.16.0" "urllib3>=1.26.2"
python3 -m pip install --user wheel DBUtils sqlalchemy
python3 -m pip install --user pandas==1.1.5 numpy==1.19.5
python3 -m pip install --user "psycopg2-binary >= 2.8.6; platform_python_implementation == 'CPython'"
python3 -m pip install --user "psycopg2cffi >= 2.9.0; platform_python_implementation == 'PyPy'"
python3 -m pip install --user boto3 google_auth google-api-python-client
python3 -m pip install --user gcs-oauth2-boto-plugin
python3 -m pip install --user "cherrypy>=18.6.0" "kombu[redis]>=5.1.0" "prometheus-client>=0.10.0"

Setup proxies

  • Create the frontend proxy:
pushd /etc/grid-security/
grid-proxy-init -cert hostcert.pem -key hostkey.pem -valid 999:0 -out /var/tmp/fe_proxy
popd
  • Create the user proxy:
export X509_USER_PROXY=/var/tmp/vo_proxy
kinit <user> #this is needed if you don't have your kerberos ticker forwarded
kx509
voms-proxy-init -rfc -dont-verify-ac -noregen -voms fermilab -valid 120:0
  • Set proper permission and ownership for the proxies (as root user):
chmod 600 /var/tmp/vo_proxy /var/tmp/fe_proxy
chown decisionengine: /var/tmp/vo_proxy /var/tmp/fe_proxy

Decision Engine configuration

Configuration templates are available in the contrib repository.

Files from the folder decisionengine go in the directory /etc/decisionengine.
There are two file to edit, job_classification.libsonnet and glideinwms.libsonnet, there is a placeholder @FERMICLOUDNODE@ that needs to be replaced with the fermicloud host name.

The condor_mapfile template from the folder condor goes in /etc/condor/certs/condor_mapfile.
This template also has placeholders that needs to be replaced with the fermicloud host DN and the user DN.
To make sure to have the right DNs, they can be retrieved with:

  • the host DN
openssl x509 -noout -subject -in  /var/tmp/fe_proxy | cut -c 10- | sed -re 's#/CN=[0-9]{8,10}##g'
  • the user DN
openssl x509 -noout -subject -in  /var/tmp/vo_proxy | cut -c 10- | sed -re 's#/CN=[0-9]{8,10}##g'

Start required services

In order to run Decision Engine, it is needed to start some services

systemctl start httpd
systemctl start condor
systemctl start fetch-crl-cron
systemctl start fetch-crl-boot

Setup Redis

Install and start the message broker (Redis) as pod container.

yum install podman
podman run --name decisionengine-redis -p 127.0.0.1:6379:6379 -d redis:6 --loglevel warning

More details about Redis are available in this redis document.

Create GWMS frontend configuration

For this step we need to run:

chown -R decisionengine: /var/lib/gwms-frontend
systemctl start decisionengine
ksu decisionengine -e /usr/bin/python3 /usr/lib/python3.6/site-packages/decisionengine_modules/glideinwms/configure_gwms_frontend.py

This command will create the file /var/lib/gwms-frontend/vofrontend/de_frontend_config

At this point we need to stop decisionengine service and Redis

systemctl stop decisionengine
podman exec -it decisionengine-redis redis-cli FLUSHALL
podman stop decisionengine-redis

Now all should be ready to run Decision Engine.

Run Decision Engine integration test

The procedure to run Decision Engine is as follow:

  • Reset decisionengine DB:
dropdb -U postgres decisionengine
createdb -U postgres decisionengine
  • Clean up Redis and start it:
podman start decisionengine-redis
podman exec -it decisionengine-redis redis-cli FLUSHALL
podman restart decisionengine-redis
  • Start decisionengine service and check its status:
systemctl start decisionengine
sleep 5
systemctl status decisionengine

Submit a test job

If decisionengine service is running we can submit test jobs.

  • Make sure the channel is STEADY
ksu decisionengine -e /bin/bash
de-client --status
# or
de-client --status | grep -i state
  • prepare a Condor submission file mytest.submit with the following content:
#  A test Condor submission file - mytest.submit
executable = /bin/hostname
universe = vanilla
+DESIRED_Sites = "ITB_FC_CE2"
log = test.log
output = test.out.$(Cluster).$(Process)
error = test.err.$(Cluster).$(Process)
queue 1
  • submit the test job
condor_submit mytest.submit
  • check jobs in the queue
condor_q
  • check for available glideins
condor_status

after test jobs are submitted it will take few minutes (usually no more than 10 minutes) to get some glideins and then the job running.

Clone this wiki locally