-
Notifications
You must be signed in to change notification settings - Fork 28
How to Run Decision Engine
To run decisionengine from the development tree or install it via pip please see https://github.com/HEPCloud/decisionengine/blob/master/DEVELOPMENT.md
The official release installation documents are on https://hepcloud.github.io/decisionengine/.
Here are instructions for the new Decision Engine 2.0.x installation procedures.
Dependency RPMs are available in the Fermilab YUM repo or can be built with the provided script. These will install Postgres and other requirements.
Decision Engine uses Postgresql database back-end, installed as a requirement.
These instructions are to be executed by root for a system installation.
- Prerequisites setup. Make sure that the required yum repositories and some required packages (python3, gcc, ...) are installed and up to date.
# Possible OSG versions: 24, 23, 24-upcoming
OSG_VERSION=24
# YUM repo for Decision Engine
GWMS_REPO=osg-development
dnf install -y epel-release yum-utils sed
dnf config-manager --set-enabled crb
/bin/sed -i '/^enabled=1/a priority=99' /etc/yum.repos.d/epel.repo
dnf -y install "https://repo.osg-htc.org/osg/$OSG_VERSION-main/osg-$OSG_VERSION-main-el9-release-latest.rpm"
- Setup the decision engine yum repositories
wget -O /etc/yum.repos.d/ssi-hepcloud.repo http://ssi-rpm.fnal.gov/hep/ssi-hepcloud.repo
wget -O /etc/yum.repos.d/ssi-hepcloud-dev.repo http://ssi-rpm.fnal.gov/hep/ssi-hepcloud-dev.repoNote the above repos are only accessible within Fermilab. There is an alternative place on github to get the RPMs if you are off-site.
- Install the decision engine (add
--enablerepo=ssi-hepcloud-devfor the latest development version)
DE_REPO=ssi-hepcloud-dev
dnf install -y --enablerepo="$DE_REPO" decisionengine-onenode
# Individual packages are: decisionengine-deps (framework req) decisionengine-modules-deps (modules req) decisionengine-standalone (2 deps+httpd)- Install the required Python packages (these are taken from setup.py)
decisionengine-install-python
# This shell script (included in decisionengine-deps) installs the Decision Engine Python code.
# You can run it as root or as the decisionengine user
# To see all the options:
# decisionengine-install-python --help
# Double check that pip added $HOME/.local/bin to the PATH of user decisionengine- Start and enable HTCondor and httpd
systemctl start condor
systemctl enable condor
systemctl start httpd
systemctl enable httpd- Optionally install these extra packages
# htgettoken - if you need it to generate SciTokens
dnf -y install htgettokenWe will make HEPCloud's Decision Engine independent from the Frontend and only use some GlideinWMS libraries. At the moment though, the codebases are still intertwined, so there are some adjustments needed to the GlideinWMS installation.
Create the condor password file and change to decisionengine the ownership of the frontend directories:
# Create or copy the FRONTEND condor password file
# The password file depends on the GlideinWMS version you use. Before 3.10.11 and in 3.11.0, the password file must be in the
# Frontend directory: /var/lib/gwms-frontend/passwords.d/FRONTEND, replace name and path in the instructions below
# Instructions for Decision Engine version >2.0.6 and GlideinWMS versions >=3.10.11 or >=3.11.1
# You need to know the Decision Engine user name and home directory
DE_USER=decisionengine
DE_HOME=/var/lib/decisionengine
pass_fname=${DE_USER^^}
# You can change the subject to recognize where the Glideins are coming from, e.g. "decisionengine_itb@${fqdn_hostname}"
openssl rand -base64 64 | /usr/sbin/condor_store_cred -u "decisionengine@${fqdn_hostname}" -f "/etc/condor/passwords.d/${pass_fname}" add > /dev/null 2>&1
/bin/cp /etc/condor/passwords.d/${pass_fname} $DE_HOME/passwords.d/${pass_fname}
chown $DE_USER: $DE_HOME/passwords.d/${pass_fname}
# The permission of $DE_HOME/passwords.d/${pass_fname} should be 0600
# Make sure the decisionengine user ($DE_USER) is in the glidein group
chown -R $DE_USER: /etc/gwms-frontend
Instructions for GlideinWMS versions <=3.10.10 and 3.11.0
# Create or copy the FRONTEND condor password file
openssl rand -base64 64 | /usr/sbin/condor_store_cred -u "frontend@${fqdn_hostname}" -f "/etc/condor/passwords.d/FRONTEND" add > /dev/null 2>&1
/bin/cp /etc/condor/passwords.d/FRONTEND /var/lib/gwms-frontend/passwords.d/FRONTEND
chown -R decisionengine: /var/lib/gwms-frontend # This includes: chown decisionengine: /var/lib/gwms-frontend/passwords.d/FRONTEND
chown -R decisionengine: /etc/gwms-frontend
# The permission of /var/lib/gwms-frontend/passwords.d/FRONTEND should be 0600Postgresql is installed by the requirements RPM (PostgreSQL 13):
- Enable postgresql
systemctl enable postgresql- Init PostgreSQL db
postgresql-setup --initdb- edit
/var/lib/pgsql/data/pg_hba.confto set the authentication method totrust, e.g.:
[root@dehost ~]# diff /var/lib/pgsql/data/pg_hba.conf~ /var/lib/pgsql/data/pg_hba.conf
80c80
< local all all peer
---
> local all all trust
82c82
< host all all 127.0.0.1/32 ident
---
> host all all 127.0.0.1/32 trust
84c84
< host all all ::1/128 ident
---
> host all all ::1/128 trust(difference of the correct file from the default one - pg_hba.conf~)
- Fix the PostgreSQL installation. Not sure why, but the run directory was missing and causing the startup to fail.
# Without this the systemctl start was failing and the error was in /var/lib/pgsql/data/log/postgresql-*.log
mkdir -p /var/run/postgresql
chown postgres: /var/run/postgresql- Start postgresqle
systemctl start postgresql- create decisionengine
createdb -U postgres decisionengineThe schema and the connection will be created and configured during the Decision Engine framework initialization.
RHEL also provides other PostgreSQL versions via streams. These may require changes to environment variables like PG_VERSION and PATH to use the database.
- Start message broker (redis). See https://github.com/HEPCloud/decisionengine/blob/master/doc/source/redis.rst. In short:
dnf rm iptables-legacy
dnf install iptables-nft
podman run --name decisionengine-redis -p 127.0.0.1:6379:6379 -d redis:6 --loglevel warning
# pick the docker.io/library registryTo run decisionengine you must be the decisionengine user (su -s /bin/bash - decisionengine).
decisionengine --help should print the help message
The default configuration file lives in /etc/decisionengine/decision_engine.jsonnet.
A number of defaults are set for you.
Each datasource has its own unique schema and cannot be used with a different datasource.
The SQLAlchemy Data Source is setup with a config like:
"datasource": {
"module": "decisionengine.framework.dataspace.datasources.sqlalchemy_ds",
"name": "SQLAlchemyDS",
"config": {
"url": "postgresql://{db_user}:{db_password}@{db_host}:{db_port}/{db_dbname}",
}
}Any extra keywords you can pass to the sqlalchemy.engine.Engine constructor may be set under config.
SQLAlchemy will create any tablespace objects it requires automatically.
Start everything else needed (you may have started some already):
systemctl start httpd
systemctl enable httpd
systemctl start condor
systemctl enable condor
systemctl start postgresql
systemctl enable postgresqlStart the service
systemctl start decisionengine
# or
su -s /bin/bash - decisionengine
export PATH="~/.local/bin:$PATH"
decisionengine --no-webserver
Decision engine decision cycles happen in channels.
You can add channels by adding configuration files in /etc/decisionengine/config.d/ and restarting the decision engine.
Here is a simple test channel configuration. This test channel is using some NOP classes currently defined in the unit tests and not distributed. First, you need to copy these classes from the Git repository. You :
cd YOUR_decisionengine_REPO
# OR download the files from GitHub
mkdir /tmp/derepo
cd /tmp/derepo
wget https://github.com/HEPCloud/decisionengine/archive/refs/heads/master.zip
unzip master.zip
cd decisionengine-master
# Now copy the files
cp -r src/decisionengine/framework/tests /lib/python3.6/site-packages/decisionengine/framework/Then, add the channel by placing this in /etc/decisionengine/config.d/test_channel.jsonnet:
{
sources: {
source1: {
module: "decisionengine.framework.tests.SourceNOP",
parameters: {},
schedule: 1,
}
},
transforms: {
transform1: {
module: "decisionengine.framework.tests.TransformNOP",
parameters: {},
schedule: 1
}
},
logicengines: {
le1: {
module: "decisionengine.framework.logicengine.LogicEngine",
parameters: {
facts: {
pass_all: "True"
},
rules: {
r1: {
expression: 'pass_all',
actions: ['publisher1']
}
}
}
}
},
publishers: {
publisher1: {
module: "decisionengine.framework.tests.PublisherNOP",
parameters: {}
}
}
}
Restart decision engine to start the new channel
systemctl restart decisionenginede-client --status should sho the active test channel