Skip to content

How to Run Decision Engine

Vito Di Benedetto edited this page Jul 16, 2025 · 49 revisions

To run decisionengine from the development tree or install it via pip please see https://github.com/HEPCloud/decisionengine/blob/master/DEVELOPMENT.md

The official release installation documents are on https://hepcloud.github.io/decisionengine/.

Here are instructions for the new Decision Engine 2.0.x installation procedures.

Dependency RPMs are available in the Fermilab YUM repo or can be built with the provided script. These will install Postgres and other requirements.

Decision Engine installation

Decision Engine uses Postgresql database back-end, installed as a requirement.

Install Decision Engine and the standard modules

These instructions are to be executed by root for a system installation.

  1. Prerequisites setup. Make sure that the required yum repositories and some required packages (python3, gcc, ...) are installed and up to date.
# Possible OSG versions: 24, 23, 24-upcoming
OSG_VERSION=24
# YUM repo for Decision Engine
GWMS_REPO=osg-development
dnf install -y epel-release yum-utils sed
dnf config-manager --set-enabled crb
/bin/sed -i '/^enabled=1/a priority=99' /etc/yum.repos.d/epel.repo
dnf -y install "https://repo.osg-htc.org/osg/$OSG_VERSION-main/osg-$OSG_VERSION-main-el9-release-latest.rpm"
  1. Setup the decision engine yum repositories
wget -O /etc/yum.repos.d/ssi-hepcloud.repo http://ssi-rpm.fnal.gov/hep/ssi-hepcloud.repo
wget -O /etc/yum.repos.d/ssi-hepcloud-dev.repo http://ssi-rpm.fnal.gov/hep/ssi-hepcloud-dev.repo

Note the above repos are only accessible within Fermilab. There is an alternative place on github to get the RPMs if you are off-site.

  1. Install the decision engine (add --enablerepo=ssi-hepcloud-dev for the latest development version)
DE_REPO=ssi-hepcloud-dev
dnf install -y --enablerepo="$DE_REPO" decisionengine-onenode
# Individual packages are: decisionengine-deps (framework req) decisionengine-modules-deps (modules req) decisionengine-standalone (2 deps+httpd)
  1. Install the required Python packages (these are taken from setup.py)
decisionengine-install-python
# This shell script (included in decisionengine-deps) installs the Decision Engine Python code.
# You can run it as root or as the decisionengine user
# To see all the options:
# decisionengine-install-python --help 

# Double check that pip added $HOME/.local/bin to the PATH of user decisionengine
  1. Start and enable HTCondor and httpd
systemctl start condor
systemctl enable condor

systemctl start httpd
systemctl enable httpd
  1. Optionally install these extra packages
# htgettoken - if you need it to generate SciTokens
dnf -y install htgettoken

Fix the GlideinWMS Frontend installation

We will make HEPCloud's Decision Engine independent from the Frontend and only use some GlideinWMS libraries. At the moment though, the codebases are still intertwined, so there are some adjustments needed to the GlideinWMS installation.

Create the condor password file and change to decisionengine the ownership of the frontend directories:

# Create or copy the FRONTEND condor password file
# The password file depends on the GlideinWMS version you use. Before 3.10.11 and in 3.11.0, the password file must be in the
# Frontend directory: /var/lib/gwms-frontend/passwords.d/FRONTEND, replace name and path in the instructions below
# Instructions for Decision Engine version >2.0.6 and GlideinWMS versions >=3.10.11 or >=3.11.1
# You need to know the Decision Engine user name and home directory
DE_USER=decisionengine
DE_HOME=/var/lib/decisionengine 
pass_fname=${DE_USER^^}
# You can change the subject to recognize where the Glideins are coming from, e.g. "decisionengine_itb@${fqdn_hostname}"
openssl rand -base64 64 | /usr/sbin/condor_store_cred -u "decisionengine@${fqdn_hostname}" -f "/etc/condor/passwords.d/${pass_fname}" add > /dev/null 2>&1
/bin/cp /etc/condor/passwords.d/${pass_fname} $DE_HOME/passwords.d/${pass_fname}
chown $DE_USER: $DE_HOME/passwords.d/${pass_fname}
# The permission of $DE_HOME/passwords.d/${pass_fname} should be 0600
# Make sure the decisionengine user ($DE_USER) is in the glidein group
chown -R $DE_USER: /etc/gwms-frontend

Instructions for GlideinWMS versions <=3.10.10 and 3.11.0
# Create or copy the FRONTEND condor password file
openssl rand -base64 64 | /usr/sbin/condor_store_cred -u "frontend@${fqdn_hostname}" -f "/etc/condor/passwords.d/FRONTEND" add > /dev/null 2>&1
/bin/cp /etc/condor/passwords.d/FRONTEND /var/lib/gwms-frontend/passwords.d/FRONTEND
chown -R decisionengine: /var/lib/gwms-frontend   # This includes: chown decisionengine: /var/lib/gwms-frontend/passwords.d/FRONTEND
chown -R decisionengine: /etc/gwms-frontend
# The permission of /var/lib/gwms-frontend/passwords.d/FRONTEND should be 0600

Set up PostgreSQL and Redis

Postgresql is installed by the requirements RPM (PostgreSQL 13):

  1. Enable postgresql
systemctl enable postgresql
  1. Init PostgreSQL db
postgresql-setup --initdb
  1. edit /var/lib/pgsql/data/pg_hba.conf to set the authentication method to trust, e.g.:
[root@dehost ~]# diff  /var/lib/pgsql/data/pg_hba.conf~ /var/lib/pgsql/data/pg_hba.conf 
80c80
< local   all             all                                     peer
---
> local   all             all                                     trust
82c82
< host    all             all             127.0.0.1/32            ident
---
> host    all             all             127.0.0.1/32            trust
84c84
< host    all             all             ::1/128                 ident
---
> host    all             all             ::1/128                 trust

(difference of the correct file from the default one - pg_hba.conf~)

  1. Fix the PostgreSQL installation. Not sure why, but the run directory was missing and causing the startup to fail.
# Without this the systemctl start was failing and the error was in /var/lib/pgsql/data/log/postgresql-*.log
mkdir -p /var/run/postgresql
chown postgres: /var/run/postgresql
  1. Start postgresqle
systemctl start postgresql
  1. create decisionengine
createdb -U postgres decisionengine

The schema and the connection will be created and configured during the Decision Engine framework initialization.

RHEL also provides other PostgreSQL versions via streams. These may require changes to environment variables like PG_VERSION and PATH to use the database.

  1. Start message broker (redis). See https://github.com/HEPCloud/decisionengine/blob/master/doc/source/redis.rst. In short:
dnf rm iptables-legacy
dnf install iptables-nft
podman run --name decisionengine-redis -p 127.0.0.1:6379:6379 -d redis:6 --loglevel warning
# pick the docker.io/library registry

Test

To run decisionengine you must be the decisionengine user (su -s /bin/bash - decisionengine). decisionengine --help should print the help message

Configure decisionengine

The default configuration file lives in /etc/decisionengine/decision_engine.jsonnet.

A number of defaults are set for you.

Each datasource has its own unique schema and cannot be used with a different datasource.

The SQLAlchemy Data Source

The SQLAlchemy Data Source is setup with a config like:

	"datasource": {
	  "module": "decisionengine.framework.dataspace.datasources.sqlalchemy_ds",
	  "name": "SQLAlchemyDS",
	  "config": {
		"url": "postgresql://{db_user}:{db_password}@{db_host}:{db_port}/{db_dbname}",
		}
	  }

Any extra keywords you can pass to the sqlalchemy.engine.Engine constructor may be set under config.

SQLAlchemy will create any tablespace objects it requires automatically.

Start decision engine

Start everything else needed (you may have started some already):

systemctl start httpd
systemctl enable httpd
systemctl start condor
systemctl enable condor
systemctl start postgresql
systemctl enable postgresql

Start the service

systemctl start decisionengine
# or 
su -s /bin/bash - decisionengine
export PATH="~/.local/bin:$PATH"
decisionengine --no-webserver

Add channels to decision engine

Decision engine decision cycles happen in channels. You can add channels by adding configuration files in /etc/decisionengine/config.d/ and restarting the decision engine.

Here is a simple test channel configuration. This test channel is using some NOP classes currently defined in the unit tests and not distributed. First, you need to copy these classes from the Git repository. You :

cd YOUR_decisionengine_REPO
# OR download the files from GitHub
mkdir /tmp/derepo
cd /tmp/derepo
wget https://github.com/HEPCloud/decisionengine/archive/refs/heads/master.zip
unzip master.zip
cd decisionengine-master
# Now copy the files
cp -r src/decisionengine/framework/tests /lib/python3.6/site-packages/decisionengine/framework/

Then, add the channel by placing this in /etc/decisionengine/config.d/test_channel.jsonnet:

{
  sources: {
    source1: {
      module: "decisionengine.framework.tests.SourceNOP",
      parameters: {},
      schedule: 1,
    }
  },
  transforms: {
    transform1: {
      module: "decisionengine.framework.tests.TransformNOP",
      parameters: {},
      schedule: 1
    }
  },
  logicengines: {
    le1: {
      module: "decisionengine.framework.logicengine.LogicEngine",
      parameters: {
        facts: {
          pass_all: "True"
        },
        rules: {
          r1: {
            expression: 'pass_all',
            actions: ['publisher1']
          }
        }
      }
    }
  },
  publishers: {
    publisher1: {
      module: "decisionengine.framework.tests.PublisherNOP",
      parameters: {}
    }
  }
}

Restart decision engine to start the new channel

systemctl restart decisionengine

de-client --status should sho the active test channel

Clone this wiki locally