Skip to content

How to Run Decision Engine

Marco Mambelli edited this page Feb 27, 2025 · 49 revisions

To run decisionengine from the development tree or install it via pip please see https://github.com/HEPCloud/decisionengine/blob/master/DEVELOPMENT.md

The official release installation documents are on https://hepcloud.github.io/decisionengine/.

Here are instructions for the new Decision Engine 2.0.x installation procedures.

Dependency RPMs are available in the Fermilab YUM repo or can be built with the provided script. These will install Postgres and other requirements.

Decision Engine installation

Decision Engine uses Postgresql database back-end, installed as a requirement.

Install Decision Engine and the standard modules

These instructions are to be executed by root for a system installation.

  1. Prerequisites setup. Make sure that the required yum repositories and some required packages (python3, gcc, ...) are installed and up to date.
# Possible OSG versions: 24, 23, 24-upcoming
OSG_VERSION=24
# YUM repo for Decision Engine
GWMS_REPO=osg-development
dnf install -y epel-release yum-utils sed
dnf config-manager --set-enabled crb
/bin/sed -i '/^enabled=1/a priority=99' /etc/yum.repos.d/epel.repo
dnf -y install "https://repo.osg-htc.org/osg/$OSG_VERSION-main/osg-$OSG_VERSION-main-el9-release-latest.rpm"
  1. Setup the decision engine yum repositories
wget -O /etc/yum.repos.d/ssi-hepcloud.repo http://ssi-rpm.fnal.gov/hep/ssi-hepcloud.repo
wget -O /etc/yum.repos.d/ssi-hepcloud-dev.repo http://ssi-rpm.fnal.gov/hep/ssi-hepcloud-dev.repo

Note the above repos are only accessible within Fermilab. There is an alternative place on github to get the RPMs if you are off-site.

  1. Install the decision engine (add --enablerepo=ssi-hepcloud-dev for the latest development version)
DE_REPO=ssi-hepcloud-dev
dnf install -y --enablerepo="$DE_REPO" decisionengine-onenode
# Individual packages are: decisionengine-deps (framework req) decisionengine-modules-deps (modules req) decisionengine-standalone (2 deps+httpd)
  1. Install the required Python packages (these are taken from setup.py)
decisionengine-install-python
# This shell script (included in decisionengine-deps) installs the Decision Engine Python code.
# You can run it as root or as the decisionengine user
# To see all the options:
# decisionengine-install-python --help 

# Double check that pip added $HOME/.local/bin to the PATH of user decisionengine

# The script performs the following - NO NEED TO RUN THESE COMMANDS IF YOU USED THE SCRIPT
# Update pip
su -s /bin/bash -c 'pip install --upgrade --prefix=/usr pip' - decisionengine
su -s /bin/bash -c 'pip install --upgrade --prefix=/usr setuptools wheel setuptools-scm[toml] package' - decisionengine
# Install the Python modules via pip
su -s /bin/bash -c 'pip install git+https://github.com/HEPCloud/decisionengine.git' - decisionengine
su -s /bin/bash -c 'pip install git+https://github.com/HEPCloud/decisionengine_modules.git' - decisionengine
  1. Start and enable HTCondor
systemctl start condor
systemctl enable condor

Fix the GlideinWMS Frontend instllation

We will make HEPCloud's Decision Engine using some GlideinWMS libraries but independent from the Frontend. The codebases, though, are still intertwined, so there are some adjustments needed to the GlideinWMS installation.

Create the condor password and change to decisionengine the ownership of the frontend directories:

# Create or copy the FRONTEND condor password file
# If POOL is not there, do start condor (systemctl start condor)
pushd  /etc/condor/passwords.d/
cp POOL FRONTEND
cp FRONTEND /var/lib/gwms-frontend/passwords.d/
popd
chown -R decisionengine: /var/lib/gwms-frontend
chown -R decisionengine: /etc/gwms-frontend
# The permission of /var/lib/gwms-frontend/passwords.d/FRONTEND should be 0600

Set up PostgreSQL and Redis

Postgresql is installed by the requirements RPM (PostgreSQL 13):

  1. Enable postgresql
systemctl enable postgresql
  1. Init PostgreSQL db
postgresql-setup --initdb
  1. edit /var/lib/pgsql/data/pg_hba.conf to set the authentication method to trust, e.g.:
[root@dehost ~]# diff  /var/lib/pgsql/data/pg_hba.conf~ /var/lib/pgsql/data/pg_hba.conf 
80c80
< local   all             all                                     peer
---
> local   all             all                                     trust
82c82
< host    all             all             127.0.0.1/32            ident
---
> host    all             all             127.0.0.1/32            trust
84c84
< host    all             all             ::1/128                 ident
---
> host    all             all             ::1/128                 trust

(difference of the correct file from the default one - pg_hba.conf~)

  1. Fix the PostgreSQL installation. Not sure why, but the run directory was missing and causing the startup to fail.
# Without this the systemctl start was failing and the error was in /var/lib/pgsql/data/log/postgresql-*.log
mkdir -p /var/run/postgresql
chown postgres: /var/run/postgresql
  1. Start postgresqle
systemctl start postgresql
  1. create decisionengine
createdb -U postgres decisionengine

The schema and the connection will be created and configured during the Decision Engine framework initialization.

RHEL also provides other PostgreSQL versions via streams. These may require changes to environment variables like PG_VERSION and PATH to use the database.

  1. Start message broker (redis). See https://github.com/HEPCloud/decisionengine/blob/master/doc/source/redis.rst. In short:
yum rm iptables-legacy
yum install iptables-nft
podman run --name decisionengine-redis -p 127.0.0.1:6379:6379 -d redis:6 --loglevel warning
# pick the docker.io/library registry

Test

To run decisionengine you must be the decisionengine user (su -s /bin/bash - decisionengine). decisionengine --help should print the help message

Configure decisionengine

The default configuration file lives in /etc/decisionengine/decision_engine.jsonnet.

A number of defaults are set for you.

Each datasource has its own unique schema and cannot be used with a different datasource.

The SQLAlchemy Data Source

The SQLAlchemy Data Source is setup with a config like:

	"datasource": {
	  "module": "decisionengine.framework.dataspace.datasources.sqlalchemy_ds",
	  "name": "SQLAlchemyDS",
	  "config": {
		"url": "postgresql://{db_user}:{db_password}@{db_host}:{db_port}/{db_dbname}",
		}
	  }

Any extra keywords you can pass to the sqlalchemy.engine.Engine constructor may be set under config.

SQLAlchemy will create any tablespace objects it requires automatically.

Start decision engine

Start the service

systemctl start decisionengine
# or 
su -s /bin/bash - decisionengine
export PATH="~/.local/bin:$PATH"
decisionengine --no-webserver

Add channels to decision engine

Decision engine decision cycles happen in channels. You can add channels by adding configuration files in /etc/decisionengine/config.d/ and restarting the decision engine.

Here is a simple test channel configuration. This test channel is using some NOP classes currently defined in the unit tests and not distributed. First, you need to copy these classes from the Git repository. You :

cd YOUR_decisionengine_REPO
# OR download the files from GitHub
mkdir /tmp/derepo
cd /tmp/derepo
wget https://github.com/HEPCloud/decisionengine/archive/refs/heads/master.zip
unzip master.zip
cd decisionengine-master
# Now copy the files
cp -r src/decisionengine/framework/tests /lib/python3.6/site-packages/decisionengine/framework/

Then, add the channel by placing this in /etc/decisionengine/config.d/test_channel.jsonnet:

{
  sources: {
    source1: {
      module: "decisionengine.framework.tests.SourceNOP",
      parameters: {},
      schedule: 1,
    }
  },
  transforms: {
    transform1: {
      module: "decisionengine.framework.tests.TransformNOP",
      parameters: {},
      schedule: 1
    }
  },
  logicengines: {
    le1: {
      module: "decisionengine.framework.logicengine.LogicEngine",
      parameters: {
        facts: {
          pass_all: "True"
        },
        rules: {
          r1: {
            expression: 'pass_all',
            actions: ['publisher1']
          }
        }
      }
    }
  },
  publishers: {
    publisher1: {
      module: "decisionengine.framework.tests.PublisherNOP",
      parameters: {}
    }
  }
}

Restart decision engine to start the new channel

systemctl restart decisionengine

de-client --status should sho the active test channel

Clone this wiki locally