Skip to content

Commit 94919bb

Browse files
authored
Merge pull request #12 from natcap/feature/compute-note-playbook
Compute node setup
2 parents d1c2db9 + c75cc62 commit 94919bb

20 files changed

+2242
-360
lines changed

.github/workflows/test.yml

Lines changed: 93 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -45,16 +45,103 @@ jobs:
4545
with:
4646
fetch-depth: 0 # Fetch complete history for accurate versioning
4747

48-
- uses: koesterlab/setup-slurm-action@v1
48+
49+
50+
#############################################################
51+
# This section copied from koesterlab/setup-slurm-action@v1
52+
# their action does not allow modification of slurm conf vars, and we
53+
# need to configure `AccountingStoreFlags: job_comment` so that job
54+
# comments are stored in the database.
55+
#
56+
- name: Download slurm ansible roles
57+
shell: bash -e {0}
58+
run: |
59+
ansible-galaxy role install https://github.com/galaxyproject/ansible-slurm/archive/1.0.1.tar.gz
60+
61+
- name: Apt prerequisites
62+
shell: bash -e {0}
63+
run: |
64+
sudo apt-get update
65+
sudo apt-get install retry
66+
67+
- name: Define slurm playbook
68+
uses: 1arp/create-a-file-action@0.2
69+
with:
70+
file: slurm-playbook.yml
71+
content: |
72+
- name: Slurm all in One
73+
hosts: localhost
74+
roles:
75+
- role: 1.0.1
76+
become: true
77+
vars:
78+
slurm_upgrade: true
79+
slurm_roles: ['controller', 'exec', 'dbd']
80+
slurm_config_dir: /etc/slurm
81+
slurm_config:
82+
ClusterName: cluster
83+
SlurmctldLogFile: /var/log/slurm/slurmctld.log
84+
SlurmctldPidFile: /run/slurmctld.pid
85+
SlurmdLogFile: /var/log/slurm/slurmd.log
86+
SlurmdPidFile: /run/slurmd.pid
87+
SlurmdSpoolDir: /tmp/slurmd # the default /var/lib/slurm/slurmd does not work because of noexec mounting in github actions
88+
StateSaveLocation: /var/lib/slurm/slurmctld
89+
AccountingStorageType: accounting_storage/slurmdbd
90+
AccountingStoreFlags: job_comment
91+
SelectType: select/cons_tres
92+
slurmdbd_config:
93+
StorageType: accounting_storage/mysql
94+
PidFile: /run/slurmdbd.pid
95+
LogFile: /var/log/slurm/slurmdbd.log
96+
StoragePass: root
97+
StorageUser: root
98+
StorageHost: 127.0.0.1 # see https://stackoverflow.com/questions/58222386/github-actions-using-mysql-service-throws-access-denied-for-user-rootlocalh
99+
StoragePort: 8888
100+
DbdHost: localhost
101+
slurm_create_user: yes
102+
slurm_nodes:
103+
- name: localhost
104+
State: UNKNOWN
105+
Sockets: 1
106+
CoresPerSocket: 2
107+
RealMemory: 2000
108+
slurm_user:
109+
comment: "Slurm Workload Manager"
110+
gid: 1002
111+
group: slurm
112+
home: "/var/lib/slurm"
113+
name: slurm
114+
shell: "/bin/bash"
115+
uid: 1002
116+
117+
- name: Set XDG_RUNTIME_DIR
118+
shell: bash -e {0}
119+
run: |
120+
mkdir -p /tmp/1002-runtime # work around podman issue (https://github.com/containers/podman/issues/13338)
121+
echo XDG_RUNTIME_DIR=/tmp/1002-runtime >> $GITHUB_ENV
122+
123+
- name: Setup slurm
124+
shell: bash -e {0}
125+
run: |
126+
ansible-playbook slurm-playbook.yml || (journalctl -xe && exit 1)
127+
128+
- name: Add slurm account
129+
shell: bash -e {0}
130+
run: |
131+
sudo retry --times=24 --delay=5 --until=success -- sacctmgr -i create account "Name=runner"
132+
sudo retry --times=24 --delay=5 --until=success -- sacctmgr -i create user "Name=runner" "Account=runner"
133+
############################################################
49134

50135
- name: Setup conda environment
51136
uses: mamba-org/setup-micromamba@v2
52137
with:
53138
environment-name: env
139+
# pin numpy: https://github.com/natcap/invest/issues/2288
54140
create-args: >-
55141
python=3.13
56142
natcap.invest
57143
pytest
144+
numpy<2.4.0
58145
condarc: |
59146
channels:
60147
- conda-forge
@@ -74,7 +161,9 @@ jobs:
74161

75162
- name: Run tests
76163
run: |
77-
export PYGEOAPI_CONFIG=pygeoapi-config.yml
78-
export PYGEOAPI_OPENAPI=openapi.yml
164+
which invest
165+
invest --version
166+
export PYGEOAPI_CONFIG=invest_processes/pygeoapi-config.yml
167+
export PYGEOAPI_OPENAPI=invest_processes/openapi.yml
79168
pygeoapi openapi generate $PYGEOAPI_CONFIG --output-file $PYGEOAPI_OPENAPI
80-
pytest --log-cli-level=DEBUG tests/
169+
pytest -s --log-cli-level=DEBUG tests/

README.md

Lines changed: 0 additions & 19 deletions
Original file line numberDiff line numberDiff line change
@@ -1,21 +1,2 @@
11
# invest-compute
22
APIs and backend for running invest in the cloud
3-
4-
## pygeoapi server
5-
6-
To launch the server:
7-
```
8-
export PYGEOAPI_CONFIG=pygeoapi-config.yml
9-
export PYGEOAPI_OPENAPI=openapi.yml
10-
pygeoapi openapi generate $PYGEOAPI_CONFIG --output-file $PYGEOAPI_OPENAPI
11-
pygeoapi serve
12-
```
13-
14-
Access the OpenAPI Swagger page in your browser at http://localhost:5000/openapi
15-
16-
### asynchronous requests
17-
invest model execution should run asynchronously because it can take a long time. To use asynchronous mode, include the `'Prefer: respond-async'` header in the request, as required by `pygeoapi` and the OGC Processes specification ([source](https://docs.pygeoapi.io/en/latest/data-publishing/ogcapi-processes.html#asynchronous-support)).
18-
19-
it seems that the async execution request is supposed to return a JSON object containing info about the job including its ID, which you can then use to query the job status and results. however the request actually returns null, and the only job info is available in the `location` response header. I asked about this here: https://github.com/geopython/pygeoapi/issues/2105
20-
21-
for now, given a `location` header value like `http://localhost:5000/jobs/XXXXXX`, you can check its status at that url, and retrieve results at `http://localhost:5000/jobs/XXXXXX/results`.

invest_processes/README.md

Lines changed: 18 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,18 @@
1+
## pygeoapi server
2+
3+
To launch the server:
4+
```
5+
export PYGEOAPI_CONFIG=pygeoapi-config.yml
6+
export PYGEOAPI_OPENAPI=openapi.yml
7+
pygeoapi openapi generate $PYGEOAPI_CONFIG --output-file $PYGEOAPI_OPENAPI
8+
pygeoapi serve
9+
```
10+
11+
Access the OpenAPI Swagger page in your browser at http://localhost:5000/openapi
12+
13+
### asynchronous requests
14+
invest model execution should run asynchronously because it can take a long time. To use asynchronous mode, include the `'Prefer: respond-async'` header in the request, as required by `pygeoapi` and the OGC Processes specification ([source](https://docs.pygeoapi.io/en/latest/data-publishing/ogcapi-processes.html#asynchronous-support)).
15+
16+
it seems that the async execution request is supposed to return a JSON object containing info about the job including its ID, which you can then use to query the job status and results. however the request actually returns null, and the only job info is available in the `location` response header. I asked about this here: https://github.com/geopython/pygeoapi/issues/2105
17+
18+
for now, given a `location` header value like `http://localhost:5000/jobs/XXXXXX`, you can check its status at that url, and retrieve results at `http://localhost:5000/jobs/XXXXXX/results`.
File renamed without changes.
Lines changed: 13 additions & 95 deletions
Original file line numberDiff line numberDiff line change
@@ -1,11 +1,8 @@
1-
import importlib
21
import logging
3-
import os
4-
import tempfile
2+
from pathlib import Path
53
import textwrap
6-
import time
74

8-
from natcap.invest import datastack, models, spec, utils
5+
from invest_processes.utils import download_and_extract_datastack
96
from pygeoapi.process.base import BaseProcessor, ProcessorExecuteError
107

118
LOGGER = logging.getLogger(__name__)
@@ -33,9 +30,9 @@
3330
}
3431
},
3532
'outputs': {
36-
'workspace_dir': {
37-
'title': 'Workspace directory',
38-
'description': 'Path to the workspace directory containing all model results',
33+
'workspace_url': {
34+
'title': 'Workspace URL',
35+
'description': 'URL to the workspace containing all model results',
3936
'schema': {
4037
'type': 'string',
4138
'contentMediaType': 'application/json'
@@ -49,6 +46,7 @@
4946
}
5047
}
5148

49+
5250
class ExecuteProcessor(BaseProcessor):
5351
"""InVEST execute process"""
5452

@@ -65,29 +63,23 @@ def __init__(self, processor_def):
6563

6664
super().__init__(processor_def, PROCESS_METADATA)
6765

68-
def create_slurm_script(self, datastack_path, workspace_dir):
66+
def create_slurm_script(self, datastack_url, workspace_dir):
6967
"""Create a script to run with sbatch.
7068
7169
Args:
72-
datastack_path: path to the user provided invest datastack to execute
70+
datastack_url: URL to the invest datastack (.tgz) to execute
7371
workspace_dir: path to the directory that the slurm job will run in
7472
7573
Returns:
7674
string contents of the script
7775
"""
78-
try:
79-
model_id = datastack.extract_parameter_set(datastack_path).model_id
80-
except Exception as error:
81-
raise ProcessorExecuteError(
82-
1, "Error when parsing JSON datastack:\n " + str(error))
83-
84-
# Create a workspace directory
85-
workspace_dir = os.path.join(workspace_dir, f'{model_id}_workspace')
86-
76+
json_path, model_id = download_and_extract_datastack(
77+
datastack_url, Path(workspace_dir) / 'datastack')
78+
workspace_dir = Path(workspace_dir) / f'{model_id}_workspace'
8779
return textwrap.dedent(f"""\
8880
#!/bin/sh
8981
#SBATCH --time=10
90-
invest run --datastack {datastack_path} --workspace {workspace_dir} {model_id}
82+
invest run --datastack {json_path} --workspace {workspace_dir} {model_id}
9183
""")
9284

9385
def process_output(self, workspace_dir):
@@ -99,81 +91,7 @@ def process_output(self, workspace_dir):
9991
Returns:
10092
empty dict
10193
"""
102-
return {}
103-
104-
def execute(self, data, outputs=None):
105-
"""Execute the process.
106-
107-
Args:
108-
data: dictionary of data inputs
109-
outputs:
110-
111-
Returns:
112-
Tuple of (mimetype, outputs)
113-
"""
114-
# Extract model ID and parameters from the datastack file
115-
datastack_path = data.get('datastack_path')
116-
117-
try:
118-
parameter_set = datastack.extract_parameter_set(datastack_path)
119-
except Exception as error:
120-
raise ProcessorExecuteError(
121-
1, "Error when parsing JSON datastack:\n " + str(error))
122-
123-
# Import the model
124-
try:
125-
model_module = models.pyname_to_module[
126-
models.model_id_to_pyname[parameter_set.model_id]]
127-
except KeyError as ex:
128-
raise ValueError(f'model ID {parameter_set.model_id} not found')
129-
130-
# Create a workspace directory
131-
workspace_root = os.path.abspath('workspaces')
132-
workspace_dir = os.path.join(workspace_root, f'{parameter_set.model_id}_{time.time()}')
133-
parameter_set.args['workspace_dir'] = workspace_dir
134-
135-
for arg_key, val in parameter_set.args.items():
136-
try:
137-
input_spec = model_module.MODEL_SPEC.get_input(arg_key)
138-
except KeyError:
139-
continue
140-
# Uncomment this for next invest release
141-
# if type(input_spec) in {spec.RasterInput, spec.SingleBandRasterInput,
142-
# spec.VectorInput}:
143-
# parameter_set.args[arg_key] = utils._GDALPath.from_uri(
144-
# val).to_normalized_path()
145-
146-
with utils.prepare_workspace(workspace_dir,
147-
model_id=parameter_set.model_id,
148-
logging_level=logging.DEBUG):
149-
LOGGER.log(
150-
datastack.ARGS_LOG_LEVEL,
151-
'Starting model with parameters: \n' +
152-
datastack.format_args_dict(
153-
parameter_set.args,
154-
parameter_set.model_id))
155-
156-
try:
157-
model_module.execute(parameter_set.args)
158-
except Exception as ex:
159-
LOGGER.error(
160-
f'An error occurred during execution: {ex}', exc_info=ex)
161-
raise ProcessorExecuteError(
162-
'An error occurred during execution. See the log file in '
163-
'the workspace for details. \n Workspace: ' + workspace_dir)
164-
165-
LOGGER.info('Generating metadata for results')
166-
try:
167-
# If there's an exception from creating metadata
168-
# I don't think we want to indicate a model failure
169-
spec.generate_metadata_for_outputs(
170-
model_module, parameter_set.args)
171-
except Exception as ex:
172-
LOGGER.warning(
173-
'Something went wrong while generating metadata', exc_info=ex)
174-
175-
outputs = {'workspace_dir': workspace_dir}
176-
return 'application/json', outputs
94+
pass
17795

17896
def __repr__(self):
17997
return f'<InVESTExecuteProcessor> {self.name}'

0 commit comments

Comments
 (0)