Skip to content
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
20 changes: 18 additions & 2 deletions .devcontainer/Dockerfile
Original file line number Diff line number Diff line change
Expand Up @@ -10,14 +10,30 @@ ENV USERNAME=${USERNAME}
ENV USER_UID=${USER_UID}
ENV USER_GID=${USER_GID}
USER root

# extra interactive utilities
RUN apt-get update \
&& apt-get -qq install -y --no-install-recommends \
fd-find \
less \
bats \
ripgrep
ripgrep \
lsb-release \
gnupg2

# Python depedencies
ENV PIXI_HOME=/opt/pixi
ENV PATH=/opt/pixi/bin:$PATH
COPY pixi.toml pixi.lock /opt/pixi/
WORKDIR /opt/pixi
RUN /bin/bash -c "curl -fsSL https://pixi.sh/install.sh | bash" && pixi install

# Need to "install" fluxion
RUN git clone --depth 1 https://github.com/flux-framework/flux-sched /tmp/flux-sched
ENV PYTHONPATH=/usr/lib/python3.10/site-packages:/tmp/flux-sched/src/python

# This should be set in the pixi environment
ENV NO_AT_BRIDGE=1

# Add the group and user that match our ids
RUN groupadd -g ${USER_GID} ${USERNAME} && \
Expand Down
2 changes: 2 additions & 0 deletions .gitattributes
Original file line number Diff line number Diff line change
@@ -0,0 +1,2 @@
# SCM syntax highlighting & preventing 3-way merges
pixi.lock merge=binary linguist-language=YAML linguist-generated=true
22 changes: 22 additions & 0 deletions .gitignore
Original file line number Diff line number Diff line change
@@ -1 +1,23 @@
.deps
# pixi environments
.pixi/*
!.pixi/config.toml
Makefile
Makefile.in
aclocal.m4
ar-lib
autom4te.cache
compile
config.*
configure
.libs
install-sh
libtool
m4
missing
stamp-h1
depcomp
ltmain.sh
*.la
*.lo
*.o
27 changes: 27 additions & 0 deletions .pre-commit-config.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,27 @@
exclude: "examples"
repos:
- repo: https://github.com/pre-commit/pre-commit-hooks
rev: v4.3.0
hooks:
- id: check-added-large-files
args: ["--maxkb=2000"]
- id: check-case-conflict
- id: check-docstring-first
- id: end-of-file-fixer
- id: trailing-whitespace
- id: mixed-line-ending

- repo: local
hooks:
- id: black
name: black
language: python
types: [python]
entry: black

- id: isort
name: isort
args: [--filter-files]
language: python
types: [python]
entry: isort
14 changes: 13 additions & 1 deletion Makefile.am
Original file line number Diff line number Diff line change
@@ -1,4 +1,6 @@
# File: Makefile.am (FINAL CORRECTED VERSION with -pthread in LDFLAGS)
###########################################################################
# Build C Stuff Rules
###########################################################################

ACLOCAL_AMFLAGS = -I m4

Expand All @@ -12,6 +14,16 @@ delegate_c_bridge_la_LDFLAGS = -module -avoid-version -shared -pthread

delegate_c_bridge_la_LIBADD = @FLUX_CORE_LIBS@ @PYTHON_LIBS@ -ldl

###########################################################################
# Install Script Rules
###########################################################################

# $(libexecdir) is a standard autotools thing (e.g., /usr/libexec).
fluxcmddir = $(libexecdir)/flux/cmd

# Using _SCRIPTS ensures the file is installed with execute permissions.
fluxcmd_SCRIPTS = cmd/flux-remote.py

clean-local:
@echo "--- Running cleanup ---"
rm -f configure aclocal.m4 Makefile.in config.h.in ltmain.sh
Expand Down
52 changes: 51 additions & 1 deletion README.md
Original file line number Diff line number Diff line change
Expand Up @@ -2,6 +2,18 @@

This is a delgation plugin in Python, inspired by [Tapasya's](https://github.com/flux-framework/flux-core/pull/6873/files) WIP. I'm using it as a prototype until that one is ready, and I wanted to implement in Python as a proof of concept (since many like writing Python) and to make it easy to quickly update without recompiling something.

## Notes

We need to add a feasibility check, e.g., "Given these requests for software or other cluster resources, is this going to work?" Normally I think feasibility is run with fluxion, so *after* JobTap, but that doesn't make sense here. It doesn't make sense to send a job to another cluster just to have it rejected. So although this isn't a final design, I think feasibility (across many different options) needs to happen first. Then when we actually submit with a flux proxy (done via interfacing with the other URI) there is actually a better change of acceptance. It's like a pre-flight, fail fast check instead of "try and find out." An issue we might face is that the metadata about clusters and compatibility lives in a user's home, and I'm not sure that the flux instance / flux-core would have that access.

On the other hand, we don't want to add stress to flux-core to do these expensive checks for every job submit. So maybe what needs to happen is our library performs a local feasibility check, and _then_ formulates the submit request to the local Flux instance, and then the JobTap submits it. To the user, it is one command. And actually for our tool (fractale) we can wrap it in a flux command so it looks like a single flux command. Something like:

```bash
flux remote submit <same as flux submit>
```

The above is implemented to wrap around submit, and that is the extra command in [cmd](cmd) that can be installed to a Flux root (or just run as a one-off script). We just need to add the fractale stuff there. Will work on this week.

## Development

Open up the [.devcontainer](.devcontainer) is VSCode. then do:
Expand All @@ -13,6 +25,12 @@ make
sudo make install
```

Enter the pixi shell:

```bash
pixi shell
```

## Usage

Once you have installed, you can test by starting a Flux broker in the same directory as [delegate_handler.py](delegate_handler.py).
Expand All @@ -27,4 +45,36 @@ At this point you have your development environment, You can tweak the Python sc
```bash
flux submit --verbose --setattr=delegate.local_uri=$FLUX_URI --dependency=delegate:$FLUX_URI hostname
```
I added the local uri as an attribute because it would mean we can direct the interaction, saying exactly FROM where and TO where. I'm next wanting to think about how this fits into Fractale. We should have a lookup of clusters we know in the user's home, and then the URIs can come from there after a match is done. I also want to account for something with node features using NFD.

I added the local uri as an attribute because it would mean we can direct the interaction, saying exactly FROM where and TO where. I'm next wanting to think about how this fits into Fractale. We should have a lookup of clusters we know in the user's home, and then the URIs can come from there after a match is done. I also want to account for something with node features using NFD.

## Fractale

For integration with Fractale (and flux) we won't require the user to ask for delegation directly. We will provide a command for a remote submit:

```bash
flux remote submit --dry-run -S=requires.software.spack.value=curl curl
flux remote submit --dry-run --solver graph -S=requires.software.spack.name=curl curl

# Development variant
python cmd/flux-remote.py submit --dry-run -S=requires.software.spack.value=curl curl
python cmd/flux-remote.py submit --dry-run --solver graph -S=requires.software.spack.name=curl curl
```

In the above, the user is asking for a remote submit. This means we are going to receive the request in Flux, compare to local subsystems and clusters defined by the user in their home, and then make a selection of a cluster. The selected cluster will go straight to the JobTap plugin from the `flux remote submit` command without the user needing any subsequent interaction. To support this process, we have a `flux detect` command that can detect (and generate) local subsystem metadata, and even export it.

```bash
# Automatic detection of new clusters and subsystems
python cmd/flux-detect.py
flux detect

# Force detection of existing
python cmd/flux-detect.py --force
flux detect --force

# Detect and export to archive for import (detects all present)
python cmd/flux-detect.py --export
flux detect --export
```

See [examples/fractale](examples/fractale) for the full example, and the [cmd](cmd) that is added to Flux as a WIP to orchestrate this.
110 changes: 110 additions & 0 deletions cmd/flux-detect.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,110 @@
##############################################################
# Copyright 2023 Lawrence Livermore National Security, LLC
# (c.f. AUTHORS, NOTICE.LLNS, COPYING)
#
# This file is part of the Flux resource manager framework.
# For details, see https://github.com/flux-framework.
#
# SPDX-License-Identifier: LGPL-3.0
##############################################################

import argparse
import logging
import os
import shutil
import sys
import tempfile
from datetime import datetime

import flux
import flux.cli.submit as base
import fractale.defaults as defaults
import fractale.utils as utils
from compspec.plugin.registry import PluginRegistry
from fractale.store import FractaleStore
from fractale.subsystem import get_subsystem_solver

registry = PluginRegistry()
registry.discover()

LOGGER = logging.getLogger("flux-remote")


def open_logfile(fd):
return open(fd, "w", encoding="utf8", errors="surrogateescape")


class DetectCmd(base.SubmitCmd):
def main(self, args):
"""
Detect local subsystems.

This doesn't technically need to be based on submit.
"""
# If we are exporting, we are going to run detect in a non-existing directory
cleanup = False
if args.export and not args.config_dir:
args.config_dir = tempfile.mkdtemp(prefix="flux-detect-")
cleanup = True
store = FractaleStore(args.config_dir)
store.detect(force=args.force)

# Export locally detected subsystems
if args.export:
timestamp = datetime.now().strftime("%Y%m%d_%H%M%S")
archive_name = f"fractale-export-{timestamp}"
shutil.make_archive(archive_name, "zip", args.config_dir)

# Cleanup if we created a temporary context
if cleanup and os.path.exists(args.config_dir):
shutil.rmtree(args.config_dir)

def run_parser(self):
"""
The main remote parser is very simple. It only looks for the subcommand.
"""
parser = argparse.ArgumentParser(
prog="flux detect",
description="Detect and save local subsystem metadata.",
usage="flux detect <command> [options...]",
)
parser.add_argument(
"--config-dir",
dest="config_dir",
help="Fractale configuration directory to store subsystems. Defaults to ~/.fractale",
)
parser.add_argument(
"--force",
help="Given existing metadata, force an update.",
action="store_true",
default=False,
)
parser.add_argument(
"--export",
default=False,
action="store_true",
help="Export to local archive for later import.",
)

# We need to handle this manually since it's off base for argparse
if "-h" in sys.argv or "--help" in sys.argv:
parser.print_help()
sys.exit(0)

# This just processes our added command (expecting other subcommands eventually)
args, extra = parser.parse_known_args()
self.main(args)


@flux.util.CLIMain(LOGGER)
def main():
sys.stdout = open_logfile(sys.stdout.fileno())
sys.stderr = open_logfile(sys.stderr.fileno())

# This is going to be a submit parser with extra bells and whistles
detect = DetectCmd("flux detect", description="detect local subsystems")
detect.run_parser()


if __name__ == "__main__":
main()
Loading