Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
87 changes: 86 additions & 1 deletion README.md
Original file line number Diff line number Diff line change
Expand Up @@ -20,7 +20,10 @@ dependencies:
- htcondor
- pip
- pip:
- ncsa-htcdaskgateway>=1.0.2
- ncsa-htcdaskgateway>=1.0.4
- dask==2025.2.0
- distributed==2025.2.0
- tornado==6.4.2
```

From a Jupyter terminal window create the conda environment with:
Expand All @@ -30,6 +33,10 @@ conda env create -f conda.yaml
conda activate dask-gateway
```

_Note:_ Depending on your conda setup, the `conda activate` command may not be
available you can also activate the environment with the command
`source activate dask-gateway`.

Now you can use the `setup_condor` script to set up the HTCondor tools. This
will request your Illinois password and attempt to log into the HTCondor login
node and execute a command that generates a token file. This token file is used
Expand All @@ -46,6 +53,84 @@ with
condor_q
```

## Use in Jupyter Notebook

In your Jupyter notebook first thing you need to do is activate the conda
environment:

```shell
!source activate dask-gateway
```

Now you can pip install any additional dependencies. For objects that are sent
to dask or received as return values, you must have the exact same versions.

```shell
! python -m pip install numpy==2.2.4
```

### Providing Path to Condor Tools

There are some interesting interactions between conda and Jupyter. Conda has
installed the condor binaries, but doesn't update PATH in the notebook kernel.
We use an environment variable to tell the htcdaskgateway client how to find the
binaries.

In a terminal window:

```shell
source activate dask-gateway
which condor_q
```

Back in your notebook:

```python
import os

os.environ["CONDOR_BIN_DIR"] = "/home/myhome/.conda/envs/dask-gateway/bin"
```

### Setting up a dotenv file

It is good practice to keep passwords out of your notebooks. Create a `.env`
file that contains an entry for `DASK_GATEWAY_PASSWORD`

Add `python-dotenv` to your pip installed dependencies and add this line to your
notebook:

```python
from dotenv import load_dotenv

load_dotenv() # take environment variables from .env.
```

### Connecting to the Gateway and Scaling up Cluster

Now we can finally start up a cluster!

```python
from htcdaskgateway import HTCGateway
from dask_gateway.auth import BasicAuth
import os

gateway = HTCGateway(
address="https://dask.software-dev.ncsa.illinois.edu",
proxy_address=8786,
auth=BasicAuth(username=None, password=os.environ["DASK_GATEWAY_PASSWORD"]),
)

cluster = gateway.new_cluster(
image="ncsa/dask-public-health:latest",
container_image="/u/bengal1/condor/PublicHealth.sif",
)
cluster.scale(2)
client = cluster.get_client()
client
```

This will display the URL to access the cluster dashboard

## How it Works

This is a drop-in replacement for the official Dask Gateway client. It keeps the
Expand Down
4 changes: 2 additions & 2 deletions htcdaskgateway/_version.py
Original file line number Diff line number Diff line change
Expand Up @@ -17,5 +17,5 @@
__version_tuple__: VERSION_TUPLE
version_tuple: VERSION_TUPLE

__version__ = version = '1.0.2.dev0+g6b8821d.d20250415'
__version_tuple__ = version_tuple = (1, 0, 2, 'dev0', 'g6b8821d.d20250415')
__version__ = version = '1.0.2.dev6+g5379e9a.d20250418'
__version_tuple__ = version_tuple = (1, 0, 2, 'dev6', 'g5379e9a.d20250418')
9 changes: 5 additions & 4 deletions htcdaskgateway/cluster.py
Original file line number Diff line number Diff line change
Expand Up @@ -20,6 +20,7 @@ def __init__(self, container_image=None, **kwargs):
self.batchWorkerJobs = []
self.cluster_options = kwargs.get("cluster_options")
self.container_image = container_image
self.condor_bin_dir = os.environ["CONDOR_BIN_DIR"]

super().__init__(**kwargs)

Expand Down Expand Up @@ -141,7 +142,7 @@ def scale_batch_workers(self, n):
# We add this to avoid a bug on Farruk's condor_submit wrapper (a fix is in progress)
os.environ["LS_COLORS"] = "ExGxBxDxCxEgEdxbxgxcxd"
# Submit our jdl, print the result and call the cluster widget
cmd = ". ~/.profile && condor_submit htcdask_submitfile.jdl | grep -oP '(?<=cluster )[^ ]*'"
cmd = f". ~/.profile && {self.condor_bin_dir}/condor_submit htcdask_submitfile.jdl | grep -oP '(?<=cluster )[^ ]*'"
logger.info(
" Submitting HTCondor job(s) for %d workers with command: %s", n, cmd
)
Expand All @@ -153,7 +154,7 @@ def scale_batch_workers(self, n):
worker_dict["Iwd"] = tmproot
try:
cmd = (
". ~/.profile && condor_q "
f". ~/.profile && {self.condor_bin_dir}/condor_q "
+ clusterid
+ " -af GlobalJobId | awk '{print $1}'| awk -F '#' '{print $1}' | uniq"
)
Expand All @@ -176,7 +177,7 @@ def scale_batch_workers(self, n):
def destroy_batch_cluster_id(self, clusterid):
logger.info(" Shutting down HTCondor worker jobs from cluster %s", clusterid)
cmd = (
". ~/.profile && condor_rm "
f". ~/.profile && {self.condor_bin_dir}/condor_rm "
+ self.batchWorkerJobs["ClusterId"]
+ " -name "
+ self.batchWorkerJobs["ScheddName"]
Expand All @@ -194,7 +195,7 @@ def destroy_all_batch_clusters(self):
for htc_cluster in self.batchWorkerJobs:
try:
cmd = (
". ~/.profile && condor_rm "
f". ~/.profile && {self.condor_bin_dir}/condor_rm "
+ htc_cluster["ClusterId"]
+ " -name "
+ htc_cluster["ScheddName"]
Expand Down