Skip to content
Draft
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
61 changes: 61 additions & 0 deletions codeflare_helpers/landsat_workflow/Dockerfile
Original file line number Diff line number Diff line change
@@ -0,0 +1,61 @@

### USAGE ###

#### FIRST: BUILD DOCKER (off HPC) ####
# 1. MUST be x64
# export DOCKER_DEFAULT_PLATFORM=linux/amd64
# 2.
# docker build -t kastanday/landsattrend2 .
# 3.
# docker push kastanday/landsattrend2:latest

#### SECOND: Pull Apptainer (on HPC) ####
# 1. Pull and convert that image to a local .sif apptainer/singulairty file.
# apptainer pull kastanday/landsattrend2:latest
# WARNING: Fast internet recommended. This takes a long time and uses significant disk space.

# use NGC images when ML is needed: FROM nvcr.io/nvidia/tensorflow:19.10-py3
FROM python:3.8

LABEL name="landsattrend2"
LABEL summary="Build a Docker / Apptainer image for landsattrend2."
LABEL URL="https://github.com/initze/landsattrend/blob/dev4Clowder_Ingmar_deployed_delta"
LABEL maintaner="Kastan Day"
LABEL architecture="x86_64"
LABEL version="0.1"
LABEL release-date="2023-06-30"

WORKDIR /app

# necessary for cv2 https://exerror.com/importerror-libgl-so-1-cannot-open-shared-object-file-no-such-file-or-directory/
# and gdal
RUN apt-get update && apt-get install -y \
curl \
wget \
build-essential \
ffmpeg \
libsm6 \
libxext6 \
gdal-bin \
libgdal-dev \
&& apt-get clean

# these args are untested, and probably unnecessary. Try adding if problems w gdal.
# might need to switch ARG to ENV
ENV CPLUS_INCLUDE_PATH=/usr/include/gdal
ENV C_INCLUDE_PATH=/usr/include/gdal

RUN python3 -m pip install --upgrade pip
RUN pip install \
bottleneck \
fiona==1.8.20 \
gdal==3.3.2 \
geopandas==0.9.0 \
googledrivedownloader==0.4 \
jupyter \
notebook \
pycaret \
rasterio==1.2 \
scikit-image==0.19.1 \
netCDF4 \
pyclowder
6 changes: 6 additions & 0 deletions codeflare_helpers/landsat_workflow/landsat_workflow.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,6 @@
## Run Landsat Workflow
=== "Run Landsat Workflow"
Run Ingmar Nitze Landsat workflow, defined here: <https://github.com/initze/landsattrend>
```shell
bash ./CodeFlare-Extractors/codeflare_helpers/landsat_workflow/run_landsat_workflow.sh
```
42 changes: 42 additions & 0 deletions codeflare_helpers/landsat_workflow/run_landsat_workflow.sh
Original file line number Diff line number Diff line change
@@ -0,0 +1,42 @@
#!/bin/bash

# Build and push docker image
docker build -t kastanday/landsattrend2 --platform linux/amd64 CodeFlare-Extractors/codeflare_helpers/landsat_workflow
docker push kastanday/landsattrend2:latest

# ssh into remote client and execute commands remotely
# ssh [email protected] 'bash -s' << EOF
ssh [email protected] 'zsh -s' << EOF

source ~/.zshrc

apptainer pull docker://kastanday/landsattrend2:latest

# or use conda (instead of apptainer)
# conda activate landsattrend2

### Setup Ingmar's github ###
mkdir -p ~/codeflare_utils/landsat_workflow
cd ~/codeflare_utils/landsat_workflow

if [ ! -d "landsattrend" ]; then
git clone [email protected]:initze/landsattrend.git
fi

cd landsattrend
git checkout dev4Clowder_Ingmar_deployed_delta

#### Run code ###
# step 1: Google Earth to Google Cloud Storage.

# step 2: GCS to HPC.

# step 4a - upload results to clowder.
bash ~/codeflare_utils/landsat_workflow/landsattrend/import_export/upload_region_output.sh https://pdg.clowderframework.org/ 981ab4c8-7d22-418d-93a2-b47019c2f583 ALASKA /scratch/bbou/toddn/landsat-delta/landsattrend/process 649232e2e4b00aa1838f0fc2
echo "Completed Step 4a: 'upload_region_output.sh'"

# step 4b - upload results to clowder.
bash ~/codeflare_utils/landsat_workflow/landsattrend/import_export/upload_region.sh https://pdg.clowderframework.org/ 981ab4c8-7d22-418d-93a2-b47019c2f583 ALASKA /scratch/bbou/toddn/landsat-delta/landsattrend/process 649232e2e4b00aa1838f0fc2
echo "Completed Step 4b: 'upload_input_regions'"

EOF
13 changes: 13 additions & 0 deletions codeflare_helpers/landsat_workflow/start_ray.sh
Original file line number Diff line number Diff line change
@@ -0,0 +1,13 @@
#!/bin/bash

# ssh into remote client and execute commands remotely
ssh [email protected] 'zsh -s' << EOF
source ~/.zshrc
conda activate ray_py38
ray start
echo "Ray started, exiting."
EOF

# port forward from local to NCSA.
# ssh -l kastanday \
# -L localhost:8265:cn005.delta.internal.ncsa.edu:8265 dt-login02.delta.ncsa.illinois.edu
13 changes: 13 additions & 0 deletions index.md
Original file line number Diff line number Diff line change
Expand Up @@ -158,6 +158,19 @@ fi
# Launch script (docker compose up)
bash ./CodeFlare-Extractors/codeflare_helpers/launch_clowder.sh
```
=== "🛰️ Run Landsat Analysis Workflow"
Google Cloud Storage → Clowder → HPC Jobs → Results to Clowder. Created by Ingmar Nitze, defined here: <https://github.com/initze/landsattrend>

```shell
bash ./CodeFlare-Extractors/codeflare_helpers/launch_clowder.sh
```

```shell
bash ./CodeFlare-Extractors/codeflare_helpers/landsat_workflow/run_landsat_workflow.sh
```

<!-- :import{CodeFlare-Extractors/codeflare_helpers/landsat_workflow/landsat_workflow.md} -->


=== "↔️ Move in and out of Clowder"
You can import data from anywhere, and export data to anywhere. Just select a source, then a destination. First select your source files, then your destination location.
Expand Down
23 changes: 23 additions & 0 deletions single-file-huggingface/Dockerfile
Original file line number Diff line number Diff line change
@@ -0,0 +1,23 @@
## USAGE:
# 1. Build
# docker build -t kastanday/huggingface-single-file-extractor .
# 2. Run (with clowder in full development mode.)
# docker run -t -i --rm --net clowder2_clowder2 \
# -e CLOWDER_URL=http://host.docker.internal:80/ \
# -e "CLOWDER_VERSION=2" \
# -e RABBITMQ_URI="amqp://guest:guest@clowder2-rabbitmq-1:5672/%2F" \
# --shm-size=2.17gb \
# --name huggingface-extractor-aarch64-4 \
# kastanday/huggingface-single-file-extractor

FROM python:3.8

WORKDIR /extractor
COPY requirements.txt ./
RUN apt-get update && apt-get --yes install libsndfile1
RUN pip install -r requirements.txt

RUN export LC_ALL=C.UTF-8
RUN export LANG=C.UTF-8
COPY single_file_huggingface.py extractor_info.json ./
CMD python3 -u single_file_huggingface.py --max-retry 1 --heartbeat 5 --connector RabbitMQ
78 changes: 78 additions & 0 deletions single-file-huggingface/README.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,78 @@
# This extractor is best run via CodeFlare, see the top-level README for more.

# Manual Docker (no CodeFlare)

This extractor is ready to be run as a docker container, the only dependency is a running Clowder instance. Simply build and run.

1. Start Clowder. For help starting Clowder, see our [getting started guide](https://github.com/clowder-framework/clowder/blob/develop/doc/src/sphinx/userguide/installing_clowder.rst).

2. First build the extractor Docker container:

```
# from this directory, run:

docker build -t clowder_wordcount .
```

3. Finally run the extractor:

```
docker run -t -i --rm --net clowder_clowder -e "RABBITMQ_URI=amqp://guest:guest@rabbitmq:5672/%2f" --name "wordcount" clowder_wordcount
```

Then open the Clowder web app and run the wordcount extractor on a .txt file (or similar)! Done.

### Python and Docker details

You may use any version of Python 3. Simply edit the first line of the `Dockerfile`, by default it uses `FROM python:3.8`.

Docker flags:

- `--net` links the extractor to the Clowder Docker network (run `docker network ls` to identify your own.)
- `-e RABBITMQ_URI=` sets the environment variables can be used to control what RabbitMQ server and exchange it will bind itself to. Setting the `RABBITMQ_EXCHANGE` may also help.
- You can also use `--link` to link the extractor to a RabbitMQ container.
- `--name` assigns the container a name visible in Docker Desktop.

## Troubleshooting

**If you run into _any_ trouble**, please reach out on our Clowder Slack in the [#pyclowder channel](https://clowder-software.slack.com/archives/CNC2UVBCP).

Alternate methods of running extractors are below.

# Commandline Execution

To execute the extractor from the command line you will need to have the required packages installed. It is highly recommended to use python virtual environment for this. You will need to create a virtual environment first, then activate it and finally install all required packages.

```
virtualenv /home/clowder/virtualenv/wordcount
. /home/clowder/virtualenv/wordcount/bin/activate
pip install -r /home/clowder/extractors/wordcount/requirements.txt
```

To start the extractor you will need to load the virtual environment and start the extractor.

```
. /home/clowder/virtualenv/wordcount/bin/activate
/home/clowder/extractors/wordcount/wordcount.py
```

# Systemd Start

The example service file provided in sample-extractors will start the docker container at system startup. This can be used with CoreOS or RedHat systems to make sure the wordcount extractor starts when the machine comes online. This expects the docker system to be installed.

All you need to do is copy clowder-wordcount.service to /etc/systemd/system and run, edit it to set the parameters for rabbitmq and run the following commands:

```
systemctl enable clowder-wordcount.service
systemctl start clowder-wordcount.service
```

To see the log you can use:

```
journalctl -f -u clowder-wordcount.service
```

# Upstart

The example conf file provided in sample-extractors will start the extractor on an Ubuntu system. This assumes that the system is setup for commandline execution. This will make it so the wordcount extractor starts when the system boots up. This extractor can be configured by specifying the same environment variables as using in the docker container. Any of the console output will go into /var/log/upstart/wordcount.log.
31 changes: 31 additions & 0 deletions single-file-huggingface/extractor_info.json
Original file line number Diff line number Diff line change
@@ -0,0 +1,31 @@
{
"@context": "http://clowder.ncsa.illinois.edu/contexts/extractors.jsonld",
"name": "ncsa.SINGLEFILE-text-classifier",
"version": "2.1",
"description": "Huggingface dataset-extractor for sentiment classifier of text file",
"author": "Kastan Day",
"contributors": [
"Luigi Marini"
],
"contexts": [
{
"predictions": "http://example.org"
}
],
"repository": [
{
"repType": "git",
"repUrl": "https://opensource.ncsa.illinois.edu/stash/scm/cats/pyclowder.git"
}
],
"process": {
"file": [
"text/*",
"application/json"
]
},
"max_retry": 1,
"external_services": [],
"dependencies": [],
"bibtex": []
}
8 changes: 8 additions & 0 deletions single-file-huggingface/requirements.txt
Original file line number Diff line number Diff line change
@@ -0,0 +1,8 @@
pyclowder==3.0.3
ray[default]==2.7.0
numpy
scipy
pillow
torch
torchvision
transformers
Loading