clowder-framework · KastanDay · Jun 22, 2023 · Jun 22, 2023 · Sep 19, 2023
diff --git a/codeflare_helpers/landsat_workflow/Dockerfile b/codeflare_helpers/landsat_workflow/Dockerfile
@@ -0,0 +1,61 @@
+
+### USAGE ### 
+
+#### FIRST: BUILD DOCKER (off HPC) ####
+# 1. MUST be x64
+# export DOCKER_DEFAULT_PLATFORM=linux/amd64
+# 2.
+# docker build -t kastanday/landsattrend2 .
+# 3. 
+# docker push kastanday/landsattrend2:latest
+
+#### SECOND: Pull Apptainer (on HPC) ####
+# 1. Pull and convert that image to a local .sif apptainer/singulairty file.
+# apptainer pull kastanday/landsattrend2:latest
+#   WARNING: Fast internet recommended. This takes a long time and uses significant disk space.
+
+# use NGC images when ML is needed: FROM nvcr.io/nvidia/tensorflow:19.10-py3
+FROM python:3.8
+
+LABEL name="landsattrend2"
+LABEL summary="Build a Docker / Apptainer image for landsattrend2."
+LABEL URL="https://github.com/initze/landsattrend/blob/dev4Clowder_Ingmar_deployed_delta"
+LABEL maintaner="Kastan Day"
+LABEL architecture="x86_64"
+LABEL version="0.1"
+LABEL release-date="2023-06-30"
+
+WORKDIR /app
+
+# necessary for cv2 https://exerror.com/importerror-libgl-so-1-cannot-open-shared-object-file-no-such-file-or-directory/
+# and gdal
+RUN apt-get update && apt-get install -y \
+  curl     \
+  wget     \
+  build-essential     \
+  ffmpeg     \
+  libsm6     \
+  libxext6   \
+  gdal-bin   \
+  libgdal-dev     \
+  && apt-get clean
+
+# these args are untested, and probably unnecessary. Try adding if problems w gdal. 
+# might need to switch ARG to ENV
+ENV CPLUS_INCLUDE_PATH=/usr/include/gdal
+ENV C_INCLUDE_PATH=/usr/include/gdal
+
+RUN python3 -m pip install --upgrade pip
+RUN pip install \
+  bottleneck \ 
+  fiona==1.8.20 \ 
+  gdal==3.3.2 \ 
+  geopandas==0.9.0 \ 
+  googledrivedownloader==0.4 \ 
+  jupyter \ 
+  notebook \ 
+  pycaret \ 
+  rasterio==1.2 \ 
+  scikit-image==0.19.1 \ 
+  netCDF4 \ 
+  pyclowder 
diff --git a/codeflare_helpers/landsat_workflow/landsat_workflow.md b/codeflare_helpers/landsat_workflow/landsat_workflow.md
@@ -0,0 +1,6 @@
+## Run Landsat Workflow
+=== "Run Landsat Workflow"
+    Run Ingmar Nitze Landsat workflow, defined here: <https://github.com/initze/landsattrend>
+    ```shell
+    bash ./CodeFlare-Extractors/codeflare_helpers/landsat_workflow/run_landsat_workflow.sh
+    ```
diff --git a/codeflare_helpers/landsat_workflow/run_landsat_workflow.sh b/codeflare_helpers/landsat_workflow/run_landsat_workflow.sh
@@ -0,0 +1,42 @@
+#!/bin/bash
+
+# Build and push docker image
+docker build -t kastanday/landsattrend2 --platform linux/amd64 CodeFlare-Extractors/codeflare_helpers/landsat_workflow
+docker push kastanday/landsattrend2:latest
+
+# ssh into remote client and execute commands remotely
+# ssh [email protected] 'bash -s' << EOF
+ssh [email protected] 'zsh -s' << EOF
+
+source ~/.zshrc
+
+apptainer pull docker://kastanday/landsattrend2:latest
+
+# or use conda (instead of apptainer)
+# conda activate landsattrend2
+
+### Setup Ingmar's github ###
+mkdir -p ~/codeflare_utils/landsat_workflow
+cd ~/codeflare_utils/landsat_workflow
+
+if [ ! -d "landsattrend" ]; then
+  git clone [email protected]:initze/landsattrend.git
+fi
+
+cd landsattrend
+git checkout dev4Clowder_Ingmar_deployed_delta
+
+#### Run code ###
+# step 1: Google Earth to Google Cloud Storage.
+
+# step 2: GCS to HPC.
+
+# step 4a - upload results to clowder. 
+bash ~/codeflare_utils/landsat_workflow/landsattrend/import_export/upload_region_output.sh https://pdg.clowderframework.org/ 981ab4c8-7d22-418d-93a2-b47019c2f583 ALASKA /scratch/bbou/toddn/landsat-delta/landsattrend/process 649232e2e4b00aa1838f0fc2
+echo "Completed Step 4a: 'upload_region_output.sh'"
+
+# step 4b - upload results to clowder. 
+bash ~/codeflare_utils/landsat_workflow/landsattrend/import_export/upload_region.sh https://pdg.clowderframework.org/ 981ab4c8-7d22-418d-93a2-b47019c2f583 ALASKA /scratch/bbou/toddn/landsat-delta/landsattrend/process 649232e2e4b00aa1838f0fc2
+echo "Completed Step 4b: 'upload_input_regions'"
+
+EOF
diff --git a/codeflare_helpers/landsat_workflow/start_ray.sh b/codeflare_helpers/landsat_workflow/start_ray.sh
@@ -0,0 +1,13 @@
+#!/bin/bash
+
+# ssh into remote client and execute commands remotely
+ssh [email protected] 'zsh -s' << EOF
+source ~/.zshrc
+conda activate ray_py38
+ray start
+echo "Ray started, exiting."
+EOF
+
+# port forward from local to NCSA.
+# ssh -l kastanday \
+#   -L localhost:8265:cn005.delta.internal.ncsa.edu:8265 dt-login02.delta.ncsa.illinois.edu
diff --git a/index.md b/index.md
@@ -158,6 +158,19 @@ fi
     # Launch script (docker compose up)
     bash ./CodeFlare-Extractors/codeflare_helpers/launch_clowder.sh
     ```
+=== "🛰️  Run Landsat Analysis Workflow"
+    Google Cloud Storage → Clowder → HPC Jobs → Results to Clowder. Created by Ingmar Nitze, defined here: <https://github.com/initze/landsattrend>
+
+    ```shell
+    bash ./CodeFlare-Extractors/codeflare_helpers/launch_clowder.sh
+    ```
+
+    ```shell
+    bash ./CodeFlare-Extractors/codeflare_helpers/landsat_workflow/run_landsat_workflow.sh
+    ```
+
+    <!-- :import{CodeFlare-Extractors/codeflare_helpers/landsat_workflow/landsat_workflow.md} -->
+
 
 === "↔️  Move in and out of Clowder"
     You can import data from anywhere, and export data to anywhere. Just select a source, then a destination. First select your source files, then your destination location.

diff --git a/single-file-huggingface/Dockerfile b/single-file-huggingface/Dockerfile
@@ -0,0 +1,23 @@
+## USAGE: 
+# 1. Build
+# docker build -t kastanday/huggingface-single-file-extractor .
+# 2. Run (with clowder in full development mode.)
+# docker run -t -i --rm --net clowder2_clowder2 \
+#   -e CLOWDER_URL=http://host.docker.internal:80/ \
+#   -e "CLOWDER_VERSION=2" \
+#   -e RABBITMQ_URI="amqp://guest:guest@clowder2-rabbitmq-1:5672/%2F" \
+#   --shm-size=2.17gb \
+#   --name huggingface-extractor-aarch64-4 \
+#   kastanday/huggingface-single-file-extractor
+
+FROM python:3.8 
+
+WORKDIR /extractor
+COPY requirements.txt ./
+RUN apt-get update && apt-get --yes install libsndfile1
+RUN pip install -r requirements.txt
+
+RUN export LC_ALL=C.UTF-8
+RUN export LANG=C.UTF-8
+COPY single_file_huggingface.py extractor_info.json ./
+CMD python3 -u single_file_huggingface.py --max-retry 1 --heartbeat 5 --connector RabbitMQ
diff --git a/single-file-huggingface/README.md b/single-file-huggingface/README.md
@@ -0,0 +1,78 @@
+# This extractor is best run via CodeFlare, see the top-level README for more. 
+
+# Manual Docker (no CodeFlare)
+
+This extractor is ready to be run as a docker container, the only dependency is a running Clowder instance. Simply build and run.
+
+1. Start Clowder. For help starting Clowder, see our [getting started guide](https://github.com/clowder-framework/clowder/blob/develop/doc/src/sphinx/userguide/installing_clowder.rst).
+
+2. First build the extractor Docker container:
+
+```
+# from this directory, run:
+
+docker build -t clowder_wordcount .
+```
+
+3. Finally run the extractor:
+
+```
+docker run -t -i --rm --net clowder_clowder -e "RABBITMQ_URI=amqp://guest:guest@rabbitmq:5672/%2f" --name "wordcount" clowder_wordcount
+```
+
+Then open the Clowder web app and run the wordcount extractor on a .txt file (or similar)! Done.
+
+### Python and Docker details
+
+You may use any version of Python 3. Simply edit the first line of the `Dockerfile`, by default it uses `FROM python:3.8`.
+
+Docker flags:
+
+- `--net` links the extractor to the Clowder Docker network (run `docker network ls` to identify your own.)
+- `-e RABBITMQ_URI=` sets the environment variables can be used to control what RabbitMQ server and exchange it will bind itself to. Setting the `RABBITMQ_EXCHANGE` may also help.
+  - You can also use `--link` to link the extractor to a RabbitMQ container.
+- `--name` assigns the container a name visible in Docker Desktop.
+
+## Troubleshooting
+
+**If you run into _any_ trouble**, please reach out on our Clowder Slack in the [#pyclowder channel](https://clowder-software.slack.com/archives/CNC2UVBCP).
+
+Alternate methods of running extractors are below.
+
+# Commandline Execution
+
+To execute the extractor from the command line you will need to have the required packages installed. It is highly recommended to use python virtual environment for this. You will need to create a virtual environment first, then activate it and finally install all required packages.
+
+```
+virtualenv /home/clowder/virtualenv/wordcount
+. /home/clowder/virtualenv/wordcount/bin/activate
+pip install -r /home/clowder/extractors/wordcount/requirements.txt
+```
+
+To start the extractor you will need to load the virtual environment and start the extractor.
+
+```
+. /home/clowder/virtualenv/wordcount/bin/activate
+/home/clowder/extractors/wordcount/wordcount.py
+```
+
+# Systemd Start
+
+The example service file provided in sample-extractors will start the docker container at system startup. This can be used with CoreOS or RedHat systems to make sure the wordcount extractor starts when the machine comes online. This expects the docker system to be installed.
+
+All you need to do is copy clowder-wordcount.service to /etc/systemd/system and run, edit it to set the parameters for rabbitmq and run the following commands:
+
+```
+systemctl enable clowder-wordcount.service
+systemctl start clowder-wordcount.service
+```
+
+To see the log you can use:
+
+```
+journalctl -f -u clowder-wordcount.service
+```
+
+# Upstart
+
+The example conf file provided in sample-extractors will start the extractor on an Ubuntu system. This assumes that the system is setup for commandline execution. This will make it so the wordcount extractor starts when the system boots up. This extractor can be configured by specifying the same environment variables as using in the docker container. Any of the console output will go into /var/log/upstart/wordcount.log.
diff --git a/single-file-huggingface/extractor_info.json b/single-file-huggingface/extractor_info.json
@@ -0,0 +1,31 @@
+{
+  "@context": "http://clowder.ncsa.illinois.edu/contexts/extractors.jsonld",
+  "name": "ncsa.SINGLEFILE-text-classifier",
+  "version": "2.1",
+  "description": "Huggingface dataset-extractor for sentiment classifier of text file",
+  "author": "Kastan Day",
+  "contributors": [
+    "Luigi Marini"
+  ],
+  "contexts": [
+    {
+      "predictions": "http://example.org"
+    }
+  ],
+  "repository": [
+    {
+      "repType": "git",
+      "repUrl": "https://opensource.ncsa.illinois.edu/stash/scm/cats/pyclowder.git"
+    }
+  ],
+  "process": {
+    "file": [
+      "text/*",
+      "application/json"
+    ]
+  },
+  "max_retry": 1,
+  "external_services": [],
+  "dependencies": [],
+  "bibtex": []
+}
diff --git a/single-file-huggingface/requirements.txt b/single-file-huggingface/requirements.txt
@@ -0,0 +1,8 @@
+pyclowder==3.0.3
+ray[default]==2.7.0
+numpy
+scipy
+pillow
+torch
+torchvision
+transformers