Added quickstart example with new, isolated, Dockerfile

KastanDay · KastanDay · commit f9a29aa78061 · 2022-01-27T21:52:05.000-06:00
diff --git a/README.md b/README.md
@@ -25,13 +25,18 @@ git clone https://github.com/clowder-framework/pyclowder.git
 cd pyclowder
 pip install -r requirements.txt
 python setup.py install
-
 ```
+
 or directly from GitHub:
+
 ```
 pip install -r https://raw.githubusercontent.com/clowder-framework/pyclowder/master/requirements.txt git+https://github.com/clowder-framework/pyclowder.git
 ```
 
+## Quickstart example
+
+See the [README](https://github.com/clowder-framework/pyclowder/tree/master/sample-extractors/wordcount#readme) in `sample-extractors/wordcount`. Using Docker, no install is required.
+
 ## Example Extractor
 
 Following is an example of the WordCount extractor. This example will allow the user to specify from the command line
@@ -157,7 +162,7 @@ extractor_info.json, and instead bind only by extractor name. Assuming no other
 extractor instance will then only be triggered via manual or direct messages (i.e. using extractor name), and not by
 upload events in Clowder.
 
-Note however that if any other instances of the extractor are running on the same RabbitMQ queue without --no-bind, 
+Note however that if any other instances of the extractor are running on the same RabbitMQ queue without --no-bind,
 they will still bind by file type as normal regardless of previously existing instances with --no-bind, so use caution
 when running multiple instances of one extractor while using --no-bind.
 
@@ -174,8 +179,8 @@ process_message.
 The RabbitMQ connector connects to a RabbitMQ instance, creates a queue and binds itself to that queue. Any message in
 the queue will be fetched and passed to the check_message and process_message. This connector takes three parameters:
 
-* rabbitmq_uri [REQUIRED] : the uri of the RabbitMQ server
-* rabbitmq_exchange [OPTIONAL] : the exchange to which to bind the queue
+- rabbitmq_uri [REQUIRED] : the uri of the RabbitMQ server
+- rabbitmq_exchange [OPTIONAL] : the exchange to which to bind the queue
 
 ## HPCConnector
 
@@ -184,18 +189,18 @@ Once all pickle files are processed the extractor will stop. The pickle file is
 argument, the logfile that is being monitored to send feedback back to clowder. This connector takes a single argument
 (which can be list):
 
-* picklefile [REQUIRED] : a single file, or list of files that are the pickled messages to be processed.
+- picklefile [REQUIRED] : a single file, or list of files that are the pickled messages to be processed.
 
 ## LocalConnector
 
-The Local connector will execute an extractor as a standalone program. This can be used to process files that are 
-present in a local hard drive. After extracting the metadata, it stores the generated metadata in an output file in the 
+The Local connector will execute an extractor as a standalone program. This can be used to process files that are
+present in a local hard drive. After extracting the metadata, it stores the generated metadata in an output file in the
 local drive. This connector takes two arguments:
 
-* --input-file-path [REQUIRED] : Full path of the local input file that needs to be processed.
-* --output-file-path [OPTIONAL] : Full path of the output file (.json) to store the generated metadata. If no output 
-file path is provided, it will create a new file with the name <input_file_with_extension>.json in the same directory 
-as that of the input file.
+- --input-file-path [REQUIRED] : Full path of the local input file that needs to be processed.
+- --output-file-path [OPTIONAL] : Full path of the output file (.json) to store the generated metadata. If no output
+  file path is provided, it will create a new file with the name <input_file_with_extension>.json in the same directory
+  as that of the input file.
 
 # Clowder API wrappers
 
@@ -250,49 +255,53 @@ COPY <MY.CODE>.py extractor_info.json /home/clowder/
 # Command to be run when container is run
 CMD python3 <MY.CODE>.py
 ```
+
 ## SimpleExtractor
+
 Motivation: design and implement a simple extractor to bridge Python developer and knowledge of PyClowder library. It requires little effort for Python developers to wrap their python code into Clowder's extractors.
 
 Simple extractors take developer defined main function as input parameter to do extraction and then parse and pack extraction's output into Simple extractor defined metadata data-struct and submit back to Clowder.
 
 Users' function must have to return a ``dict'' object containing metdata and previews.
+
 ```markdown
 result = {
-  'metadata': {},
-  'previews': [
-      'filename',
-      {'file': 'filename'},
-      {'file': 'filename', 'metadata': {}, 'mimetype': 'image/jpeg'}
-  ]}
+'metadata': {},
+'previews': [
+'filename',
+{'file': 'filename'},
+{'file': 'filename', 'metadata': {}, 'mimetype': 'image/jpeg'}
+]}
 ```
 
-### Example: 
+### Example:
+
 `wordcount-simpleextractor` is the simplest example to illustrate how to wrap existing Python code as a Simple Extractor.
 
 wordcount.py is regular python file which is defined and provided by Python developers. In the code, wordcount invoke `wc` command to process input file to extract lines, words, characters. It packs metadata into python dict.
+
 ```markdown
 import subprocess
-  
-def wordcount(input_file):
-    result = subprocess.check_output(['wc', input_file], stderr=subprocess.STDOUT)
-    (lines, words, characters, _) = result.split()
-    metadata = {
-        'lines': lines,
-        'words': words,
-        'characters': characters
-    }
-    result = {
-        'metadata': metadata
-    }
-    return result
+
+def wordcount(input*file):
+result = subprocess.check_output(['wc', input_file], stderr=subprocess.STDOUT)
+(lines, words, characters, *) = result.split()
+metadata = {
+'lines': lines,
+'words': words,
+'characters': characters
+}
+result = {
+'metadata': metadata
+}
+return result
 ```
 
 To build wordcount as a Simpel extractor docker image, users just simply assign two environment variables in Dockerfile shown below. EXTRACTION_FUNC is environment variable and has to be assigned as extraction function, where in wordcount.py, the extraction function is `wordcount`. Environment variable EXTRACTION_MODULE is the name of module file containing the definition of extraction function.
+
 ```markdown
 FROM clowder/extractors-simple-extractor:onbuild
 
 ENV EXTRACTION_FUNC="wordcount"
 ENV EXTRACTION_MODULE="wordcount"
 ```
-
-
diff --git a/sample-extractors/wordcount/Dockerfile b/sample-extractors/wordcount/Dockerfile
@@ -1,4 +1,8 @@
-ARG PYCLOWDER_PYTHON=""
-FROM clowder/pyclowder${PYCLOWDER_PYTHON}:onbuild
+FROM python:3.8
 
-ENV MAIN_SCRIPT="wordcount.py"
+WORKDIR /extractor
+COPY requirements.txt ./
+RUN pip install -r requirements.txt
+
+COPY wordcount.py extractor_info.json ./
+CMD python wordcount.py
diff --git a/sample-extractors/wordcount/README.md b/sample-extractors/wordcount/README.md
@@ -2,20 +2,38 @@ A simple extractor that counts the number of characters, words and lines in a te
 
 # Docker
 
-This extractor is ready to be run as a docker container. To build the docker container run:
+This extractor is ready to be run as a docker container, the only dependency is a running Clowder instance. Simply build and run.
+
+1. Start Clowder. For help starting Clowder, see our [getting started guide](https://github.com/clowder-framework/clowder/blob/develop/doc/src/sphinx/userguide/installing_clowder.rst).
+
+2. First build the extractor Docker container:
 
 ```
+# from this directory, run:
+
 docker build -t clowder_wordcount .
 ```
 
-To run the docker containers use:
+3. Finally run the extractor:
 
 ```
-docker run -t -i --rm -e "RABBITMQ_URI=amqp://rabbitmqserver/clowder" clowder_wordcount
-docker run -t -i --rm --link clowder_rabbitmq_1:rabbitmq clowder_wordcount
+docker run -t -i --rm --net clowder_clowder -e "RABBITMQ_URI=amqp://guest:guest@rabbitmq:5672/%2f" --name "wordcount" clowder_wordcount
 ```
 
-The RABBITMQ_URI and RABBITMQ_EXCHANGE environment variables can be used to control what RabbitMQ server and exchange it will bind itself to, you can also use the --link option to link the extractor to a RabbitMQ container.
+Then open the Clowder web app and run the wordcount extractor on a .txt file (or similar)! Done.
+
+### Details
+
+- `--net` links the extractor to the Clowder Docker network (run `docker network ls` to identify your own.)
+- `-e RABBITMQ_URI=` sets the environment variables can be used to control what RabbitMQ server and exchange it will bind itself to. Setting the `RABBITMQ_EXCHANGE` may also help.
+  - You can also use `--link` to link the extractor to a RabbitMQ container.
+- `--name` assigns the container a name visible in Docker Desktop.
+
+## Troubleshooting
+
+**If you run into _any_ trouble**, please reach out on our Clowder Slack in the [#pyclowder channel](https://clowder-software.slack.com/archives/CNC2UVBCP).
+
+Alternate methods of running extractors are below.
 
 # Commandline Execution
 
diff --git a/sample-extractors/wordcount/requirements.txt b/sample-extractors/wordcount/requirements.txt
@@ -0,0 +1 @@
+pyclowder==2.4.0