Skip to content

Commit edabd4a

Browse files
committed
1. Moved simple extractor out of pyclowder as part of sample extractors.
2. Started adding README for simple extractor 3. Updated docker file for simple extractor
1 parent 1d79563 commit edabd4a

File tree

9 files changed

+49
-5
lines changed

9 files changed

+49
-5
lines changed

docker.sh

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -10,6 +10,7 @@ export DEBUG=${DEBUG:-""}
1010
${DEBUG} docker build --tag clowder/pyclowder:latest .
1111
${DEBUG} docker build --tag clowder/pyclowder:onbuild --file Dockerfile.onbuild .
1212
${DEBUG} docker build --tag clowder/extractors-binary-preview:onbuild sample-extractors/binary-preview
13+
${DEBUG} docker build --tag clowder/extractors-simple-extractor:latest sample-extractors/simple-extractor
1314

1415
# build sample extractors
1516
${DEBUG} docker build --tag clowder/extractors-wordcount:latest sample-extractors/wordcount
Lines changed: 6 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,6 @@
1+
FROM clowder/pyclowder:onbuild
2+
3+
ENV EXTRACTION_FUNC=""
4+
ENV EXTRACTION_MODULE=""
5+
6+
CMD python -c "from simple_extractor import SimpleExtractor; from EXTRACTION_MODULE import *; SimpleExtractor(${EXTRACTION_FUNC}).start()"
Lines changed: 38 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,38 @@
1+
# Simple Extractor
2+
3+
The goal of the simple extractor is to make writing of an extractor as easy as possible. It wraps almost all of the
4+
complexities in itself and exposes only one environment variable called ```EXTRACTION_FUNC```. This environment
5+
variable needs to contain the name of the method that needs to be called when this extractor receives a message from
6+
the message broker.
7+
8+
# When to Use This
9+
10+
1. This simple extractor is meant to be used in those situations when there is already some Python code available that
11+
needs to be wrapped as an extractor as quickly as possible.
12+
2. This extractor ONLY generates JSON format metadata or a list of preview files. If your extractor generates
13+
any additional information like generated files, datasets, collections, thumbnails, etc., this method cannot be use and
14+
you need to write your extractor the normal way using [PyClowder2](https://opensource.ncsa.illinois.edu/bitbucket/projects/CATS/repos/pyclowder2/browse)
15+
3. [Docker](https://www.docker.com/) is the recommended way of developing / wrapping your code using the Simple Extractor.
16+
17+
## Steps for Writing an Extractor Using the Simple Extractor
18+
19+
To write an extractor using the Simple Extractor, you need to have your Python program available. The main function of
20+
this Python program needs to accept an input file path as its parameter. It needs to return a Python dictionary that
21+
can contain either metadata information ("metadata"), details about file previews ("previews") or both. For example:
22+
23+
``` json
24+
{
25+
"metadata": dict(),
26+
"previews": array()
27+
}
28+
```
29+
30+
1. Let's call your main Python program file ```your_python_program.py``` and the main function ```your_main_function```.
31+
32+
2. Let's create a Dockerfile for your extractor. Its contents need to be:
33+
34+
FROM clowder/extractors-simple-extractor:latest
35+
ENV EXTRACTION_FUNC="your_python_program.your_main_function"
36+
37+
TODO: Complete this.
38+
Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1 @@
1+

pyclowder/simpleextractor.py renamed to sample-extractors/simple-extractor/simple_extractor.py

File renamed without changes.
Lines changed: 3 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,3 @@
1+
FROM clowder/extractors-simple-extractor:latest
2+
3+
ENV EXTRACTION_FUNC="wordcount"

sample-extractors/wordcount-simpleextractor/extractor_info.json renamed to sample-extractors/wordcount-simple-extractor/extractor_info.json

File renamed without changes.

sample-extractors/wordcount-simpleextractor/wordcount.py renamed to sample-extractors/wordcount-simple-extractor/wordcount.py

File renamed without changes.

sample-extractors/wordcount-simpleextractor/Dockerfile

Lines changed: 0 additions & 5 deletions
This file was deleted.

0 commit comments

Comments
 (0)