You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Motivation: design and implement a simple extractor to bridge Python developer and knowledge of PyClowder library. It requires little effort for Python developers to wrap their python code into Clowder's extractors.
262
258
263
259
Simple extractors take developer defined main function as input parameter to do extraction and then parse and pack extraction's output into Simple extractor defined metadata data-struct and submit back to Clowder.
264
260
265
261
Users' function must have to return a ``dict'' object containing metdata and previews.
`wordcount-simpleextractor` is the simplest example to illustrate how to wrap existing Python code as a Simple Extractor.
280
274
281
275
wordcount.py is regular python file which is defined and provided by Python developers. In the code, wordcount invoke `wc` command to process input file to extract lines, words, characters. It packs metadata into python dict.
282
-
283
276
```markdown
284
277
import subprocess
285
-
286
-
def wordcount(input*file):
287
-
result = subprocess.check_output(['wc', input_file], stderr=subprocess.STDOUT)
288
-
(lines, words, characters, *) = result.split()
289
-
metadata = {
290
-
'lines': lines,
291
-
'words': words,
292
-
'characters': characters
293
-
}
294
-
result = {
295
-
'metadata': metadata
296
-
}
297
-
return result
278
+
279
+
def wordcount(input_file):
280
+
result = subprocess.check_output(['wc', input_file], stderr=subprocess.STDOUT)
281
+
(lines, words, characters, _) = result.split()
282
+
metadata = {
283
+
'lines': lines,
284
+
'words': words,
285
+
'characters': characters
286
+
}
287
+
result = {
288
+
'metadata': metadata
289
+
}
290
+
return result
298
291
```
299
292
300
293
To build wordcount as a Simpel extractor docker image, users just simply assign two environment variables in Dockerfile shown below. EXTRACTION_FUNC is environment variable and has to be assigned as extraction function, where in wordcount.py, the extraction function is `wordcount`. Environment variable EXTRACTION_MODULE is the name of module file containing the definition of extraction function.
0 commit comments