Skip to content

Commit 9f7f399

Browse files
authored
Reverted formatting to original. Only adds small 'quickstart' section
1 parent f9a29aa commit 9f7f399

File tree

1 file changed

+32
-38
lines changed

1 file changed

+32
-38
lines changed

README.md

Lines changed: 32 additions & 38 deletions
Original file line numberDiff line numberDiff line change
@@ -26,9 +26,7 @@ cd pyclowder
2626
pip install -r requirements.txt
2727
python setup.py install
2828
```
29-
3029
or directly from GitHub:
31-
3230
```
3331
pip install -r https://raw.githubusercontent.com/clowder-framework/pyclowder/master/requirements.txt git+https://github.com/clowder-framework/pyclowder.git
3432
```
@@ -162,7 +160,7 @@ extractor_info.json, and instead bind only by extractor name. Assuming no other
162160
extractor instance will then only be triggered via manual or direct messages (i.e. using extractor name), and not by
163161
upload events in Clowder.
164162

165-
Note however that if any other instances of the extractor are running on the same RabbitMQ queue without --no-bind,
163+
Note however that if any other instances of the extractor are running on the same RabbitMQ queue without --no-bind,
166164
they will still bind by file type as normal regardless of previously existing instances with --no-bind, so use caution
167165
when running multiple instances of one extractor while using --no-bind.
168166

@@ -179,8 +177,8 @@ process_message.
179177
The RabbitMQ connector connects to a RabbitMQ instance, creates a queue and binds itself to that queue. Any message in
180178
the queue will be fetched and passed to the check_message and process_message. This connector takes three parameters:
181179

182-
- rabbitmq_uri [REQUIRED] : the uri of the RabbitMQ server
183-
- rabbitmq_exchange [OPTIONAL] : the exchange to which to bind the queue
180+
* rabbitmq_uri [REQUIRED] : the uri of the RabbitMQ server
181+
* rabbitmq_exchange [OPTIONAL] : the exchange to which to bind the queue
184182

185183
## HPCConnector
186184

@@ -189,18 +187,18 @@ Once all pickle files are processed the extractor will stop. The pickle file is
189187
argument, the logfile that is being monitored to send feedback back to clowder. This connector takes a single argument
190188
(which can be list):
191189

192-
- picklefile [REQUIRED] : a single file, or list of files that are the pickled messages to be processed.
190+
* picklefile [REQUIRED] : a single file, or list of files that are the pickled messages to be processed.
193191

194192
## LocalConnector
195193

196-
The Local connector will execute an extractor as a standalone program. This can be used to process files that are
197-
present in a local hard drive. After extracting the metadata, it stores the generated metadata in an output file in the
194+
The Local connector will execute an extractor as a standalone program. This can be used to process files that are
195+
present in a local hard drive. After extracting the metadata, it stores the generated metadata in an output file in the
198196
local drive. This connector takes two arguments:
199197

200-
- --input-file-path [REQUIRED] : Full path of the local input file that needs to be processed.
201-
- --output-file-path [OPTIONAL] : Full path of the output file (.json) to store the generated metadata. If no output
202-
file path is provided, it will create a new file with the name <input_file_with_extension>.json in the same directory
203-
as that of the input file.
198+
* --input-file-path [REQUIRED] : Full path of the local input file that needs to be processed.
199+
* --output-file-path [OPTIONAL] : Full path of the output file (.json) to store the generated metadata. If no output
200+
file path is provided, it will create a new file with the name <input_file_with_extension>.json in the same directory
201+
as that of the input file.
204202

205203
# Clowder API wrappers
206204

@@ -255,53 +253,49 @@ COPY <MY.CODE>.py extractor_info.json /home/clowder/
255253
# Command to be run when container is run
256254
CMD python3 <MY.CODE>.py
257255
```
258-
259256
## SimpleExtractor
260-
261257
Motivation: design and implement a simple extractor to bridge Python developer and knowledge of PyClowder library. It requires little effort for Python developers to wrap their python code into Clowder's extractors.
262258

263259
Simple extractors take developer defined main function as input parameter to do extraction and then parse and pack extraction's output into Simple extractor defined metadata data-struct and submit back to Clowder.
264260

265261
Users' function must have to return a ``dict'' object containing metdata and previews.
266-
267262
```markdown
268263
result = {
269-
'metadata': {},
270-
'previews': [
271-
'filename',
272-
{'file': 'filename'},
273-
{'file': 'filename', 'metadata': {}, 'mimetype': 'image/jpeg'}
274-
]}
264+
'metadata': {},
265+
'previews': [
266+
'filename',
267+
{'file': 'filename'},
268+
{'file': 'filename', 'metadata': {}, 'mimetype': 'image/jpeg'}
269+
]}
275270
```
276271

277-
### Example:
278-
272+
### Example:
279273
`wordcount-simpleextractor` is the simplest example to illustrate how to wrap existing Python code as a Simple Extractor.
280274

281275
wordcount.py is regular python file which is defined and provided by Python developers. In the code, wordcount invoke `wc` command to process input file to extract lines, words, characters. It packs metadata into python dict.
282-
283276
```markdown
284277
import subprocess
285-
286-
def wordcount(input*file):
287-
result = subprocess.check_output(['wc', input_file], stderr=subprocess.STDOUT)
288-
(lines, words, characters, *) = result.split()
289-
metadata = {
290-
'lines': lines,
291-
'words': words,
292-
'characters': characters
293-
}
294-
result = {
295-
'metadata': metadata
296-
}
297-
return result
278+
279+
def wordcount(input_file):
280+
result = subprocess.check_output(['wc', input_file], stderr=subprocess.STDOUT)
281+
(lines, words, characters, _) = result.split()
282+
metadata = {
283+
'lines': lines,
284+
'words': words,
285+
'characters': characters
286+
}
287+
result = {
288+
'metadata': metadata
289+
}
290+
return result
298291
```
299292

300293
To build wordcount as a Simpel extractor docker image, users just simply assign two environment variables in Dockerfile shown below. EXTRACTION_FUNC is environment variable and has to be assigned as extraction function, where in wordcount.py, the extraction function is `wordcount`. Environment variable EXTRACTION_MODULE is the name of module file containing the definition of extraction function.
301-
302294
```markdown
303295
FROM clowder/extractors-simple-extractor:onbuild
304296

305297
ENV EXTRACTION_FUNC="wordcount"
306298
ENV EXTRACTION_MODULE="wordcount"
307299
```
300+
301+

0 commit comments

Comments
 (0)