You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: docs/data-processing.md
+7-9Lines changed: 7 additions & 9 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -6,8 +6,7 @@ Additionally, the built-in data-processing logic integrates a **data partitioner
6
6
7
7
8
8
## Processing data from IBM Cloud Object Storage
9
-
This mode is activated when you write the parameter **obj** into the function arguments. The input to the partitioner may be either a list of buckets, a list of buckets with object prefix, or a list of data objects. If you set the *size of the chunk* or the *number of chunks*, the partitioner is activated inside PyWren and it is responsible to split the objects into smaller chunks, eventually running one function activation for each generated chunk. If *size of the chunk* and *number of chunks* are not set, chunk is an entire object, so one function activation is executed for each individual object. For example consider the following function:
10
-
9
+
This mode is activated when you write the parameter **obj** into the function arguments. The input to the partitioner may be either a list of buckets, a list of buckets with object prefix, or a list of data objects. If you set the *size of the chunk* or the *number of chunks*, the partitioner is activated inside PyWren and it is responsible to split the objects into smaller chunks, eventually running one function activation for each generated chunk. If *size of the chunk* and *number of chunks* are not set, chunk is an entire object, so one function activation is executed for each individual object.
11
10
12
11
The *obj* parameter is a python class from where you can access all the information related to the object (or chunk) that the function is processing. For example, consider the following function that shows all the available attributes in *obj*:
13
12
@@ -24,26 +23,26 @@ def my_map_function(obj):
24
23
25
24
As stated above, the allowed inputs of the function can be:
26
25
27
-
- Input data is a bucket or a list of buckets. See a complete example in [map_reduce_cos_bucket.py](../examples/map_reduce_cos_bucket.py):
26
+
- Input data is a bucket or a list of buckets. See an example in [map_reduce_cos_bucket.py](../examples/map_reduce_cos_bucket.py):
28
27
```python
29
28
iterdata ='cos://bucket1'
30
29
```
31
30
32
-
- Input data is a bucket(s) withobject prefix. See a complete example in [map_cos_prefix.py](../examples/map_cos_prefix.py):
31
+
- Input data is a bucket(s) withobject prefix. See an example in [map_cos_prefix.py](../examples/map_cos_prefix.py):
Notice that *iterdata* must be only one of the previous 3 types. Intermingled types are not allowed. For example, you cannot setin the same *iterdata*lista bucket and some object keys:
42
+
Notice that *iterdata* must be only one of the previous 3 types. Intermingled types are not allowed. For example, you cannot setin the same *iterdata* a bucket and some object keys:
iterdata = ['cos://bucket1', 'cos://bucket1/object2', 'cos://bucket1/object3'] # Not allowed
47
46
```
48
47
49
48
Once iterdata is defined, you can execute PyWren as usual, either using *map()* or **map_reduce()* calls. If you need to split the files in smaller chunks, you can set (optionally) the *chunk_size* or *chunk_n* parameters.
@@ -92,8 +91,7 @@ See a complete example in [map_reduce_url.py](../examples/map_reduce_url.py).
92
91
93
92
94
93
## Reducer granularity
95
-
By default there will be one reducer for all the object chunks. If you need one reducer for each object, you must set the parameter
96
-
`reducer_one_per_object=True` into the *map()* or *map_reduce()* methods.
94
+
When using the `map_reduce()` API call with `chunk_size` or `chunk_n`, by default there will be only one reducer for all the object chunks from all the objects. Alternatively, you can spawn one reducer for each object by setting the parameter `reducer_one_per_object=True`.
0 commit comments