You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
|ibm_cf| endpoint ||yes | IBM Cloud Functions endpoint from [here](https://cloud.ibm.com/docs/openwhisk?topic=cloud-functions-cloudfunctions_regions#cloud-functions-endpoints). Make sure to use https:// prefix |
119
+
|ibm_cf| endpoint ||yes | IBM Cloud Functions endpoint from [here](https://cloud.ibm.com/docs/openwhisk?topic=cloud-functions-cloudfunctions_regions#cloud-functions-endpoints). Make sure to use https:// prefix, for example: https://us-east.functions.cloud.ibm.com|
120
120
|ibm_cf| namespace ||yes | Value of CURRENT NAMESPACE from [here](https://cloud.ibm.com/functions/namespace-settings)|
121
121
|ibm_cf| api_key || no |**Mandatory** if using Cloud Foundry-based namespace. Value of 'KEY' from [here](https://cloud.ibm.com/functions/namespace-settings)|
122
122
|ibm_cf| namespace_id ||no |**Mandatory** if using IAM-based namespace with IAM API Key. Value of 'GUID' from [here](https://cloud.ibm.com/functions/namespace-settings)|
Copy file name to clipboardExpand all lines: docs/data-processing.md
+45-56Lines changed: 45 additions & 56 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -6,80 +6,71 @@ Additionally, the built-in data-processing logic integrates a **data partitioner
6
6
7
7
8
8
## Processing data from IBM Cloud Object Storage
9
-
The input to the partitioner may be either a list of data objects, a list of URLs or the entire bucket itself. The partitioner is activated inside PyWren and it responsible to split the objects into smaller chunks. It executes one *`my_map_function`*for each object chunkand when all executions are completed, the partitioner executes the *`my_reduce_function`*. The reduce function will wait for all the partial results before processing them.
9
+
This mode is activated when you write the parameter **obj** into the function arguments. The input to the partitioner may be either a list of buckets, a list of buckets with object prefix, or a list of data objects. If you set the *size of the chunk*or the *number of chunks*, the partitioner is activated inside PyWren and it is responsible to split the objects into smaller chunks, eventually running one function activation for each generated chunk. If *size of the chunk*and *number of chunks* are not set, chunk is an entire object, so one function activation is executed for each individual object. For example consider the following function:
10
10
11
11
12
-
#### Partitioner get a list of objects
12
+
The *obj* parameter is a python class from where you can access all the information related to the object (or chunk) that the function is processing. For example, consider the following function that shows all the available attributes in *obj*:
Notice that *iterdata* must be only one of the previous 3 types. Intermingled types are not allowed. For example, you cannot setin the same *iterdata*list a bucket and some object keys:
|`pw.map_reduce`(`my_map_function`, `iterdata`, `my_reduce_function`, `chunk_size`)|`iterdata` contains list of objects in the format of `bucket_name/object_name`|
39
-
|`my_map_function`(`obj`) |`obj` is a Python class that contains the *bucket*, *key* and *data_stream* of the object assigned to the activation|
40
-
41
-
#### Partitioner gets entire bucket
49
+
Once iterdata is defined, you can execute PyWren as usual, either using *map()* or **map_reduce()* calls. If you need to split the files in smaller chunks, you can set (optionally) the *chunk_size* or *chunk_n* parameters.
42
50
43
-
Commonly, a dataset may contains hundreds or thousands of files, so the previous approach where you have to specify each object one by one is not well suited in this case. With this new `map_reduce()` method you can specify, instead, the bucket name which contains all the object of the dataset.
* If `chunk_size=None` then partitioner's granularity is a single object.
68
-
69
-
| method | method signature |
70
-
|---| ---|
71
-
|`pw.map_reduce`(`my_map_function`, `bucket_name`, `my_reduce_function`, `chunk_size`)|`bucket_name` contains the name of the bucket |
72
-
|`my_map_function`(`obj`, `ibm_cos`) |`obj` is a Python class that contains the *bucket*, *key* and *data_stream* of the object assigned to the activation. `ibm_cos` is an optional parameter which provides a `ibm_boto3.Client()`|
73
-
74
-
75
61
## Processing data from public URLs
62
+
This mode is activated when you write the parameter **url** into the function arguments. The input to the partitioner must be a list of object URls. As with COS data processing, if you set the *size of the chunk* or the *number of chunks*, the partitioner is activated inside PyWren and it is responsible to split the objects into smaller chunks, as long as the remote storage server allows requests in chunks (ranges). If range requests are not allowed in the remote storage server, each URL is treated as a single object. For example consider the following code that shows all the available attributes in *url*:
|`pw.map_reduce`(`my_map_function`, `iterdata`, `my_reduce_function`, `chunk_size`)|`iterdata` contains list of objects in the format of `http://myurl/myobject.data`|
102
-
|`my_map_function`(`url`) |`url` is an object Pytnon class that contains the url *path* assigned to the activation (an entry of iterdata) and the *data_stream*|
91
+
See a complete example in [map_reduce_url.py](../examples/map_reduce_url.py).
92
+
103
93
104
94
## Reducer granularity
105
-
By default there will be one reducer for all the objects. If you need one reducer for each object, you must set the parameter
106
-
`reducer_one_per_object=True` into the **map_reduce()** method.
95
+
By default there will be one reducer for all the object chunks. If you need one reducer for each object, you must set the parameter
96
+
`reducer_one_per_object=True` into the *map()* or *map_reduce()* methods.
Copy file name to clipboardExpand all lines: docs/knative.md
+3-4Lines changed: 3 additions & 4 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -4,14 +4,14 @@ The easiest way to make it working is to create an IBM Kubernetes (IKS) cluster
4
4
- Install Kubernetes v1.15.3
5
5
- Select a **single zone** to place the worker nodes
6
6
-*Master service endpoint*: Public endpoint only
7
-
-You must create a cluster with at least 3 worker nodes, each one with a minimum flavor of 4vCPU and 16GB RAM
7
+
-Your cluster must have 3 or more worker nodeswith at least 4 cores and 16GB RAM.
8
8
- No need to encrypt local disk
9
9
10
10
Once the cluster is running, follow the instructions of the "Access" tab to configure the *kubectl* client in your local machine. Then, follow one of this two options to install the PyWren environment:
11
11
12
12
- Option 1 (IBM IKS):
13
13
14
-
1. In the Dashboard of your cluster, go to the "Add-ons" tab and install knative v0.8.0. It automatically installs Istio v1.2.5 and Tekton v0.3.1.
14
+
1. In the Dashboard of your cluster, go to the "Add-ons" tab and install knative v0.8.0. It automatically installs Istio v1.3.0 and Tekton v0.3.1.
15
15
16
16
17
17
- Option 2 (IBM IKS or any other Kubernetes Cluster):
0 commit comments