You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
|**Cassandra Connection Provider**| no || Controller service for connecting to a specific Keyspace engine|
217
-
|**NGSI version**| v2 || list of supported version of NGSI (v2 and ld), currently only support v2|
218
-
|**Data Model**| db-by-entity || The Data model for creating the Columns when an event have been received you can choose between: db-by-service-path or db-by-entity, default value is db-by-service-path|
219
-
|**Attribute persistence**| row | row, column | The mode of storing the data inside of the Column allowable values are row and column|
220
-
| Default Service | test || In case you dont set the Fiware-Service header in the context broker, this value will be used as Fiware-Service|
221
-
| Default Service path | /path || In case you dont set the Fiware-ServicePath header in the context broker, this value will be used as Fiware-ServicePath|
222
-
| Enable encoding | true | true, false | true applies the new encoding, false applies the old encoding.|
223
-
| Enable lowercase | true | true, false | true for creating the Schema and Columns name with lowercase.|
224
-
|**Batch size**| 10 || The preferred number of FlowFiles to put to the Keyspace in a single transaction|
225
-
| Consistency Level | Serial | Serial, Local_serial | The strategy for how many replicas must respond before results are returned. |
226
-
| Batch Statement Type | Serial | Logged, Unlogged, Counter| Specifies the type of 'Batch Statement' to be used. |
218
+
| Name | Default Value | Allowable Values | Description|
|**Cassandra Connection Provider**| no || Controller service for connecting to a specific Keyspace engine |
221
+
|**NGSI version**| v2 || list of supported version of NGSI (v2 and ld), currently only support v2 |
222
+
|**Data Model**| db-by-entity || The Data model for creating the Columns when an event have been received you can choose between: db-by-service-path or db-by-entity, default value is db-by-service-path |
223
+
|**Attribute persistence**| row | row, column | The mode of storing the data inside of the Column allowable values are row and column |
224
+
| Default Service | test || In case you dont set the Fiware-Service header in the context broker, this value will be used as Fiware-Service |
225
+
| Default Service path | /path || In case you dont set the Fiware-ServicePath header in the context broker, this value will be used as Fiware-ServicePath |
226
+
| Enable encoding | true | true, false | true applies the new encoding, false applies the old encoding. |
227
+
| Enable lowercase | true | true, false | true for creating the Schema and Columns name with lowercase. |
228
+
|**Batch size**| 10 || The preferred number of FlowFiles to put to the Keyspace in a single transaction |
229
+
| Consistency Level | Serial | Serial, Local_serial | The strategy for how many replicas must respond before results are returned.|
230
+
| Batch Statement Type | Serial | Logged, Unlogged, Counter| Specifies the type of 'Batch Statement' to be used.|
227
231
228
232
A configuration example could be:
229
233
@@ -239,8 +243,8 @@ Use `NGSIToCassandra` if you are looking for a Keyspace storage not growing so m
239
243
240
244
The Column type configuration parameter, as seen, is a method for <i>direct</i> aggregation of data: by <i>default</i>
241
245
destination (i.e. all the notifications about the same entity will be stored within the same Cassandra Column) or by
242
-
<i>default</i> service-path (i.e. all the notifications about the same service-path will be stored within the same Cassandra
243
-
Column).
246
+
<i>default</i> service-path (i.e. all the notifications about the same service-path will be stored within the same
247
+
Cassandra Column).
244
248
245
249
#### About the persistence mode
246
250
@@ -263,13 +267,14 @@ deal with the persistence details of such a batch of events in the final backend
263
267
264
268
What is important regarding the batch mechanism is it largely increases the performance of the sink, because the number
265
269
of writes is dramatically reduced. Let's see an example, let's assume a batch of 100 `NGSIEvent`s. In the best case, all
266
-
these events regard to the same entity, which means all the data within them will be persisted in the same Cassandra Column.
267
-
If processing the events one by one, we would need 100 inserts into Cassandra; nevertheless, in this example only one insert
268
-
is required. Obviously, not all the events will always regard to the same unique entity, and many entities may be
269
-
involved within a batch. But that's not a problem, since several sub-batches of events are created within a batch, one
270
-
sub-batch per final destination Cassandra Column. In the worst case, the whole 100 entities will be about 100 different
271
-
entities (100 different Cassandra Columns), but that will not be the usual scenario. Thus, assuming a realistic number of
272
-
10-15 sub-batches per batch, we are replacing the 100 inserts of the event by event approach with only 10-15 inserts.
270
+
these events regard to the same entity, which means all the data within them will be persisted in the same Cassandra
271
+
Column. If processing the events one by one, we would need 100 inserts into Cassandra; nevertheless, in this example
272
+
only one insert is required. Obviously, not all the events will always regard to the same unique entity, and many
273
+
entities may be involved within a batch. But that's not a problem, since several sub-batches of events are created
274
+
within a batch, one sub-batch per final destination Cassandra Column. In the worst case, the whole 100 entities will be
275
+
about 100 different entities (100 different Cassandra Columns), but that will not be the usual scenario. Thus, assuming
276
+
a realistic number of 10-15 sub-batches per batch, we are replacing the 100 inserts of the event by event approach with
277
+
only 10-15 inserts.
273
278
274
279
The batch mechanism adds an accumulation timeout to prevent the sink stays in an eternal state of batch building when no
275
280
new data arrives. If such a timeout is reached, then the batch is persisted as it is.
@@ -280,17 +285,17 @@ retry intervals can be configured. Such a list defines the first retry interval,
280
285
on; if the TTL is greater than the length of the list, then the last retry interval is repeated as many times as
281
286
necessary.
282
287
283
-
By default, `NGSIToCassandra` has a configured batch size and batch accumulation timeout of 1 and 30 seconds, respectively.
284
-
Nevertheless, as explained above, it is highly recommended to increase at least the batch size for performance purposes.
285
-
Which are the optimal values? The size of the batch it is closely related to the transaction size of the channel the
286
-
events are got from (it has no sense the first one is greater then the second one), and it depends on the number of
287
-
estimated sub-batches as well. The accumulation timeout will depend on how often you want to see new data in the final
288
-
storage.
288
+
By default, `NGSIToCassandra` has a configured batch size and batch accumulation timeout of 1 and 30 seconds,
289
+
respectively. Nevertheless, as explained above, it is highly recommended to increase at least the batch size for
290
+
performance purposes. Which are the optimal values? The size of the batch it is closely related to the transaction size
291
+
of the channel the events are got from (it has no sense the first one is greater then the second one), and it depends on
292
+
the number of estimated sub-batches as well. The accumulation timeout will depend on how often you want to see new data
293
+
in the final storage.
289
294
290
295
#### Time zone information
291
296
292
-
Time zone information is not added in Cassandra timestamps since Cassandra stores that information as a environment variable.
293
-
Cassandra timestamps are stored in UTC time.
297
+
Time zone information is not added in Cassandra timestamps since Cassandra stores that information as a environment
298
+
variable. Cassandra timestamps are stored in UTC time.
0 commit comments