You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
{{ message }}
This repository was archived by the owner on Aug 31, 2021. It is now read-only.
// Load a Dataset[Vegetable]. Notice the @attribute annotation on the case class - we imagine the weight attribute is named with an underscore in DynamoDB.
valavgWeightByColor= vegetableDs.agg($"color", avg($"weightKg")) // The column is called 'weightKg' in the Dataset.
38
+
39
+
40
+
35
41
```
36
42
37
43
## Getting The Dependency
@@ -41,6 +47,10 @@ The library is available from Maven Central. Add the dependency in SBT as ```"co
41
47
Spark is used in the library as a "provided" dependency, which means Spark has to be installed separately on the container where the application is running, such as is the case on AWS EMR.
42
48
43
49
## Parameters
50
+
The following parameters can be set as options on the Spark reader and writer object before loading/saving.
51
+
-`region` sets the region where the dynamodb table. Default is environment specific.
52
+
53
+
44
54
The following parameters can be set as options on the Spark reader object before loading.
45
55
46
56
-`readPartitions` number of partitions to split the initial RDD when loading the data into Spark. Corresponds 1-to-1 with total number of segments in the DynamoDB parallel scan used to load the data. Defaults to `sparkContext.defaultParallelism`
@@ -53,6 +63,7 @@ The following parameters can be set as options on the Spark writer object before
53
63
54
64
-`writePartitions` number of partitions to split the given DataFrame into when writing to DynamoDB. Set to `skip` to avoid repartitioning the DataFrame before writing. Defaults to `sparkContext.defaultParallelism`
55
65
-`writeBatchSize` number of items to send per call to DynamoDB BatchWriteItem. Default 25.
66
+
-`update` if true writes will be using UpdateItem on keys rather than BatchWriteItem. Default false
56
67
57
68
## Running Unit Tests
58
69
The unit tests are dependent on the AWS DynamoDBLocal client, which in turn is dependent on [sqlite4java](https://bitbucket.org/almworks/sqlite4java/src/master/). I had some problems running this on OSX, so I had to put the library directly in the /lib folder, as graciously explained in [this Stack Overflow answer](https://stackoverflow.com/questions/34137043/amazon-dynamodb-local-unknown-error-exception-or-failure/35353377#35353377).
0 commit comments