You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
{{ message }}
This repository was archived by the owner on Aug 31, 2021. It is now read-only.
The library is available from [Maven Central](https://mvnrepository.com/artifact/com.audienceproject/spark-dynamodb). Add the dependency in SBT as ```"com.audienceproject" %% "spark-dynamodb" % "latest"```
27
+
28
+
Spark is used in the library as a "provided" dependency, which means Spark has to be installed separately on the container where the application is running, such as is the case on AWS EMR.
The library is available from [Maven Central](https://mvnrepository.com/artifact/com.audienceproject/spark-dynamodb). Add the dependency in SBT as ```"com.audienceproject" %% "spark-dynamodb" % "latest"```
69
-
70
-
Spark is used in the library as a "provided" dependency, which means Spark has to be installed separately on the container where the application is running, such as is the case on AWS EMR.
71
-
72
80
## Parameters
73
81
The following parameters can be set as options on the Spark reader and writer object before loading/saving.
74
82
-`region` sets the region where the dynamodb table. Default is environment specific.
75
83
-`roleArn` sets an IAM role to assume. This allows for access to a DynamoDB in a different account than the Spark cluster. Defaults to the standard role configuration.
76
84
77
-
78
85
The following parameters can be set as options on the Spark reader object before loading.
79
86
80
-
-`readPartitions` number of partitions to split the initial RDD when loading the data into Spark. Corresponds 1-to-1 with total number of segments in the DynamoDB parallel scan used to load the data. Defaults to `sparkContext.defaultParallelism`
87
+
-`readPartitions` number of partitions to split the initial RDD when loading the data into Spark. Defaults to the size of the DynamoDB table divided into chunks of `maxPartitionBytes`
88
+
-`maxPartitionBytes` the maximum size of a single input partition. Default 128 MB
81
89
-`targetCapacity` fraction of provisioned read capacity on the table (or index) to consume for reading. Default 1 (i.e. 100% capacity).
82
90
-`stronglyConsistentReads` whether or not to use strongly consistent reads. Default false.
83
91
-`bytesPerRCU` number of bytes that can be read per second with a single Read Capacity Unit. Default 4000 (4 KB). This value is multiplied by two when `stronglyConsistentReads=false`
84
92
-`filterPushdown` whether or not to use filter pushdown to DynamoDB on scan requests. Default true.
85
93
-`throughput` the desired read throughput to use. It overwrites any calculation used by the package. It is intended to be used with tables that are on-demand. Defaults to 100 for on-demand.
86
-
-`itemCount` the number of items in the table. This overrides requesting it from the table itself.
87
-
-`tableSize` the number of bytes in the table. This overrides requesting it from the table itself.
88
94
89
95
The following parameters can be set as options on the Spark writer object before saving.
0 commit comments