Skip to content
This repository was archived by the owner on Aug 31, 2021. It is now read-only.

Commit 9c25309

Browse files
Merge pull request #31 from anish749/patch-1
Add quick start in python
2 parents 87857df + fc8bec7 commit 9c25309

File tree

1 file changed

+23
-1
lines changed

1 file changed

+23
-1
lines changed

README.md

Lines changed: 23 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -17,6 +17,7 @@ https://www.audienceproject.com/blog/tech/sparkdynamodb-using-aws-dynamodb-data-
1717

1818
## Quick Start Guide
1919

20+
### Scala
2021
```scala
2122
import com.audienceproject.spark.dynamodb.implicits._
2223

@@ -40,9 +41,30 @@ val vegetableDs = spark.read.dynamodbAs[Vegetable]("VegeTable")
4041
val avgWeightByColor = vegetableDs.agg($"color", avg($"weightKg")) // The column is called 'weightKg' in the Dataset.
4142
```
4243

44+
### Python
45+
```python
46+
# Load a DataFrame from a Dynamo table. Only incurs the cost of a single scan for schema inference.
47+
dynamoDf = spark.read.format("com.audienceproject.spark.dynamodb") \
48+
.option("tableName", "SomeTableName") \
49+
.load() # <-- DataFrame of Row objects with inferred schema.
50+
51+
# Scan the table for the first 100 items (the order is arbitrary) and print them.
52+
dynamoDf.show(100)
53+
54+
# write to some other table overwriting existing item with same keys
55+
dynamoDf.write.format("com.audienceproject.spark.dynamodb") \
56+
.option("tableName", "SomeOtherTable")
57+
```
58+
59+
*Note:* When running from `pyspark` shell, you can add the library as:
60+
```bash
61+
pyspark --packages com.audienceproject:spark-dynamodb_<spark-scala-version>:<version>
62+
```
63+
64+
4365
## Getting The Dependency
4466

45-
The library is available from Maven Central. Add the dependency in SBT as ```"com.audienceproject" %% "spark-dynamodb" % "latest"```
67+
The library is available from [Maven Central](https://mvnrepository.com/artifact/com.audienceproject/spark-dynamodb). Add the dependency in SBT as ```"com.audienceproject" %% "spark-dynamodb" % "latest"```
4668

4769
Spark is used in the library as a "provided" dependency, which means Spark has to be installed separately on the container where the application is running, such as is the case on AWS EMR.
4870

0 commit comments

Comments
 (0)