|
| 1 | +--- |
| 2 | +title: Query Cosmos DB analytical with Synapse Spark |
| 3 | +description: How to query Cosmos DB analytical with Synapse Spark |
| 4 | +services: synapse-analytics |
| 5 | +author: ArnoMicrosoft |
| 6 | +ms.service: synapse-analytics |
| 7 | +ms.topic: quickstart |
| 8 | +ms.subservice: |
| 9 | +ms.date: 05/06/2020 |
| 10 | +ms.author: acomet |
| 11 | +ms.reviewer: jrasnick |
| 12 | +--- |
| 13 | + |
| 14 | +# Query Cosmos DB analytical with Synapse Spark |
| 15 | + |
| 16 | +This article gives some examples on how you can interact with the analytical store from Synapse gestures. Those gestures are visible when you right-click on a container. With gestures, you can quickly generate code and tweak it to your needs. They are also perfect for discovering data with a single click. |
| 17 | + |
| 18 | +## Load to DataFrame |
| 19 | + |
| 20 | +In this step, you will read data from Azure Cosmos DB analytical store in a Spark DataFrame. It will display 10 rows from the DataFrame called ***df***. Once your data is into dataframe, you can perform additional analysis. This operation does not impact the transactional store. |
| 21 | + |
| 22 | +```python |
| 23 | +# To select a preferred list of regions in a multi-region Cosmos DB account, add .option("spark.cosmos.preferredRegions", "<Region1>,<Region2>") |
| 24 | + |
| 25 | +df = spark.read.format("cosmos.olap")\ |
| 26 | + .option("spark.synapse.linkedService", "INFERRED")\ |
| 27 | + .option("spark.cosmos.container", "INFERRED")\ |
| 28 | + .load() |
| 29 | + |
| 30 | +df.show(10) |
| 31 | +``` |
| 32 | + |
| 33 | +## Create Spark table |
| 34 | + |
| 35 | +In this gesture, you will create a Spark table pointing to the container you selected. That operation does not incur any data movement. If you decide to delete that table, the underlying container (and corresponding analytical store) won't be impacted. This scenario is convenient to reuse tables through third-party tools and provide accessibility to the data for the run-time. |
| 36 | + |
| 37 | +```sql |
| 38 | +%%sql |
| 39 | +-- To select a preferred list of regions in a multi-region Cosmos DB account, add spark.cosmos.preferredRegions '<Region1>,<Region2>' in the config options |
| 40 | + |
| 41 | +create table call_center using cosmos.olap options ( |
| 42 | + spark.synapse.linkedService 'INFERRED', |
| 43 | + spark.cosmos.container 'INFERRED' |
| 44 | +) |
| 45 | +``` |
| 46 | + |
| 47 | +## Write DataFrame to container |
| 48 | +In this gesture, you will write a dataframe into a container. This operation will impact the transactional performance and consume Request Units. Using Azure Cosmos DB transactional performance is ideal for write transactions. Make sure that you replace **YOURDATAFRAME** by the dataframe that you want to write back. |
| 49 | + |
| 50 | +```python |
| 51 | +# Write a Spark DataFrame into a Cosmos DB container |
| 52 | +# To select a preferred list of regions in a multi-region Cosmos DB account, add .option("spark.cosmos.preferredRegions", "<Region1>,<Region2>") |
| 53 | + |
| 54 | + |
| 55 | +YOURDATAFRAME.write.format("cosmos.oltp")\ |
| 56 | + .option("spark.synapse.linkedService", "INFERRED")\ |
| 57 | + .option("spark.cosmos.container", "INFERRED")\ |
| 58 | + .option("spark.cosmos.write.upsertEnabled", "true")\ |
| 59 | + .mode('append')\ |
| 60 | + .save() |
| 61 | +``` |
| 62 | + |
| 63 | +## Load streaming DataFrame from container |
| 64 | +In this gesture, you will use Spark Streaming capability to load data from a container into a dataframe. The data will be stored into the primary data lake account (and file system) that you connected to the workspace. If the folder /localReadCheckpointFolder is not created, it will be automatically created. This operation will impact the transactional performance of Cosmos DB. |
| 65 | + |
| 66 | +```python |
| 67 | +# To select a preferred list of regions in a multi-region Cosmos DB account, add .option("spark.cosmos.preferredRegions", "<Region1>,<Region2>") |
| 68 | + |
| 69 | +dfStream = spark.readStream\ |
| 70 | + .format("cosmos.oltp")\ |
| 71 | + .option("spark.synapse.linkedService", "INFERRED")\ |
| 72 | + .option("spark.cosmos.container", "INFERRED")\ |
| 73 | + .option("spark.cosmos.changeFeed.readEnabled", "true")\ |
| 74 | + .option("spark.cosmos.changeFeed.startFromTheBeginning", "true")\ |
| 75 | + .option("spark.cosmos.changeFeed.checkpointLocation", "/localReadCheckpointFolder")\ |
| 76 | + .option("spark.cosmos.changeFeed.queryName", "streamQuery")\ |
| 77 | + .load() |
| 78 | +``` |
| 79 | + |
| 80 | +## Write streaming DataFrame to container |
| 81 | +In this gesture, you will write a streaming dataframe into the Cosmos DB container you selected. If the folder /localReadCheckpointFolder is not created, it will be automatically created. This operation will impact the transactional performance of Cosmos DB. |
| 82 | + |
| 83 | +```python |
| 84 | +# To select a preferred list of regions in a multi-region Cosmos DB account, add .option("spark.cosmos.preferredRegions", "<Region1>,<Region2>") |
| 85 | + |
| 86 | +streamQuery = dfStream\ |
| 87 | + .writeStream\ |
| 88 | + .format("cosmos.oltp")\ |
| 89 | + .outputMode("append")\ |
| 90 | + .option("checkpointLocation", "/localWriteCheckpointFolder")\ |
| 91 | + .option("spark.synapse.linkedService", "INFERRED")\ |
| 92 | + .option("spark.cosmos.container", "trafficSourceColl_sink")\ |
| 93 | + .option("spark.cosmos.connection.mode", "gateway")\ |
| 94 | + .start() |
| 95 | + |
| 96 | +streamQuery.awaitTermination() |
| 97 | +``` |
0 commit comments