|
| 1 | +--- |
| 2 | +title: Query Cosmos DB analytical with Synapse Spark |
| 3 | +description: How to query Cosmos DB analytical with Synapse Spark |
| 4 | +services: synapse-analytics |
| 5 | +author: ArnoMicrosoft |
| 6 | +ms.service: synapse-analytics |
| 7 | +ms.topic: quickstart |
| 8 | +ms.subservice: |
| 9 | +ms.date: 05/06/2020 |
| 10 | +ms.author: acomet |
| 11 | +ms.reviewer: jrasnick |
| 12 | +--- |
| 13 | + |
| 14 | +# Query Cosmos DB analytical with Synapse Spark |
| 15 | + |
| 16 | +This article gives some examples on how you can interact with the analytical store from Synapse gestures. Those gestures are visible when you right-click on a container. |
| 17 | + |
| 18 | +When you right click into a container, Synapse will be able to infer which linked service, database and container it refers to. Such gestures are very simple to get quickly code and tweak it to your needs but they are also perfect for discovering data in a single click. |
| 19 | + |
| 20 | +## Load to DataFrame |
| 21 | + |
| 22 | +In this step, you will read from Azure Cosmos DB analytical store into a Spark DataFrame and display 10 rows from the DataFrame called df. Once your data is into dataframe, you can perform additional analysis. This operation does not impact the transactional store. |
| 23 | + |
| 24 | +```python |
| 25 | +# To select a preferred list of regions in a multi-region Cosmos DB account, add .option("spark.cosmos.preferredRegions", "<Region1>,<Region2>") |
| 26 | + |
| 27 | +df = spark.read.format("cosmos.olap")\ |
| 28 | + .option("spark.synapse.linkedService", "INFERRED")\ |
| 29 | + .option("spark.cosmos.container", "INFERRED")\ |
| 30 | + .load() |
| 31 | + |
| 32 | +df.show(10) |
| 33 | +``` |
| 34 | + |
| 35 | +## Create Spark table |
| 36 | + |
| 37 | +In this gesture, you will create a Spark table pointing to the container you selected. That operation does not incur any data movement. If you decide to delete that table, the underlying container (and corresponding analytical store) won't be impacted. This scenario is very convenient to reuse tables through 3rd party tools and provide accessibility to the data for the run-time. |
| 38 | + |
| 39 | +```sql |
| 40 | +%%sql |
| 41 | +-- To select a preferred list of regions in a multi-region Cosmos DB account, add spark.cosmos.preferredRegions '<Region1>,<Region2>' in the config options |
| 42 | + |
| 43 | +create table call_center using cosmos.olap options ( |
| 44 | + spark.synapse.linkedService 'INFERRED', |
| 45 | + spark.cosmos.container 'INFERRED' |
| 46 | +) |
| 47 | +``` |
| 48 | + |
| 49 | +## Write DataFrame to container |
| 50 | +In this gesture, you will write back a dataframe into a container. This operation will impact the transactional performance and consume Request Units. Using Azure Cosmos DB transactional performance will optimize the speed and reliability of those write transactions. Make sure that you replace **YOURDATAFRAME** by the dataframe that you want to write back. |
| 51 | + |
| 52 | +```python |
| 53 | +# Write a Spark DataFrame into a Cosmos DB container |
| 54 | +# To select a preferred list of regions in a multi-region Cosmos DB account, add .option("spark.cosmos.preferredRegions", "<Region1>,<Region2>") |
| 55 | + |
| 56 | + |
| 57 | +YOURDATAFRAME.write.format("cosmos.oltp")\ |
| 58 | + .option("spark.synapse.linkedService", "INFERRED")\ |
| 59 | + .option("spark.cosmos.container", "INFERRED")\ |
| 60 | + .option("spark.cosmos.write.upsertEnabled", "true")\ |
| 61 | + .mode('append')\ |
| 62 | + .save() |
| 63 | +``` |
| 64 | + |
| 65 | +## Load streaming DataFrame from container |
| 66 | +In this gesture, you will use Spark Streaming capability with change feed support to load data from a container into a dataframe with data being stored into the primary data lake account that you connected to the workspace. If the folder /localReadCheckpointFolder is not created, it will be automatically created. This operation will impact the transactional performance of Cosmos DB. |
| 67 | + |
| 68 | +```python |
| 69 | +# To select a preferred list of regions in a multi-region Cosmos DB account, add .option("spark.cosmos.preferredRegions", "<Region1>,<Region2>") |
| 70 | + |
| 71 | +dfStream = spark.readStream\ |
| 72 | + .format("cosmos.oltp")\ |
| 73 | + .option("spark.synapse.linkedService", "INFERRED")\ |
| 74 | + .option("spark.cosmos.container", "INFERRED")\ |
| 75 | + .option("spark.cosmos.changeFeed.readEnabled", "true")\ |
| 76 | + .option("spark.cosmos.changeFeed.startFromTheBeginning", "true")\ |
| 77 | + .option("spark.cosmos.changeFeed.checkpointLocation", "/localReadCheckpointFolder")\ |
| 78 | + .option("spark.cosmos.changeFeed.queryName", "streamQuery")\ |
| 79 | + .load() |
| 80 | +``` |
| 81 | + |
| 82 | +## Write streaming DataFrame to container |
| 83 | +In this gesture, you will write a streaming dataframe into the Cosmos DB container you selected. If the folder /localReadCheckpointFolder is not created, it will be automatically created. This operation will impact the transactional performance of Cosmos DB. |
| 84 | + |
| 85 | +```python |
| 86 | +# To select a preferred list of regions in a multi-region Cosmos DB account, add .option("spark.cosmos.preferredRegions", "<Region1>,<Region2>") |
| 87 | + |
| 88 | +streamQuery = dfStream\ |
| 89 | + .writeStream\ |
| 90 | + .format("cosmos.oltp")\ |
| 91 | + .outputMode("append")\ |
| 92 | + .option("checkpointLocation", "/localWriteCheckpointFolder")\ |
| 93 | + .option("spark.synapse.linkedService", "INFERRED")\ |
| 94 | + .option("spark.cosmos.container", "trafficSourceColl_sink")\ |
| 95 | + .option("spark.cosmos.connection.mode", "gateway")\ |
| 96 | + .start() |
| 97 | + |
| 98 | +streamQuery.awaitTermination() |
| 99 | +``` |
0 commit comments