Skip to content

Commit 2758a1d

Browse files
authored
Merge pull request #66 from Rodrigossz/master
Fixing small issiues
2 parents ecff4b1 + 874ee28 commit 2758a1d

File tree

12 files changed

+232
-22425
lines changed

12 files changed

+232
-22425
lines changed

Notebooks/PySpark/Synapse Link for Cosmos DB samples/MongoDB/README.md renamed to Notebooks/PySpark/Synapse Link for Cosmos DB samples/E-Commerce/README.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -1,7 +1,7 @@
11

22
# Load, Query, and Schema Updates with Azure Cosmos DB API for MongoDB
33

4-
In this noteboook, a simple dataset is created to show you how to use MongoDB client to ingest data, and how to use Synapse Link with Cosmos DB API for MongoDB to query this data.
4+
In this notebook, a simple dataset is created to show you how to use MongoDB client to ingest data, and how to use Synapse Link with Cosmos DB API for MongoDB to query this data.
55

66
Also, we will ingest a second dataset with a schema update and show how it is managed by Synapse Link.
77

Notebooks/PySpark/Synapse Link for Cosmos DB samples/MongoDB/spark-notebooks/pyspark/01-CosmosDBSynapseMongoDB.ipynb renamed to Notebooks/PySpark/Synapse Link for Cosmos DB samples/E-Commerce/spark-notebooks/pyspark/01-CosmosDBSynapseMongoDB.ipynb

Lines changed: 13 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -18,6 +18,16 @@
1818
"source": [
1919
"# Getting started with Azure Cosmos DB's API for MongoDB and Synapse Link\n",
2020
"\n",
21+
"## Key Information about this notebook\n",
22+
"\n",
23+
"* This notebook is part of the Azure Synapse Link for Azure Cosmos DB analitycal sample notebooks. For more information, click [here](../../../README.md). \n",
24+
"\n",
25+
"* It was build for Azure Cosmos DB API for MongoDB but you can, by yourself, customize it for Azure Cosmos DB SQL API. Please read about the analytical store inference schema differences between these 2 APIs [here](https://docs.microsoft.com/azure/cosmos-db/analytical-store-introduction#analytical-schema). \n",
26+
"\n",
27+
"* This is a Synapse Notebook and it was created to run in Synapse Analytics workspaces. Please make sure that you followed the pre-reqs of the [README](/README.md) file. After that, please execute the steps below in the same order that they are presented here. \n",
28+
"\n",
29+
"* From now on, all operations are case sentitive. Please be careful with everything you need to type.\n",
30+
"\n",
2131
"In this sample we will execute the following tasks:\n",
2232
"\n",
2333
"1. Insert a dataset using the traditional MongoDB client.\n",
@@ -26,9 +36,9 @@
2636
"1. Execute aggregation queries again, consolidating both datasets.\n",
2737
"\n",
2838
"## Pre-requisites\n",
29-
"1. Have you created a MongoDB API account in Azure Cosmos DB? If not, go to [Create an account for Azure Cosmos DB's API for MongoDB]().\n",
30-
"1. For your Cosmos DB account, have you enabled Synapse Link? If not, go to [Enable Synapse Link for Azure Cosmos DB's API for MongoDB]().\n",
31-
"1. Have you created a Synapse Workspace? If not, go to [Create Synapse Workspace account](). Please don't forget to add yourself as **Storage Blob Data Contributor** to the primary ADLS G2 account that is linked to the Synapse workspace.\n",
39+
"1. Have you created a MongoDB API account in Azure Cosmos DB? If not, go to [Create an account for Azure Cosmos DB's API for MongoDB](https://docs.microsoft.com/azure/cosmos-db/mongodb-introduction).\n",
40+
"1. For your Cosmos DB account, have you enabled Synapse Link? If not, go to [Enable Synapse Link for Azure Cosmos DB's API for MongoDB](https://docs.microsoft.com/azure/cosmos-db/configure-synapse-link).\n",
41+
"1. Have you created a Synapse Workspace? If not, go to [Create Synapse Workspace account](https://docs.microsoft.com/azure/synapse-analytics/synapse-link/how-to-connect-synapse-link-cosmos-db). Please don't forget to add yourself as **Storage Blob Data Contributor** to the primary ADLS G2 account that is linked to the Synapse workspace.\n",
3242
"\n",
3343
"## Create a Cosmos DB collection with analytical store enabled\n",
3444
"\n",

Notebooks/PySpark/Synapse Link for Cosmos DB samples/MongoDB/spark-notebooks/pyspark/requirements.txt renamed to Notebooks/PySpark/Synapse Link for Cosmos DB samples/E-Commerce/spark-notebooks/pyspark/requirements.txt

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -1,3 +1,3 @@
1-
pymongo==2.8.1
2-
aenum==2.2.4
1+
pymongo==2.8.1
2+
aenum==2.2.4
33
bson==0.5.10
Lines changed: 19 additions & 16 deletions
Original file line numberDiff line numberDiff line change
@@ -1,41 +1,44 @@
11

22
# IoT Anomaly Detection leveraging Azure Synapse Link for Azure Cosmos DB
3+
34
The hypothetical scenario is Power Plant where signals from steam turbines are being analyzed and Anomalous signals are detected. You will ingest streaming and batch IoT data into Azure Cosmos DB using Azure Synapse Spark, perform Joins and aggregations using Azure Synapse Link and perform anomaly detection using Azure Cognitive Services on Spark (MMLSpark).
45

5-
### Environment setup
6+
## Environment setup
7+
68
Please make sure that you followed the pre-reqs of the main [README](../README.md) file. Please execute the below steps in the given order.
7-
1. Using the Data / Linked tab of your Synapse workspace, create IoTData folder within the root directory of the storage account that is attached to the Synapse workspace. Upload to this folder the "IoTDeviceInfo.csv" file that is placed under the "IoTData" dir of this repo.
8-
9+
10+
1. Using the Data / Linked tab of your Synapse workspace, create IoTData folder within the root directory of the storage account that is attached to the Synapse workspace. Upload to this folder the "IoTDeviceInfo.csv" file that is placed under the "IoTData" dir of this repo.
11+
912
![upload_datasets](images/upload_datasets.png)
1013

11-
2. Using the Azure Portal, go to the Access Control (IAM) tab of the storage account associated with Synapse workspace, click on the +Add and Add a role assignment and add yourself to the Data Contributor role. This is needed for any spark metadata operations such as creating databases and tables using the Azure Synapse Spark Pool.
14+
2. Using the Azure Portal, go to the Access Control (IAM) tab of the storage account associated with Synapse workspace, click on the +Add and Add a role assignment and add yourself to the Data Contributor role. This is needed for any spark metadata operations such as creating databases and tables using the Azure Synapse Spark Pool.
1215

13-
3. Using the Azure Portal, go to Data Explorer of your the Azure Cosmos DB Account and create a database called CosmosDBIoTDemo.
16+
3. Using the Azure Portal, go to Data Explorer of your the Azure Cosmos DB Account and create a database called CosmosDBIoTDemo.
1417

15-
4. In the same Data Explorer, create two Analytical Store enabled containers: IoTSignals and IoTDeviceInfo. In the portal interface, the container-id is the container name. Change the Throughput to Autoscale and set the max limit to 4,000. Please click [here](https://review.docs.microsoft.com/en-us/azure/cosmos-db/configure-synapse-link?branch=release-build-cosmosdb#create-analytical-ttl) for details on how to enable Analytical storage on Cosmos DB containers.
16-
* Use /id as the Partition key for both the containers
18+
4. In the same Data Explorer, create two Analytical Store enabled containers: IoTSignals and IoTDeviceInfo. In the portal interface, the container-id is the container name. Change the Throughput to `autoscale` and set the max limit to 4,000. Please click [here](https://docs.microsoft.com/azure/cosmos-db/configure-synapse-link#azure-portal-2) for details on how to enable Analytical storage on Cosmos DB containers.
19+
* Use `/id` as the Partition key for both the containers
1720
* Please make sure that Analytical store is enabled for both the containers
1821

19-
5. In your Azure Synapse workspace, go to the Manage / Linked Services tab and create a linked service called CosmosDBIoTDemo pointing to the Cosmos DB database that was created in step 3 above. Please click [here](https://review.docs.microsoft.com/en-us/azure/synapse-analytics/synapse-link/how-to-connect-synapse-link-cosmos-db?branch=release-build-synapse#connect-an-azure-cosmos-db-database-to-a-synapse-workspace) for more details on creating Synapse linked service pointing to Cosmos DB.
22+
5. In your Azure Synapse workspace, go to the Manage / Linked Services tab and create a linked service called CosmosDBIoTDemo pointing to the Cosmos DB database that was created in step 3 above. Please click [here](https://docs.microsoft.com/azure/synapse-analytics/synapse-link/how-to-connect-synapse-link-cosmos-db) for more details on creating Synapse linked service pointing to Cosmos DB.
23+
24+
## Notebooks Execution
2025

21-
### Notebooks Execution
26+
Import the below four synapse spark notebooks under the `IoT/spark-notebooks/pyspark/` dir on to the Synapse workspace and attach the Spark pool created in the prerequisite to the notebooks.
2227

23-
Import the below four synapse spark notebooks under the "IoT/spark-notebooks/pyspark/" dir on to the Synapse workspace and attach the Spark pool created in the prerequisite to the notebooks.
24-
1. [01-CosmosDBSynapseStreamIngestion: Ingest streaming data into Azure Cosmos DB collection using Structured Streaming](IoT/spark-notebooks/pyspark/01-CosmosDBSynapseStreamIngestion.ipynb)
28+
1. [01-CosmosDBSynapseStreamIngestion: Ingest streaming data into Azure Cosmos DB collection using Structured Streaming](./spark-notebooks/pyspark/01-CosmosDBSynapseStreamIngestion.ipynb)
2529

2630
This notebook ingests documents to "IoTSignals" collection using structured streaming. Please make sure to stop the execution of this notebook after few 2 to 5 minutes of streaming, which would bring in enough documents required for the Anomaly detection in "04-CosmosDBSynapseML" notebook.
2731
Once the notebook execution is stopped, go to the Data Explorer in Azure Cosmos DB Account portal and make sure that the data has been loaded into the "IoTSignals" collection.
2832

29-
1. [02-CosmosDBSynapseBatchIngestion: Ingest Batch data into Azure Cosmos DB collection using Azure Synapse Spark](IoT/spark-notebooks/pyspark/02-CosmosDBSynapseBatchIngestion.ipynb)
33+
1. [02-CosmosDBSynapseBatchIngestion: Ingest Batch data into Azure Cosmos DB collection using Azure Synapse Spark](./spark-notebooks/pyspark/02-CosmosDBSynapseBatchIngestion.ipynb)
3034

3135
This notebook ingests documents from "IoTDeviceInfo.csv" to the "IoTDeviceInfo" collection.
3236
Once the notebook execution is completed, go to the Data Explorer in Azure Cosmos DB Account portal and make sure that the data has been loaded into the "IoTDeviceInfo" collection.
3337

34-
1. [03-CosmosDBSynapseJoins: Perform Joins and aggregations across Azure Cosmos DB collections using Azure Synapse Link](IoT/spark-notebooks/pyspark/03-CosmosDBSynapseJoins.ipynb)
38+
1. [03-CosmosDBSynapseJoins: Perform Joins and aggregations across Azure Cosmos DB collections using Azure Synapse Link](./spark-notebooks/pyspark/03-CosmosDBSynapseJoins.ipynb)
3539

3640
This notebook creates Spark tables pointing to Azure Cosmos DB Analytical store collections, perform Joins, filters and aggregations across collectionsand visualize the data using plotly
3741

38-
1. [04-CosmosDBSynapseML: Perform Anomaly Detection using Azure Synapse Link and Azure Cognitive Services on Synapse Spark (MMLSpark)](IoT/spark-notebooks/pyspark/04-CosmosDBSynapseML.ipynb)
39-
40-
This notebook performs anomaly detection using Azure Cognitive Services on Spark and enables to visualize the anomalies using plotly.
42+
1. [04-CosmosDBSynapseML: Perform Anomaly Detection using Azure Synapse Link and Azure Cognitive Services on Synapse Spark (MMLSpark)](./spark-notebooks/pyspark/04-CosmosDBSynapseML.ipynb)
4143

44+
This notebook performs anomaly detection using Azure Cognitive Services on Spark and enables to visualize the anomalies using `plotly`.

Notebooks/PySpark/Synapse Link for Cosmos DB samples/IoT/spark-notebooks/pyspark/01-CosmosDBSynapseStreamIngestion.ipynb

Lines changed: 25 additions & 5 deletions
Original file line numberDiff line numberDiff line change
@@ -2,7 +2,17 @@
22
"metadata": {
33
"saveOutput": false,
44
"language_info": {
5-
"name": "python"
5+
"name": "python",
6+
"version": "3.8.5-final"
7+
},
8+
"kernelspec": {
9+
"name": "python3",
10+
"display_name": "Python 3.8.5 32-bit",
11+
"metadata": {
12+
"interpreter": {
13+
"hash": "bb162378bcd0865be49969664200f47400a735e3f46a0ab8d5d80e6092c9afde"
14+
}
15+
}
616
}
717
},
818
"nbformat": 4,
@@ -14,18 +24,28 @@
1424
"source": [
1525
"# Streaming ingestion into Azure Cosmos DB collection using Structured Streaming\n",
1626
"\n",
17-
"In this notebook, we'll \n",
27+
"## Key Information about this notebook\n",
28+
"\n",
29+
"* This notebook is part of the Azure Synapse Link for Azure Cosmos DB analitycal sample notebooks. For more information, click [here](../../../README.md). \n",
30+
"\n",
31+
"* It was build for Azure Cosmos DB SQL API but you can, by yourself, customize it for Azure Cosmos DB API for MongoDB. Please read about the analytical store inference schema differences between these 2 APIs [here](https://docs.microsoft.com/azure/cosmos-db/analytical-store-introduction#analytical-schema). \n",
32+
"\n",
33+
"* This is a Synapse Notebook and it was created to run in Synapse Analytics workspaces. Please make sure that you followed the pre-reqs of the [README](/README.md) file. After that, please execute the steps below in the same order that they are presented here. \n",
34+
"\n",
35+
"* From now on, all operations are case sentitive. Please be careful with everything you need to type.\n",
36+
"\n",
37+
"In this notebook, we'll:\n",
1838
"\n",
1939
"1. Simulate streaming data generation using Rate streaming source\n",
2040
"2. Format the stream dataframe as per the IoTSignals schema\n",
2141
"3. Write the streaming dataframe to the Azure Cosmos DB collection\n",
2242
"\n",
23-
">**Did you know?** Azure Cosmos DB is a great fit for IoT predictive maintenance and anomaly detection use cases. [Click here](https://review.docs.microsoft.com/en-us/azure/cosmos-db/synapse-link-use-cases?branch=release-build-cosmosdb#iot-predictive-maintenance) to learn more about an IoT architecture leveraging HTAP capabilities of Azure Synapse Link for Azure Cosmos DB.\n",
43+
">**Did you know?** Azure Cosmos DB is a great fit for IoT predictive maintenance and anomaly detection use cases. [Click here](https://docs.microsoft.com/azure/cosmos-db/) to learn more about an IoT architecture leveraging HTAP capabilities of Azure Synapse Link for Azure Cosmos DB.\n",
2444
"\n",
25-
">**Did you know?** [Azure Synapse Link for Azure Cosmos DB](https://review.docs.microsoft.com/en-us/azure/cosmos-db/synapse-link?branch=release-build-cosmosdb) is a hybrid transactional and analytical processing (HTAP) capability that enables you to run near real-time analytics over operational data in Azure Cosmos DB.\n",
45+
">**Did you know?** [Azure Synapse Link for Azure Cosmos DB](https://docs.microsoft.com/azure/cosmos-db/synapse-link) is a hybrid transactional and analytical processing (HTAP) capability that enables you to run near real-time analytics over operational data in Azure Cosmos DB.\n",
2646
" \n",
2747
"\n",
28-
">**Did you know?** [Azure Cosmos DB analytical store](https://review.docs.microsoft.com/en-us/azure/cosmos-db/analytical-store-introduction?branch=release-build-cosmosdb) is a fully isolated column store for enabling large scale analytics against operational data in your Azure Cosmos DB, without any impact to your transactional workloads.\n",
48+
">**Did you know?** [Azure Cosmos DB analytical store](https://docs.microsoft.com/azure/cosmos-db/analytical-store-introduction) is a fully isolated column store for enabling large scale analytics against operational data in your Azure Cosmos DB, without any impact to your transactional workloads.\n",
2949
" "
3050
],
3151
"attachments": {}

Notebooks/PySpark/Synapse Link for Cosmos DB samples/IoT/spark-notebooks/pyspark/02-CosmosDBSynapseBatchIngestion.ipynb

Lines changed: 15 additions & 6 deletions
Original file line numberDiff line numberDiff line change
@@ -14,7 +14,17 @@
1414
"source": [
1515
"# Batch ingestion into Azure Cosmos DB collection\n",
1616
"\n",
17-
"In this notebook, we'll \n",
17+
"## Key Information about this notebook\n",
18+
"\n",
19+
"* This notebook is part of the Azure Synapse Link for Azure Cosmos DB analitycal sample notebooks. For more information, click [here](../../../README.md). \n",
20+
"\n",
21+
"* It was build for Azure Cosmos DB SQL API but you can, by yourself, customize it for Azure Cosmos DB API for MongoDB. Please read about the analytical store inference schema differences between these 2 APIs [here](https://docs.microsoft.com/azure/cosmos-db/analytical-store-introduction#analytical-schema). \n",
22+
"\n",
23+
"* This is a Synapse Notebook and it was created to run in Synapse Analytics workspaces. Please make sure that you followed the pre-reqs of the [README](/README.md) file. After that, please execute the steps below in the same order that they are presented here. \n",
24+
"\n",
25+
"* From now on, all operations are case sentitive. Please be careful with everything you need to type.\n",
26+
"\n",
27+
"In this notebook, we'll:\n",
1828
"\n",
1929
"1. Load the IoTDeviceInfo dataset from ADLS Gen2 to a dataframe\n",
2030
"2. Write the dataframe to the Azure Cosmos DB collection\n",
@@ -23,8 +33,7 @@
2333
" \n",
2434
"\n",
2535
">**Did you know?** [Azure Cosmos DB analytical store](https://review.docs.microsoft.com/en-us/azure/cosmos-db/analytical-store-introduction?branch=release-build-cosmosdb) is a fully isolated column store for enabling large scale analytics against operational data in your Azure Cosmos DB, without any impact to your transactional workloads.\n",
26-
" \n",
27-
""
36+
" \n"
2837
],
2938
"attachments": {}
3039
},
@@ -42,7 +51,7 @@
4251
},
4352
{
4453
"cell_type": "code",
45-
"execution_count": 3,
54+
"execution_count": null,
4655
"outputs": [],
4756
"metadata": {},
4857
"source": [
@@ -66,7 +75,7 @@
6675
},
6776
{
6877
"cell_type": "code",
69-
"execution_count": 4,
78+
"execution_count": null,
7079
"outputs": [],
7180
"metadata": {},
7281
"source": [
@@ -82,4 +91,4 @@
8291
"attachments": {}
8392
}
8493
]
85-
}
94+
}

0 commit comments

Comments
 (0)