Skip to content

Commit d3be17a

Browse files
authored
Merge pull request #115738 from julieMSFT/20200519_mtm
20200519 mtm
2 parents a9cbd5d + b92fe08 commit d3be17a

17 files changed

+63
-111
lines changed

articles/synapse-analytics/metadata/database.md

Lines changed: 5 additions & 26 deletions
Original file line numberDiff line numberDiff line change
@@ -6,35 +6,29 @@ author: MikeRys
66
ms.service: synapse-analytics
77
ms.topic: overview
88
ms.subservice:
9-
ms.date: 04/15/2020
9+
ms.date: 05/01/2020
1010
ms.author: mrys
1111
ms.reviewer: jrasnick
1212
---
1313

1414
# Azure Synapse Analytics shared database
1515

16-
Azure Synapse Analytics allows the different computational workspace engines to share databases and tables between its Spark pools (preview), SQL on-demand (preview) engine, and SQL pools.
16+
Azure Synapse Analytics allows the different computational workspace engines to share databases and tables between its Spark pools (preview) and SQL on-demand (preview) engine.
1717

1818
[!INCLUDE [synapse-analytics-preview-terms](../../../includes/synapse-analytics-preview-terms.md)]
1919

2020
A database created with a Spark job will become visible with that same name to all current and future Spark pools (preview) in the workspace as well as the SQL on-demand engine.
2121

22-
If there are SQL pools in the workspace that have metadata synchronization enabled, or if you create a new SQL pool with the metadata synchronization enabled, these Spark created databases are automatically mapped into special schemas in the SQL pool database.
22+
The Spark default database, called `default`, will also be visible in the SQL on-demand context as a database called `default`.
2323

24-
Each schema is named after the Spark database name with an additional `$` prefix. Both the external and managed tables in the Spark-generated database are exposed as external tables in the corresponding special schema.
25-
26-
The Spark default database, called `default`, will also be visible in the SQL on-demand context as a database called `default`, and in any of the SQL pool databases with metadata synchronization turned on as the schema `$default`.
27-
28-
Since the databases are synchronized to SQL on-demand and the SQL pools asynchronously, there will be a delay until they appear.
24+
Since the databases are synchronized to SQL on-demand asynchronously, there will be a delay until they appear.
2925

3026
## Manage a Spark created database
3127

3228
Use Spark to manage Spark created databases. For example, delete it through a Spark pool job, and create tables in it from Spark.
3329

3430
If you create objects in a Spark created database using SQL on-demand, or try to drop the database, the operation will succeed. But, the original Spark database won't be changed.
3531

36-
If you try to drop the synchronized schema in a SQL pool, or try to create a table in it, Azure returns an error.
37-
3832
## Handling of name conflicts
3933

4034
If the name of a Spark database conflicts with the name of an existing SQL on-demand database, a suffix is appended in SQL on-demand to the Spark database. The suffix in SQL on-demand is `_<workspace name>-ondemand-DefaultSparkConnector`.
@@ -46,7 +40,7 @@ For example, if a Spark database called `mydb` gets created in the Azure Synapse
4640
4741
## Security model
4842

49-
The Spark databases and tables, along with their synchronized representations in the SQL engines will be secured at the underlying storage level.
43+
The Spark databases and tables, along with their synchronized representations in the SQL engine will be secured at the underlying storage level.
5044

5145
The security principal who creates a database is considered the owner of that database, and has all the rights to the database and its objects.
5246

@@ -74,22 +68,7 @@ SELECT * FROM sys.databases;
7468

7569
Verify that `mytestdb` is included in the results.
7670

77-
### Exposing a Spark database in a SQL pool
78-
79-
With the database created in the previous example, now create a SQL pool in your workspace named `mysqlpool` that enables metadata synchronization.
80-
81-
Run the following statement against the `mysqlpool` SQL pool:
82-
83-
```sql
84-
SELECT * FROM sys.schema;
85-
```
86-
87-
Verify the schema for the newly created database in the results.
88-
8971
## Next steps
9072

9173
- [Learn more about Azure Synapse Analytics' shared metadata](overview.md)
9274
- [Learn more about Azure Synapse Analytics' shared metadata Tables](table.md)
93-
94-
<!-- - [Learn more about the Synchronization with SQL on-demand](overview.md)
95-
- [Learn more about the Synchronization with SQL pools](overview.md)-->

articles/synapse-analytics/metadata/overview.md

Lines changed: 6 additions & 10 deletions
Original file line numberDiff line numberDiff line change
@@ -6,19 +6,17 @@ author: MikeRys
66
ms.service: synapse-analytics
77
ms.topic: overview
88
ms.subservice:
9-
ms.date: 04/15/2020
9+
ms.date: 05/01/2020
1010
ms.author: mrys
1111
ms.reviewer: jrasnick
1212
---
1313

1414
# Azure Synapse Analytics shared metadata
1515

16-
Azure Synapse Analytics allows the different workspace computational engines to share databases and tables between its Spark pools (preview), SQL on-demand engine (preview), and SQL pools.
16+
Azure Synapse Analytics allows the different workspace computational engines to share databases and tables between its Spark pools (preview) and SQL on-demand engine (preview).
1717

1818
[!INCLUDE [preview](../includes/note-preview.md)]
1919

20-
21-
2220
The sharing supports the so-called modern data warehouse pattern and gives the workspace SQL engines access to databases and tables created with Spark. It also allows the SQL engines to create their own objects that aren't being shared with the other engines.
2321

2422
## Support the modern data warehouse
@@ -29,9 +27,7 @@ The shared metadata model supports the modern data warehouse pattern in the foll
2927

3028
2. The Spark created databases and all their tables become visible in any of the Azure Synapse workspace Spark pool instances and can be used from any of the Spark jobs. This capability is subject to the [permissions](#security-model-at-a-glance) since all Spark pools in a workspace share the same underlying catalog meta store.
3129

32-
3. The Spark created databases and their Parquet-backed tables become visible in the workspace SQL on-demand engine. [Databases](database.md) are created automatically in the SQL on-demand metadata, and both the [external and managed tables](table.md) created by a Spark job are made accessible as external tables in the SQL on-demand metadata in the `dbo` schema of the corresponding database. <!--For more details, see [ADD LINK].-->
33-
34-
4. If there are SQL pool instances in the workspace that have their metadata synchronization enabled <!--[ADD LINK]--> or if a new SQL pool instance is created with the metadata synchronization enabled, the Spark created databases and their Parquet-backed tables will be mapped automatically into the SQL pool database as described in [Azure Synapse Analytics shared database](database.md).
30+
3. The Spark created databases and their Parquet-backed tables become visible in the workspace SQL on-demand engine. [Databases](database.md) are created automatically in the SQL on-demand metadata, and both the [external and managed tables](table.md) created by a Spark job are made accessible as external tables in the SQL on-demand metadata in the `dbo` schema of the corresponding database.
3531

3632
<!--[INSERT PICTURE]-->
3733

@@ -41,17 +37,17 @@ Object synchronization occurs asynchronously. Objects will have a slight delay o
4137

4238
## Which metadata objects are shared
4339

44-
Spark allows you to create databases, external tables, managed tables, and views. Since Spark views require a Spark engine to process the defining Spark SQL statement, and cannot be processed by a SQL engine, only databases and their contained external and managed tables that use the Parquet storage format are shared with the workspace SQL engines. Spark views are only shared among the Spark pool instances.
40+
Spark allows you to create databases, external tables, managed tables, and views. Since Spark views require a Spark engine to process the defining Spark SQL statement, and cannot be processed by a SQL engine, only databases and their contained external and managed tables that use the Parquet storage format are shared with the workspace SQL engine. Spark views are only shared among the Spark pool instances.
4541

4642
## Security model at a glance
4743

48-
The Spark databases and tables, along with their synchronized representations in the SQL engines, are secured at the underlying storage level. When the table is queried by any of the engines that the query submitter has the right to use, the query submitter's security principal is being passed through to the underlying files. Permissions are checked at the file system level.
44+
The Spark databases and tables, along with their synchronized representations in the SQL engine, are secured at the underlying storage level. When the table is queried by any of the engines that the query submitter has the right to use, the query submitter's security principal is being passed through to the underlying files. Permissions are checked at the file system level.
4945

5046
For more information, see [Azure Synapse Analytics shared database](database.md).
5147

5248
## Change maintenance
5349

54-
If a metadata object is deleted or changed with Spark, the changes are picked up and propagated to the SQL on-demand engine and the SQL pools that have the objects synchronized. Synchronization is asynchronous and changes are reflected in the SQL engines after a short delay.
50+
If a metadata object is deleted or changed with Spark, the changes are picked up and propagated to the SQL on-demand engine. Synchronization is asynchronous and changes are reflected in the SQL engine after a short delay.
5551

5652
## Next steps
5753

articles/synapse-analytics/metadata/table.md

Lines changed: 6 additions & 31 deletions
Original file line numberDiff line numberDiff line change
@@ -6,7 +6,7 @@ author: MikeRys
66
ms.service: synapse-analytics
77
ms.topic: overview
88
ms.subservice:
9-
ms.date: 04/15/2020
9+
ms.date: 05/01/2020
1010
ms.author: mrys
1111
ms.reviewer: jrasnick
1212
---
@@ -15,24 +15,20 @@ ms.reviewer: jrasnick
1515

1616
[!INCLUDE [synapse-analytics-preview-terms](../../../includes/synapse-analytics-preview-terms.md)]
1717

18-
Azure Synapse Analytics allows the different workspace computational engines to share databases and Parquet-backed tables between its Apache Spark pools (preview), SQL on-demand (preview) engine, and SQL pools.
18+
Azure Synapse Analytics allows the different workspace computational engines to share databases and Parquet-backed tables between its Apache Spark pools (preview) and SQL on-demand (preview) engine.
1919

2020
Once a database has been created by a Spark job, you can create tables in it with Spark that use Parquet as the storage format. These tables will immediately become available for querying by any of the Azure Synapse workspace Spark pools. They can also be used from any of the Spark jobs subject to permissions.
2121

22-
The Spark created, managed, and external tables are also made available as external tables with the same name in the corresponding synchronized database in SQL on-demand and in the corresponding `$`-prefixed schemas in the SQL pools that have their metadata synchronization enabled. [Exposing a Spark table in SQL](#exposing-a-spark-table-in-sql) provides more detail on the table synchronization.
22+
The Spark created, managed, and external tables are also made available as external tables with the same name in the corresponding synchronized database in SQL on-demand. [Exposing a Spark table in SQL](#exposing-a-spark-table-in-sql) provides more detail on the table synchronization.
2323

24-
Since the tables are synchronized to SQL on-demand and the SQL pools asynchronously, there will be a delay until they appear.
25-
26-
Mapping of tables to external tables, data sources and file formats.
24+
Since the tables are synchronized to SQL on-demand asynchronously, there will be a delay until they appear.
2725

2826
## Manage a Spark created table
2927

3028
Use Spark to manage Spark created databases. For example, delete it through a Spark pool job, and create tables in it from Spark.
3129

3230
If you create objects in such a database from SQL on-demand or try to drop the database, the operation will succeed, but the original Spark database will not be changed.
3331

34-
If you try to drop the synchronized schema in a SQL pool, or try to create a table in it, Azure returns an error.
35-
3632
## Exposing a Spark table in SQL
3733

3834
### Which Spark tables are shared
@@ -51,7 +47,7 @@ Azure Synapse currently only shares managed and external Spark tables that store
5147

5248
### How are Spark tables shared
5349

54-
The shareable managed and external Spark tables exposed in the SQL engines as external tables with the following properties:
50+
The shareable managed and external Spark tables exposed in the SQL engine as external tables with the following properties:
5551

5652
- The SQL external table's data source is the data source representing the Spark table's location folder.
5753
- The SQL external table's file format is Parquet.
@@ -83,7 +79,7 @@ Spark tables provide different data types than the Synapse SQL engines. The foll
8379

8480
## Security model
8581

86-
The Spark databases and tables, as well as their synchronized representations in the SQL engines will be secured at the underlying storage level. Since they do not currently have permissions on the objects themselves, the objects can be seen in the object explorer.
82+
The Spark databases and tables, as well as their synchronized representations in the SQL engine will be secured at the underlying storage level. Since they do not currently have permissions on the objects themselves, the objects can be seen in the object explorer.
8783

8884
The security principal who creates a managed table is considered the owner of that table and has all the rights to the table as well as the underlying folders and files. In addition, the owner of the database will automatically become co-owner of the table.
8985

@@ -189,27 +185,6 @@ id | name | birthdate
189185
1 | Alice | 2010-01-01
190186
```
191187

192-
### Querying Spark tables in a SQL pool
193-
194-
With the tables created in the previous examples, now create a SQL pool in your workspace named `mysqlpool` that enables metadata synchronization (or use the already created pool from [Exposing a Spark database in a SQL pool](database.md#exposing-a-spark-database-in-a-sql-pool).
195-
196-
Run the following statement against the `mysqlpool` SQL pool:
197-
198-
```sql
199-
SELECT * FROM sys.tables;
200-
```
201-
202-
Verify that the tables `myParquetTable` and `myExternalParquetTable` are visible in the schema `$mytestdb`.
203-
204-
Now you can read the data from SQL on-demand as follows:
205-
206-
```sql
207-
SELECT * FROM [$mytestdb].myParquetTable WHERE name = 'Alice';
208-
SELECT * FROM [$mytestdb].myExternalParquetTable WHERE name = 'Alice';
209-
```
210-
211-
You should get the same results as with SQL on-demand above.
212-
213188
## Next steps
214189

215190
- [Learn more about Azure Synapse Analytics' shared metadata](overview.md)

articles/synapse-analytics/quickstart-connect-synapse-link-cosmos-db.md

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -1,5 +1,5 @@
11
---
2-
title: Connect to Synapse Link for Azure Cosmos DB
2+
title: Connect to Azure Synapse Link for Azure Cosmos DB
33
description: How to connect an Azure Cosmos DB to a Synapse workspace with Synapse Link
44
services: synapse-analytics
55
author: ArnoMicrosoft
@@ -11,7 +11,7 @@ ms.author: acomet
1111
ms.reviewer: jrasnick
1212
---
1313

14-
# Connect to Synapse Link for Azure Cosmos DB
14+
# Connect to Azure Synapse Link for Azure Cosmos DB
1515

1616
This article describes how to access an Azure Cosmos DB database from Azure Synapse Analytics studio with Synapse Link.
1717

articles/synapse-analytics/spark/apache-spark-development-using-notebooks.md

Lines changed: 12 additions & 9 deletions
Original file line numberDiff line numberDiff line change
@@ -6,7 +6,7 @@ author: ruixinxu
66
ms.service: synapse-analytics
77
ms.topic: conceptual
88
ms.subservice:
9-
ms.date: 04/15/2020
9+
ms.date: 05/01/2020
1010
ms.author: ruxu
1111
ms.reviewer:
1212
---
@@ -50,12 +50,12 @@ There are multiple ways to add a new cell to your notebook.
5050

5151
### Set a primary language
5252

53-
Azure Synapse Studio notebooks support four spark languages:
53+
Azure Synapse Studio notebooks support four Apache Spark languages:
5454

55-
* pyspark (python)
56-
* spark (Scala)
57-
* sparkSQL
58-
* Spark.NET (C#)
55+
* pySpark (Python)
56+
* Spark (Scala)
57+
* SparkSQL
58+
* .NET for Apache Spark (C#)
5959

6060
You can set the primary language for new added cells from the dropdown list in the top command bar.
6161

@@ -70,9 +70,9 @@ You can use multiple languages in one notebook by specifying the correct languag
7070
|%%pyspark| Python | Execute a **Python** query against Spark Context. |
7171
|%%spark| Scala | Execute a **Scala** query against Spark Context. |
7272
|%%sql| SparkSQL | Execute a **SparkSQL** query against Spark Context. |
73-
|%%csharp | Spark.NET C# | Execute a **Spark.NET C#** query against Spark Context. |
73+
|%%csharp | .NET for Spark C# | Execute a **.NET for Spark C#** query against Spark Context. |
7474

75-
The following image is an example of how you can write a PySpark query using the **%%pyspark** magic command or a SparkSQL query with the **%%sql** magic command in a **Spark(Scala)** notebook. Notice that the primary language for the notebook is set to Scala.
75+
The following image is an example of how you can write a PySpark query using the **%%pyspark** magic command or a SparkSQL query with the **%%sql** magic command in a **Spark(Scala)** notebook. Notice that the primary language for the notebook is set to pySpark.
7676

7777
![synapse-spark-magics](./media/apache-spark-development-using-notebooks/synapse-spark-magics.png)
7878

@@ -113,7 +113,7 @@ The IntelliSense features are at different levels of maturity for different lang
113113
|PySpark (Python)|Yes|Yes|Yes|Yes|Yes|Yes|Yes|Yes|
114114
|Spark (Scala)|Yes|Yes|Yes|Yes|-|-|-|Yes|
115115
|SparkSQL|Yes|Yes|-|-|-|-|-|-|
116-
|Spark.NET (C#)|Yes|-|-|-|-|-|-|-|
116+
|.NET for Spark (C#)|Yes|-|-|-|-|-|-|-|
117117

118118
### Format text cell with toolbar buttons
119119

@@ -387,5 +387,8 @@ Using the following keystroke shortcuts, you can more easily navigate and run co
387387

388388
## Next steps
389389

390+
- [Quickstart: Create an Apache Spark pool (preview) in Azure Synapse Analytics using web tools](../quickstart-apache-spark-notebook.md)
391+
- [What is Apache Spark in Azure Synapse Analytics](apache-spark-overview.md)
392+
- [Use .NET for Apache Spark with Azure Synapse Analytics](spark-dotnet.md)
390393
- [.NET for Apache Spark documentation](/dotnet/spark?toc=/azure/synapse-analytics/toc.json&bc=/azure/synapse-analytics/breadcrumb/toc.json)
391394
- [Azure Synapse Analytics](https://docs.microsoft.com/azure/synapse-analytics)

articles/synapse-analytics/spark/apache-spark-version-support.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -1248,4 +1248,4 @@ zlib==1.2.8
12481248

12491249
- [Azure Synapse Analytics](../overview-what-is.md)
12501250
- [Apache Spark Documentation](https://spark.apache.org/docs/2.4.4/)
1251-
- [Apache Spark Concepts](apache-spark-concepts.md)
1251+
- [Apache Spark Concepts](apache-spark-concepts.md)

0 commit comments

Comments
 (0)