You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: articles/synapse-analytics/metadata/database.md
+5-26Lines changed: 5 additions & 26 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -6,35 +6,29 @@ author: MikeRys
6
6
ms.service: synapse-analytics
7
7
ms.topic: overview
8
8
ms.subservice:
9
-
ms.date: 04/15/2020
9
+
ms.date: 05/01/2020
10
10
ms.author: mrys
11
11
ms.reviewer: jrasnick
12
12
---
13
13
14
14
# Azure Synapse Analytics shared database
15
15
16
-
Azure Synapse Analytics allows the different computational workspace engines to share databases and tables between its Spark pools (preview), SQL on-demand (preview) engine, and SQL pools.
16
+
Azure Synapse Analytics allows the different computational workspace engines to share databases and tables between its Spark pools (preview) and SQL on-demand (preview) engine.
A database created with a Spark job will become visible with that same name to all current and future Spark pools (preview) in the workspace as well as the SQL on-demand engine.
21
21
22
-
If there are SQL pools in the workspace that have metadata synchronization enabled, or if you create a new SQL pool with the metadata synchronization enabled, these Spark created databases are automatically mapped into special schemas in the SQL pool database.
22
+
The Spark default database, called `default`, will also be visible in the SQL on-demand context as a database called `default`.
23
23
24
-
Each schema is named after the Spark database name with an additional `$` prefix. Both the external and managed tables in the Spark-generated database are exposed as external tables in the corresponding special schema.
25
-
26
-
The Spark default database, called `default`, will also be visible in the SQL on-demand context as a database called `default`, and in any of the SQL pool databases with metadata synchronization turned on as the schema `$default`.
27
-
28
-
Since the databases are synchronized to SQL on-demand and the SQL pools asynchronously, there will be a delay until they appear.
24
+
Since the databases are synchronized to SQL on-demand asynchronously, there will be a delay until they appear.
29
25
30
26
## Manage a Spark created database
31
27
32
28
Use Spark to manage Spark created databases. For example, delete it through a Spark pool job, and create tables in it from Spark.
33
29
34
30
If you create objects in a Spark created database using SQL on-demand, or try to drop the database, the operation will succeed. But, the original Spark database won't be changed.
35
31
36
-
If you try to drop the synchronized schema in a SQL pool, or try to create a table in it, Azure returns an error.
37
-
38
32
## Handling of name conflicts
39
33
40
34
If the name of a Spark database conflicts with the name of an existing SQL on-demand database, a suffix is appended in SQL on-demand to the Spark database. The suffix in SQL on-demand is `_<workspace name>-ondemand-DefaultSparkConnector`.
@@ -46,7 +40,7 @@ For example, if a Spark database called `mydb` gets created in the Azure Synapse
46
40
47
41
## Security model
48
42
49
-
The Spark databases and tables, along with their synchronized representations in the SQL engines will be secured at the underlying storage level.
43
+
The Spark databases and tables, along with their synchronized representations in the SQL engine will be secured at the underlying storage level.
50
44
51
45
The security principal who creates a database is considered the owner of that database, and has all the rights to the database and its objects.
52
46
@@ -74,22 +68,7 @@ SELECT * FROM sys.databases;
74
68
75
69
Verify that `mytestdb` is included in the results.
76
70
77
-
### Exposing a Spark database in a SQL pool
78
-
79
-
With the database created in the previous example, now create a SQL pool in your workspace named `mysqlpool` that enables metadata synchronization.
80
-
81
-
Run the following statement against the `mysqlpool` SQL pool:
82
-
83
-
```sql
84
-
SELECT*FROMsys.schema;
85
-
```
86
-
87
-
Verify the schema for the newly created database in the results.
88
-
89
71
## Next steps
90
72
91
73
-[Learn more about Azure Synapse Analytics' shared metadata](overview.md)
92
74
-[Learn more about Azure Synapse Analytics' shared metadata Tables](table.md)
93
-
94
-
<!-- - [Learn more about the Synchronization with SQL on-demand](overview.md)
95
-
- [Learn more about the Synchronization with SQL pools](overview.md)-->
Copy file name to clipboardExpand all lines: articles/synapse-analytics/metadata/overview.md
+6-10Lines changed: 6 additions & 10 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -6,19 +6,17 @@ author: MikeRys
6
6
ms.service: synapse-analytics
7
7
ms.topic: overview
8
8
ms.subservice:
9
-
ms.date: 04/15/2020
9
+
ms.date: 05/01/2020
10
10
ms.author: mrys
11
11
ms.reviewer: jrasnick
12
12
---
13
13
14
14
# Azure Synapse Analytics shared metadata
15
15
16
-
Azure Synapse Analytics allows the different workspace computational engines to share databases and tables between its Spark pools (preview), SQL on-demand engine (preview), and SQL pools.
16
+
Azure Synapse Analytics allows the different workspace computational engines to share databases and tables between its Spark pools (preview) and SQL on-demand engine (preview).
17
17
18
18
[!INCLUDE [preview](../includes/note-preview.md)]
19
19
20
-
21
-
22
20
The sharing supports the so-called modern data warehouse pattern and gives the workspace SQL engines access to databases and tables created with Spark. It also allows the SQL engines to create their own objects that aren't being shared with the other engines.
23
21
24
22
## Support the modern data warehouse
@@ -29,9 +27,7 @@ The shared metadata model supports the modern data warehouse pattern in the foll
29
27
30
28
2. The Spark created databases and all their tables become visible in any of the Azure Synapse workspace Spark pool instances and can be used from any of the Spark jobs. This capability is subject to the [permissions](#security-model-at-a-glance) since all Spark pools in a workspace share the same underlying catalog meta store.
31
29
32
-
3. The Spark created databases and their Parquet-backed tables become visible in the workspace SQL on-demand engine. [Databases](database.md) are created automatically in the SQL on-demand metadata, and both the [external and managed tables](table.md) created by a Spark job are made accessible as external tables in the SQL on-demand metadata in the `dbo` schema of the corresponding database. <!--For more details, see [ADD LINK].-->
33
-
34
-
4. If there are SQL pool instances in the workspace that have their metadata synchronization enabled <!--[ADD LINK]--> or if a new SQL pool instance is created with the metadata synchronization enabled, the Spark created databases and their Parquet-backed tables will be mapped automatically into the SQL pool database as described in [Azure Synapse Analytics shared database](database.md).
30
+
3. The Spark created databases and their Parquet-backed tables become visible in the workspace SQL on-demand engine. [Databases](database.md) are created automatically in the SQL on-demand metadata, and both the [external and managed tables](table.md) created by a Spark job are made accessible as external tables in the SQL on-demand metadata in the `dbo` schema of the corresponding database.
35
31
36
32
<!--[INSERT PICTURE]-->
37
33
@@ -41,17 +37,17 @@ Object synchronization occurs asynchronously. Objects will have a slight delay o
41
37
42
38
## Which metadata objects are shared
43
39
44
-
Spark allows you to create databases, external tables, managed tables, and views. Since Spark views require a Spark engine to process the defining Spark SQL statement, and cannot be processed by a SQL engine, only databases and their contained external and managed tables that use the Parquet storage format are shared with the workspace SQL engines. Spark views are only shared among the Spark pool instances.
40
+
Spark allows you to create databases, external tables, managed tables, and views. Since Spark views require a Spark engine to process the defining Spark SQL statement, and cannot be processed by a SQL engine, only databases and their contained external and managed tables that use the Parquet storage format are shared with the workspace SQL engine. Spark views are only shared among the Spark pool instances.
45
41
46
42
## Security model at a glance
47
43
48
-
The Spark databases and tables, along with their synchronized representations in the SQL engines, are secured at the underlying storage level. When the table is queried by any of the engines that the query submitter has the right to use, the query submitter's security principal is being passed through to the underlying files. Permissions are checked at the file system level.
44
+
The Spark databases and tables, along with their synchronized representations in the SQL engine, are secured at the underlying storage level. When the table is queried by any of the engines that the query submitter has the right to use, the query submitter's security principal is being passed through to the underlying files. Permissions are checked at the file system level.
49
45
50
46
For more information, see [Azure Synapse Analytics shared database](database.md).
51
47
52
48
## Change maintenance
53
49
54
-
If a metadata object is deleted or changed with Spark, the changes are picked up and propagated to the SQL on-demand engine and the SQL pools that have the objects synchronized. Synchronization is asynchronous and changes are reflected in the SQL engines after a short delay.
50
+
If a metadata object is deleted or changed with Spark, the changes are picked up and propagated to the SQL on-demand engine. Synchronization is asynchronous and changes are reflected in the SQL engine after a short delay.
Azure Synapse Analytics allows the different workspace computational engines to share databases and Parquet-backed tables between its Apache Spark pools (preview), SQL on-demand (preview) engine, and SQL pools.
18
+
Azure Synapse Analytics allows the different workspace computational engines to share databases and Parquet-backed tables between its Apache Spark pools (preview) and SQL on-demand (preview) engine.
19
19
20
20
Once a database has been created by a Spark job, you can create tables in it with Spark that use Parquet as the storage format. These tables will immediately become available for querying by any of the Azure Synapse workspace Spark pools. They can also be used from any of the Spark jobs subject to permissions.
21
21
22
-
The Spark created, managed, and external tables are also made available as external tables with the same name in the corresponding synchronized database in SQL on-demand and in the corresponding `$`-prefixed schemas in the SQL pools that have their metadata synchronization enabled. [Exposing a Spark table in SQL](#exposing-a-spark-table-in-sql) provides more detail on the table synchronization.
22
+
The Spark created, managed, and external tables are also made available as external tables with the same name in the corresponding synchronized database in SQL on-demand. [Exposing a Spark table in SQL](#exposing-a-spark-table-in-sql) provides more detail on the table synchronization.
23
23
24
-
Since the tables are synchronized to SQL on-demand and the SQL pools asynchronously, there will be a delay until they appear.
25
-
26
-
Mapping of tables to external tables, data sources and file formats.
24
+
Since the tables are synchronized to SQL on-demand asynchronously, there will be a delay until they appear.
27
25
28
26
## Manage a Spark created table
29
27
30
28
Use Spark to manage Spark created databases. For example, delete it through a Spark pool job, and create tables in it from Spark.
31
29
32
30
If you create objects in such a database from SQL on-demand or try to drop the database, the operation will succeed, but the original Spark database will not be changed.
33
31
34
-
If you try to drop the synchronized schema in a SQL pool, or try to create a table in it, Azure returns an error.
35
-
36
32
## Exposing a Spark table in SQL
37
33
38
34
### Which Spark tables are shared
@@ -51,7 +47,7 @@ Azure Synapse currently only shares managed and external Spark tables that store
51
47
52
48
### How are Spark tables shared
53
49
54
-
The shareable managed and external Spark tables exposed in the SQL engines as external tables with the following properties:
50
+
The shareable managed and external Spark tables exposed in the SQL engine as external tables with the following properties:
55
51
56
52
- The SQL external table's data source is the data source representing the Spark table's location folder.
57
53
- The SQL external table's file format is Parquet.
@@ -83,7 +79,7 @@ Spark tables provide different data types than the Synapse SQL engines. The foll
83
79
84
80
## Security model
85
81
86
-
The Spark databases and tables, as well as their synchronized representations in the SQL engines will be secured at the underlying storage level. Since they do not currently have permissions on the objects themselves, the objects can be seen in the object explorer.
82
+
The Spark databases and tables, as well as their synchronized representations in the SQL engine will be secured at the underlying storage level. Since they do not currently have permissions on the objects themselves, the objects can be seen in the object explorer.
87
83
88
84
The security principal who creates a managed table is considered the owner of that table and has all the rights to the table as well as the underlying folders and files. In addition, the owner of the database will automatically become co-owner of the table.
89
85
@@ -189,27 +185,6 @@ id | name | birthdate
189
185
1 | Alice | 2010-01-01
190
186
```
191
187
192
-
### Querying Spark tables in a SQL pool
193
-
194
-
With the tables created in the previous examples, now create a SQL pool in your workspace named `mysqlpool` that enables metadata synchronization (or use the already created pool from [Exposing a Spark database in a SQL pool](database.md#exposing-a-spark-database-in-a-sql-pool).
195
-
196
-
Run the following statement against the `mysqlpool` SQL pool:
197
-
198
-
```sql
199
-
SELECT*FROMsys.tables;
200
-
```
201
-
202
-
Verify that the tables `myParquetTable` and `myExternalParquetTable` are visible in the schema `$mytestdb`.
203
-
204
-
Now you can read the data from SQL on-demand as follows:
205
-
206
-
```sql
207
-
SELECT*FROM [$mytestdb].myParquetTable WHERE name ='Alice';
208
-
SELECT*FROM [$mytestdb].myExternalParquetTable WHERE name ='Alice';
209
-
```
210
-
211
-
You should get the same results as with SQL on-demand above.
212
-
213
188
## Next steps
214
189
215
190
-[Learn more about Azure Synapse Analytics' shared metadata](overview.md)
Copy file name to clipboardExpand all lines: articles/synapse-analytics/spark/apache-spark-development-using-notebooks.md
+12-9Lines changed: 12 additions & 9 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -6,7 +6,7 @@ author: ruixinxu
6
6
ms.service: synapse-analytics
7
7
ms.topic: conceptual
8
8
ms.subservice:
9
-
ms.date: 04/15/2020
9
+
ms.date: 05/01/2020
10
10
ms.author: ruxu
11
11
ms.reviewer:
12
12
---
@@ -50,12 +50,12 @@ There are multiple ways to add a new cell to your notebook.
50
50
51
51
### Set a primary language
52
52
53
-
Azure Synapse Studio notebooks support four spark languages:
53
+
Azure Synapse Studio notebooks support four Apache Spark languages:
54
54
55
-
*pyspark (python)
56
-
*spark (Scala)
57
-
*sparkSQL
58
-
*Spark.NET (C#)
55
+
*pySpark (Python)
56
+
*Spark (Scala)
57
+
*SparkSQL
58
+
* .NET for Apache Spark (C#)
59
59
60
60
You can set the primary language for new added cells from the dropdown list in the top command bar.
61
61
@@ -70,9 +70,9 @@ You can use multiple languages in one notebook by specifying the correct languag
70
70
|%%pyspark| Python | Execute a **Python** query against Spark Context. |
71
71
|%%spark| Scala | Execute a **Scala** query against Spark Context. |
72
72
|%%sql| SparkSQL | Execute a **SparkSQL** query against Spark Context. |
73
-
|%%csharp |Spark.NET C# | Execute a **Spark.NET C#** query against Spark Context. |
73
+
|%%csharp | .NET for Spark C# | Execute a **.NET for Spark C#** query against Spark Context. |
74
74
75
-
The following image is an example of how you can write a PySpark query using the **%%pyspark** magic command or a SparkSQL query with the **%%sql** magic command in a **Spark(Scala)** notebook. Notice that the primary language for the notebook is set to Scala.
75
+
The following image is an example of how you can write a PySpark query using the **%%pyspark** magic command or a SparkSQL query with the **%%sql** magic command in a **Spark(Scala)** notebook. Notice that the primary language for the notebook is set to pySpark.
0 commit comments