You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: articles/synapse-analytics/guidance/security-white-paper-introduction.md
+12-12Lines changed: 12 additions & 12 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -20,7 +20,7 @@ ms.date: 01/14/2022
20
20
-[Dedicated SQL pool](../sql-data-warehouse/sql-data-warehouse-overview-what-is.md?context=/azure/synapse-analytics/context/context) (formerly SQL DW) for enterprise data warehousing.
21
21
- Deep integration with [Power BI](https://powerbi.microsoft.com/), [Azure Cosmos DB](../../cosmos-db/synapse-link.md?context=/azure/synapse-analytics/context/context), and [Azure Machine Learning](../machine-learning/what-is-machine-learning.md).
22
22
23
-
Azure Synapse data security and privacy are non-negotiable. The purpose of this white paper, then, is to provide a comprehensive overview of Azure Synapse security features, which are enterprise-grade and industry-leading. The white paper comprises a series of articles that cover the following five layers of security:
23
+
Azure Synapse data security and privacy are non-negotiable. The purpose of this white paper is to provide a comprehensive overview of Azure Synapse security features, which are enterprise-grade and industry-leading. The white paper comprises a series of articles that cover the following five layers of security:
24
24
25
25
- Data protection
26
26
- Access control
@@ -53,29 +53,29 @@ Some common security questions include:
53
53
54
54
The purpose of this white paper is to provide answers to these common security questions, and many others.
55
55
56
-
## Component Architecture
56
+
## Component architecture
57
57
58
-
Azure Synapse Analytics is a Platform-as-a-service (PaaS) that brings together multiple independent components such as Dedicated SQL pools, Serverless SQL pools, Apache Spark pools and Data Integration Pipelines that work together to provide a seamless analytical platform experience for the customers.
58
+
Azure Synapse is a Platform-as-a-service (PaaS) analytics service that brings together multiple independent components such as dedicated SQL pools, serverless SQL pools, Apache Spark pools, and data integration pipelines. These components are designed to work together to provide a seamless analytical platform experience.
59
59
60
-
Dedicated SQL pools are provisioned clusters that provide enterprise data warehousing capabilities for SQL workloads. The data is ingested into a managed storage powered by Azure Storage, which is another PaaS service by itself. Compute is isolated from storage which enables customers to scale compute independently of their data. Dedicated SQL pools also provide the ability to query the data files directly over the customermanaged Azure Storage accounts via external tables.
60
+
[Dedicated SQL pools](../sql/overview-architecture.md) are provisioned clusters that provide enterprise data warehousing capabilities for SQL workloads. Data is ingested into managed storage powered by Azure Storage, which is also a PaaS service. Compute is isolated from storage enabling customers to scale compute independently of their data. Dedicated SQL pools also provide the ability to query data files directly over customer-managed Azure Storage accounts by using external tables.
61
61
62
-
Serverless SQL pools are on-demand clusters that provide SQL interface to query and analyze data directly over customermanaged Azure Storage accounts. Since they are serverless, there is no managed storage, and the compute nodes are scaled automatically depending on the query workload.
62
+
[Serverless SQL pools](../sql/on-demand-workspace-overview.md) are on-demand clusters that provide a SQL interface to query and analyze data directly over customer-managed Azure Storage accounts. Since they're serverless, there's no managed storage, and the compute nodes scale automatically in response to the query workload.
63
63
64
-
Apache Spark in Azure Synapse Analytics is one of Microsoft's implementations of open-source Apache Spark in the cloud. Spark instances are provisioned on-demand based on the metadata configurations defined in the Spark pools. Each user gets their own dedicated Spark instance for running their jobs. The data files processed by the Spark instances are managed by the customers in their own Azure storage accounts.
64
+
[Apache Spark](../spark/apache-spark-overview.md) in Azure Synapse is one of Microsoft's implementations of open-source Apache Spark in the cloud. Spark instances are provisioned on-demand based on the metadata configurations defined in the Spark pools. Each user gets their own dedicated Spark instance for running their jobs. The data files processed by the Spark instances are managed by the customer in their own Azure Storage accounts.
65
65
66
-
Pipelines and Data flows provide data integration capabilities. Pipelines are a logical grouping of activities that perform data movement and data transformation at scale. Data flowis a transformation activity in a pipeline that provides low-code user interface to author and execute data transformations at scale. Data flow leverages Apache Spark clusters of Azure Synapse Analytics behind the scenes to execute the generated code. Pipelines and Data flows are computeonly services and they do not have any managed storage associated with them.
66
+
[Pipelines](../../data-factory/concepts-pipelines-activities.md)are a logical grouping of activities that perform data movement and data transformation at scale. [Data flow] (../../data-factory/concepts-data-flow-overview.md) is a transformation activity in a pipeline that's developed by using a low-code user interface. It can execute data transformations at scale. Behind the scenes, data flows use Apache Spark clusters of Azure Synapse to execute automatically generated code. Pipelines and data flows are compute-only services, and they don't have any managed storage associated with them.
67
67
68
-
Pipeline leverages the Integration Runtime as the scalable compute infrastructure for performing data movement activities that executes on the Integration Runtime and for dispatch activities that runs on variety of other compute engines such as Azure SQL Database, Azure HDInsight, Azure Databricks, Apache Spark clusters of Azure Synapse Analytics, etc. Azure Synapse Analytics supports two types of Integration Runtimes – Azure Integration Runtime and Self-hosted Integration Runtime. Azure Integration Runtimes provide a fully managed, scalable, and on-demand compute infrastructure. Self-hosted Integration Runtimes are installed and configured by the users in their own networks, either in on-premises machines or in the Azure cloud virtual machines.
68
+
Pipelines use the Integration Runtime (IR) as the scalable compute infrastructure for performing data movement and dispatch activities. Data movement activities run on the IR whereas the dispatch activities run on variety of other compute engines, including Azure SQL Database, Azure HDInsight, Azure Databricks, Apache Spark clusters of Azure Synapse, and others. Azure Synapse supports two types of IR: Azure Integration Runtime and Self-hosted Integration Runtime. The [Azure IR](/azure/data-factory/concepts-integration-runtime.md#azure-integration-runtime) provides a fully managed, scalable, and on-demand compute infrastructure. The [Self-hosted IR](/azure/data-factory/concepts-integration-runtime.md#self-hosted-integration-runtime) is installed and configured by the customer in their own network, either in on-premises machines or in Azure cloud virtual machines.
69
69
70
-
Customers can choose to associate their Synapse workspace with a Managed workspace Virtual Network. When associated with a Managed workspace Virtual Network, Azure Integration Runtimes, and the Apache Spark clusters that are used by the Pipelines, Data flows and the Apache Spark pools are deployed inside the Managed workspace Virtual Network. This ensures network isolation between the workspaces for Pipelines and Apache Spark workloads.
70
+
Customers can choose to associate their Synapse workspace with a [managed workspace virtual network](../security/synapse-workspace-managed-vnet.md). When associated with a managed workspace virtual network, Azure IRs and Apache Spark clusters that are used by pipelines, data flows, and the Apache Spark pools are deployed inside the managed workspace virtual network. This setup ensures network isolation between the workspaces for pipelines and Apache Spark workloads.
71
71
72
-
The following diagram depicts the various components of Azure Synapse Analytics.
72
+
The following diagram depicts the various components of Azure Synapse.
73
73
74
-
:::image type="content" source="media/security-white-paper-overview/azure-synapse-components.png" alt-text="Image shows the various components of Azure Synapse Analytics: Dedicated SQL pools, Serverless SQL pools, Apache Spark pools and Pipelines.":::
Each individual component of Azure Synapse Analytics described above provides its own security features such as data protection, access control, authentication, network security and threat protection for securing the compute and the associated data that is processed. In addition to that, Azure Storage, being a PaaS service, provides additional security on its own, that is configured and managed by the users in their own storage accounts. This level of component isolation of Azure Synapse Analytics limits and minimizes the exposure in case of a security vulnerability in any one of it's components.
78
+
Each individual component of Azure Synapse depicted in the diagram provides its own security features. Security features provide data protection, access control, authentication, network security, and threat protection for securing the compute and the associated data that’s processed. Additionally, Azure Storage, being a PaaS service, provides additional security of its own, that's set up and managed by the customer in their own storage accounts. This level of component isolation limits and minimizes the exposure if there were a security vulnerability in any one of its components.
0 commit comments