Skip to content

Commit f743302

Browse files
authored
Merge pull request #197476 from vparasuramanMSFT/synapse-component-architecture
Added Synapse Component Architecture
2 parents ea634c8 + 6a360f2 commit f743302

File tree

2 files changed

+27
-3
lines changed

2 files changed

+27
-3
lines changed
62.2 KB
Loading

articles/synapse-analytics/guidance/security-white-paper-introduction.md

Lines changed: 27 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -20,7 +20,7 @@ ms.date: 01/14/2022
2020
- [Dedicated SQL pool](../sql-data-warehouse/sql-data-warehouse-overview-what-is.md?context=/azure/synapse-analytics/context/context) (formerly SQL DW) for enterprise data warehousing.
2121
- Deep integration with [Power BI](https://powerbi.microsoft.com/), [Azure Cosmos DB](../../cosmos-db/synapse-link.md?context=/azure/synapse-analytics/context/context), and [Azure Machine Learning](../machine-learning/what-is-machine-learning.md).
2222

23-
Azure Synapse data security and privacy are non-negotiable. The purpose of this white paper, then, is to provide a comprehensive overview of Azure Synapse security features, which are enterprise-grade and industry-leading. The white paper comprises a series of articles that cover the following five layers of security:
23+
Azure Synapse data security and privacy are non-negotiable. The purpose of this white paper is to provide a comprehensive overview of Azure Synapse security features, which are enterprise-grade and industry-leading. The white paper comprises a series of articles that cover the following five layers of security:
2424

2525
- Data protection
2626
- Access control
@@ -30,9 +30,9 @@ Azure Synapse data security and privacy are non-negotiable. The purpose of this
3030

3131
This white paper targets all enterprise security stakeholders. They include security administrators, network administrations, Azure administrators, workspace administrators, and database administrators.
3232

33-
**Writers:** Vengatesh Parasuraman, Fretz Nuson, Ron Dunn, Khendr'a Reid, John Hoang, Nithesh Krishnappa, Mykola Kovalenko, Brad Schacht, Pedro Matinez, Mark Pryce-Maher, and Arshad Ali.
33+
**Writers:** Vengatesh Parasuraman, Fretz Nuson, Ron Dunn, Khendr'a Reid, John Hoang, Nithesh Krishnappa, Mykola Kovalenko, Brad Schacht, Pedro Martinez, Mark Pryce-Maher, and Arshad Ali.
3434

35-
**Technical Reviewers:** Nandita Valsan, Rony Thomas, Daniel Crawford, and Tammy Richter Jones.
35+
**Technical Reviewers:** Nandita Valsan, Rony Thomas, Abhishek Narain, Daniel Crawford, and Tammy Richter Jones.
3636

3737
**Applies to:** Azure Synapse Analytics, dedicated SQL pool (formerly SQL DW), serverless SQL pool, and Apache Spark pool.
3838

@@ -53,6 +53,30 @@ Some common security questions include:
5353

5454
The purpose of this white paper is to provide answers to these common security questions, and many others.
5555

56+
## Component architecture
57+
58+
Azure Synapse is a Platform-as-a-service (PaaS) analytics service that brings together multiple independent components such as dedicated SQL pools, serverless SQL pools, Apache Spark pools, and data integration pipelines. These components are designed to work together to provide a seamless analytical platform experience.
59+
60+
[Dedicated SQL pools](../sql/overview-architecture.md) are provisioned clusters that provide enterprise data warehousing capabilities for SQL workloads. Data is ingested into managed storage powered by Azure Storage, which is also a PaaS service. Compute is isolated from storage enabling customers to scale compute independently of their data. Dedicated SQL pools also provide the ability to query data files directly over customer-managed Azure Storage accounts by using external tables.
61+
62+
[Serverless SQL pools](../sql/on-demand-workspace-overview.md) are on-demand clusters that provide a SQL interface to query and analyze data directly over customer-managed Azure Storage accounts. Since they're serverless, there's no managed storage, and the compute nodes scale automatically in response to the query workload.
63+
64+
[Apache Spark](../spark/apache-spark-overview.md) in Azure Synapse is one of Microsoft's implementations of open-source Apache Spark in the cloud. Spark instances are provisioned on-demand based on the metadata configurations defined in the Spark pools. Each user gets their own dedicated Spark instance for running their jobs. The data files processed by the Spark instances are managed by the customer in their own Azure Storage accounts.
65+
66+
[Pipelines](../../data-factory/concepts-pipelines-activities.md) are a logical grouping of activities that perform data movement and data transformation at scale. [Data flow] (../../data-factory/concepts-data-flow-overview.md) is a transformation activity in a pipeline that's developed by using a low-code user interface. It can execute data transformations at scale. Behind the scenes, data flows use Apache Spark clusters of Azure Synapse to execute automatically generated code. Pipelines and data flows are compute-only services, and they don't have any managed storage associated with them.
67+
68+
Pipelines use the Integration Runtime (IR) as the scalable compute infrastructure for performing data movement and dispatch activities. Data movement activities run on the IR whereas the dispatch activities run on variety of other compute engines, including Azure SQL Database, Azure HDInsight, Azure Databricks, Apache Spark clusters of Azure Synapse, and others. Azure Synapse supports two types of IR: Azure Integration Runtime and Self-hosted Integration Runtime. The [Azure IR](/azure/data-factory/concepts-integration-runtime.md#azure-integration-runtime) provides a fully managed, scalable, and on-demand compute infrastructure. The [Self-hosted IR](/azure/data-factory/concepts-integration-runtime.md#self-hosted-integration-runtime) is installed and configured by the customer in their own network, either in on-premises machines or in Azure cloud virtual machines.
69+
70+
Customers can choose to associate their Synapse workspace with a [managed workspace virtual network](../security/synapse-workspace-managed-vnet.md). When associated with a managed workspace virtual network, Azure IRs and Apache Spark clusters that are used by pipelines, data flows, and the Apache Spark pools are deployed inside the managed workspace virtual network. This setup ensures network isolation between the workspaces for pipelines and Apache Spark workloads.
71+
72+
The following diagram depicts the various components of Azure Synapse.
73+
74+
:::image type="content" source="media/security-white-paper-overview/azure-synapse-components.png" border="false" alt-text="Diagram of Azure Synapse components showing dedicated SQL pools, serverless SQL pools, Apache Spark pools, and pipelines.":::
75+
76+
## Component isolation
77+
78+
Each individual component of Azure Synapse depicted in the diagram provides its own security features. Security features provide data protection, access control, authentication, network security, and threat protection for securing the compute and the associated data that’s processed. Additionally, Azure Storage, being a PaaS service, provides additional security of its own, that's set up and managed by the customer in their own storage accounts. This level of component isolation limits and minimizes the exposure if there were a security vulnerability in any one of its components.
79+
5680
## Security layers
5781

5882
Azure Synapse implements a multi-layered security architecture for end-to-end protection of your data. There are five layers:

0 commit comments

Comments
 (0)