Skip to content

Commit 6752706

Browse files
authored
Merge branch 'release-ignite-arcadia' into 20200410_ria_dates
2 parents 29defdd + 266a5ad commit 6752706

Some content is hidden

Large Commits have some content hidden by default. Use the searchbox below for content that may be hidden.

42 files changed

+253
-220
lines changed

articles/synapse-analytics/data-integration/data-integration-sql-pool.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -1,6 +1,6 @@
11
---
22
title: Ingest into SQL pool in Azure Synapse Analytics
3-
description: Learn how to ingest data into a SQL analytics pool in Azure Synapse Analytics
3+
description: Learn how to ingest data into a SQL pool in Azure Synapse Analytics
44
services: synapse-analytics
55
author: djpmsft
66
ms.service: synapse-analytics

articles/synapse-analytics/data-integration/linked-service.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -63,6 +63,6 @@ You have now established a secure and private connection between Synapse and you
6363

6464
## Next steps
6565

66-
For more understanding of Managed private endpoint in Synapse, see the [Concept around Synapse Managed private endpoint](data-integration-data-lake.md) article.
66+
To develop further understanding of Managed private endpoint in Synapse Analytics, see the [Concept around Synapse Managed private endpoint](data-integration-data-lake.md) article.
6767

6868
For more information on data integration for Synapse Analytics, see the [Ingesting data into a Data Lake](data-integration-data-lake.md) article.

articles/synapse-analytics/monitoring/how-to-monitor-pipeline-runs.md

Lines changed: 1 addition & 6 deletions
Original file line numberDiff line numberDiff line change
@@ -46,9 +46,4 @@ To view details about your pipeline run, select the pipeline run. Then view the
4646

4747
## Next steps
4848

49-
This article showed you how to monitor pipeline runs in your Azure Synapse workspace. You learned how to:
50-
51-
> [!div class="checklist"]
52-
> * View the list of pipeline runs in your workspace
53-
> * Filter the list of pipeline runs to find the pipeline you'd like to monitor
54-
> * Monitor your selected pipeline run in detail.
49+
To learn more about monitoring applications, see the [Monitor Apache Spark applications](how-to-monitor-spark-applications.md) article.

articles/synapse-analytics/monitoring/how-to-monitor-spark-applications.md

Lines changed: 1 addition & 7 deletions
Original file line numberDiff line numberDiff line change
@@ -52,10 +52,4 @@ To view the details about one of your Spark applications, select the Spark appli
5252

5353
## Next steps
5454

55-
This article showed you how to monitor Spark applications in your Azure Synapse workspace. You learned how to:
56-
57-
> [!div class="checklist"]
58-
>
59-
> * View the list of Spark applications in your workspace
60-
> * Filter the list of Spark applications to find the Spark applications you'd like to monitor
61-
> * Monitor your selected Spark application in detail.
55+
For more information on monitoring pipeline runs, see the [Monitor pipeline runs Azure Synapse Studio](how-to-monitor-pipeline-runs.md) article.

articles/synapse-analytics/security/how-to-grant-workspace-managed-identity-permissions.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -114,4 +114,4 @@ You should see your managed identity listed under the **Storage Blob Data Contri
114114

115115
## Next steps
116116

117-
[Workspace managed identity](./synapse-workspace-managed-identity.md)
117+
Learn more about [Workspace managed identity](./synapse-workspace-managed-identity.md)

articles/synapse-analytics/spark/apache-spark-concepts.md

Lines changed: 11 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -13,7 +13,9 @@ ms.reviewer: euang
1313

1414
# Apache Spark in Azure Synapse Analytics Core Concepts
1515

16-
Apache Spark is a parallel processing framework that supports in-memory processing to boost the performance of big-data analytic applications. Apache Spark in Azure Synapse Analytics is one of Microsoft's implementations of Apache Spark in the cloud. Azure Synapse makes it easy to create and configure Spark capabilities in Azure. Azure Synapse provides a different implementation of these Spark capabilities that are documented here.
16+
Apache Spark is a parallel processing framework that supports in-memory processing to boost the performance of big-data analytic applications. Apache Spark in Azure Synapse Analytics is one of Microsoft's implementations of Apache Spark in the cloud.
17+
18+
Azure Synapse makes it easy to create and configure Spark capabilities in Azure. Azure Synapse provides a different implementation of these Spark capabilities that are documented here.
1719

1820
## Spark pools (preview)
1921

@@ -27,7 +29,9 @@ You can read how to create a Spark pool and see all their properties here [Get s
2729

2830
## Spark instances
2931

30-
Spark instances are created when you connect to a Spark pool, create a session, and run a job. As multiple users may have access to a single Spark pool, a new Spark instance is created for each user that connects. When you submit a second job, then if there is capacity in the pool, the existing Spark instance also has capacity then the existing instance will process the job; if not and there is capacity at the pool level, then a new Spark instance will be created.
32+
Spark instances are created when you connect to a Spark pool, create a session, and run a job. As multiple users may have access to a single Spark pool, a new Spark instance is created for each user that connects.
33+
34+
When you submit a second job, then if there is capacity in the pool, the existing Spark instance also has capacity then the existing instance will process the job; if not and there is capacity at the pool level, then a new Spark instance will be created.
3135

3236
## Examples
3337

@@ -50,3 +54,8 @@ Spark instances are created when you connect to a Spark pool, create a session,
5054
- You submit a notebook job, J1 that uses 10 nodes, a Spark instance, SI1 is created to process the job.
5155
- Another user, U2, submits a Job, J3, that uses 10 nodes, a new Spark instance, SI2, is created to process the job.
5256
- You now submit another job, J2, that uses 10 nodes because there is still capacity in the pool and the instance, J2, is processed by SI1.
57+
58+
## Next steps
59+
60+
- [Azure Synapse Analytics](https://docs.microsoft.com/azure/synapse-analytics)
61+
- [Apache Spark Documentation](https://spark.apache.org/docs/2.4.4/)

articles/synapse-analytics/spark/apache-spark-history-server.md

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -233,5 +233,5 @@ Input/output data using Resilient Distributed Datasets (RDDs) does not show in d
233233

234234
## Next steps
235235

236-
* [.NET for Apache Spark documentation](https://docs.microsoft.com/dotnet/spark)
237-
* [Azure Synapse Analytics](https://docs.microsoft.com/azure/synapse-analytics)
236+
- [.NET for Apache Spark documentation](https://docs.microsoft.com/dotnet/spark)
237+
- [Azure Synapse Analytics](../overview-what-is.md)

articles/synapse-analytics/spark/apache-spark-job-definitions.md

Lines changed: 4 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -168,4 +168,7 @@ After creating a Spark job definition, you can submit it to a Synapse Spark pool
168168

169169
## Next steps
170170

171-
This tutorial demonstrates how to use the Azure Synapse Analytics to create Spark job definitions, and then submit them to a Synapse Spark pool. Next you can use the Azure Synapse Analytics to create Power BI datasets and manage Power BI data.
171+
This tutorial demonstrated how to use the Azure Synapse Analytics to create Spark job definitions, and then submit them to a Synapse Spark pool. Next you can use Azure Synapse Analytics to create Power BI datasets and manage Power BI data.
172+
173+
- [Connect to data in Power BI Desktop](https://docs.microsoft.com/power-bi/desktop-quickstart-connect-to-data)
174+
- [Visualize with Power BI](/sql-data-warehouse/sql-data-warehouse-get-started-visualize-with-power-bi)

articles/synapse-analytics/spark/apache-spark-notebook-create-spark-use-sql.md

Lines changed: 3 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -129,8 +129,8 @@ To ensure the Spark instance is shut down, end any connected sessions(notebooks)
129129
In this quickstart, you learned how to create a Synapse Analytics Apache Spark pool and run a basic Spark SQL query.
130130

131131
- [.NET for Apache Spark documentation](https://docs.microsoft.com/dotnet/spark)
132-
- [Azure Synapse Analytics](https://docs.microsoft.com/azure/synapse-analytics)
132+
- [Azure Synapse Analytics](../overview-what-is.md)
133133
- [Apache Spark official documentation](https://spark.apache.org/docs/latest/)
134134

135-
> [!NOTE]
136-
> Some of the official Apache Spark documentation relies on using the spark console, this is not available on Azure Synapse Spark, use the notebook or IntelliJ experiences instead
135+
>[!NOTE]
136+
> Some of the official Apache Spark documentation relies on using the Spark console, which is not available on Azure Synapse Spark. Use the [notebook](../spark/apache-spark-notebook-create-spark-use-sql.md) or [IntelliJ](../spark/intellij-tool-synapse.md) experiences instead.

articles/synapse-analytics/spark/apache-spark-performance.md

Lines changed: 9 additions & 5 deletions
Original file line numberDiff line numberDiff line change
@@ -153,22 +153,26 @@ When running concurrent queries, consider the following:
153153
* Distribute queries across parallel applications.
154154
* Modify size based both on trial runs and on the preceding factors such as GC overhead.
155155

156-
Monitor your query performance for outliers or other performance issues, by looking at the timeline view, SQL graph, job statistics, and so forth. Sometimes one or a few of the executors are slower than the others, and tasks take much longer to execute. This frequently happens on larger clusters (> 30 nodes). In this case, divide the work into a larger number of tasks so the scheduler can compensate for slow tasks. For example, have at least twice as many tasks as the number of executor cores in the application. You can also enable speculative execution of tasks with `conf: spark.speculation = true`.
156+
Monitor your query performance for outliers or other performance issues, by looking at the timeline view, SQL graph, job statistics, and so forth. Sometimes one or a few of the executors are slower than the others, and tasks take much longer to execute. This frequently happens on larger clusters (> 30 nodes). In this case, divide the work into a larger number of tasks so the scheduler can compensate for slow tasks.
157+
158+
For example, have at least twice as many tasks as the number of executor cores in the application. You can also enable speculative execution of tasks with `conf: spark.speculation = true`.
157159

158160
## Optimize job execution
159161

160162
* Cache as necessary, for example if you use the data twice, then cache it.
161163
* Broadcast variables to all executors. The variables are only serialized once, resulting in faster lookups.
162164
* Use the thread pool on the driver, which results in faster operation for many tasks.
163165

164-
Key to Spark 2.x query performance is the Tungsten engine, which depends on whole-stage code generation. In some cases, whole-stage code generation may be disabled. For example, if you use a non-mutable type (`string`) in the aggregation expression, `SortAggregate` appears instead of `HashAggregate`. For example, for better performance, try the following and then re-enable code generation:
166+
Key to Spark 2.x query performance is the Tungsten engine, which depends on whole-stage code generation. In some cases, whole-stage code generation may be disabled.
167+
168+
For example, if you use a non-mutable type (`string`) in the aggregation expression, `SortAggregate` appears instead of `HashAggregate`. For example, for better performance, try the following and then re-enable code generation:
165169

166170
```sql
167171
MAX(AMOUNT) -> MAX(cast(AMOUNT as DOUBLE))
168172
```
169173

170174
## Next steps
171175

172-
* [Tuning Apache Spark](https://spark.apache.org/docs/latest/tuning.html)
173-
* [How to Actually Tune Your Apache Spark Jobs So They Work](https://www.slideshare.net/ilganeli/how-to-actually-tune-your-spark-jobs-so-they-work)
174-
* [Kryo Serialization](https://github.com/EsotericSoftware/kryo)
176+
- [Tuning Apache Spark](https://spark.apache.org/docs/latest/tuning.html)
177+
- [How to Actually Tune Your Apache Spark Jobs So They Work](https://www.slideshare.net/ilganeli/how-to-actually-tune-your-spark-jobs-so-they-work)
178+
- [Kryo Serialization](https://github.com/EsotericSoftware/kryo)

0 commit comments

Comments
 (0)