Skip to content

Commit 266a5ad

Browse files
Merge pull request #110572 from mamccrea/spark-synapse
Update to Spark.NET
2 parents 3448a0e + 4fcc05b commit 266a5ad

File tree

1 file changed

+38
-26
lines changed

1 file changed

+38
-26
lines changed

articles/synapse-analytics/spark/spark-dotnet.md

Lines changed: 38 additions & 26 deletions
Original file line numberDiff line numberDiff line change
@@ -5,50 +5,62 @@ author: mamccrea
55
services: synapse-analytics
66
ms.service: synapse-analytics
77
ms.topic: conceptual
8-
ms.date: 10/21/2019
8+
ms.date: 04/10/2020
99
ms.author: mamccrea
1010
ms.reviewer: jrasnick
1111
---
1212

13-
<!-- # Use .NET for Apache Spark with Azure Synapse Analytics
13+
# Use .NET for Apache Spark with Azure Synapse Analytics
1414

15-
Azure Synapse Analytics uses Spark pools (preview) for data processing. Apache Spark is a general-purpose distributed processing engine for analytics over large data sets - typically terabytes or petabytes of data. You can use Apache Spark for several popular big data scenarios, including:
15+
[.NET for Apache Spark](https://dot.net/spark) is free, open-source, and cross-platform .NET support for Spark. .NET for Apache Spark provides .NET bindings for Spark which allow you to access Spark APIs through C# and F#. With .NET for Apache Spark, you have the ability to write and execute user-defined functions for Spark using .NET. The .NET APIs for Spark enable you to access all aspects of Spark that help you analyze your data, including Spark SQL and Structured Streaming.
1616

17-
* Batch processing
18-
* Machine Learning
19-
* Impromptu querying -->
17+
You can analyze data with .NET for Apache Spark through Spark batch job definitions or with interactive Azure Synapse Analytics notebooks. In this article, you learn how to use .NET for Apache Spark with Azure Synapse using both techniques.
2018

21-
# What is .NET for Apache Spark?
19+
## Submit batch jobs using the Spark job definition
2220

23-
[.NET for Apache Spark](https://dot.net/spark) provides free, open-source, and cross-platform .NET support for Spark. .NET for Apache Spark provides .NET bindings for Spark that allow you to access Spark APIs through C# and F# and gives you the ability to write and execute user-defined functions for Spark using .NET.
21+
Visit the tutorial to learn how to use Azure Synapse Analytics to [create Apache Spark job definitions for Synapse Spark pools](apache-spark-job-definitions.md). If you have not packaged your app to submit to Azure Synapse, complete the following steps.
2422

25-
The .NET APIs for Spark enable you to access all aspects of Spark that help you analyze your data, including Spark SQL and Structured Streaming.
23+
1. Run the following commands to publish your app. Be sure to replace *mySparkApp* with the path to your app.
2624

27-
## .NET for Apache Spark in Azure Synapse Analytics
25+
**On Windows:**
2826

29-
You can analyze your data using .NET for Apache Spark through either Spark batch job definitions or with interactive Azure Synapse Analytics notebooks.
27+
```dotnetcli
28+
cd mySparkApp
29+
dotnet publish -c Release -f netcoreapp3.0 -r ubuntu.16.04-x64
30+
```
3031

31-
<!--
32-
### .NET for Apache Spark in Azure Synapse batch job definitions
33-
Jenny or someone please add details on the batch mode submission -->
32+
**On Linux:**
3433

35-
### .NET for Apache Spark in Azure Synapse Analytics notebooks
34+
```bash
35+
cd mySparkApp
36+
foo@bar:~/path/to/app$ dotnet publish -c Release -f netcoreapp3.0 -r ubuntu.16.04-x64
37+
```
3638

37-
When creating a new notebook, you choose a language kernel that you wish to express your business logic. There is kernel support for several languages, including C#.
39+
2. Do the following tasks to zip your published app files so that you can easily upload them to Azure Synapse.
3840

39-
To use .NET for Apache Spark in your Azure Synapse Analytics notebook, select **.NET Spark (C#)** as your kernel and attach the notebook to an existing Spark pool.
41+
**On Windows:**
42+
43+
Navigate to *mySparkApp/bin/Release/netcoreapp3.0/ubuntu.16.04-x64*. Then, right-click on **Publish** folder and select **Send to > Compressed (zipped) folder**. Name the new folder **publish.zip**.
44+
45+
**On Linux, run the following command:**
4046

41-
The .NET Spark notebook is based on the .NET interactive experiences and provides interactive C# experiences with the ability to use .NET for Spark out of the box (with the Spark session variable `spark` already predefined). For more details on the available notebook capabilities [see below](#sparknet-c-kernel-features).
47+
```bash
48+
zip -r publish.zip
49+
```
4250

43-
## .NET for Apache Spark scenarios
51+
## .NET for Apache Spark in Azure Synapse Analytics notebooks
4452

45-
Notebooks are a great option for prototyping your .NET for Apache Spark pipelines and scenarios. You can start working with, understanding, filtering, displaying, and visualizing your data quickly and efficiently. Data engineers, data scientists, business analysts, and machine learning engineers are all able to collaborate over a shared, highly interactive document. You see immediate results from data exploration, and can visualize your data in the same notebook.
53+
Notebooks are a great option for prototyping your .NET for Apache Spark pipelines and scenarios. You can start working with, understanding, filtering, displaying, and visualizing your data quickly and efficiently. Data engineers, data scientists, business analysts, and machine learning engineers are all able to collaborate over a shared, interactive document. You see immediate results from data exploration, and can visualize your data in the same notebook.
4654

47-
Azure Synapse Analytics notebooks provide a smooth tooling experience with minimal setup, and allow for quick prototyping of big data queries in C# as you learn and practice solving your problems with Apache Spark.
55+
### How to use notebooks
56+
57+
When you create a new notebook, you choose a language kernel that you wish to express your business logic. There is kernel support for several languages, including C#.
58+
59+
To use .NET for Apache Spark in your Azure Synapse Analytics notebook, select **.NET Spark (C#)** as your kernel and attach the notebook to an existing Spark pool.
4860

49-
You can also develop a complete big data experience, such as reading in data, transforming it, and then exploring it through printed text or visualizing it through a plot or chart.
61+
The .NET Spark notebook is based on the .NET interactive experiences and provides interactive C# experiences with the ability to use .NET for Spark out of the box with the Spark session variable `spark` already predefined.
5062

51-
## Spark.NET C# kernel features
63+
### Spark.NET C# kernel features
5264

5365
The following features are available when you use .NET for Apache Spark in the Azure Synapse Analytics notebook:
5466

@@ -64,6 +76,6 @@ The following features are available when you use .NET for Apache Spark in the A
6476

6577
## Next steps
6678

67-
- [.NET for Apache Spark documentation](https://docs.microsoft.com/dotnet/spark)
68-
- [Azure Synapse Analytics](../overview-what-is.md)
69-
<!-- need link to .NET Interactive documentation -->
79+
* [.NET for Apache Spark documentation](https://docs.microsoft.com/dotnet/spark)
80+
* [Azure Synapse Analytics](https://docs.microsoft.com/azure/synapse-analytics)
81+
* [.NET Interactive](https://devblogs.microsoft.com/dotnet/creating-interactive-net-documentation/)

0 commit comments

Comments
 (0)