You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
## Understand the U-SQL and Spark language and processing paradigms
24
24
@@ -48,13 +48,13 @@ Spark programs are similar in that you would use Spark connectors to read the da
48
48
49
49
U-SQL's expression language is C# and it offers various ways to scale out custom .NET code with user-defined functions, user-defined operators and user-defined aggregators.
50
50
51
-
Azure Synapse and Azure HDInsight Spark both now natively support executing .NET code with .NET for Apache Spark. This means that you can potentially reuse some or all of your [.NET user-defined functions with Spark](#transform-user-defined-scalar-net-functions-and-user-defined-aggregators). Note though that U-SQL uses the .NET Framework while .NET for Apache Spark is based on .NET Core 3.1 or later.
51
+
Azure Synapse and Azure HDInsight Spark both now natively support executing .NET code with .NET for Apache Spark. This means that you can potentially reuse some or all of your [.NET user-defined functions with Spark](#transform-user-defined-scalar-net-functions-and-user-defined-aggregators). Note though that U-SQL uses the .NET Framework while .NET for Apache Spark is based on .NET Core 3.1 or later.
52
52
53
53
[U-SQL user-defined operators (UDOs)](#transform-user-defined-operators-udos) are using the U-SQL UDO model to provide scaled-out execution of the operator's code. Thus, UDOs will have to be rewritten into user-defined functions to fit into the Spark execution model.
54
54
55
55
.NET for Apache Spark currently doesn't support user-defined aggregators. Thus, [U-SQL user-defined aggregators](#transform-user-defined-scalar-net-functions-and-user-defined-aggregators) will have to be translated into Spark user-defined aggregators written in Scala.
56
56
57
-
If you don't want to take advantage of the .NET for Apache Spark capabilities, you'll have to rewrite your expressions into an equivalent Spark, Scala, Java, or Python expression, function, aggregator or connector.
57
+
If you don't want to take advantage of the .NET for Apache Spark capabilities, you'll have to rewrite your expressions into an equivalent Spark, Scala, Java, or Python expression, function, aggregator or connector.
58
58
59
59
In any case, if you have a large amount of .NET logic in your U-SQL scripts, please contact us through your Microsoft Account representative for further guidance.
60
60
@@ -137,9 +137,9 @@ For more information, see:
137
137
138
138
In Spark, types per default allow NULL values while in U-SQL, you explicitly mark scalar, non-object as nullable. While Spark allows you to define a column as not nullable, it will not enforce the constraint and [may lead to wrong result](https://medium.com/@weshoffman/apache-spark-parquet-and-troublesome-nulls-28712b06f836).
139
139
140
-
In Spark, NULL indicates that the value is unknown. A Spark NULL value is different from any value, including itself. Comparisons between two Spark NULL values, or between a NULL value and any other value, return unknown because the value of each NULL is unknown.
140
+
In Spark, NULL indicates that the value is unknown. A Spark NULL value is different from any value, including itself. Comparisons between two Spark NULL values, or between a NULL value and any other value, return unknown because the value of each NULL is unknown.
141
141
142
-
This behavior is different from U-SQL, which follows C# semantics where `null` is different from any value but equal to itself.
142
+
This behavior is different from U-SQL, which follows C# semantics where `null` is different from any value but equal to itself.
143
143
144
144
Thus a SparkSQL `SELECT` statement that uses `WHERE column_name = NULL` returns zero rows even if there are NULL values in `column_name`, while in U-SQL, it would return the rows where `column_name` is set to `null`. Similarly, A Spark `SELECT` statement that uses `WHERE column_name != NULL` returns zero rows even if there are non-null values in `column_name`, while in U-SQL, it would return the rows that have non-null. Thus, if you want the U-SQL null-check semantics, you should use [isnull](https://spark.apache.org/docs/2.3.0/api/sql/index.html#isnull) and [isnotnull](https://spark.apache.org/docs/2.3.0/api/sql/index.html#isnotnull) respectively (or their DSL equivalent).
145
145
@@ -203,7 +203,7 @@ Most of the settable system variables have no direct equivalent in Spark. Some o
203
203
204
204
### U-SQL hints
205
205
206
-
U-SQL offers several syntactic ways to provide hints to the query optimizer and execution engine:
206
+
U-SQL offers several syntactic ways to provide hints to the query optimizer and execution engine:
207
207
208
208
- Setting a U-SQL system variable
209
209
- an `OPTION` clause associated with the rowset expression to provide a data or plan hint
@@ -214,7 +214,7 @@ Spark's cost-based query optimizer has its own capabilities to provide hints and
214
214
## Next steps
215
215
216
216
-[Understand Spark data formats for U-SQL developers](understand-spark-data-formats.md)
217
-
-[.NET for Apache Spark](/dotnet/spark/what-is-apache-spark-dotnet)
217
+
-[.NET for Apache Spark](/previous-versions/dotnet/spark/what-is-apache-spark-dotnet)
218
218
-[Upgrade your big data analytics solutions from Azure Data Lake Storage Gen1 to Azure Data Lake Storage Gen2](../storage/blobs/data-lake-storage-migrate-gen1-to-gen2.md)
219
219
-[Transform data using Spark activity in Azure Data Factory](../data-factory/transform-data-using-spark.md)
220
220
-[Transform data using Hadoop Hive activity in Azure Data Factory](../data-factory/transform-data-using-hadoop-hive.md)
Copy file name to clipboardExpand all lines: articles/data-lake-analytics/understand-spark-data-formats.md
+2-2Lines changed: 2 additions & 2 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -47,7 +47,7 @@ After this transformation, you copy the data as outlined in the chapter [Move da
47
47
48
48
-[Understand Spark code concepts for U-SQL developers](understand-spark-code-concepts.md)
49
49
-[Upgrade your big data analytics solutions from Azure Data Lake Storage Gen1 to Azure Data Lake Storage Gen2](../storage/blobs/data-lake-storage-migrate-gen1-to-gen2.md)
50
-
-[.NET for Apache Spark](/dotnet/spark/what-is-apache-spark-dotnet)
50
+
-[.NET for Apache Spark](/previous-versions/dotnet/spark/what-is-apache-spark-dotnet)
51
51
-[Transform data using Spark activity in Azure Data Factory](../data-factory/transform-data-using-spark.md)
52
52
-[Transform data using Hadoop Hive activity in Azure Data Factory](../data-factory/transform-data-using-hadoop-hive.md)
53
-
-[What is Apache Spark in Azure HDInsight](../hdinsight/spark/apache-spark-overview.md)
53
+
-[What is Apache Spark in Azure HDInsight](../hdinsight/spark/apache-spark-overview.md)
Copy file name to clipboardExpand all lines: articles/data-lake-analytics/understand-spark-for-usql-developers.md
+2-2Lines changed: 2 additions & 2 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -42,7 +42,7 @@ It includes the steps you can take, and several alternatives.
42
42
-[Understand Spark data formats for U-SQL developers](understand-spark-data-formats.md)
43
43
-[Understand Spark code concepts for U-SQL developers](understand-spark-code-concepts.md)
44
44
-[Upgrade your big data analytics solutions from Azure Data Lake Storage Gen1 to Azure Data Lake Storage Gen2](../storage/blobs/data-lake-storage-migrate-gen1-to-gen2.md)
45
-
-[.NET for Apache Spark](/dotnet/spark/what-is-apache-spark-dotnet)
45
+
-[.NET for Apache Spark](/previous-versions/dotnet/spark/what-is-apache-spark-dotnet)
46
46
-[Transform data using Hadoop Hive activity in Azure Data Factory](../data-factory/transform-data-using-hadoop-hive.md)
47
47
-[Transform data using Spark activity in Azure Data Factory](../data-factory/transform-data-using-spark.md)
48
-
-[What is Apache Spark in Azure HDInsight](../hdinsight/spark/apache-spark-overview.md)
48
+
-[What is Apache Spark in Azure HDInsight](../hdinsight/spark/apache-spark-overview.md)
Copy file name to clipboardExpand all lines: articles/synapse-analytics/spark/spark-dotnet.md
+22-17Lines changed: 22 additions & 17 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -3,18 +3,18 @@ title: Use .NET for Apache Spark
3
3
description: Learn about using .NET and Apache Spark to do batch processing, real-time streaming, machine learning, and write ad-hoc queries in Azure Synapse Analytics notebooks.
4
4
author: juluczni
5
5
ms.author: juluczni
6
-
services: synapse-analytics
7
-
ms.service: synapse-analytics
6
+
services: synapse-analytics
7
+
ms.service: synapse-analytics
8
8
ms.topic: conceptual
9
9
ms.subservice: spark
10
10
ms.custom: devx-track-dotnet
11
-
ms.date: 05/01/2020
11
+
ms.date: 05/01/2020
12
12
ms.reviewer: sngun
13
13
---
14
14
15
15
# Use .NET for Apache Spark with Azure Synapse Analytics
16
16
17
-
[.NET for Apache Spark](https://dot.net/spark) provides free, [open-source](https://github.com/dotnet/spark), and cross-platform .NET support for Spark.
17
+
[.NET for Apache Spark](https://dot.net/spark) provides free, [open-source](https://github.com/dotnet/spark), and cross-platform .NET support for Spark.
18
18
19
19
It provides .NET bindings for Spark, which allows you to access Spark APIs through C# and F#. With .NET for Apache Spark, you can also write and execute user-defined functions for Spark written in .NET. The .NET APIs for Spark enable you to access all aspects of Spark DataFrames that help you analyze your data, including Spark SQL, Delta Lake, and Structured Streaming.
20
20
@@ -23,8 +23,8 @@ You can analyze data with .NET for Apache Spark through Spark batch job definiti
23
23
>[!IMPORTANT]
24
24
> The [.NET for Apache Spark](/previous-versions/dotnet/spark/what-is-apache-spark-dotnet) is an open-source project under the .NET Foundation that currently requires the .NET 3.1 library, which has reached the out-of-support status. We would like to inform users of Azure Synapse Spark of the removal of the .NET for Apache Spark library in the Azure Synapse Runtime for Apache Spark version 3.3. Users may refer to the [.NET Support Policy](https://dotnet.microsoft.com/platform/support/policy/dotnet-core) for more details on this matter.
25
25
>
26
-
> As a result, it will no longer be possible for users to utilize Apache Spark APIs via C# and F#, or execute C# code in notebooks within Synapse or through Apache Spark Job definitions in Synapse. It is important to note that this change affects only Azure Synapse Runtime for Apache Spark 3.3 and above.
27
-
>
26
+
> As a result, it will no longer be possible for users to utilize Apache Spark APIs via C# and F#, or execute C# code in notebooks within Synapse or through Apache Spark Job definitions in Synapse. It is important to note that this change affects only Azure Synapse Runtime for Apache Spark 3.3 and above.
27
+
>
28
28
> We will continue to support .NET for Apache Spark in all previous versions of the Azure Synapse Runtime according to [their lifecycle stages](runtime-for-apache-spark-lifecycle-and-supportability.md). However, we do not have plans to support .NET for Apache Spark in Azure Synapse Runtime for Apache Spark 3.3 and future versions. We recommend that users with existing workloads written in C# or F# migrate to Python or Scala. Users are advised to take note of this information and plan accordingly.
29
29
30
30
## Submit batch jobs using the Spark job definition
@@ -37,34 +37,35 @@ The required .NET Spark version will be noted in the Synapse Studio interface un
37
37
:::image type="content" source="./media/apache-spark-job-definitions/net-spark-workspace-compatibility.png" alt-text="Screenshot that shows properties, including the .NET Spark version.":::
38
38
39
39
Create your project as a .NET console application that outputs an Ubuntu x86 executable.
3. Zip the contents of the publish folder, `publish.zip` for example, that was created as a result of Step 1. All the assemblies should be in the root of the ZIP file and there should be no intermediate folder layer. This means when you unzip `publish.zip`, all assemblies are extracted into your current working directory.
63
+
3. Zip the contents of the publish folder, `publish.zip` for example, that was created as a result of Step 1. All the assemblies should be in the root of the ZIP file and there should be no intermediate folder layer. This means when you unzip `publish.zip`, all assemblies are extracted into your current working directory.
64
64
65
65
**On Windows:**
66
66
67
67
Using Windows PowerShell or PowerShell 7, create a .zip from the contents of your publish directory.
68
+
68
69
```PowerShell
69
70
Compress-Archive publish/* publish.zip -Update
70
71
```
@@ -77,9 +78,9 @@ The required .NET Spark version will be noted in the Synapse Studio interface un
77
78
zip -r publish.zip
78
79
```
79
80
80
-
## .NET for Apache Spark in Azure Synapse Analytics notebooks
81
+
## .NET for Apache Spark in Azure Synapse Analytics notebooks
81
82
82
-
Notebooks are a great option for prototyping your .NET for Apache Spark pipelines and scenarios. You can start working with, understanding, filtering, displaying, and visualizing your data quickly and efficiently.
83
+
Notebooks are a great option for prototyping your .NET for Apache Spark pipelines and scenarios. You can start working with, understanding, filtering, displaying, and visualizing your data quickly and efficiently.
83
84
84
85
Data engineers, data scientists, business analysts, and machine learning engineers are all able to collaborate over a shared, interactive document. You see immediate results from data exploration, and can visualize your data in the same notebook.
85
86
@@ -109,19 +110,23 @@ The following features are available when you use .NET for Apache Spark in the A
109
110
* Access to the standard C# library (such as System, LINQ, Enumerables, and so on).
110
111
* Support for C# 8.0 language features.
111
112
* `spark` as a pre-defined variable to give you access to your Apache Spark session.
112
-
* Support for defining [.NET user-defined functions that can run within Apache Spark](/dotnet/spark/how-to-guides/udf-guide). We recommend [Write and call UDFs in .NET for Apache Spark Interactive environments](/dotnet/spark/how-to-guides/dotnet-interactive-udf-issue) for learning how to use UDFs in .NET for Apache Spark Interactive experiences.
113
+
* Support for defining [.NET user-defined functions that can run within Apache Spark](/previous-versions/dotnet/spark/how-to-guides/udf-guide). We recommend [Write and call UDFs in .NET for Apache Spark Interactive environments](/previous-versions/dotnet/spark/how-to-guides/dotnet-interactive-udf-issue) for learning how to use UDFs in .NET for Apache Spark Interactive experiences.
113
114
* Support for visualizing output from your Spark jobs using different charts (such as line, bar, or histogram) and layouts (such as single, overlaid, and so on) using the `XPlot.Plotly` library.
114
115
* Ability to include NuGet packages into your C# notebook.
116
+
115
117
## Troubleshooting
116
118
117
119
### `DotNetRunner: null` / `Futures timeout` in Synapse Spark Job Definition Run
120
+
118
121
Synapse Spark Job Definitions on Spark Pools using Spark 2.4 require `Microsoft.Spark` 1.0.0. Clear your `bin` and `obj` directories, and publish the project using 1.0.0.
119
-
### OutOfMemoryError: java heap space at org.apache.spark...
122
+
123
+
### OutOfMemoryError: java heap space at org.apache.spark
124
+
120
125
Dotnet Spark 1.0.0 uses a different debug architecture than 1.1.1+. You will have to use 1.0.0 for your published version and 1.1.1+ for local debugging.
121
126
122
127
## Next steps
123
128
124
129
* [.NET for Apache Spark documentation](/previous-versions/dotnet/spark/what-is-apache-spark-dotnet)
125
-
* [.NET for Apache Spark Interactive guides](/dotnet/spark/how-to-guides/dotnet-interactive-udf-issue)
130
+
* [.NET for Apache Spark Interactive guides](/previous-versions/dotnet/spark/how-to-guides/dotnet-interactive-udf-issue)
0 commit comments