Skip to content

Commit d9e958c

Browse files
Merge pull request #90681 from cutecycle/patch-1
Clarity regarding dotnet Spark Job definition dependencies for Synapse
2 parents 26bef81 + 8d36695 commit d9e958c

File tree

2 files changed

+35
-4
lines changed

2 files changed

+35
-4
lines changed
43.7 KB
Loading

articles/synapse-analytics/spark/spark-dotnet.md

Lines changed: 35 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -23,18 +23,43 @@ You can analyze data with .NET for Apache Spark through Spark batch job definiti
2323

2424
Visit the tutorial to learn how to use Azure Synapse Analytics to [create Apache Spark job definitions for Synapse Spark pools](apache-spark-job-definitions.md). If you haven't packaged your app to submit to Azure Synapse, complete the following steps.
2525

26-
1. Run the following commands to publish your app. Be sure to replace *mySparkApp* with the path to your app.
26+
1. Configure your `dotnet` application dependencies for compatibility with Synapse Spark.
27+
The required .NET Spark version will be noted in the Synapse Studio interface under your Apache Spark Pool configuration, under the Manage toolbox.
28+
29+
:::image type="content" source="./media/apache-spark-job-definitions/net-spark-workspace-compatibility.png" alt-text="Screenshot that shows properties, including the .NET Spark version.":::
30+
31+
Create your project as a .NET console application that outputs an Ubuntu x86 executable.
32+
33+
```
34+
<Project Sdk="Microsoft.NET.Sdk">
35+
36+
<PropertyGroup>
37+
<OutputType>Exe</OutputType>
38+
<TargetFramework>netcoreapp3.1</TargetFramework>
39+
</PropertyGroup>
40+
41+
<ItemGroup>
42+
<PackageReference Include="Microsoft.Spark" Version="2.1.0" />
43+
</ItemGroup>
44+
45+
</Project>
46+
```
47+
48+
2. Run the following commands to publish your app. Be sure to replace *mySparkApp* with the path to your app.
2749

2850
```dotnetcli
2951
cd mySparkApp
3052
dotnet publish -c Release -f netcoreapp3.1 -r ubuntu.18.04-x64
3153
```
3254

33-
2. Zip the contents of the publish folder, `publish.zip` for example, that was created as a result of Step 1. All the assemblies should be in the first layer of the ZIP file and there should be no intermediate folder layer. This means when you unzip `publish.zip`, all assemblies are extracted into your current working directory.
55+
3. Zip the contents of the publish folder, `publish.zip` for example, that was created as a result of Step 1. All the assemblies should be in the root of the ZIP file and there should be no intermediate folder layer. This means when you unzip `publish.zip`, all assemblies are extracted into your current working directory.
3456

3557
**On Windows:**
3658

37-
Use an extraction program, like [7-Zip](https://www.7-zip.org/) or [WinZip](https://www.winzip.com/), to extract the file into the bin directory with all the published binaries.
59+
Using Windows PowerShell or PowerShell 7, create a .zip from the contents of your publish directory.
60+
```PowerShell
61+
Compress-Archive publish/* publish.zip -Update
62+
```
3863
3964
**On Linux:**
4065
@@ -48,7 +73,7 @@ Visit the tutorial to learn how to use Azure Synapse Analytics to [create Apache
4873
4974
Notebooks are a great option for prototyping your .NET for Apache Spark pipelines and scenarios. You can start working with, understanding, filtering, displaying, and visualizing your data quickly and efficiently.
5075
51-
Data engineers, data scientists, business analysts, and machine learning engineers are all able to collaborate over a shared, interactive document. You see immediate results from data exploration, and can visualize your data in the same notebook.
76+
Data engineers, data scientists, business analysts, and machine learning engineers are all able to collaborate over a shared, interactive document. You see immediate results from data exploration, and can visualize your data in the same notebook.
5277
5378
### How to use .NET for Apache Spark notebooks
5479
@@ -79,6 +104,12 @@ The following features are available when you use .NET for Apache Spark in the A
79104
* Support for defining [.NET user-defined functions that can run within Apache Spark](/dotnet/spark/how-to-guides/udf-guide). We recommend [Write and call UDFs in .NET for Apache Spark Interactive environments](/dotnet/spark/how-to-guides/dotnet-interactive-udf-issue) for learning how to use UDFs in .NET for Apache Spark Interactive experiences.
80105
* Support for visualizing output from your Spark jobs using different charts (such as line, bar, or histogram) and layouts (such as single, overlaid, and so on) using the `XPlot.Plotly` library.
81106
* Ability to include NuGet packages into your C# notebook.
107+
## Troubleshooting
108+
109+
### `DotNetRunner: null` / `Futures timeout` in Synapse Spark Job Definition Run
110+
Synapse Spark Job Definitions on Spark Pools using Spark 2.4 require `Microsoft.Spark` 1.0.0. Clear your `bin` and `obj` directories, and publish the project using 1.0.0.
111+
### OutOfMemoryError: java heap space at org.apache.spark...
112+
Dotnet Spark 1.0.0 uses a different debug architecture than 1.1.1+. You will have to use 1.0.0 for your published version and 1.1.1+ for local debugging.
82113
83114
## Next steps
84115

0 commit comments

Comments
 (0)