These instructions will show you how to run a .NET for Apache Spark app using .NET Core on MacOSX.
- Download and install .NET Core 2.1 SDK
- Install Java 8
- Select the appropriate version for your operating system e.g.,
jdk-8u231-macosx-x64.dmg. - Install using the installer and verify you are able to run
javafrom your command-line
- Select the appropriate version for your operating system e.g.,
- Download and install Apache Spark 2.4.4:
- Add the necessary environment variables SPARK_HOME e.g.,
~/bin/spark-2.4.4-bin-hadoop2.7/export SPARK_HOME=~/bin/spark-2.4.4-bin-hadoop2.7/ export PATH="$SPARK_HOME/bin:$PATH" source ~/.bashrc
- Add the necessary environment variables SPARK_HOME e.g.,
- Download and install Microsoft.Spark.Worker release:
- Select a Microsoft.Spark.Worker release from .NET for Apache Spark GitHub Releases page and download into your local machine (e.g.,
/bin/Microsoft.Spark.Worker/). - IMPORTANT Create a new environment variable using
export DOTNET_WORKER_DIR <your_path>and set it to the directory where you downloaded and extracted the Microsoft.Spark.Worker (e.g.,/bin/Microsoft.Spark.Worker/).
- Select a Microsoft.Spark.Worker release from .NET for Apache Spark GitHub Releases page and download into your local machine (e.g.,
- Use the
dotnetCLI to create a console application.dotnet new console -o HelloSpark - Install
Microsoft.SparkNuget package into the project from the spark nuget.org feed - see Ways to install Nuget Packagecd HelloSpark dotnet add package Microsoft.Spark - Replace the contents of the
Program.csfile with the following code:using Microsoft.Spark.Sql; namespace HelloSpark { class Program { static void Main(string[] args) { var spark = SparkSession.Builder().GetOrCreate(); var df = spark.Read().Json("people.json"); df.Show(); } } }
- Use the
dotnetCLI to build the application:dotnet build
-
Open your terminal and navigate into your app folder:
cd <your-app-output-directory>
-
Create
people.jsonwith the following content:{ "name" : "Michael" } { "name" : "Andy", "age" : 30 } { "name" : "Justin", "age" : 19 } -
Run your app
spark-submit \ --class org.apache.spark.deploy.dotnet.DotnetRunner \ --master local \ microsoft-spark-2.4.x-<version>.jar \ dotnet HelloSpark.dll
Note: This command assumes you have downloaded Apache Spark and added it to your PATH environment variable to be able to use
spark-submit, otherwise, you would have to use the full path (e.g.,~/spark/bin/spark-submit). -
The output of the application should look similar to the output below:
+----+-------+ | age| name| +----+-------+ |null|Michael| | 30| Andy| | 19| Justin| +----+-------+