How to deploy Spark Standalone in Oracle Cloud (OCI) ?

The following walk-through guides you through the steps needed to set up your environment to run Spark and Hadoop in Oracle Cloud Infrastructure.

Prerequisites

You have deployed a VM 2.1 or + with Oracle Linux 7.9 (OEL7) in Oracle Cloud Infrastructure (OCI).

The installation of Oracle Linux 7.9 is using a JVM by default.
You have access to root either directly or via sudo. By default in OCI, you are connected like "opc" user with sudo privilege.

[opc@xxx ~]$ java -version
java version "1.8.0_281"
Java(TM) SE Runtime Environment (build 1.8.0_281-b09)
Java HotSpot(TM) 64-Bit Server VM (build 25.281-b09, mixed mode)

Java Installation

The install is pretty simple. It consists of setting up Java, installing spark and hadoop components and libraries. Lets start with setting up the Spark & Hadoop Environment.

Download the last version of JDK 1.8 because Hadoop 2.X is using this JAVA version.

rpm -ivh /home/opc/jdk-8u271-linux-x64.rpm

Check Java Version.

java -version

Spark and Hadoop Setup

The next step is to install Spark and Hadoop Environment.

First step, choose the version of Spark and Hadoop you want to install. Download the version you want to install

Download Spark 2.4.5 for Hadoop 2.7

cd /home/opc
wget http://apache.uvigo.es/spark/spark-2.4.5/spark-2.4.5-bin-hadoop2.7.tgz

Download Spark 2.4.7 for Hadoop 2.7

wget http://apache.uvigo.es/spark/spark-2.4.7/spark-2.4.7-bin-hadoop2.7.tgz

Download Spark 3.1.1 for Hadoop 3.2

wget http://apache.uvigo.es/spark/spark-3.1.1/spark-3.1.1-bin-hadoop3.2.tgz

Install the Spark and Hadoop Version

Install the Spark and Hadoop Version choosen in the directory "/opt".

sudo -i
cd /opt
tar -zxvf /home/opc/spark-2.4.5-bin-hadoop2.7.tgz
or 
tar -zxvf /home/opc/spark-2.4.7-bin-hadoop2.7.tgz
or
tar -zxvf /home/opc/spark-3.1.1-bin-hadoop3.2.tgz
or
tar -zxvf /home/opc/spark-3.1.2-bin-hadoop3.2.tgz

Install PYSPARK in PYTHON3 evnironment

/opt/Python-3.7.6/bin/pip3 install 'pyspark=2.4.7'
/opt/Python-3.7.6/bin/pip3 install findspark

Next we can create a virtual environment and enable it.

Modify your environment to use this Spark and Hadoop Version

Add to ".bashrc" for the user "opc" the next lines

# Add by %OP%
export PYTHONHOME=/opt/anaconda3
export PATH=$PYTHONHOME/bin:$PYTHONHOME/condabin:$PATH

# SPARK ENV
#export JAVA_HOME=$(/usr/libexec/java_home)
export SPARK_HOME=/opt/spark-2.4.5-bin-hadoop2.7
export PATH=$SPARK_HOME/bin:$PATH
export PYSPARK_PYTHON=python3

export PYSPARK_DRIVER_PYTHON=jupyter
export PYSPARK_DRIVER_PYTHON_OPTS='notebook'

Test your Spark and Hadoop Environment

If you're running directly on a virtual machine and have a browser installed it should take you directly into the jupyter environment. Connect to your "http://xxx.xxx.xxx.xxx:8001/".

And upload the next notebooks:

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

How to deploy Spark Standalone in Oracle Cloud (OCI) ?

Prerequisites

Java Installation

Spark and Hadoop Setup

Download Spark 2.4.5 for Hadoop 2.7

Download Spark 2.4.7 for Hadoop 2.7

Download Spark 3.1.1 for Hadoop 3.2

Install the Spark and Hadoop Version

Install PYSPARK in PYTHON3 evnironment

Modify your environment to use this Spark and Hadoop Version

Test your Spark and Hadoop Environment

FilesExpand file tree

howto_deploy_spark.md

Latest commit

History

howto_deploy_spark.md

File metadata and controls

How to deploy Spark Standalone in Oracle Cloud (OCI) ?

Prerequisites

Java Installation

Spark and Hadoop Setup

Download Spark 2.4.5 for Hadoop 2.7

Download Spark 2.4.7 for Hadoop 2.7

Download Spark 3.1.1 for Hadoop 3.2

Install the Spark and Hadoop Version

Install PYSPARK in PYTHON3 evnironment

Modify your environment to use this Spark and Hadoop Version

Test your Spark and Hadoop Environment