The following walk-through guides you through the steps needed to set up your environment to run Spark and Hadoop in Oracle Cloud Infrastructure.
You have deployed a VM 2.1 or + with Oracle Linux 7.9 (OEL7) in Oracle Cloud Infrastructure (OCI).
- The installation of Oracle Linux 7.9 is using a JVM by default.
- You have access to root either directly or via sudo. By default in OCI, you are connected like "opc" user with sudo privilege.
[opc@xxx ~]$ java -version
java version "1.8.0_281"
Java(TM) SE Runtime Environment (build 1.8.0_281-b09)
Java HotSpot(TM) 64-Bit Server VM (build 25.281-b09, mixed mode)The install is pretty simple. It consists of setting up Java, installing spark and hadoop components and libraries. Lets start with setting up the Spark & Hadoop Environment.
Download the last version of JDK 1.8 because Hadoop 2.X is using this JAVA version.
rpm -ivh /home/opc/jdk-8u271-linux-x64.rpmCheck Java Version.
java -versionThe next step is to install Spark and Hadoop Environment.
First step, choose the version of Spark and Hadoop you want to install. Download the version you want to install
cd /home/opc
wget http://apache.uvigo.es/spark/spark-2.4.5/spark-2.4.5-bin-hadoop2.7.tgzwget http://apache.uvigo.es/spark/spark-2.4.7/spark-2.4.7-bin-hadoop2.7.tgzwget http://apache.uvigo.es/spark/spark-3.1.1/spark-3.1.1-bin-hadoop3.2.tgzInstall the Spark and Hadoop Version choosen in the directory "/opt".
sudo -i
cd /opt
tar -zxvf /home/opc/spark-2.4.5-bin-hadoop2.7.tgz
or
tar -zxvf /home/opc/spark-2.4.7-bin-hadoop2.7.tgz
or
tar -zxvf /home/opc/spark-3.1.1-bin-hadoop3.2.tgz
or
tar -zxvf /home/opc/spark-3.1.2-bin-hadoop3.2.tgz/opt/Python-3.7.6/bin/pip3 install 'pyspark=2.4.7'
/opt/Python-3.7.6/bin/pip3 install findsparkNext we can create a virtual environment and enable it.
Add to ".bashrc" for the user "opc" the next lines
# Add by %OP%
export PYTHONHOME=/opt/anaconda3
export PATH=$PYTHONHOME/bin:$PYTHONHOME/condabin:$PATH
# SPARK ENV
#export JAVA_HOME=$(/usr/libexec/java_home)
export SPARK_HOME=/opt/spark-2.4.5-bin-hadoop2.7
export PATH=$SPARK_HOME/bin:$PATH
export PYSPARK_PYTHON=python3
export PYSPARK_DRIVER_PYTHON=jupyter
export PYSPARK_DRIVER_PYTHON_OPTS='notebook'If you're running directly on a virtual machine and have a browser installed it should take you directly into the jupyter environment. Connect to your "http://xxx.xxx.xxx.xxx:8001/".
And upload the next notebooks: