This script installs R, R Studio, and Hadoop on EMR machines. It compiles R from source and installs it to /usr/local/. It also installs and starts R Studio on the main node of the EMR cluster. Finally, it installs some common R packages that are used in a blog example.
- Copy and paste the script into a text file, then save the file with a
.shextension (e.g.install_r_studio_hadoop.sh). - Make the script executable with the command
chmod +x install_r_studio_hadoop.sh. - Run the script with the command
./install_r_studio_hadoop.sh.
set -x -e: Enables debugging (-x) and causes the script to exit immediately if any command fails (-e).rver=4.0.3: Sets the desired version of R.rspkg=rstudio-server-rhel-1.3.1093-x86_64.rpm: Sets the R Studio package that will be downloaded.rspasswd=hadoop: Sets the password for the R Studio userhadoop.- The
grepcommand checks whether we're running on the main node. - The
yum installcommand installs some additional R and R package dependencies. - The
mkdirandcdcommands create and enter the/tmp/R-builddirectory. - The
curlcommand downloads the R source code. - The
tarcommand extracts the R source code. - The
./configurecommand configures the R installation. - The
makecommand builds R. - The
sudo make installcommand installs R. - The
catcommand creates a temporary file with R environment variables for EMR. - The
catandsudo tee -acommands add the environment variables to/usr/local/lib64/R/etc/Renviron. - The
sudo /usr/local/bin/R CMD javareconfcommand reconfigures Java support before installing packages. - The
curlcommand downloads the R Studio package. - The
sudo mkdir -pcommand creates the/etc/rstudiodirectory. - The
sudo sh -ccommand adds anauth-minimum-user-idsetting to therserver.conffile. - The
sudo yum installcommand installs R Studio. - The
sudo rstudio-server startcommand starts R Studio (only on the main node). - The
sudo sh -ccommand sets the password for thehadoopuser. - The
sudo /usr/local/bin/R --no-savecommand installs some common R packages.