Skip to content

Commit e7bf58a

Browse files
committed
More updates for getting cluster mode working with 1.5.0
1 parent a8c7b39 commit e7bf58a

File tree

9 files changed

+53
-100
lines changed

9 files changed

+53
-100
lines changed

cluster_setup.md

Lines changed: 19 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,19 @@
1+
Our cluster configuration uses docker host networking. There are a series of scripts to bring up the dockers that make up our cluster. You will likely need to tailor these scripts to meet the needs of your configuration.
2+
3+
We have several scripts:
4+
spark/docker/start_master_host.sh This brings up the spark master container using host networking.
5+
spark/docker/start_worker_host.sh This brings up the spark worker container using host networking.
6+
spark/docker/start_launcher_host.sh This brings up the spark launcher container using host networking. This is the container where our run_tpch.sh will launch the benchmark from.
7+
dikeHDFS/start_server_host.sh This brings up the docker with HDFS, and NDP.
8+
9+
There is a config file called spark/spark.config. It has the config of the addresses and hostnames needed by the above scripts. You need to modify it for your configuration. There is an example in our repo.
10+
11+
You also need to configure dikeHDFS/start_server_host.sh with your IP address. Change the line with --add-host=dikehdfs to include your storage server's ip address.
12+
13+
As an example, in our configuration we typically will follow this sequence.
14+
1) From our master node we will run start_master_host.sh and start_launcher_host.sh
15+
2) Next we go to the worker nodes and run start_worker_host.sh 1 8
16+
3) Note that the 1 8 above is the number of workers followed by the number of cores to use.
17+
4) Launch the NDP server via dikeHDFS/start_server_host.sh
18+
19+

demo.sh

Lines changed: 10 additions & 10 deletions
Original file line numberDiff line numberDiff line change
@@ -4,22 +4,22 @@
44
printf "\nNext Test: Spark TPC-H query with HDFS storage and with no pushdown\n"
55
read -n 1 -s -r -p "Press any key to continue with test."
66
cd benchmark/tpch
7-
./run_tpch.sh -t 6 -ds ndp --protocol ndphdfs
7+
./run_tpch.sh --local -t 6 -ds ndp --protocol ndphdfs
88
printf "\nTest Complete: Spark TPC-H query with HDFS storage and with no pushdown\n"
99

1010
printf "\nNext Test: Spark TPC-H query with HDFS storage and with pushdown enabled.\n"
1111
read -n 1 -s -r -p "Press any key to continue with test."
12-
./run_tpch.sh -t 6 -ds ndp --protocol ndphdfs --pushdown
12+
./run_tpch.sh --local -t 6 -ds ndp --protocol ndphdfs --pushdown
1313
printf "\nTest Complete: Spark TPC-H query with HDFS storage and with pushdown enabled.\n"
1414

1515

1616

17-
printf "\nNext Test: Spark TPC-H query with S3 storage and with no pushdown\n"
18-
read -n 1 -s -r -p "Press any key to continue with test."
19-
./run_tpch.sh -t 6 -ds ndp --protocol s3
20-
printf "Test Complete: Spark TPC-H query with S3 storage and with no pushdown\n"
17+
#printf "\nNext Test: Spark TPC-H query with S3 storage and with no pushdown\n"
18+
#read -n 1 -s -r -p "Press any key to continue with test."
19+
#./run_tpch.sh --local -t 6 -ds ndp --protocol s3
20+
#printf "Test Complete: Spark TPC-H query with S3 storage and with no pushdown\n"
2121

22-
printf "\nNext Test: Spark TPC-H query with S3 and with pushdown enabled.\n"
23-
read -n 1 -s -r -p "Press any key to continue with test."
24-
./run_tpch.sh -t 6 -ds ndp --protocol s3 --pushdown
25-
printf "\nTest Complete: Spark TPC-H query with S3 and with pushdown enabled.\n"
22+
#printf "\nNext Test: Spark TPC-H query with S3 and with pushdown enabled.\n"
23+
#read -n 1 -s -r -p "Press any key to continue with test."
24+
#./run_tpch.sh --local -t 6 -ds ndp --protocol s3 --pushdown
25+
#printf "\nTest Complete: Spark TPC-H query with S3 and with pushdown enabled.\n"

dikeHDFS

spark/docker/start-launcher.sh

Lines changed: 5 additions & 59 deletions
Original file line numberDiff line numberDiff line change
@@ -11,78 +11,25 @@ rm -f "${ROOT_DIR}/volume/status/MASTER*"
1111

1212
CMD="sleep 365d"
1313
RUNNING_MODE="daemon"
14-
START_LOCAL="NO"
15-
if [ ! -d spark.config ]; then
16-
START_LOCAL="YES"
17-
else
18-
DOCKER_HOSTS="$(cat spark.config | grep DOCKER_HOSTS)"
19-
IFS='=' read -a IP_ARRAY <<< "$DOCKER_HOSTS"
20-
DOCKER_HOSTS=${IP_ARRAY[1]}
21-
HOSTS=""
22-
IFS=',' read -a IP_ARRAY <<< "$DOCKER_HOSTS"
23-
for i in "${IP_ARRAY[@]}"
24-
do
25-
HOSTS="$HOSTS --add-host=$i"
26-
done
27-
DOCKER_HOSTS=$HOSTS
28-
echo "Docker Hosts: $DOCKER_HOSTS"
2914

30-
LAUNCHER_IP="$(cat spark.config | grep LAUNCHER_IP)"
31-
IFS='=' read -a IP_ARRAY <<< "$LAUNCHER_IP"
32-
LAUNCHER_IP=${IP_ARRAY[1]}
33-
echo "LAUNCHER_IP: $LAUNCHER_IP"
34-
fi
35-
DOCKER_ID=""
3615
if [ $RUNNING_MODE = "interactive" ]; then
3716
DOCKER_IT="-i -t"
3817
fi
3918
# --cpuset-cpus="9-12" \
40-
if [ ${START_LOCAL} == "YES" ]; then
41-
DOCKER_RUN="docker run ${DOCKER_IT} --rm \
19+
DOCKER_RUN="docker run ${DOCKER_IT} --rm \
4220
-p 5006:5006 \
4321
--name sparklauncher \
4422
--network dike-net \
4523
-e MASTER=spark://sparkmaster:7077 \
4624
-e SPARK_CONF_DIR=/conf \
4725
-e SPARK_PUBLIC_DNS=localhost \
48-
--mount type=bind,source=$(pwd)/spark,target=/spark \
49-
--mount type=bind,source=$(pwd)/build,target=/build \
50-
--mount type=bind,source=$(pwd)/examples,target=/examples \
51-
--mount type=bind,source=$(pwd)/../data,target=/tpch-data \
52-
--mount type=bind,source=$(pwd)/../dikeHDFS,target=/dikeHDFS \
53-
--mount type=bind,source=$(pwd)/../benchmark/tpch,target=/tpch \
54-
--mount type=bind,source=$(pwd)/../pyNdp,target=/pyNdp \
55-
--mount type=bind,source=$(pwd)/../pushdown-datasource/pushdown-datasource,target=/pushdown-datasource \
56-
-v $(pwd)/conf/master:/conf \
57-
-v ${ROOT_DIR}/build/.m2:${DOCKER_HOME_DIR}/.m2 \
58-
-v ${ROOT_DIR}/build/.gnupg:${DOCKER_HOME_DIR}/.gnupg \
59-
-v ${ROOT_DIR}/build/.sbt:${DOCKER_HOME_DIR}/.sbt \
60-
-v ${ROOT_DIR}/build/.cache:${DOCKER_HOME_DIR}/.cache \
61-
-v ${ROOT_DIR}/build/.ivy2:${DOCKER_HOME_DIR}/.ivy2 \
62-
-v ${ROOT_DIR}/volume/status:/opt/volume/status \
63-
-v ${ROOT_DIR}/volume/logs:/opt/volume/logs \
64-
-v ${ROOT_DIR}/bin/:${DOCKER_HOME_DIR}/bin \
65-
-e "AWS_ACCESS_KEY_ID=${USER_NAME}" \
66-
-e "AWS_SECRET_ACCESS_KEY=admin123" \
67-
-e "AWS_EC2_METADATA_DISABLED=true" \
68-
-e RUNNING_MODE=${RUNNING_MODE} \
69-
-u ${USER_ID} \
70-
spark-run-${USER_NAME} ${CMD}"
71-
else
72-
DOCKER_RUN="docker run ${DOCKER_IT} --rm \
73-
-p 5006:5006 \
74-
--name sparklauncher \
75-
--network dike-net --ip ${LAUNCHER_IP} ${DOCKER_HOSTS} \
76-
-e MASTER=spark://sparkmaster:7077 \
77-
-e SPARK_CONF_DIR=/conf \
78-
-e SPARK_PUBLIC_DNS=localhost \
7926
-e SPARK_MASTER="spark://sparkmaster:7077" \
80-
-e SPARK_DRIVER_HOST=${LAUNCHER_IP} \
8127
--mount type=bind,source=$(pwd)/spark,target=/spark \
8228
--mount type=bind,source=$(pwd)/build,target=/build \
8329
--mount type=bind,source=$(pwd)/examples,target=/examples \
8430
--mount type=bind,source=$(pwd)/../dikeHDFS,target=/dikeHDFS \
8531
--mount type=bind,source=$(pwd)/../benchmark/tpch,target=/tpch \
32+
--mount type=bind,source=$(pwd)/../data,target=/tpch-data \
8633
--mount type=bind,source=$(pwd)/../pushdown-datasource/pushdown-datasource,target=/pushdown-datasource \
8734
-v $(pwd)/conf/master:/conf \
8835
-v ${ROOT_DIR}/build/.m2:${DOCKER_HOME_DIR}/.m2 \
@@ -98,11 +45,10 @@ else
9845
-e "AWS_EC2_METADATA_DISABLED=true" \
9946
-e RUNNING_MODE=${RUNNING_MODE} \
10047
-u ${USER_ID} \
101-
spark-run-${USER_NAME} ${CMD}"
102-
fi
103-
echo "mode: $RUNNING_MODE"
48+
v${DIKE_VERSION}-spark-run-${USER_NAME} ${CMD}"
49+
10450
if [ $RUNNING_MODE = "interactive" ]; then
10551
eval "${DOCKER_RUN}"
10652
else
10753
eval "${DOCKER_RUN}" &
108-
fi
54+
fi

spark/docker/start-master.sh

Lines changed: 5 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -1,6 +1,7 @@
11
#!/bin/bash
22

33
# Include the setup for our cached local directories. (.m2, .ivy2, etc)
4+
source docker/spark_version
45
source docker/setup.sh
56

67
mkdir -p "${ROOT_DIR}/volume/logs"
@@ -37,8 +38,8 @@ else
3738
fi
3839
fi
3940
echo "removing work and logs"
40-
rm -rf build/spark-3.1.2/work/
41-
rm -rf build/spark-3.1.2/logs/
41+
rm -rf build/spark-$SPARK_VERSION/work/
42+
rm -rf build/spark-$SPARK_VERSION/logs/
4243

4344
# --cpuset-cpus="9-12" \
4445
if [ ${START_LOCAL} == "YES" ]; then
@@ -67,7 +68,7 @@ if [ ${START_LOCAL} == "YES" ]; then
6768
-v ${ROOT_DIR}/bin/:${DOCKER_HOME_DIR}/bin \
6869
-e RUNNING_MODE=${RUNNING_MODE} \
6970
-u ${USER_ID} \
70-
spark-run-${USER_NAME} ${CMD}"
71+
v${DIKE_VERSION}-spark-run-${USER_NAME} ${CMD}"
7172
else
7273
DOCKER_RUN="docker run ${DOCKER_IT} --rm \
7374
-p 4040:4040 -p 6066:6066 -p 7077:7077 -p 8080:8080 -p 5005:5005 -p 18080:18080 \
@@ -98,7 +99,7 @@ else
9899
-e "AWS_EC2_METADATA_DISABLED=true" \
99100
-e RUNNING_MODE=${RUNNING_MODE} \
100101
-u ${USER_ID} \
101-
spark-run-${USER_NAME} ${CMD}"
102+
v${DIKE_VERSION}-spark-run-${USER_NAME} ${CMD}"
102103
fi
103104
if [ $RUNNING_MODE = "interactive" ]; then
104105
eval "${DOCKER_RUN}"

spark/docker/start-worker-host.sh

Lines changed: 3 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -1,5 +1,5 @@
11
#!/bin/bash
2-
2+
source docker/spark_version
33
source docker/setup.sh
44

55
mkdir -p "${ROOT_DIR}/volume/logs"
@@ -20,8 +20,8 @@ if [ "$#" -ge 2 ] ; then
2020
CORES=$2
2121
fi
2222
echo "removing work and logs"
23-
rm -rf build/spark-3.1.2/work/
24-
rm -rf build/spark-3.1.2/logs/
23+
rm -rf build/spark-$SPARK_VERSION/work/
24+
rm -rf build/spark-$SPARK_VERSION/logs/
2525

2626
echo "Workers: $WORKERS"
2727
echo "Cores: $CORES"

spark/docker/start-worker.sh

Lines changed: 5 additions & 21 deletions
Original file line numberDiff line numberDiff line change
@@ -1,5 +1,5 @@
11
#!/bin/bash
2-
2+
source docker/spark_version
33
source docker/setup.sh
44

55
mkdir -p "${ROOT_DIR}/volume/logs"
@@ -20,27 +20,11 @@ if [ "$#" -ge 2 ] ; then
2020
CORES=$2
2121
fi
2222
echo "removing work and logs"
23-
rm -rf build/spark-3.1.2/work/
24-
rm -rf build/spark-3.1.2/logs/
23+
rm -rf build/spark-$SPARK_VERSION/work/
24+
rm -rf build/spark-$SPARK_VERSION/logs/
2525

2626
echo "Workers: $WORKERS"
2727
echo "Cores: $CORES"
28-
DOCKER_HOSTS="$(cat spark.config | grep DOCKER_HOSTS)"
29-
IFS='=' read -a IP_ARRAY <<< "$DOCKER_HOSTS"
30-
DOCKER_HOSTS=${IP_ARRAY[1]}
31-
HOSTS=""
32-
IFS=',' read -a IP_ARRAY <<< "$DOCKER_HOSTS"
33-
for i in "${IP_ARRAY[@]}"
34-
do
35-
HOSTS="$HOSTS --add-host=$i"
36-
done
37-
DOCKER_HOSTS=$HOSTS
38-
echo "Docker Hosts: $DOCKER_HOSTS"
39-
40-
WORKER_IP="$(cat spark.config | grep WORKER_IP)"
41-
IFS='=' read -a IP_ARRAY <<< "$WORKER_IP"
42-
WORKER_IP=${IP_ARRAY[1]}
43-
echo "WORKER_IP: $WORKER_IP"
4428

4529
if [ $RUNNING_MODE = "interactive" ]; then
4630
DOCKER_IT="-i -t"
@@ -50,7 +34,7 @@ fi
5034
DOCKER_RUN="docker run ${DOCKER_IT} --rm -p 8081:8081 \
5135
--expose 7012 --expose 7013 --expose 7014 --expose 7015 --expose 8881 \
5236
--name sparkworker \
53-
--network dike-net --ip ${WORKER_IP} ${DOCKER_HOSTS} \
37+
--network dike-net \
5438
-e SPARK_CONF_DIR=/conf \
5539
-e SPARK_WORKER_INSTANCES=$WORKERS \
5640
-e SPARK_WORKER_CORES=$CORES \
@@ -72,7 +56,7 @@ DOCKER_RUN="docker run ${DOCKER_IT} --rm -p 8081:8081 \
7256
-v ${ROOT_DIR}/bin/:${DOCKER_HOME_DIR}/bin \
7357
-e RUNNING_MODE=${RUNNING_MODE} \
7458
-u ${USER_ID} \
75-
spark-run-${USER_NAME} ${CMD}"
59+
v${DIKE_VERSION}-spark-run-${USER_NAME} ${CMD}"
7660

7761

7862
if [ $RUNNING_MODE = "interactive" ]; then

spark/start.sh

Lines changed: 4 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -2,4 +2,7 @@
22

33
./docker/start-master.sh && sleep 5 && ./docker/start-worker.sh
44

5-
sleep 5
5+
sleep 5
6+
./docker/start-launcher.sh
7+
8+
sleep 5

start_hdfs.sh

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -19,7 +19,7 @@ echo $CMDSTATUS
1919
if [ $CMDSTATUS -ne 0 ]; then
2020
pushd benchmark/tpch
2121
echo "Initialize tpch CSV database in hdfs"
22-
./run_tpch.sh --mode initCsv --protocol hdfs || (echo "*** failed tpch init of CSV for hdfs $?" ; exit 1)
22+
./run_tpch.sh --local --mode initCsv --protocol hdfs || (echo "*** failed tpch init of CSV for hdfs $?" ; exit 1)
2323
popd
2424
fi
2525

0 commit comments

Comments
 (0)