Skip to content

Commit 5f63f47

Browse files
authored
Infrastructure for benchmarking and cleanup for release (#61)
This PR includes a number of scripts and changes to the library to support finer-grained measurements in benchmarking. It also includes an initial kmeans implementation and some cleanup of the C++ library code. * The timing infrastructure consists of singleton dictionary classes in `logging.h` that associate a string key with some information to be logged. Right now there are dictionaries for logging times and for logging memory usage. * A number of scripts have been created in `src/benchmarks` for running various kinds of benchmarks of the C++ library. The driver script is `instance_runner.bash` which is run from the local host (i.e., not on an EC2 instance). It iterates through a list of instance types and uses aws CLI to shutdown and bring up instances, running a specified benchmarking script on the specified instance type. `instance_runner.bash` automatically logs its output (using `tee` to a file that includes the instance type and the data and time it launches remote execution of the actual benchmarking script). * The primary benchmarking script is `1b-c6a-16x-125MiB.bash` which is run on a remote EC2 instance and that iterates through a given set of instance configurations. The `setup.bash` script defines the various files required for running a benchmark, for different problems (sift, 1M, 10M, 1B for gp3, s3, and nvme). The initializations in `setup.bash` assume a certain organization of the arrays needed to run a benchmark. Users of these scripts should adjust their local structure as required in order to use the `gp3` or `nvme` array locations. The `s3` arrays should be fine. It is assumed that the executable given in the benchmarking script has been built prior to running per the users desired configuration. That is, it does not checkout or compile any code itself. * The linear algebra functionality has been moved to its own `details` subdirectory. * Several superfluous files have been removed.
1 parent 9a6544a commit 5f63f47

Some content is hidden

Large Commits have some content hidden by default. Use the searchbox below for content that may be hidden.

46 files changed

+2227
-2028
lines changed
Lines changed: 51 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,51 @@
1+
#!/bin/bash
2+
3+
dir=$(dirname $0)
4+
5+
. ${dir}/setup.bash
6+
7+
ivf_query=~/TileDB/feature-vector-prototype/src/cmake-build-release/src/ivf_hack
8+
ivf_query=/home/lums/feature-vector-prototype/src/cmake-build-release/libtiledbvectorsearch/src/ivf_hack
9+
10+
printf "=========================================================================================================================================\n\n"
11+
echo "Starting benchmark run: "
12+
date +"%A, %B %d, %Y %H:%M:%S"
13+
echo $0
14+
printf "Benchmark program ${ivf_query}\n\n"
15+
uptime
16+
17+
printf "\n\n-----------------------------------------------------------------------------------------------------------------------------------------\n\n"
18+
19+
curl -s http://169.254.169.254/latest/meta-data/instance-type
20+
21+
printf "\n\n-----------------------------------------------------------------------------------------------------------------------------------------\n\n"
22+
23+
aws ec2 --region us-east-1 describe-volumes --volume-id vol-0192769447c7688d0
24+
25+
printf "\n\n-----------------------------------------------------------------------------------------------------------------------------------------\n\n"
26+
27+
arch
28+
nproc
29+
head -1 /proc/meminfo
30+
31+
printf "\n\n-----------------------------------------------------------------------------------------------------------------------------------------\n\n"
32+
33+
cat $0
34+
35+
echo "========================================================================================================================================="
36+
37+
for source in gp3 s3;
38+
do
39+
init_1B_${source}
40+
for blocksize in 0 1000000 10000000 ;
41+
do
42+
log_header
43+
for nqueries in 1 10 100 ;
44+
do
45+
for nprobe in 1 2 4 8 16 32 64 128 ;
46+
do
47+
ivf_query --nqueries ${nqueries} --nprobe ${nprobe} --finite --blocksize ${blocksize}
48+
done
49+
done
50+
done
51+
done

src/benchmarks/instance_runner.bash

Lines changed: 80 additions & 12 deletions
Original file line numberDiff line numberDiff line change
@@ -1,32 +1,72 @@
1+
#!/bin/bashA
12

23
instance_id="i-0daab006867136323"
34
region="us-east-1"
5+
# git_branch="lums/tmp/benchmark"
6+
git_branch="lums/tmp/gemm2.0"
7+
ntrials=1
48

5-
for instance_type in c6a.4xlarge c6a.2xlarge;
9+
max_nc_tries=12
10+
nc_tries_sleep=8
11+
instance_ip=$(aws ec2 describe-instances --instance-ids i-0daab006867136323 --query 'Reservations[0].Instances[0].PublicIpAddress' --output text)
12+
13+
if [ -f ~/.bash_awsrc ]; then
14+
. ~/.bash_awsrc
15+
fi
16+
17+
# ssh ec2 "cd feature-vector-prototype ; git commit -am \"Pause for benchmark [skip ci]\" ; git checkout ${git_branch}"
18+
# ssh ec2 "cd feature-vector-prototype/src/cmake-build-release ; make -C libtiledbvectorsearch ivf_hack"
19+
20+
# for instance_type in c6a.4xlarge c6a.2xlarge;
21+
# for instance_type in c6a.16xlarge c6a.2xlarge;
22+
for instance_type in r6a.24xlarge c6a.16xlarge c6a.4xlarge c6a.2xlarge t3.xlarge t1.micro;
623
do
24+
25+
benchname="1b-${instance_type}-10k-125MiB"
26+
bash_script="1b-c6a-16x-10k-125MiB.bash"
27+
28+
echo "Benchmark name is ${benchname}, running script ${bash_script}"
29+
30+
echo "Preparing to run ${instance_type}"
731
current_instance_type=$(aws ec2 --region ${region} describe-instances --instance-ids ${instance_id} --query 'Reservations[].Instances[].InstanceType' --output text)
832
state=$(aws ec2 --region ${region} describe-instances --instance-ids ${instance_id} --query 'Reservations[].Instances[].State.Name' --output text)
933

34+
echo "First stopping ${current_instance_type}"
35+
1036
if [[ ${state} == "running" && ${current_instance_type} == "${instance_type}" ]]; then
11-
echo ${current_instance_type} is ${state}
37+
echo "${current_instance_type} is already ${state}"
1238
else
1339

40+
echo "${current_instance_type} is in state ${state}"
1441
aws ec2 --region ${region} stop-instances --instance-ids ${instance_id}
15-
aws ec2 --region ${region} stop-instances --instance-ids ${instance_id}
16-
ssh ec2 "sync;sync;sync;sudo shutdown -h now"
17-
ssh ec2 "sync;sync;sync;sudo shutdown -h now"
18-
aws ec2 --region ${region} stop-instances --instance-ids ${instance_id}
42+
sleep 1
43+
if nc_timeout=1 max_nc_tries=1 check_instance_status;
44+
then
45+
ssh ec2 "sync;sync;sync;sudo shutdown -h now"
46+
fi
47+
sleep 1
1948

2049
state=$(aws ec2 --region ${region} describe-instances --instance-ids ${instance_id} --query 'Reservations[].Instances[].State.Name' --output text)
50+
51+
# Assume instance *will* stop (eventually)
2152
while [ "$state" != "stopped" ]; do
2253
state=$(aws ec2 --region ${region} describe-instances --instance-ids ${instance_id} --query 'Reservations[].Instances[].State.Name' --output text)
23-
echo "Instance is ${state}"
54+
echo "Instance ${current_instance_type} is ${state}"
2455
sleep 1 # Delay for 1 second
2556
done
2657

2758
echo "Instance is ${state}"
2859

29-
aws ec2 --region ${region} modify-instance-attribute --instance-id ${instance_id} --instance-type ${instance_type}
60+
# Change instance type
61+
change_msg=$(aws ec2 --region ${region} modify-instance-attribute --instance-id ${instance_id} --instance-type ${instance_type})
62+
sleep 1
63+
current_instance_type=$(aws ec2 --region ${region} describe-instances --instance-ids ${instance_id} --query 'Reservations[].Instances[].InstanceType' --output text)
64+
if [ "${current_instance_type}" != ${instance_type} ];
65+
then
66+
echo "Could not change to ${instance_type} because ${change_msg}. Skipping ${instance_type}."
67+
continue
68+
fi
69+
3070
aws ec2 --region ${region} start-instances --instance-ids ${instance_id}
3171

3272
state=$(aws ec2 --region ${region} describe-instances --instance-ids ${instance_id} --query 'Reservations[].Instances[].State.Name' --output text)
@@ -37,15 +77,43 @@ do
3777
done
3878

3979
echo "Instance is ${state}"
80+
sleep 30
4081
fi
4182
# feature-vector-prototype/experimental/benchmarks/1b-c6a-16x-125MiB.bash
4283
# 1b-c6a-16x-125MiB-2023-0613-1419.log
4384

44-
benchname="1b-${instance_type}-125MiB"
45-
bash_script="1b-c6a-16x-125MiB.bash"
46-
command="bash feature-vector-prototype/experimental/benchmarks/${bash_script}"
47-
85+
86+
# Make sure remote instance is ready to accept logins
87+
nc_tries=0
88+
89+
while true; do
90+
if nc -G 2 -zv "${instance_ip}" 22 >/dev/null 2>&1; then
91+
echo "EC2 instance is ready for remote logins."
92+
break
93+
fi
94+
95+
nc_tries=$((nc_tries + 1))
96+
97+
if [ "$nc_tries" -eq "$max_nc_tries" ]; then
98+
echo "Maximum number of tries reached. EC2 instance is not ready for remote logins."
99+
break
100+
fi
101+
102+
echo "EC2 instance is not ready yet. Retrying in $nc_tries_sleep seconds..."
103+
sleep "$nc_tries_sleep"
104+
done
105+
48106
for ((i=1; i<=2; i++))
107+
do
108+
# nuke from space, it's the only way to be sure
109+
# ssh ec2 killall -u lums
110+
ssh ec2 "kill \$(ps auxw | fgrep feature | awk '{ print \$2 }')"
111+
sleep 1
112+
done
113+
ssh ec2 ps auxw | fgrep feature
114+
115+
command="bash feature-vector-prototype/src/benchmarks/${bash_script}"
116+
for ((i=1; i<=${ntrials}; i++))
49117
do
50118
logname="${benchname}-$(date +'%Y%m%d-%H%M%S').log"
51119
ssh ec2 ${command} | tee ${logname}

0 commit comments

Comments
 (0)