Skip to content

Commit fb7d1e3

Browse files
szedergitster
authored andcommitted
test-lib: add the '--stress' option to run a test repeatedly under load
Unfortunately, we have a few flaky tests, whose failures tend to be hard to reproduce. We've found that the best we can do to reproduce such a failure is to run the test script repeatedly while the machine is under load, and wait in the hope that the load creates enough variance in the timing of the test's commands that a failure is evenually triggered. I have a command to do that, and I noticed that two other contributors have rolled their own scripts to do the same, all choosing slightly different approaches. To help reproduce failures in flaky tests, introduce the '--stress' option to run a test script repeatedly in multiple parallel jobs until one of them fails, thereby using the test script itself to increase the load on the machine. The number of parallel jobs is determined by, in order of precedence: the number specified as '--stress=<N>', or the value of the GIT_TEST_STRESS_LOAD environment variable, or twice the number of available processors (as reported by the 'getconf' utility), or 8. Make '--stress' imply '--verbose -x --immediate' to get the most information about rare failures; there is really no point in spending all the extra effort to reproduce such a failure, and then not know which command failed and why. To prevent the several parallel invocations of the same test from interfering with each other: - Include the parallel job's number in the name of the trash directory and the various output files under 't/test-results/' as a '.stress-<Nr>' suffix. - Add the parallel job's number to the port number specified by the user or to the test number, so even tests involving daemons listening on a TCP socket can be stressed. - Redirect each parallel test run's verbose output to 't/test-results/$TEST_NAME.stress-<nr>.out', because dumping the output of several parallel running tests to the terminal would create a big ugly mess. For convenience, print the output of the failed test job at the end, and rename its trash directory to end with the '.stress-failed' suffix, so it's easy to find in a predictable path (OTOH, all absolute paths recorded in the trash directory become invalid; we'll see whether this causes any issues in practice). If, in an unlikely case, more than one jobs were to fail nearly at the same time, then print the output of all failed jobs, and rename the trash directory of only the last one (i.e. with the highest job number), as it is the trash directory of the test whose output will be at the bottom of the user's terminal. Based on Jeff King's 'stress' script. Signed-off-by: SZEDER Gábor <[email protected]> Signed-off-by: Junio C Hamano <[email protected]>
1 parent fa84058 commit fb7d1e3

File tree

3 files changed

+130
-5
lines changed

3 files changed

+130
-5
lines changed

t/README

Lines changed: 18 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -186,6 +186,22 @@ appropriately before running "make".
186186
this feature by setting the GIT_TEST_CHAIN_LINT environment
187187
variable to "1" or "0", respectively.
188188

189+
--stress::
190+
--stress=<N>::
191+
Run the test script repeatedly in multiple parallel jobs until
192+
one of them fails. Useful for reproducing rare failures in
193+
flaky tests. The number of parallel jobs is, in order of
194+
precedence: <N>, or the value of the GIT_TEST_STRESS_LOAD
195+
environment variable, or twice the number of available
196+
processors (as shown by the 'getconf' utility), or 8.
197+
Implies `--verbose -x --immediate` to get the most information
198+
about the failure. Note that the verbose output of each test
199+
job is saved to 't/test-results/$TEST_NAME.stress-<nr>.out',
200+
and only the output of the failed test job is shown on the
201+
terminal. The names of the trash directories get a
202+
'.stress-<nr>' suffix, and the trash directory of the failed
203+
test job is renamed to end with a '.stress-failed' suffix.
204+
189205
You can also set the GIT_TEST_INSTALLED environment variable to
190206
the bindir of an existing git installation to test that installation.
191207
You still need to have built this git sandbox, from which various
@@ -425,7 +441,8 @@ This test harness library does the following things:
425441
- Creates an empty test directory with an empty .git/objects database
426442
and chdir(2) into it. This directory is 't/trash
427443
directory.$test_name_without_dotsh', with t/ subject to change by
428-
the --root option documented above.
444+
the --root option documented above, and a '.stress-<N>' suffix
445+
appended by the --stress option.
429446

430447
- Defines standard test helper functions for your scripts to
431448
use. These functions are designed to make all scripts behave

t/test-lib-functions.sh

Lines changed: 5 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -1288,8 +1288,6 @@ test_set_port () {
12881288
# root-only port, use a larger one instead.
12891289
port=$(($port + 10000))
12901290
fi
1291-
1292-
eval $var=$port
12931291
;;
12941292
*[^0-9]*|0*)
12951293
error >&7 "invalid port number: $port"
@@ -1298,4 +1296,9 @@ test_set_port () {
12981296
# The user has specified the port.
12991297
;;
13001298
esac
1299+
1300+
# Make sure that parallel '--stress' test jobs get different
1301+
# ports.
1302+
port=$(($port + ${GIT_TEST_STRESS_JOB_NR:-0}))
1303+
eval $var=$port
13011304
}

t/test-lib.sh

Lines changed: 107 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -139,6 +139,19 @@ do
139139
verbose_log=t
140140
tee=t
141141
;;
142+
--stress)
143+
stress=t ;;
144+
--stress=*)
145+
stress=${opt#--*=}
146+
case "$stress" in
147+
*[^0-9]*|0*|"")
148+
echo "error: --stress=<N> requires the number of jobs to run" >&2
149+
exit 1
150+
;;
151+
*) # Good.
152+
;;
153+
esac
154+
;;
142155
*)
143156
echo "error: unknown test option '$opt'" >&2; exit 1 ;;
144157
esac
@@ -160,16 +173,108 @@ then
160173
test -z "$verbose_log" && verbose=t
161174
fi
162175

176+
if test -n "$stress"
177+
then
178+
verbose=t
179+
trace=t
180+
immediate=t
181+
fi
182+
183+
TEST_STRESS_JOB_SFX="${GIT_TEST_STRESS_JOB_NR:+.stress-$GIT_TEST_STRESS_JOB_NR}"
163184
TEST_NAME="$(basename "$0" .sh)"
164185
TEST_RESULTS_DIR="$TEST_OUTPUT_DIRECTORY/test-results"
165-
TEST_RESULTS_BASE="$TEST_RESULTS_DIR/$TEST_NAME"
166-
TRASH_DIRECTORY="trash directory.$TEST_NAME"
186+
TEST_RESULTS_BASE="$TEST_RESULTS_DIR/$TEST_NAME$TEST_STRESS_JOB_SFX"
187+
TRASH_DIRECTORY="trash directory.$TEST_NAME$TEST_STRESS_JOB_SFX"
167188
test -n "$root" && TRASH_DIRECTORY="$root/$TRASH_DIRECTORY"
168189
case "$TRASH_DIRECTORY" in
169190
/*) ;; # absolute path is good
170191
*) TRASH_DIRECTORY="$TEST_OUTPUT_DIRECTORY/$TRASH_DIRECTORY" ;;
171192
esac
172193

194+
# If --stress was passed, run this test repeatedly in several parallel loops.
195+
if test "$GIT_TEST_STRESS_STARTED" = "done"
196+
then
197+
: # Don't stress test again.
198+
elif test -n "$stress"
199+
then
200+
if test "$stress" != t
201+
then
202+
job_count=$stress
203+
elif test -n "$GIT_TEST_STRESS_LOAD"
204+
then
205+
job_count="$GIT_TEST_STRESS_LOAD"
206+
elif job_count=$(getconf _NPROCESSORS_ONLN 2>/dev/null) &&
207+
test -n "$job_count"
208+
then
209+
job_count=$((2 * $job_count))
210+
else
211+
job_count=8
212+
fi
213+
214+
mkdir -p "$TEST_RESULTS_DIR"
215+
stressfail="$TEST_RESULTS_BASE.stress-failed"
216+
rm -f "$stressfail"
217+
218+
stress_exit=0
219+
trap '
220+
kill $job_pids 2>/dev/null
221+
wait
222+
stress_exit=1
223+
' TERM INT HUP
224+
225+
job_pids=
226+
job_nr=0
227+
while test $job_nr -lt "$job_count"
228+
do
229+
(
230+
GIT_TEST_STRESS_STARTED=done
231+
GIT_TEST_STRESS_JOB_NR=$job_nr
232+
export GIT_TEST_STRESS_STARTED GIT_TEST_STRESS_JOB_NR
233+
234+
trap '
235+
kill $test_pid 2>/dev/null
236+
wait
237+
exit 1
238+
' TERM INT
239+
240+
cnt=0
241+
while ! test -e "$stressfail"
242+
do
243+
$TEST_SHELL_PATH "$0" "$@" >"$TEST_RESULTS_BASE.stress-$job_nr.out" 2>&1 &
244+
test_pid=$!
245+
246+
if wait $test_pid
247+
then
248+
printf "OK %2d.%d\n" $GIT_TEST_STRESS_JOB_NR $cnt
249+
else
250+
echo $GIT_TEST_STRESS_JOB_NR >>"$stressfail"
251+
printf "FAIL %2d.%d\n" $GIT_TEST_STRESS_JOB_NR $cnt
252+
fi
253+
cnt=$(($cnt + 1))
254+
done
255+
) &
256+
job_pids="$job_pids $!"
257+
job_nr=$(($job_nr + 1))
258+
done
259+
260+
wait
261+
262+
if test -f "$stressfail"
263+
then
264+
echo "Log(s) of failed test run(s):"
265+
for failed_job_nr in $(sort -n "$stressfail")
266+
do
267+
echo "Contents of '$TEST_RESULTS_BASE.stress-$failed_job_nr.out':"
268+
cat "$TEST_RESULTS_BASE.stress-$failed_job_nr.out"
269+
done
270+
rm -rf "$TRASH_DIRECTORY.stress-failed"
271+
# Move the last one.
272+
mv "$TRASH_DIRECTORY.stress-$failed_job_nr" "$TRASH_DIRECTORY.stress-failed"
273+
fi
274+
275+
exit $stress_exit
276+
fi
277+
173278
# if --tee was passed, write the output not only to the terminal, but
174279
# additionally to the file test-results/$BASENAME.out, too.
175280
if test "$GIT_TEST_TEE_STARTED" = "done"

0 commit comments

Comments
 (0)