Skip to content

Commit 443c591

Browse files
committed
qa/tasks/cephadm: add option to limit what matches in log error scraping
This is specifically being added with the orch/cephadm suite in mind, where coming up with a viable ignorelist has proved difficult. The orch testing does a lot of actions that can cause thigns like an OSD or MON daemon to be down very briefly, and I've found the vast majority of the time we really don't want to fail the test when these pop up as cephadm testing really only benefits from catching the CEPHADM_ errors/ warnings rather than eveyr possible one. Rather than continuing to play whack-a-mole with the errors in the logs, this patch should allow us to limit what we fail on to at least get the suite in a good spot again. We can always phase out the uses of this new "log-only_match" option later in a more controlled way, and adding it shouldn't affect log scraping for any of the tests that aren't facing a similar issue. Signed-off-by: Adam King <[email protected]>
1 parent 5177c58 commit 443c591

File tree

1 file changed

+13
-3
lines changed

1 file changed

+13
-3
lines changed

qa/tasks/cephadm.py

Lines changed: 13 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -430,7 +430,7 @@ def ceph_log(ctx, config):
430430

431431
finally:
432432
log.info('Checking cluster log for badness...')
433-
def first_in_ceph_log(pattern, excludes):
433+
def first_in_ceph_log(pattern, excludes, only_match):
434434
"""
435435
Find the first occurrence of the pattern specified in the Ceph log,
436436
Returns None if none found.
@@ -445,6 +445,8 @@ def first_in_ceph_log(pattern, excludes):
445445
'/var/log/ceph/{fsid}/ceph.log'.format(
446446
fsid=fsid),
447447
]
448+
if only_match:
449+
args.extend([run.Raw('|'), 'egrep', '|'.join(only_match)])
448450
if excludes:
449451
for exclude in excludes:
450452
args.extend([run.Raw('|'), 'egrep', '-v', exclude])
@@ -460,14 +462,22 @@ def first_in_ceph_log(pattern, excludes):
460462
return stdout
461463
return None
462464

465+
# NOTE: technically the first and third arg to first_in_ceph_log
466+
# are serving a similar purpose here of being something we
467+
# look for in the logs. The reason they are separate args is that
468+
# we want '\[ERR\]|\[WRN\]|\[SEC\]' to always have to be in the thing
469+
# we match even if the test yaml specifies nothing else, and then the
470+
# log-only-match options are for when a test only wants to fail on
471+
# a specific subset of log lines that '\[ERR\]|\[WRN\]|\[SEC\]' matches
463472
if first_in_ceph_log('\[ERR\]|\[WRN\]|\[SEC\]',
464-
config.get('log-ignorelist')) is not None:
473+
config.get('log-ignorelist'),
474+
config.get('log-only-match')) is not None:
465475
log.warning('Found errors (ERR|WRN|SEC) in cluster log')
466476
ctx.summary['success'] = False
467477
# use the most severe problem as the failure reason
468478
if 'failure_reason' not in ctx.summary:
469479
for pattern in ['\[SEC\]', '\[ERR\]', '\[WRN\]']:
470-
match = first_in_ceph_log(pattern, config['log-ignorelist'])
480+
match = first_in_ceph_log(pattern, config['log-ignorelist'], config.get('log-only-match'))
471481
if match is not None:
472482
ctx.summary['failure_reason'] = \
473483
'"{match}" in cluster log'.format(

0 commit comments

Comments
 (0)