@@ -180,16 +180,15 @@ As you can see, in the summary ``Job status``, there are multiple sections:
180180
181181#. Line #9-16: Overall summary of number of jobs to complete,
182182 as well as their breakdowns: number of jobs submitted/finished/pending/running/failed;
183- #. Line 18-22: Summary of failed jobs, based on the provided section **alert_log_messages ** in
183+ #. Line # 18-22: Summary of failed jobs, based on the provided section **alert_log_messages ** in
184184 ``--container-config-yaml-file ``, BABS tried to find user-defined alert messages in failed jobs' log files;
185- #. Line 24-25: If there are jobs that failed but don't have defined alert message,
185+ #. Line # 24-25: If there are jobs that failed but don't have defined alert message,
186186 and ``--job-account `` is requested, BABS will then run job account
187187 and try to extract more information and summarize.
188188 For each of these jobs, BABS runs job account command and extracts messages from it.
189189
190- * In the above case, as ``hard_runtime_limit: "48:00:00" `` was set,
191- those 56 failed jobs without alert messages failed probably due to exceeding this runtime limit
192- (``h_rt limit `` in the line #25).
190+ * In the above case, line #25 tells us that these jobs were killed by the cluster
191+ because they exceeded resource limits.
193192 * For SGE clusters: BABS uses command ``qacct `` for job account,
194193 and pulls out the code and message from ``failed `` section in ``qacct ``.
195194 * For Slurm clusters: BABS uses command ``sacct `` for job account,
0 commit comments