Skip to content

Commit e66d11d

Browse files
review feedback
1 parent e3ed685 commit e66d11d

File tree

1 file changed

+33
-34
lines changed

1 file changed

+33
-34
lines changed

src/troubleshooting.rst

Lines changed: 33 additions & 34 deletions
Original file line numberDiff line numberDiff line change
@@ -225,36 +225,32 @@ To find out more, see the ``job.err`` file.
225225

226226
.. cylc-scope::
227227

228-
If you're struggling to track down the error, you might want to restart the
229-
workflow in debug mode and run the task again:
228+
If you're struggling to track down the error, you might want to put the
229+
workflow into debug mode::
230230

231-
.. TODO Update this advice after https://github.com/cylc/cylc-flow/issues/5829
231+
cylc verbosity DEBUG <workflow-id>
232232

233-
.. code-block:: console
233+
When a workflow is running in debug mode, all jobs will create a ``job.xtrace``
234+
file when run in addition to ``job.err``. This can help you to locate the error
235+
within the job script.
234236

235-
# shut the workflow down (leave any active jobs running)
236-
$ cylc stop --now --now <workflow>
237-
# restart the workflow in debug mode
238-
$ cylc play <workflow> --debug
239-
# re-run all failed task(s)
240-
$ cylc trigger '<workflow>//*:failed'
237+
You can also start workflows in debug mode::
241238

242-
When a workflow is running in debug mode, all jobs will create a ``job.xtrace``
243-
file which can help you to locate the error within the job script.
239+
cylc play --debug <workflow-id>
244240

245241

246-
My workflow shutdown unexpectedly
247-
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
242+
My workflow shut down unexpectedly
243+
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
248244

249245
When a Cylc scheduler shuts down, it should leave behind a log message explaining why.
250246

251-
E.G. this message means that a workflow shutdown because it was told to:
247+
E.G. this message means that a workflow shut down because it was told to:
252248

253249
.. code-block::
254250
255251
Workflow shutting down - REQUEST(CLEAN)
256252
257-
If a workflow shutdown due to a critical problem, you should find some
253+
If a workflow shut down due to a critical problem, you should find some
258254
traceback in this log. If this traceback doesn't look like it comes from your
259255
system, please report it to the Cylc developers for investigation (on
260256
GitHub or Discourse).
@@ -276,6 +272,10 @@ Why isn't my task running?
276272
To find out why a task is not being run, use the ``cylc show`` command.
277273
This will list the task's prerequisites and xtriggers.
278274

275+
Note, at present ``cylc show`` can only display
276+
:term:`active tasks <active task>`. Waiting tasks beyond the
277+
:term:`n=0 window <n-window>` have no satisfied prerequisites.
278+
279279
Note, tasks which are held |task-held| will not be run, use ``cylc release``
280280
to release a held task.
281281

@@ -296,7 +296,7 @@ If something has gone wrong during installation, an error should have been
296296
logged a file in this directory:
297297
``$HOME/cylc-run/<workflow-id>/log/remote-install/``.
298298

299-
``If you need to access files from a remote platform (e.g. 2-stage ``fcm_make``),
299+
If you need to access files from a remote platform (e.g. 2-stage ``fcm_make``),
300300
ensure that a task has submitted to it before you do so. If needed you can use
301301
a blank "dummy" task to ensure that remote installation is completed *before*
302302
you run any tasks which require this e.g:
@@ -312,12 +312,12 @@ Conda / Mamba environment activation fails
312312
Some Conda packages rely on activation scripts which are run when you call the
313313
activate command.
314314

315-
Sadly, some of these scripts don't defend against command failure or unset
316-
environment variables causing them to fail when configured in Cylc ``*script``
317-
(see also :ref:`troubleshooting.my_job_failed` for details).
315+
Unfortunately, some of these scripts don't defend against command failure or
316+
unset environment variables causing them to fail when configured in Cylc
317+
``*script`` (see also :ref:`troubleshooting.my_job_failed` for details).
318318

319319
To avoid this, run ``set +eu`` before activating your environment. This turns
320-
off some Bash safety features allowing environment activation to complete.
320+
off some Bash safety features, allowing environment activation to complete.
321321
Remember to run ``set -eu`` afterwards to turn these features back on.
322322

323323
.. code-block:: cylc
@@ -360,7 +360,7 @@ E.G. the following error:
360360
361361
FileNotFoundError: [Errno 2] No such file or directory: 'ssh'
362362
363-
Means that ``ssh`` is not installed.
363+
Means that ``ssh`` is not installed or not in your ``$PATH``.
364364

365365
See :ref:`non-python-requirements` for details on system requirements.
366366

@@ -376,8 +376,8 @@ a remote platform.
376376
This either means that:
377377

378378
1. The platform is down (e.g. all login nodes are offline).
379-
2. There is a network problem (e.g. you cannot connect to the login nodes).
380-
3. The platform is not correctly configured.
379+
2. Or, there is a network problem (e.g. you cannot connect to the login nodes).
380+
3. Or, the platform is not correctly configured.
381381

382382
Check the scheduler log, you might find some stderr associated with this
383383
message.
@@ -395,15 +395,14 @@ note that this defaults to the platform name if not explicitly set.
395395
``OperationalError: disk I/O error``
396396
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
397397

398-
This means that something was unable to write to a file when it would expect to
399-
have been able to.
398+
This means that Cylc was unable to write to the database.
400399

401-
This error usually occurs if when you have exceeded you filesystem quota.
400+
This error usually occurs if when you have exceeded your filesystem quota.
402401

403-
If a Cylc workflow cannot write to the filesystem, it will shutdown. Once
402+
If a Cylc scheduler cannot write to the filesystem, it will shutdown. Once
404403
you've cleared out enough space for the workflow to continue you should be able
405-
to safely restart it as you would normally using ``cylc play``, the workflow
406-
will continue where it left off.
404+
to safely restart it as you would normally using ``cylc play``. The workflow
405+
will continue from where it left off.
407406

408407

409408
``socket.gaierror``
@@ -418,7 +417,7 @@ login nodes you submit jobs to).
418417
Cylc expects each host to have a unique and stable fully qualified domain name
419418
(FQDN) and to be identifiable from other hosts on the network using this name.
420419

421-
I.E. If a host identifies itself with an FQDN, then we should be able to look it
420+
I.e., If a host identifies itself with an FQDN, then we should be able to look it
422421
from another host by this FQDN. If we can't, then Cylc can't tell which host is
423422
which and will not be able to function properly.
424423

@@ -429,7 +428,7 @@ DNS setup is consistent.
429428
Sometimes we do not have control over the platforms we use and it is not
430429
possible to compel system administrators to address these issues. If this is
431430
the case, you can fall back to IP address based host identification which may
432-
work (i.e. use IP address rather than host names, makes logs less human
431+
work (i.e. use IP addresses rather than host names, which makes logs less human
433432
readable). As a last resort you can also hard-code the host name for each host.
434433

435434
For more information, see
@@ -449,10 +448,10 @@ increase this limit.
449448
``Cannot determine whether workflow is running on <host>``
450449
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
451450

452-
When a Cylc workflow runs, it creates a :term:`contact file` which tells us on
451+
When Cylc runs a workflow, it creates a :term:`contact file` which tells us on
453452
which host and port it can be contacted.
454453

455-
If the workflow cannot be contacted, Cylc will attempt to check whether the
454+
If the scheduler cannot be contacted, Cylc will attempt to check whether the
456455
process is still running to ensure it hasn't crashed.
457456

458457
If you are seeing this error message, it means that Cylc was unable to

0 commit comments

Comments
 (0)