Skip to content

Commit 2263e55

Browse files
authored
Merge pull request #8 from BattModels/fix/links
Fix links to point to updated ITS documentation for Lighthouse and Great Lakes
2 parents 8bd04bc + 1a78474 commit 2263e55

File tree

4 files changed

+9
-9
lines changed

4 files changed

+9
-9
lines changed

about/checkpoint.md

Lines changed: 3 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -8,11 +8,11 @@ nav_order: 4
88
# Checkpointing and Requeing Jobs
99
Have a really long job that you want to run? Here's how you do it:
1010
1. Submit the job to the queue
11-
2. Run for almost the full max wall time
11+
2. Run for almost the full max wall time
1212
3. Send a kill signal to your code using [`timeout`](https://manpages.org/timeout)
1313
4. Your code saves a checkpoint
1414
5. Requeue the job with [scontrol](https://slurm.schedmd.com/scontrol.html)
15-
6. Repeat 2-5 until your job finishes
15+
6. Repeat 2-5 times until your job finishes
1616

1717
```bash
1818
#!/bin/bash
@@ -49,7 +49,7 @@ if [[ $? == 124 ]]; then
4949
fi
5050
```
5151

52-
> Typically, a non-zero exit code in Linux means "something went wrong". Because we don't want to requeue a job that failed indefinetly, we need to be able to distighish between "Something went wrong" and "I need more time".
52+
> Typically, a non-zero exit code in Linux means "something went wrong". Because we don't want to requeue a job that failed indefinitely, we need to be able to distinguish between "Something went wrong" and "I need more time".
5353
>
5454
> Here we're checking if the exit code is 124 (`timeout` uses 124 to indicate the command timed out), but any non-zero exit code could work. Check your code's docs to see what's normal, what's an error, and how to throw a different signal
5555

about/hardware.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -12,7 +12,7 @@ math: mathjax2
1212
- Bigger Nodes: Higher Core counts, More Memory, TBs of NVMe scratch
1313
- Faster GPUs: Between 2.5x and 36x faster
1414
- More Storage: Up to 100TB of Archival Storage
15-
- No pre-built modules will need to use [spack](https://spack.io )
15+
- No pre-built modules will need to use [Spack](https://spack.io)
1616
- 4 Tier Storage System: Node, Scratch, Turbo, and DataDen
1717
- Short queues, limited wall times
1818

about/miscellaneous.md

Lines changed: 4 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -10,9 +10,9 @@ nav_order: 5
1010
## Getting Help
1111
- The [Artemis slack channel](https://eeg-group.slack.com/archives/C070HCDCY9F)
1212
- [UM CoderSpaces Slack](https://umich.enterprise.slack.com/archives/C02T1M5QNH3) ([join](https://documentation.its.umich.edu/node/352#JoinResign))
13-
- [UM Lighthouse User Guide](https://arc.umich.edu/lighthouse/user-guide/)
14-
- [UM Great Lakes User Guide](https://arc.umich.edu/greatlakes/user-guide/)
15-
- [UM Cheat Sheet](https://arc.umich.edu/wp-content/uploads/sites/4/2020/05/Great-Lakes-Cheat-Sheet.pdf)
13+
- [UM Lighthouse User Guide](https://documentation.its.umich.edu/arc-hpc/lighthouse/user-guide)
14+
- [UM Great Lakes User Guide](https://documentation.its.umich.edu/arc-hpc/greatlakes/user-guide)
15+
- [UM Great Lakes Cheat Sheet](https://docs.google.com/document/d/1wsr3yzkkojUMBCCneCz-l413xBzU-SZFAqcFrAAjttk/edit?usp=sharing)
1616

1717
## Tmux
1818
Lighthouse and GreatLakes use multiple login nodes for load balancing/redundancy. To persist a session across login nodes, change where tmux creates its sockets:
@@ -40,7 +40,7 @@ If you're moving data between clusters, use [Globus](https://www.globus.org):
4040
- It's way faster than scp/rclone/rsync
4141
- On Arjuna use [Globus Connect Personal](https://www.globus.org/globus-connect-personal)
4242

43-
[UM ARC Endpoints](https://arc.umich.edu/globus/#document-4) (don't go using some rando endpoint)
43+
[UM ARC Endpoints](https://coerc.engin.umich.edu/globus/) (don't go using some random endpoint)
4444
- [DataDen](https://app.globus.org/file-manager?origin_id=ab65757f-00f5-4e5b-aa21-133187732a01)
4545
- [Turbo](https://app.globus.org/file-manager?origin_id=8c185a84-5c61-4bbc-b12b-11430e20010f&origin_path=%2F)
4646
- [/home on Lighthouse](https://app.globus.org/file-manager?origin_id=3242c149-a2b9-4dba-9406-ae3717981621)

getting_started/jupyter_notebooks.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -17,7 +17,7 @@ You must run notebooks on the worker nodes, as described, in this tutorial.
1717
For using Jupyter Notebooks you will need to have:
1818

1919
1. Visual Studio Code installed on your local machine with [Python](https://marketplace.visualstudio.com/items?itemName=ms-python.python), [Jupyter](https://marketplace.visualstudio.com/items?itemName=ms-toolsai.jupyter) and [Remote SSH](https://marketplace.visualstudio.com/items?itemName=ms-vscode-remote.remote-ssh) extensions enabled.
20-
2. Installed Jupyter notebook on Artemis (i.e. via [conda](https://docs.conda.io/en/latest/) or [spack](https://spack.readthedocs.io/en/latest/))
20+
2. Installed Jupyter notebook on Artemis (i.e. via [uv](https://docs.astral.sh/uv/) or [spack](https://spack.readthedocs.io/en/latest/))
2121

2222
### Instructions
2323
1. Allocate an interactive worker node with the resources you need, for example:

0 commit comments

Comments
 (0)