Skip to content

Commit f5bd48b

Browse files
rkdarstGehock
andauthored
triton/apps/jupyter-gpu: Basic page to describe jupyter+GPUs (#757)
* triton/apps/jupyter-gpu: Basic page to describe jupyter+GPUs * triton/apps/jupyter-gpu: minor update * triton/apps/jupyter-gpu: misc updates * triton/apps/jupyter-gpu: `--gpu` -> `--gpus` and format minor fix --------- Co-authored-by: Sami Laine <sami.v.laine@aalto.fi>
1 parent 0288109 commit f5bd48b

File tree

1 file changed

+83
-0
lines changed

1 file changed

+83
-0
lines changed

triton/apps/jupyter-gpu.rst

Lines changed: 83 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,83 @@
1+
Jupyter with GPUs
2+
=================
3+
4+
.. warning::
5+
6+
Certain projects have funded hardware for Jupyter with GPUs. The resources
7+
are available to all Triton users, with priority given to the project
8+
members. Others can attempt to use in a preemptible queue (the jobs are
9+
killed with no warning if a higher-priority user comes).
10+
11+
We are still tuning the parameters (run time, resources available,
12+
etc.) to balance usefulness vs resource wastage. There is no service
13+
guarantee. Let us know what is useful or not working.
14+
15+
The normal OnDemand Jupyter does not include GPUs, because they are very
16+
expensive and Jupyter interactive work by its nature has lots of idling.
17+
However, some projects have ordered GPUs specifically for interactive work. The
18+
GPUs that have been dedicated for interactive Jupyter work are divided into
19+
separate virtual GPUs with less GPU-memory.
20+
21+
22+
.. highlight:: console
23+
24+
25+
Expected use case
26+
-----------------
27+
28+
Remember, Jupyter+GPUs are designed to be used for testing and
29+
development, not production runs or real computation. The GPU memory
30+
is limited, so you can test code but probably not even run
31+
moderately-sized models. This is because any resources allocated to a
32+
Jupyter job are mostly idle.
33+
34+
You should plan (from the beginning) how you will transition to batch
35+
jobs for your main computations. For example, write and verify code
36+
in Jupyter with tiny data, then from the command line submit the code
37+
to run in the batch queue with much more resources::
38+
39+
$ sbatch --gpus=1 --wrap 'jupyter nbconvert --to notebook --execute mynotebook.ipynb --output mynotebook.$(date -Iseconds).ipynb'
40+
41+
42+
43+
How it works
44+
------------
45+
46+
* Use the normal OnDemand Jupyter app,
47+
https://ondemand.triton.aalto.fi, as described in :doc:`jupyter`.
48+
* Select one of the interactive partitions (see below)
49+
* Your Jupyter session will start. Note it has shorter timeouts that
50+
other Jupyter sessions, to prevent inefficiency. Once you have
51+
resources, don't forget to use them.
52+
* When you are done with using the resources, remember to stop the session via
53+
File > Shut Down.
54+
* There is no service guarantee, resources may be stopped or adjusted
55+
anytime without warning. Save often.
56+
57+
.. list-table::
58+
:header-rows: 1
59+
60+
* * Name
61+
* Who has access
62+
* Resources
63+
64+
* * Ellis H200 GPU
65+
* ELLIS project staff (``ellis`` unix group). Still being formed, contact ASC
66+
for access for now.
67+
* 8 H200 GPUs split into a total of 56 vGPUs with 18G mem each.
68+
69+
* * General H200 GPU
70+
* Anyone, but sessions can be stopped without warning if a higher
71+
priority user comes and needs the resources.
72+
* Same as above
73+
74+
Time limits and other parameters are visible in OnDemand (and not
75+
copied here since they may change).
76+
77+
78+
79+
Contact
80+
-------
81+
82+
Contact ASC/Science-IT us via the normal means, or the people in the
83+
table above for access to the resources.

0 commit comments

Comments
 (0)