|
| 1 | +Jupyter with GPUs |
| 2 | +================= |
| 3 | + |
| 4 | +.. warning:: |
| 5 | + |
| 6 | + Certain projects have funded hardware for Jupyter with GPUs. The resources |
| 7 | + are available to all Triton users, with priority given to the project |
| 8 | + members. Others can attempt to use in a preemptible queue (the jobs are |
| 9 | + killed with no warning if a higher-priority user comes). |
| 10 | + |
| 11 | + We are still tuning the parameters (run time, resources available, |
| 12 | + etc.) to balance usefulness vs resource wastage. There is no service |
| 13 | + guarantee. Let us know what is useful or not working. |
| 14 | + |
| 15 | +The normal OnDemand Jupyter does not include GPUs, because they are very |
| 16 | +expensive and Jupyter interactive work by its nature has lots of idling. |
| 17 | +However, some projects have ordered GPUs specifically for interactive work. The |
| 18 | +GPUs that have been dedicated for interactive Jupyter work are divided into |
| 19 | +separate virtual GPUs with less GPU-memory. |
| 20 | + |
| 21 | + |
| 22 | +.. highlight:: console |
| 23 | + |
| 24 | + |
| 25 | +Expected use case |
| 26 | +----------------- |
| 27 | + |
| 28 | +Remember, Jupyter+GPUs are designed to be used for testing and |
| 29 | +development, not production runs or real computation. The GPU memory |
| 30 | +is limited, so you can test code but probably not even run |
| 31 | +moderately-sized models. This is because any resources allocated to a |
| 32 | +Jupyter job are mostly idle. |
| 33 | + |
| 34 | +You should plan (from the beginning) how you will transition to batch |
| 35 | +jobs for your main computations. For example, write and verify code |
| 36 | +in Jupyter with tiny data, then from the command line submit the code |
| 37 | +to run in the batch queue with much more resources:: |
| 38 | + |
| 39 | + $ sbatch --gpus=1 --wrap 'jupyter nbconvert --to notebook --execute mynotebook.ipynb --output mynotebook.$(date -Iseconds).ipynb' |
| 40 | + |
| 41 | + |
| 42 | + |
| 43 | +How it works |
| 44 | +------------ |
| 45 | + |
| 46 | +* Use the normal OnDemand Jupyter app, |
| 47 | + https://ondemand.triton.aalto.fi, as described in :doc:`jupyter`. |
| 48 | +* Select one of the interactive partitions (see below) |
| 49 | +* Your Jupyter session will start. Note it has shorter timeouts that |
| 50 | + other Jupyter sessions, to prevent inefficiency. Once you have |
| 51 | + resources, don't forget to use them. |
| 52 | +* When you are done with using the resources, remember to stop the session via |
| 53 | + File > Shut Down. |
| 54 | +* There is no service guarantee, resources may be stopped or adjusted |
| 55 | + anytime without warning. Save often. |
| 56 | + |
| 57 | +.. list-table:: |
| 58 | + :header-rows: 1 |
| 59 | + |
| 60 | +* * Name |
| 61 | + * Who has access |
| 62 | + * Resources |
| 63 | + |
| 64 | +* * Ellis H200 GPU |
| 65 | + * ELLIS project staff (``ellis`` unix group). Still being formed, contact ASC |
| 66 | + for access for now. |
| 67 | + * 8 H200 GPUs split into a total of 56 vGPUs with 18G mem each. |
| 68 | + |
| 69 | +* * General H200 GPU |
| 70 | + * Anyone, but sessions can be stopped without warning if a higher |
| 71 | + priority user comes and needs the resources. |
| 72 | + * Same as above |
| 73 | + |
| 74 | +Time limits and other parameters are visible in OnDemand (and not |
| 75 | +copied here since they may change). |
| 76 | + |
| 77 | + |
| 78 | + |
| 79 | +Contact |
| 80 | +------- |
| 81 | + |
| 82 | +Contact ASC/Science-IT us via the normal means, or the people in the |
| 83 | +table above for access to the resources. |
0 commit comments